THE TRANSPORTER
FactsBook
THE TRANSPORTER
FactsBook
Other books in the FactsBook Series: A. Nell Barclay, Albertus...
57 downloads
2279 Views
28MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
THE TRANSPORTER
FactsBook
THE TRANSPORTER
FactsBook
Other books in the FactsBook Series: A. Nell Barclay, Albertus D. Beyers, Marian L. Birkeland, Marion H. Brown, Simon J. Davis, Chamorro Somoza and Alan F. Williams The Leucocyte Antigen FactsBook, 1st edn Robin Callard and Andy Gearing The Cytokine FactsBook Steve Watson and Steve Arkinstall The G-Protein Linked Receptor FactsBook Rod Pigott and Christine Power The Adhesion Molecule FactsBook Shirley Ayad, Ray Boot-Handford, Martin J. Humphries, Karl E. Kadler and C. Adrian Shuttleworth The Extracellular Matrix FactsBook Grahame Hardie and Steven Hanks The Protein Kinase FactsBook The Protein Kinase FactsBook CD-Rom Edward C. Conley The Ion Channel FactsBook h Extracellular Ligand-Gated Channels Edward C. Conley The Ion Channel FactsBook H: lntracellular Ligand-Gated Channels Kris Vaddi, Margaret Keller and Robert Newton The Chemokine FactsBook Marion E. Reid and Christine Lomas-Francis The Blood Group Antigen FactsBook A. Nell Barclay, Marion H. Brown, S.K. Alex Law, Andrew J. McKnight, Michael G. Tomlinson and P. Anton van der Merwe The Leucocyte Antigen FactsBook, 2nd edn Robin Hesketh The Oncogene and Tumour Suppressor Gene FactsBook, 2nd edn
THE TRANSPORTER
FactsBook Jeffrey Griffith Department of Biochemistry and Molecular Biology University of New Mexico School of Medicine
Clare Sansom Department of Crystallography Birkbeck College, University of London
Academic Press Harcourt Brace & Company, Publishers SAN DIEGO
LONDON
SYDNEY
BOSTON
TOKYO
NEW YORK
TORONTO
This book is printed on acid-flee paper Copyright 9 1998 by ACADEMIC PRESS
All Rights Reserved No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Academic Press 525 B Street, Suite 1900, San Diego, California 92101-4495, USA http://www.apnet.com Academic Press Limited 24-28 Oval Road, London NW1 7DX, UK http://www.hbuk.co.uk/ap/ ISBN 0-12-303965-7
Library of Congress Cataloging-in-Publication Data Griffith, Jeffrey. The transporter factsbook / by Jeffrey Griffith, Clare Sansom. p. cm. Includes index. ISBN 0-12-303965-7 (alk. paper) 1. Carrier proteins. I. Sansom, Clare. II. Title. OP552.C34075 1997 97-44438 572'69-dc21 CIP
A catalogue record for this book is available from the British Library
Typeset in Great Britain by Alden Group, Oxford. Printed in Great Britain by WBC, Bridgend, Mid Glamorgan 98 990001 0203 EB 9 8 7 6 5 43 2 1
Preface
M
Abbreviations
X
Chapter I Function and Structure of Membrane Transport Proteins Peter I.F. Henderson
3
Chapter 2 Amino Acid Sequence Comparisons
30
Chapter 3 Organization of the Data
34
Part 1 P-Type ATPases Calcium-transporting ATPase family Plasma membrane cation-transporting ATPase family Heavy metal-transporting ATPase family
41 42 48 88
Part 2 Vacuolar ATPases Vacuolar ATPase family
103 104
Part 3 ABC Multidrug Resistance Proteins White transporter family ABC 1 & 2 transporter family Yeast multidrug resistance family Cystic fibrosis transmembrane conductance regulator family P-Glycoprotein transporter family Peroxisomal membrane transporter family
113 114 121 126 135 142 179
Part 4 ABCQ Transporters ABC-2 nodulation protein family ABC-2 polysaccharide exporter family ABC-2 associated (cytoplasmic)protein family
185 186 190 194
Part 5 ABC Binding Protein-Dependent Transporters:Transmembrane Elements ABC-associated binding protein-dependent maltose transporter family ___ ABC-associated bindmg protein-dependent peptide transporter family ABC-associated binding protein-dependent iron transporter family
203 204 208 214
Part 6 ABC Binding Protein-Dependent Transporters: Cytoplasmic Elements Binding protein-dependent monosaccharide transporter family Binding protein-dependent peptide transporter family Part 7 Other ABC-Associated (Cytoplasmic) Proteins Heme exporter family Macrolide-streptogramin-tysolin resistance family
22 1 222 227 25 1 252 255
Part 8 H'-Dependent Symporters H+/sugar symporter-uniporter family H'/rhamnose symporter family H'/amino acid symporter family H'/lactose-sucrose-nucleoside symporter family H+/galactoside-pentose-hexuronide symporter family H+/oligopeptide symporter family H+/fucose symporter family H'/carboxylate symporter family H'/nucleotide symporter family Sugar phosphate transporter family
26 1 262 288 290 30 1 305 310 317 320 326 329
Part 9 H'-Dependent Antiporters H+/vesicular amine antiporter family 14-Helix H'/multidrug antiporter family 4-Helix H+/multidrug antiporter family 12-Helix H'/multidrug antiporter family Acriflavin-cation resistance family Yeast multidrug resistance family Part 10 Na'-Dependent Symporters Na'/Ca2' exchanger family Na'/proline symporter family Na'/glucose symporter family Na'/dicarboxylate symporter family Na'PO4 symporter family Na+/branched amino acid symporter family Na+/citrate symporter family Na'/alanine-glycine symporter family Na+/neurotransmitter symporter family
335 336 341 353 357 364 370 375 376 380 385 392 400 404 408 41 1 414
Part 11 Na'-Dependent Antiporters Na+/H+antiporter family
42 7 428
Part 12 PEP-Dependent Phosphotransferase Family Phosphoenolpyruvate-dependent sugar phosphotransferase family
435 436
Part 13 Other Transporters Anion exchanger family Mitochondnal adenine nucleotide translocator family Mitochondria1 phosphate carrier family Nitrate transporter I family Nitrate transporter II family
445 446 454 469 472 476
Contents Spore germination transporter family Vacuolar membrane pyrophosphatase family Gluconate transporter family
479 482 486
Index
491
VII
This Page Intentionally Left Blank
The Transporter FactsBook had its inception early in 1996 when Tessa Picknett at Academic Press approached the authors with the idea of preparing a volume on transport proteins. Recognizing that the book would contain several different types of transporters, and that additional transporter species were being described almost daily, it was decided that the only way to make the volume comprehensive would be to base the chapters on families of related transporters, rather than individual proteins. Using this method, we have been able to include nearly 800 transport proteins in this volume. More important, this comparative approach, which stresses the structural, mechanistic and biological properties that are common to closely related proteins, provides an objective basis for identifying potential evolutionary relationships between distantly related groups of proteins and establishes a system for classifying and characterizing newly described transporters. The authors hope that this basis for identification and classification will continue to make the volume a valuable resource even after the compilation of transporters it contains is no longer comprehensive. An undertaking of this scope and complexity would not have been possible without the help and advice of many people. In particular, the authors would like to thank Jennifer Bryant and Peggy Moran at the University of New Mexico School of Medicine for cheery and able assistance in establishing the relationships between the nearly 800 transporters described in The Transporter FactsBook, Dr Mark Platt, now at the Louisiana State University School of Medicine, for wizardry in editing and modifying the phylogenetic trees, and Tessa Picknett and her staff at Academic Press for encouragement, support and patience in getting the manuscript into press. There will undoubtedly be omissions and errors in this volume although we hope that they will be infrequent. We would greatly appreciate being informed of any inaccuracies by writing to the Editor, The Transporter FactsBook, Academic Press, 24-28 Oval Road, London NW1 7DX, UK, so that these can be rectified in future editions.
Jeff Griffith
Clare Sansom
IX
ABC ADP Asn Asp ATP CC-4 CFTR CNS 3-D DNA EB
FAD GABA Gln Glu HMA kDa M
mol. wt MDR MFS mV NNADH Ap ApH NMR PEP PIR PTS QUAC SD USA
angstrom unit ATP binding cassette adenosine diphosphate asparagine aspartic acid adenosine triphosphate carboxyl4-carbon cystic fibrosis transmembrane conductance regulator central nervous system three-dimensional deoxyribonucleic acid ethidium bromide flavin adenine dinucleotide 4-amino butyric acid glutamine glutamic acid heavy metal binding sequence kilodalton molar molecular weight multidrug resistance major facilitator superfamily millivolts amino nicotinamide adenine dinucleotide (reduced form) proton-motive force transmembrane pH gradient transmembrane charge gradient nuclear magnetic resonance phosphoenolpyruvate protein identification resource Phosphoenolpyruvate-dependent sugar phosphotransferase system quaternary ammonium compound standard deviation uniporter-symporter-antiporter
THE INTRODUCTORY CHAPTERS
This Page Intentionally Left Blank
Peter J. F. Henderson (Department of Biochemistry and Molecular Biology, University of Leeds, Leeds LS2 9JT, UK) INTRODUCTION The hydrophobic bilayer membrane that bounds cells is inherently impermeable to the great majority of hydrophilic solutes required for cell nutrition and to many of the waste products and/or toxins that must be excreted. Accordingly, the membrane contains proteins, the sole function of which is to catalyze the translocation of substrates through the membrane. As the substrates for many membrane processes can be obtained in radioisotope-labeled form, it has been technically feasible to characterize the functions of many of these transport proteins. The structures of the proteins themselves, however, have proved to be difficult to elucidate: they are of low natural abundance in the membrane; they are very hydrophobic and refractory to isolation methods in aqueous solutions; and, even when purified, usually in nondenaturing detergents, they are very difficult to crystallize. Where the proteins happen to be abundant - bacteriorhodopsin from Halobacterium halobium, K+/Na + ATPases in nerve and Ca 2+ ATPase from muscle, cytochrome oxidases in bacteria and mitochondria, glucose transporter from human erythrocytes, for example progress has been made in elucidating the structure-function relationship. Yet, of these proteins the three-dimensional structure has only been determined for bacteriorhodopsin and the oxidases 1-3, and this is just the beginning of determining their molecular mechanisms of operation. Free-living microorganisms {bacteria, algae, yeasts, parasitic protozoa} often inhabit environments where nutrients are in short supply, and different species must compete with each other for the available metabolites. Accordingly, they couple expenditure of metabolic energy to inward transport of essential nutrients (K§ NH~, Pi, SO42-, sugars, vitamins, etc.)to achieve intracellular concentrations sufficient for optimal growth rates. This expenditure can amount to 20-30% of the organism's available energy when a carbohydrate is fermented under anaerobic conditions to yield only 2-3 moles ATP per mole sugar 4,s. Since the efficiencies of the transport steps may therefore influence cell yield and growth rate 4,6,7 an understanding of the transport processes is important to both the academic researcher seeking to understand bacterial cell physiology, and the industrial manager trying to maintain the profitability of a fermentation process. Furthermore, the process of eliminating metabolic wastes and/or toxins such as antibiotics is often coupled to the expenditure of metabolic energy, an indication of its importance for survival. Motility appears to be driven by transport processes also, although this may not consume so much energy 8. In higher organisms, where survival functions are distributed between different organs, the energization of nutrient capture and waste efflux may be confined to specific tissues, e.g. the gut and the kidney. As a result of their activities, cells in other tissues enjoy an unchallenging environment in which their energy reserves can be channeled into other functions. Thus, their transport processes more often occur by facilitated diffusion.
As approximately 5-15 % of all proteins, revealed by the current efforts in genome sequencing, are membrane transport proteins 9, we anticipate the need for a huge effort in the new millennium to determine the structures of these proteins that are vital for the capture of nutrients and hence the first stage in cell growth. Their additional roles in antibiotic resistance, toxin secretion, ATP synthesis, ion balance, generation of action potentials, synaptic neurotransmission, kidney function, intestinal absorption, tumor growth and other diverse cell functions in organisms from microbe to man presage a major investigative effort to elucidate their molecular mechanisms of action. This effort to elucidate vectorial processes can be compared to the continuing efforts to understand enzyme-mediated catalysis, though there is the possibility of an underlying uniformity of translocation mechanism despite the huge numbers of independent transport proteins that exist. The advent of recombinant DNA technology has enabled the study of membrane transport proteins to be furthered in at least four major directions. The first is the burgeoning appearance of an enormous number of amino acid sequences of the proteins predicted from the DNA sequences of their genes in the genome mapping projects. This sequence information has enabled a second advance: the unambivalent exposure of the evolutionary relationships between proteins not thought hitherto to be related. The third is the manipulation of the genes to expedite amplified expression and purification of the proteins. Finally the ability to mutagenize individual amino acids and to make chimeric proteins is being used to elucidate the relationship of function to structure. A number of transport proteins play a role in human health and disease. The study of "ABC" transport systems (see later) in mammalian cells was intensified with the discovery that cystic fibrosis, the commonest inherited disease in the western world, was caused by a defect in the C1- transport protein lo. The significance of a multidrug resistance protein, "Mdr" that catalyzes secretion of cytotoxins and the failure of anti-tumor chemotherapy similarly focused attention on a different ABC system. In both cases their similarity as ABC-type systems would have been completely obscured without the amino acid sequence information derived from the cloning and sequencing of their genes. Other transport proteins are involved in glucose/galactose malabsorption, albinism, adrenoleukodystrophy. This FactsBook is intended to catalyze this new age of exploration of membrane transport protein structure. It is our major goal to arrive at a sensible classification of transport systems based upon both evolutionary and mechanistic considerations. The numbers of protein sequences now known is too large to include them all, and the expected appearance of legions more from the genome sequencing programs makes it timely to formulate a systematic approach to their classification. First it is important to describe current concepts of their functions. The treatment below is necessarily brief, and the reader is referred to the appropriate chapters in standard biochemistry textbooks 11,12 for a fuller introduction. A watershed in the field occurred when Peter Mitchell la-,s showed that transport processes were intimately associated with the mechanism of oxidative and photosynthetic ATP synthesis, a process which is central to energy metabolism in almost all organisms. However, because of the difficulties in studying the hydrophobic membrane proteins involved we know very little about the molecular mechanism of such vectorial events; this contrasts with the wealth of information on the molecular mechanisms of chemical events catalyzed by water-soluble enzymes. It is quite possible that there is an underlying unity in the molecular mechanism of the
translocation process, even when the direction of solute movement and any energization steps are completely different. This question is likely to be illuminated only when we elucidate the 3D structures and determine the structure-activity relationships of the transport proteins. By far the most central question in the transport field is precisely this - what are the 3D structures of the proteins involved? Before reaching this question it is useful to define some terms often used in the characterization of transport processes.
USEFUL CONCEPTS
Passive diffusion Passive diffusion is the translocation of a solute across a membrane down its electrochemical gradient without the participation of a transport protein. The process follows Fick's law, and so obeys the relationship below in which the velocity has a linear relationship to the [solute]: v : PAc
where v is velocity, P is the permeability coefficient for the particular solute, A is the area, and c is the difference in solute concentration across the cell membrane. Diffusion has a low temperature coefficient (vcx absolute temperature) and is non-specific. Typical biologically important compounds that follow this mechanism are 02, CO2, NH3, HCO~H, CH3CO2H, CH~OH.CHOH.CH~OH - small, neutral molecules that are soluble in lipid membranes.
Facilitated diffusion Facilitated diffusion is the translocation of a solute across a membrane down its electrochemical gradient catalyzed by a transport protein. The Michaelis-Menten relationship ~,~2 often adequately relates the initial rate of transport (vl to initial substrate concentration ([S] = c at zero time): V :
Vmax.[Sl/(K
m --]-IS])
(Vm~x is maximum velocity, K m = [S] where v is Vmax/2). As with enzyme reactions, there is a high temperature coefficient and, usually, strong substrate specificity. Biological substrates that follow this mechanism are typically charged and/or larger than about the size of glycerol, with a very low inherent solubility in biological membranes. Mitchell classified such transport of a single substrate as "uniport", and glycerol transport is an example of such facilitated diffusion in E. coli 16"17. However, in flee-living single-cell organisms, e.g. bacteria, yeasts, algae, the rate of capture of nutrient from the environment by this mechanism is probably too slow at the dilute concentrations that prevail in their normal environments to support competitive growth. Therefore, we usually find that transport of their vital nutrients is coupled to consumption of metabolic energy by active transport (see below) rather than facilitated diffusion. Presumably, during the course of evolution of such organisms the expenditure of precious energy reserves on transport has been a very significant survival factor. In contrast, transport of solutes between intracellular organelles in eukaryotes, or into tissue cells from the blood, often occurs by facilitated diffusion since high
concentrations of solute are already established, for example by the Na§ symport system (below) so that facilitated diffusion by the tissue glucose uniporters is sufficient to support cell metabolism. The seminal example of such a facilitated transport was the GLUT1 glucose transport protein in human erythrocytes ls,19.
Active transport The term active transport is used to describe the net transport of a solute across a biological membrane from a low to a high electrochemical potential. Active transport shows the following characteristics. 9 Accumulation of solute occurs against a concentration gradient. 9 The solute is not chemically modified during translocation. 9 Saturable steady-state kinetics are observed, often following the Michaelis-Menten relationship (above). 9 There is a high temperature coefficient typical of enzyme-catalyzed reactions. 9 Substrate specificity is restricted. 9 An input of metabolic energy is required. Active transport processes embrace a variety of molecular mechanisms, in which energy may be derived from light, oxidoreduction, ATP hydrolysis, or pre-existing solute gradients. It is conceptually helpful to classify them further into "primary" and "secondary" mechanisms 2o (Fig. 1). Secondary transport can be subdivided into "symport" or "antiport", terms introduced by Mitchell la,14 (Fig. 1).
Primary active transport Primary transport involves the direct conversion of chemical or photosynthetic energy into an electrochemical potential of solute across the membrane barrier. Thus, translocation of protons driven by oxidation of respiratory substrates ls, zl,z2 by hydrolysis of ATP ls'22-24 or by light energy absorbed by bacteriorhodopsin 1 all fall into this category. Many nutrient transport systems involving binding proteins in bacteria are of the primary type, directly energized by ATP (see below). All these examples transport one substrate in one direction and so are described as "uniport" 14
Secondary active transport Secondary transport involves the conversion of a pre-existing electrochemical gradient, usually of H + or Na + ions, into a new electrochemical gradient of the transported species. Thus the ultimate energy source for secondary transport systems is a primary chemical or photochemical conversion. In E. coli primary proton ejection by respiration or ATPase powers secondary sugar-H + s y m p o r t (obligatory coupling of H § and solute movement in the s a m e direction 14 see Fig. 1) or secondary Na+/H + antiport (the obligatory coupling of H § and solute movement in the opposite direction la; Fig. 2). For example, the resulting Na + gradient can be further coupled to melibiose transport by a melibiose-Na § symport, so that net melibiose accumulation is driven by respiration (or ATPase)via H + and Na § gradients (Fig. 2}. In E. coli the transmembrane H § gradient would appear to be the "common currency" of many energized transport reactions, and the Na § gradient of relatively few. However, in other organisms living in salt environments the Na § gradient is the dominant factor maintained by a primary Na § pump 2s-27, as it is in multicellular eukaryotes.
Secondaryactivetransport H+ Nutrient
H+
H+
Symport
Antiport
Respi
+
J
A
~
An~biotic / ~ SugarP
~ugar Group translocation \
\Pi§ ATP~
synthase~
~lf
^-r~
Toxin
l
T
ATe
I Nutrient
I
N O cero,
Uniport
I
K§
Primaryactivetransport Figure 1 Energization of sugar transport in E. coli. The large oval represents the cytoplasmic membrane of the microorganism. A transmembrane electrochemical gradient of protons is generated by respiration or ATP hydrolysis depicted on the left. This can be utilized by the proton-nutrient symport or proton-substrate antiport systems shown along the top. Some sugars can be accumulated by an alternative mechanism involving ATP, a binding protein, and two or three other proteins shown along the bottom, with other ATP-dependent primary transport systems for uptake of K § or efflux of toxin. A phosphotransferase mechanism involving PEP and two or three proteins for sugar accumulation is shown on the upper right, and facilitated transport of glycerol on the lower right.
Group translocation All the above mechanisms operate without chemical modification of the solute. Group translocation systems catalyze both the translocation and concomitant chemical modification of the solute. For a range of carbohydrates in many species of bacteria, phosphoenol pyruvate (PEP} is the donor to produce internal sugarphosphate from external free sugar 2s {Fig. 1); the glucose phosphotransferase system is particularly widespread amongst anerobic organisms.
CLASSIFICATION OF MEMBRANE TRANSPORT SYSTEMS A C C O R D I N G TO THEIR ENERGETICS Although thousands of transport processes, each catalyzed by its own protein, have been identified, the strategies found coupling metabolic energy to the translocation process are relatively few in number. These are now described to provide a formal
7
Na + Nutrient
Na +
ymport
Respirat~ionN ,.
'
,
Na+
Nla+.." AT
+
Na+
~
.
"/
Na§
--
,Oarlooxy-! IDa~eCarb~'/
Anti(x~rt
Primary active transport
Figure 2 Sodium-linked transport systems. In halotolerant bacteria respiration may pump sodium ions from the inside to the outside, and the ATP synthase then utilizes the gradient of sodium ions to make ATP (depicted on the left). Similarly, instead of being driven by the proton gradient, rotation of the flagella is driven by the sodium gradient (right). Nutrients may be accumulated by a Na § substrate symport system (top left) and toxins excreted by a Na § substrate antiport (top right). In many bacteria, sodium ions are excreted by a Na§ § antiport (bottom left). In a few species there are sodium-secreting active transport systems driven by decarboxylation reactions (bottom right). basis for a preliminary classification of all the processes. While recent work indicates that a single transport system might employ more than one energization mechanism e9,3o, or even that at least one novel mechanism may exist (TonB), the vast majority of biological transport systems so far fall conveniently into one of these classes. Their operation is illustrated in Fig. 1. While previous investigators made many fundamental contributions to understanding transport processes (see review31), it was Peter Mitchell who showed how vital are vectorial processes to the totality of energy metabolism in living organisms. Accordingly, we will now sketch in the chemiosmotic approach before focusing on individual mechanisms of solute translocation.
The chemiosmotic theory of oxidative and photosynthetic phosphorylation In 1961, Peter Mitchell proposed his Chemiosmotic Theory of Oxidative and Photosynthetic Phosphorylation ae. This sought to explain how ATP synthesis is coupled to oxidative or photosynthetic electron transfer by the use of an electrochemical gradient of protons across the membrane as a high-energy intermediate between the processes. This brilliant concept generated a wealth of productive experimental investigations that, not without some controversy, arrived at an acceptance that proton transport is a fundamental feature of ATP synthesis in virtually all
organisms. The molecular mechanism of these processes is just beginning to be understood, with the very recent elucidations of the structures of proton-translocating proteins and electron transfer proteins 2,3. There has also been the realization that rotation of the proteins in the membrane is a key feature of energy transmission for flagella s,aa, and ATP synthase 2a,aa The four basic parts of the chemiosmotic system, corresponding to the four postulates of the Chemiosmotic Hypothesis, can be paraphrased as follows al,a2: 1. The proton-translocating reversible ATPase system. 2. The proton-translocating oxido-reduction or light-driven electron transfer chain. 3. The exchange diffusion systems, coupling proton translocation to that of anions and cations. 4. The ion-impermeable coupling membrane, in which systems 1, 2 and 3 reside.
The chemiosmotic view of substrate transport mechanism It is postulate 3, which predicts the involvement of transport systems in the process of balancing charge and osmolarity across the membrane, that led Peter Mitchell to consider the energetics of solute uptake into bacteria. In 1963 he suggested that the uptake of sugars into microbial cells might be energized by a transmembrane proton gradient la. The idea required that an individual transport system catalyze the simultaneous translocation of protons with a substrate molecule, "symport", or the experimentally indistinguishable "antiport" of hydroxyl ions la'al. In this hypothesis energy released by respiration or ATP hydrolysis and "stored" as the electrochemical gradient of protons, could drive accumulation of the nutrient 1a-is'a1 The principle is illustrated in Fig. 1. However, this brilliant prediction remained untested until 1970, when Ian West devised experimental conditions in which the movement of lactose or substrate analogs into cells of Escherichia coli containing the lactose transport protein (LacY) evoked an alkaline pH change showing proton movement in the same directionaS-aT. Since then the structure-activity relationship of the LacY protein has been explored by every practicable method of modern molecular biology as-4~ Several other sugar-H + systems have been characterized, but, most importantly, the principles enunciated by Mitchell la, la have been shown to apply to diverse bacterial transport systems responsible, not just for the capture of nutrients like sugars, amino acids, vitamins and ions, but also for the extrusion of wastes and toxins including lactate, Na § or antibiotics al.
Many transport systems are not i o n - l i n k e d Although the Chemiosmotic Theory formed a framework to unify ideas on mechanisms of transport, it became evident that not all transport sytems were linked to ion translocation. In bacteria, the seminal experiments of Berger and Heppe142"4a showed that transport systems associated with periplasmic binding proteins were energized "directly", probably by ATE These early ideas have been reinforced by the subsequent discovery of numerous ATP binding cassette, "ABC", transport systems in all types of organism that function to transport substrates into, or out of, whole cells or subcellular compartments. They are reviewed most recently by Higgins lo and Boos and Lucht 44.
Furthermore, the uptake of some carbohydrates into bacteria, including most importantly, glucose, was accompanied by simultaneous phosphorylation 2s. This chemical conversion occurred at the expense of phosphoenol pyruvate (PEP) via a cascade of phosphate transfer reactions 28. The operation of such vectorial "group translocation" reactions was considered in detail by Mitchell 14"4s, and the subsequent elucidation of these interesting systems has been reviewed most recently by Postma et aI. 2s CLASSIFICATION OF TRANSPORT SYSTEMS ACCORDING THE AMINO ACID SEQUENCES OF THEIR PROTELNS
TO
Proteins catalyzing a single type of transport function and/or energization mechanism do not necessarily exhibit homology at the primary sequence level. Note that they might nevertheless have similar secondary and tertiary structures. Thus the sequences of the rhamose-H § and fucose-H § symport proteins of E. coli are not homologous to that of the arabinose-H § xylose-H § or galactose-H § symport protein of the same organism 46,47 and none of the sugar-H+ symporters are homologous to the sugar-Na § symporters 48. In addition, some phosphotransferase enzymes II are homologous while others are not 2s. More important, perhaps, is that some proteins catalyzing different types of transport according to the above classifications exhibit a high degree of primary sequence homology. One example is the similarity of E. coli sugar-H + symport proteins for arabinose, xylose, or galactose to the mammalian non-energized glucose uniporter, GLUT1 (Fig. 34z). Another example is the similarity of bacterial K§ ATPase uniport to mammalian Na+/K § ATPase antiport and Ca 2§ ATPase uniport proteins. The mitochondrial H+-Pi symport, ADP/ATP antiport, and oxoglutarate/ malate antiport proteins a9 also show homology to one another. It seems likely that our understanding of the molecular mechanisms of transport processes will be much enhanced by this rapid proliferation of information about the amino acid sequences of membrane transport proteins. In this book the transport proteins are arranged according to such evolutionary families. At least 28 families can be identified already (Table 1), and there are likely to be many more as the sequence databases grow. TRANSPORT
ACROSS PROKARYOTIC
CELL MEMBRANES
Penetration of the cell wall by solutes The cell walls of gram-negative bacteria have a complex multilayered structure that includes lipopolysaccharide, an outer lipid membrane, peptidoglycan, the periplasm and an inner phospholipid bilayer membrane (Fig. 4 so). This wall can be regarded as having at least two global functions, that are to an extent antagonistic. In the first instance the wall has to protect the cell against external toxins and environmental changes inimical to life; secondly, it has to permit the uptake of vital nutrients. The wall must also confer mechanical strength to maintain the integrity of the cell, for example when there are changes in osmotic pressure. In E. coli and a number of other species the evidence suggests that compounds of molecular weight less than about 900 penetrate to the inner membrane at rates that
m
Glucose
Na+-Glucose
H+ Sugar
Na+
Proline
Glutamate
Na+ Glutamate ~
Respir
Hi.~Nucle~
H+
Toxin ATPsynthase
I
K* Ca++ATPase
K+/Na+ATPase H+ATPase
Multidrug resistance Antigen presentation Cystic fibrosis
Antibiotic~ / ~
Neurotransmitters
Figure 3 Mammalian homologues of bacterial transport proteins. The bacterial transporters are depicted as in Figs 1 and 2 with their mammalian homologues indicated in bold type around. do not limit cell growth so,s1. This is achieved by at least three factors. First, the lipopolysaccharide layer is permeable to hydrophilic solutes, though it may be impermeable to more hydrophobic molecules including antibiotics so,s2. Secondly, the outer membrane contains channel-forming trimeric proteins ("porins" s2), acting as molecular sieves that permit simple diffusion of solutes of Mr up to 900, including di- and trisaccharides so-s2. Thirdly, the outer membrane also contains other porin-like proteins which exhibit some specificity for the permeant molecule, and pass the substrate (we presume} to high-affinity binding proteins in the periplasm so,s2. In general the porins can be regarded as forming a "pore" or "channel" that enables passive diffusion of solute into the periplasm at a rate sufficient for growth. However, not all porins are non-specific. A clear example of this is the maltoporin, LamB, that aids the entry into the cell of oligosaccharides containing up to six glucose units. The molecular basis of this specificity has recently been elucidated with the characterization of a "greasy slide" in the pore that interacts with the hydrophobic face of the sugar molecules sa. Similarly, the preference of one porin protein for anions, of which a most important nutrient is inorganic phosphate ions, is explained by a positively charged region in the molecule. Thus, the porins may reflect an evolutionary bridge between passive and facilitated modes of diffusion of nutrients into the cell. Importantly, the inner and outer membranes also have to function as conduits for secretion. Included amongst their substrates are: protein, carbohydrate and lipid components of outer layers of the cell wall; proteins and toxins secreted by
m
Function and Structure of Membrane Transport Proteins
Table 1
Families of Transport Proteins 1
Family
Example: Species
Calcium-transporting ATPase
Probable calcium-transporting ATPase 4 Saccharomyces cerevisiae Peroxisomal Membrane Adrenoleukodystrophy protein Homo sapiens ABC-2 Nodulation Protein Nodj nodulation protein Azorhizobium caulinodans ABC-2 Polysaccharide Exporter BexB capsular polysaccharide exporter Haemophilus influenzae ABC-2 Associated (Cytoplasmic) ATP-binding protein NodI Azorhizobium caulinodans ABC-Associated Binding Protein MalG maltose permease Dependent Maltose Transporter Escherichia coli ABC-Associated Binding Protein DppC dipeptide transporter Dependent Peptide Transporter Escherichia coli ABC-Associated Binding Protein Btuc vitamin B12 transport protein Dependent Iron Transporter Escherichia coli Binding Protein Dependent L-Arabinose transport ATP binding protein Monosaccharide Transporter Escherichia coli Binding Protein Dependent Oligopeptide transport ATP binding protein Peptide Transporter Escherichia coli Heme Exporter Heme exporter CycV Bradorhizobium japonicum Plasma Membrane Calcium-transporting ATPase Cation-Transporting ATPase Homo sapiens Macrolide-Streptogramin-Tylosin Erythromycin resistance protein MsrA Resistance Staphylococcus epidermalis H+-Sugar Symporter or Glut l facilitative glucose transporter Sugar Uniporter Homo sapiens H§ Symporter RhaT rhamnose-H § symporter Escherichia coli H§ Acid Symporter PheP phenylalanine transporter Escherichia coli H§ Sucrose-Nucleoside LacY lactose-H* symporter Symporter Escherichia coli H§ PentoseMelB melibiose-H § symporter Hexuronide Symporter Escherichia coli H§ Symporter Pet l oligopeptide-H+ symporter Homo sapiens H§ Symporter FucP fucose-H+ symporter Escherichia coli H+-Carboxylate Symporter KgtP ~-ketoglutarate-H+ symporter Escherichia coli H§ Symporter NupC pyrimidine nucleoside-H* symporter Escherichia coli Heavy Metal-Transporting Copper-transporting ATPase 1 ATPase Homo sapiens 1Data kindly providedby J.K. Griffith and C.E. Sansom.
12
Code Atc4sacce Aldhomsa Nodjazoca Bexbhaein Nodiazoca Malgescco Dppcescco Btucescco Aragescco Oppdescco Cycvbraja Atchomsa Msrastaep Glutlhomsa Rhatescco Phepescco Lacyescco Melbescco Petlhomsa Fucpescco Kgtpescco Nupcescco At7ahomsa
Table 1
Continued
Family Sugar Phosphate Transporter
Example: Species
UhpT hexose phosphate transporter Escherichia coli H§ Vesicular Antiporter Vesicular amine transporter 2 (VAT2) Homo sapiens 14-Helix H+/Multidrug QacA multidrug resistance protein Antiporter Staphylococcus aureus 4-Helix H+/Multidrug Antiporter QacC multidrug resistance protein Staphylococcus aureus 12-Helix H§ TetA(C) tetracycline antiporter Antiporter Escherichia coli Acfiflavin-Cation Resistance AcrB acriflavin resistance protein Escherichia coli Yeast Multidrug Resistance Bmr benomyl-methotrexate resistance Candida albicans Na+/Ca + Exchanger Cardiac sodium/calcium exchanger Homo sapiens Na+-Proline Symporter PutP proline-Na § symporter Escherichia coli Na+-Glucose Symporter Sgltl glucose-Na § symporter Homo sapiens Vacuolar ATPase Vacuolar ATPase subunit Homo sapiens Na+-Dicarboxylate Symporter DctA dicarboxylate-Na § symporter Escherichia coli Na+-PO4 Symporter Nptl phosphate-Na § cotransporter Homo sapiens Na§ Amino Acid Brnq branched chain amino acid transporter Symporter Salmonella typhimurium Na§ Symporter CitN citrate transporter Klebsiella pneumoniae Na+-Alanine-Glycine ACP alanine transporter Symporter Thermophilic bacterium PS-3 Na § Net 1 noradrenalin-Na § symporter Symporter Homo sapiens Na+/H + Antiporter Nhe 1 Na§ + antiporter Homo sapiens Phosphenolpyruvate-Dependent PtaA N-acetyl glucosamine permease II Sugar Phosphotransferase Escherichia coli System (PTS) Anion Exchanger AE1 anion exchange protein 1 Homo sapiens Mitochondrial Adenine Ant 1 ADP/ATP carrier protein Nucleotide Translocator Homo sapiens White White protein Drosophila melanogaster Mitochondrial Phosphate PHC phosphate carrier protein Carrier Homo sapiens
Code Uhptescco Vat2homsa Qacastaau Ebrstaau Tcr2escco Acrbescco Bmrpcanal Naclhomsa Putpescco Nagchomsa Vphlhomsa Dctaescco Nptlhomsa Bmqsalty Citnklepn Alcpthep3 Ntnohomsa Nhelhomsa Ptaaescco
B3athomsa Antlhomsa Whitdrome Mpcphomsa
13
Table 1 Continued Family Nitrate Transporter I
Example: Species
Code
NarK nitrate-nitrite facilitator protein
Narkescco
Escherichia coli
Nitrate Transporter 1I
CmA nitrate transporter
Crnaemeni
Emericella nidulans
Spore Germination
Spore germination protein GraII
Gra2bacsu
Bacillus subtilis
Vacuolar Membrane Pyrophosphatase Gluconate Transporter
Pyrophospate-energized vacuolar proton pump Avp3arath Arabidopsis thaliana
GntP gluconate transporter
Gntpbacsu
Bacillus subtilis
ABC 1 &2
ATP binding protein ABC 1
Abc 1musmu
Mus musculus
Yeast Multidrug Resistance
Multidrug resistance protein Cdr 1
Cdr 1canal
Candida albicans
Cystic Fibrosis Transmembrane Cystic fibrosis transmembrane Conductance Regulator conductance regulator
Cffrhomsa
Homo sapiens
P-Glycoprotein
Multidrug resistance protein Mdr 1
Mdrlhomsa
Homo sapiens
pathogenic organisms that aid their infection of host cells; enzymes required for the digestion of extracellular macromolecules such as cellulose, proteins, nucleic acids and lipids present as the result of the death of other organisms; and the active secretion of "assault" agents such as antibiotics.
Penetration of the inner cell membrane by solutes The inner cell membrane, a protein-contaimng phospholipid bilayer (Fig. 4 s4) is the barrier preventing the entry of most ambient solutes into the bacterial cell. Nutrient uptake is therefore effected by integral membrane transport proteins, either singly or in complexes, the majority of which are synthesized only in the presence of their substrate (see below). Energization of transport is effected at this inner membrane. Amongst its many other functions are the processes of respiration, ATP synthesis, maintenance of the K§ gradient, motility and osmoregulation, which are themselves transport processes ls, ss, s6. The membrane is therefore a dynamic entity of transport proteins, some of which are dependent on others (Figs 1 and 2). For example, only one i n d u c i b l e protein is required for lactose transport 4o, but the energization of its accumulation requires the respiratory chain or ATPase activity (see Figs 1 and 2), which are more permanent features of the membrane s6,s7.
The importance of proton transport across the inner membranes The Chemiosmotic Theory of Mitchell proposed that the respiratory enzymes pump protons across the inner bacterial membrane so that energy released by substrate
14
LPS
{o
;I
C
Oill
A' '
I
PL ), i
MLP .... PC, Pr'
'
PL s
Figure 4 Schematic drawing of the gram-negative bacterial cell envelope. The
outer membrane (om) consists of lipopolysaccharide, phospholipid and proteins, most of which are porins. Inside the outer membrane is a peptidoglycan layer (pg), which is noncovalently bonded to the outer membrane via murein lipoproteins, themselves covalently attached to the peptidoglycan. The cell membrane (cm) is composed of phospholipid and protein, and is the location of the integral membrane proteins involved in transport. The region between the outer membrane and the cell membrane is called the periplasm. The wavy lines are fatty acid residues that anchor the phospholipids and lipid A into the membrane. LPS, lipopolysaccharide; O, oligosaccharide; C, core; A, lipid A; P, porin; PL, phospholipid; MLP, murein lipoprotein; Pr, protein; ore, outer membrane; pg, peptidoglycan; cm, cell membrane. [Copied, with permission, from White, D. (1995) The Physiology and Biochemistry of Prokaryotes. Oxford University Press, New York.]
oxidation is conserved as an electrochemical proton gradient ss-6o. This "protonmotive force" could then be used as an energy "currency" for expenditure on ATP synthesis, nutrient transport, chemotaxis, osmoregulation, etc. {Fig. 1). In organisms without respiratory enzymes an H § ATPase could maintain the proton-motive force utilizing ATP generated by fermentative metabolism.
m
The existence of the proton-motive force (Ap) across the inner membrane has been conclusively established in a diversity of bacterial species. Its magnitude is usually equivalent to 200-300mV, made up of both electrical (A~P) and osmotic (ApH) components. Proton-motive force Ap = A~P- ZApH where Z is RT/zF, the factor that converts pH units to millivolts, usually calculated at 25 ~ When the proton-motive force is used to energize solute transport by proton-coupled mechanisms (Figs 1 and 2), the gradients of solute that can be achieved are related to the Ap by the following equation (n + m) A ~ - n Z ApH log[Sd/[So] :
z
where m is the substrate charge and n is the proton/substrate ratio. As already described, the Chemiosmotic Theory has been an invaluable guide for the elucidation of transport mechanisms. It is important to note that in some organisms living in alkaline and/or high salt environments, the Na § ion has replaced the H § as the coupling cation 61,sz. While in most examples the "conventional" oxidases and ATP synthase components seem simply to have adapted to pump Na § instead of H § in some organisms Na§ decarboxylase enzymes generate an electrochemical gradient of Na § 62 (Fig. 2). The diagram in Fig. 5 illustrates the following mechanisms by which bacteria are known to effect the transport of some nutrients into their cells, and some solutes out.
1. Facilitated diffusion. 2. The "ATP-Binding-Cassette" ABC systems ("uniport")utilizing ATP to capture nutrient or drive efflux (Figs 1 and 5). 3. The group translocation mechanism utilizing PEP as energy source (Figs 1 and 5).
Facilitated Diffusion Glycerol Out
Primary active Group transport translocation Maltose
I
Secondary active transport Fit_ Tetracycline
Mannitol
Out
_ .
In Glycerol
H+ -antibiotic antiport
H§ -su ga r symport
Ma I )se Mannitol 1 - & ..-. ) Pyruvate,.qlJ"
9
H* Lactose
" H
Tetracychne
Figure 5 Mechanisms of transport across the bacterial cell membrane. The different types of transport activity are described in the text.
16
./n
Function and Structure of Membrane Transport Proteins
4. The H + nutrient coupled ("symport") systems utilizing the transmembrane electrochemical gradient of protons generated by respiration or ATPase (Figs 1 and 5). 5. Coupled transport of similarly charged compounds - anions or cations - in opposite directions ("antiport", Figs 1, 2 and 5), which may effect either accumulation of desired substrate or efflux of enzyme, waste or toxin. The best-understood membrane transport processes have been studied in the gram-negative organisms Escherichia coli and Salmonella typhimurium, which are convenient because of their unicellular nature. Furthermore, most of their transport mechanisms appear to occur in many other microorganisms and even man himself.
TRANSPORT ACROSS EUKARYOTIC CELL MEMBRANES The considerations that apply to understanding transport in prokaryotes extend to eukaryotes with important exceptions. It is more difficult with multicellular organisms where cells occur in tissues. Also, eukaryote cells have subcellular compartments bounded by membranes. Obviously, the transport reactions involved in ATP synthesis are localized in mitochondria and chloroplasts, which use an H § electrochemical gradient for energy coupling. In order to accommodate solute-H + symporters or antiporters in the cell membrane, therefore, organisms like yeast have an H § ATPase located there 63. Mammalian cells, however, utilize a transmembrane Na § gradient generated by an Na§ § ATPase to accommodate solute-Na § symporters or antiporters 64. Quite often the maintenance of high concentrations of nutrient in the extracellular fluid (from the blood in mammals or vascular system in plants) obviates the need for energized transport into the cell, so higher organisms can utilize facilitated diffusion systems in their cell membranes rather than active transport. Translocation of substrates between intracellular and extracellular compartments can have sophisticated functional implications. Examples are the release and recapture of neurotransmitter substances in nerve6S; sucrose mobilization in plants 64; antigen peptide presentation in lymphocytes 10, protein targeting in plants and animal cells 67. Since little is known about each of the individual proteins that contributes to these processes our understanding remains superficial at the present time.
THE NUMBER OF MEMBRANE PROTEIN COMPONENTS A N D / OR DOMAINS INVOLVED IN A TRANSPORT SYSTEM Facilitated diffusion transport systems usually contain a single protein. Similarly, secondary active transport systems usually contain one protein, if we discount those that generate the driving ion gradient. Primary active transport systems may occasionally contain one protein, for example bacteriorhodopsin. However, most appear to comprise a protein complex, involving from as few as two polypeptides (X§ ATPase 68) through six (histidine transport system) to 20 (F1 Fo ATPase) and more in, for example, NADH dehydrogenase 69 Both the ABC and phosphotransferase systems illustrate how transport systems that contained several separate polypeptides in primitive organisms may become
17
fused together during the course of evolution so that one polypeptide with functionally distinguishable domains effects translocation. This has been particularly well illustrated by Higgins 7o and by Postma et al. zs (see Fig. 6).
Oligopeptide
o:
Membrane
IN
Ribose
::~:.'+.' . . . . . . k__)]~J
S. typhimuriuxm
k Yk_) E.
coli~
--
~~
N
i.'-.". . . .
iii!i!i.
|~il!~i
O0 Mycoplasma
(~i~ ::~.....~-".~:
Drosophila
ManJ Multidrugs
~
OUT Membrane
Mannitol
Gluco
EIIC
Single
.
potypeptioe
Mannitol
EIIC
IN
o.,
oi//,
Pyruvate "~/ ~ [ ~
P
Figure 6 Proteins of multicomponent transport systems may become fused during evolution. The transport systems illustrated are discussed in the text. The upper part of the figure shows schematically various ABC primary active transport systems and the organisms in which they are found; the different polypeptides are unfused in the example on the left, and the shading indicates different types of fusion between functionally discrete domains that has occurred in other examples. The lower part of the figure shows different group translocation transport systems, all of which are phosphotransferases found in Escherichia coli; the different polypeptides, E IIA and E liB, associated with the membrane component, E IIC, are unfused in the example on the left and the shading indicates different types of fusion in other examples. The figures are derived from information in refs. 2s,44,7o
n
18
M A N Y MEMBRANE TRANSPORT PROTEINS ARE PREDICTED TO CONTAIN 12 TRANSMEMBRANE DOMAINS Hydropathy plots are widely used to predict if regions of a protein might span the membrane as an a helix 71. This method is particularly applicable when a protein is predicted to contain a high proportion of hydrophobic amino acids. Some examples are shown in Fig. 7. The only authentication of their validity is the reasonable correspondence of predicted a helices with those actually observed in bacteriorhodopsin and membrane proteins of the photosynthetic reaction centre and light-harvesting complex 72,73, and more recently cytochrome oxidases 2,a. There is discussion over which algorithm, if any, is satisfactory 74-77. Despite these uncertainties, the majority of the transport protein sequences in Table 1 are predicted to contain 12 hydrophobic regions of sufficient length (19+ amino acids} to span the membrane as a helices 7s,79. The possible exceptions are the transporters for methylenomycin and quaternary ammonium compounds, which may have 14 so,s1, and the rhamnose-H + transporter, predicted to have 10 s2. Many of the sugar transport proteins have an extensive central {i.e. between transmembrane domains 6 and 7)hydrophilic region of about 65 amino acids which is predicted to contain a substantial proportion of helix. Most of the other transport proteins also have a central hydrophilic region, although it is usually shorter than that of the sugar transport proteins. Taken with the evidence of some sequence duplication in the two halves of many of the proteins 4s, it seems reasonable to propose the existence of internal dimerization, originally resulting, perhaps, from gene duplication. This also accords with the same proposal by Lancaster s3, based on kinetic and inhibitor studies of the LacY porter. Despite the differences in individual sequences, an underlying similarity between transport proteins from otherwise dissimilar groups seems to exist 79, even though some catalyze mechanistically rather different types of transport reaction - uniport, antiport, or symport {influx or efflux). One example of a 12-helix arrangement is shown in Fig. 8. In this context, it is interesting that many other groups of membrane transport proteins are predicted to have 12 membrane-spanning a helices. One is the series of phosphate antiporters in prokaryotes 79. Another is the "ABC" group typified by the Mdr, multiple drug resistance factor l~176 some individual members of this group catalyze influx of substrate and others catalyze efflux. Yet another group is the family of mitochondrial transporters, which are thought to function as a dimer, each subunit having six a helices s4,ss (see discussion below); here again transporters of similar sequence catalyze different types of transport reaction - uniport, antiport, or symport, influx or efflux. A fourth group contains the homologous transporters for noradrenaline and gamma-aminobutyric acid 86. Within the family of mitochondrial transport proteins each is predicted to contain six hydrophobic regions and transmembrane helices sa'ss. However, in several examples there is evidence for dimerization to form a functional unit with 12 predicted helices. Interestingly, this family has strong evidence of internal triplication in each polypeptide sa, implying that there are six equivalent domains in the functional dimer. It is important to consider the possible arrangements of 12 helices in the membrane, and several groups have obtained evidence for the nearest-neighbor relationships of predicted transmembrane helices in individual membrane transport proteins, using fluorescence energy transfer, second site revertants of mutants, cysteine
19
Window size
4ol-
19 -40 40
. . . . 17
15 40-
' 13
11 -40
~9
40
-4ut
.
.
.
.
.
.
.
.
I
!
I
I
100
200
300
400
I
Residue number Figure 7a Hydropathy plots of the L-fucose-H § symport protein, FucP, of E. coli.
The algorithm of Kyte and Doolittle (1982, see text) was used with window sizes of 7-19 amino acid residues to generate a series of hydropathy plots of FucP; the putative positions of 12 helices are indicated in the plot with a window of nine residues. [Copied, with permission, from Gunn et al. (1995) Molec. Microbiol. 15,
771-783.1
mutagenesis and other techniques. In addition, m a n y reviewers have hypothesized as to h o w the arrangement might be. However, until we determine the actual 3D structures of some of these proteins such models should perhaps be regarded with caution.
20
Galactose-H + transport protein (GalP) 4..I-
Q~
>" "1-
1
2
3
4
5
,,6
7
8
9
10
11
1"2
J
-40 0
50
100
150
200 250 300 Sequence number
350
400
450
Arabinose-H + transport protein (AraE) .__o
40~
^L
2
3
4
5
),6
7
8
9
10
11
12
/
o~ 0iV
"1:3"--
>"
91"
-40
I
O
I
50
|
100
150
!
I
1
I
200 250 300 Sequence number
I
350
t
400
450
Xylose-H + transport protein (XylE) o F9 ~x
40
o~
~
~"
"1"
0 -40
~. 1
2
I
3
I
0
I
50
-r
4
5
I
100
150
6
7
I
8
I
l
200 250 300 Sequence number
9
.,
10
1
11
t
350
__l
400
L1
Rhamnose-H + transport protein (RhaT) 2 3 4 5 6 7 8 9
0
50
12
450
I !
I
10i
-40
100 150 200 250 Sequence number
300
Fucose-H + transport protein (FucP) L
"1-
-40
1
2
I 50
0
3
4
I 100
5
i 150
,,.6
?
?
,/ ?
. ~ l I ..... l 200 250 300 350 Sequence number
#'
,/
J 400
Lactose-H + transport protein (LacY) F9 ~•
40
>" "1-
-40
o~
0 0
"!
50
1
100
1
!,
I
150 200 250 Sequence number
I
300
I
350
400
Figure7b
Hydropathic profiles of membrane transport proteins. The amino acid sequence of each of the indicated transport proteins was analyzed for hydropathy using the algorithm of Kyte and Doolittle with a window of 11 residues. The majority can be interpreted in terms of 12 putative transmembrane helices, but the L-rhamnose-H § symport protein appears to have 10. [Copied, with permission, from Henderson (1991) Bioscience Reports 11,477-538, ref. 31].
21
Cytopla~mirside
Figure 8a Model o f the orientntion o f the L-fucose-H+ symport protein, FucP,in the membrane, based upon the hydropathy plot ( F I X . 7 4 . Note the predominance o f positive residues inside the membrane, which follows the rule o f von Heiine (see text). [Copied, with permission, from Gunn et 01. (199.5) Molec. Microhiol. 15, 771-783.1
Feriplncmic siclc
@
11
-I
:
6:
r ,;t
0 Scgatively rhsrgcd
-2
1
6"
I;
Pmitively chargtd
+2
+1 0
+3
0
+I
II
+1 <'ylopl:nniic cidc
Figure 8b Modified model o f the orientation o f the L-fucose-H+ symport protein, FucP, in the membrane, based upon :Flnctamase fusion doto. (Copied, with permission, from Gunn et al. (1 995) Molec. Microbiol. 15, 771-783.1
J KINETICS OF MEMBRANE TRANSPORT The simplest kinetic view of translocation of solutes across membranes includes four steps. Binding of substrate to the protein on one side of the membrane; occlusion and translocation; release on the other side; and reopening of the unloaded carrier to the original side of the membrane. In the case of the human GLUT1 glucose transport protein, the rabbit Na§ symporter, the Na+/K § and Ca ~§ ATPases, the bacterial lactose and melibiose transporters, the bacterial glucose phosphotransferase and some others, more sophisticated models with intermediate steps have been advanced 18,87,88. By chemical modification, mutagenesis, electrophysiology, fluorescence measurements, topological proteolysis, etc. such kinetic features have been somewhat superficially associated with particular regions or even amino acid residues in a protein. In many cases the rate of transport shows a hyperbolic relationship to the concentration(s) of substrate(s}, and the classic equations of steadystate kinetics 89 can be used to describe the process. They can also be used to analyze the order of addition of multiple substrates and the order of leaving of the products {note that these are usually identical to the substrates, but are simply on the other side of the membrane}.
IS T H E R E A U N I F Y I N G CATALYSIS?
MECHANISM
OF TRANSLOCATION
Theoretical models of the mechanism of solute translocation fall into several classes, which may overlap: alternating access; alternating conformer~ gated pore; hgand conduction and mobile barrier {reviewed by Henderson al}. Despite these apparent variations many authors have considered the possibility of a unifying mechanism, two examples being due to Tanford 9o and Scarborough 91. Peter Mitchell has long advocated that solvation substitution and a mobile barrier mechanism could constitute the features sufficient for a unifying mechanism of translocation catalysis, and made the following points on this topic 6o. 1} The dominant process governing the translocation of solute molecules or ions from one side of the catalytic osmotic barrier domain of a porter or osmoenzyme to the other is solvation substitution: a substrate-speciflc process of secondary chemistry. 2) Translocation of hydrophilic solute{s) in a porter or osmoenzyme may be best explained by a mobile barrier type of mechanism. This relies upon a specific solute-binding domain in the interior of the polypeptide system of the protein becoming alternately and exclusively accessible to the aqueous media containing the solute substrate(s} on either side only under conditions that facilitate a rocking or rolling motion of part of the polypeptide system across the specific substrate-bindmg domain. 3} Maloney 79 asked, how could a umform {12 a-helix} ensemble catalyze a variety of kinetic mechanisms: uniport, symport or antiport? This is answered very simply if the mobility of the polypeptide that allows the switching of accessibility of the solute-binding osmotic-barrier domain in the porter or osmoenzyme molecule depends on solvation-substitution processes in or near that domain, which is affected by the presence or absence of the translocatable
24
solute(s). In the case of osmoenzymes, this may also be effected by the binding of other ligands. Thus, barrier mobility would be activated: in a uniport whether solute was bound or not; in a symporter, only when both or neither of the solutes were bound; in an antiporter, only when either one or the other solute, but not both or neither, were bound; in an osmoenzyme, under appropriate conditions of binding of the translocatable solute(s) and also of other chemical group-donating and group-accepting ligands. ...The alternating access model of transport proteins...attributed to Tanford, resembles, in some respects, my mobile barrier model. But the model discussed by Tanford 9~ like the gated pore type of model considered by Brooker 92, which seems to be consistent with the concept of a proton relay, discussed by Roepe et al. 93 misses the fundamental importance of solvation substitution in the proposed motion of the osmotic barrier over the solutebinding domain. One of the most attractive properties of the mobile barrier type of mechanism of solute translocation arises from its presumed dependence on the subtle secondary chemical processes of solvation substitution, both with respect to the binding of its solute substrate(s) and with respect to the kinetic activation of the mobility of the barrier across the catalytic substrate-binding domain. Thus, it would be expected to show the close interrelationships between changes of organic substrate specificity, changes of cation specificity, and changes in translocational kinetics induced by certain amino acid substitutions, already described for several transporters. The tendency for the active species of solute-translocating proteins to contain 12 a-helical components.., may possibly be relevant to the mobile barrier type of mechanism. Invoking the concept of close packing in hexagonal arrays of the cylindrical ~-helices, and assuming the requirement for a cleft opening alternately above and below the catalytically active molecule, imagined with the plane of the membrane lying fiat on the page, one is tempted to suggest a binary hexagonal arrangement with two hexagonal lobes sharing a pair of ahelices (the cylindrical a-helices appearing as circles from above). The catalytic solute-binding domain would lie in the region between the two shared ahelices, and extend to the neighbouring helices on either side. One of the shared helices might act as a hinge, allowing slight relative movement of the two lobes, while the other shared helix would cant outwards from its partner alternately at top and bottom, allowing accessibility of a centrally positioned solute-binding domain alternately and exclusively from above and below. Or perhaps both of the shared helices might cant outward from its partner alternately at top and bottom to give a relatively symmetrical cleft opening alternately and exclusively from above and below. From this eclectic viewpoint transport systems can be envisaged as modular in construction, with a basic porter unit capable of carrying out solvation substitution and the molecular events of translocation (by mobilization of an internal barrier s9"6~ or by conformational changes effecting alternate access to each side of the membrane 90,94). There can be additional proteins/domains to bring the initial solvation substitution under independent control, e.g. with a binding protein type of system. And/or there may be different proteins/domains to bring the translocation events under control of an ATP-hydrolyzing protein as in the ABC transport systems
25
or the P-type ATPases 10,44,70,95or under control of a decarboxylating reaction as in the bacterial Na + transporters 26,6e.
CAN THE THREE-DIMENSIONAL STRUCTURES T R A N S P O R T P R O T E I N S BE D E T E R M I N E D ?
OF
Our understanding of the molecular mechanisms of membrane transport proteins is still severely handicapped by our ignorance of their three-dimensional structures 9a. The problems of determining such information for membrane proteins have been admirably reviewed by Pattus 96. Nevertheless, there are recent advances that raise hopes of determining the complete three-dimensional structure of a membrane transport protein by physical methods. The first is the elucidation of the structure of bacterial and mitochondrial cytochrome oxidases at atomic level resolution by X-ray crystallography e'3. The bacterial enzyme is a four-subunit protein, but the most intriguing component is subunit I, which contains the heme groups and comprises 12 membrane-spanning helices. These are arranged in three groups of four helices, each group of which can be hypothesized to form a "pore" suitable for transmembrane conduction of H § This is the structure of a primary active transport system for protons. It is a useful exercise to model the unsophisticated 12-helix representation of other transport proteins {Fig. 8), around such a structure as an aid to hypothesizing how larger substrates might have their passage through the membrane catalyzed. Continued refinement has occurred of the application of electron diffraction techniques and data analysis, so that high-resolution structures, e.g. of lightharvesting protein 7a and of visual rhodopsin 97 can be achieved. These techniques should be capable of further refinement to higher levels of resolution 6o. There has also been improvement in crystals of the bacterial porin proteins, enabling X-ray crystallography to be improved. Finally, there is the application of biophysical techniques to determine the structure of small membrane-spanning peptides 9s'99. If individual, or a small number of combined, transmembrane domains of the lactose-H § {or any other} transporter can be expressed, purified and reconstituted in the native form, as already partly achieved, it may be possible to determine the structure of parts of the protein separately by NMR and build up an overlapping picture of the whole.
CONCLUSIONS We are entering an era when the amino acid sequences of a huge number of transport proteins will be available from the DNA sequence databases. For reasons of scientific curiosity and/or biomedical utility a select number of these will be chosen for detailed investigation. It is very important that we learn how to determine the threedimensional structures of these proteins. We will then be in a position to define the structure-activity relationship of the protein to the point where it can be manipulated for the good of humanity - to design a new generation of antimicrobials, perhaps, to devise molecular-size electronic components of nanocomputers, to cure cystic fibrosis by gene therapy, and unconceived applications. This book collates the
26
information that is currently available to us and arranges it in a manner to expedite future developments.
References 1 2 a 4 s 4 7 s 9 lo 11 12 la in is 14 17 is z9 2o 21 22 2a 2a 2s 26 27 2s 29 ao al a2 aa an as a6
Henderson, R. et al. (1990) J. Mol. Biol. 213, 899-929. Iwata, S. et al. (1995)Nature 376, 660-669. Tsukihawa, T. et al. (1996)Science 272, 1136-1144. Muir, M. et al. (1985)J. Bacteriol. 163, 1237-1242. White, D. (1995)The Physiology and Biochemistry of Prokaryotes. Oxford University Press, Oxford. Koch, A. (1971)Adv. Microb. Physiol. 6, 147-217. Button, D.K. (1985)Microbiol. Rev. 49, 270-297 McNab, R.M. (1996) In Escherichia coli and Salmonella (Neidhardt, N.C., ed.). ASM Press, Washington DC, pp. 123-145. Goffeau, A. et al. (1997)Yeast 13, 43-54. Higgins, C.F. (1992)Annu. Rev. Cell. Biol. 8, 67-113. Mathews, C.K. and van Holde, K.E. (1996) Biochemistry. Benjamin/Cummings, Redwood City, CA. Voet, D. and Voet, J.G. (1996) Biochemistry. John Wiley, Chichester. Mitchell, P. (1963)Biochem. Soc. Symp. 22, 142-169. Mitchell, P. (1973) Bioenergetics 4, 63-91. Mitchell, P. (1966) Chemiosmotic Coupling in Oxidative and Photosynthetic Phosphorylation. Glynn Research, Bodmin. Heller, K.B. et al. (1980)J. Bacteriol. 144, 274-278. Maloney, P.C. and Wilson, T.H. (1996) In Escherichia coli and Salmonella (Neidhardt, N.C., ed.). ASM Press, Washington DC, pp. 1130-1148. Stein, W.D. (1986)Transport and Diffusion across Cell Membranes. Academic Press, Orlando, FL. Baldwin, S.A. (1993)Biochim. Biophys. Acta 1154, 17-50. Harold, F.M. and Maloney, P.C. (1996) In Escherichia coli and Salmonella (Neidhardt, N.C., ed.). ASM Press, Washington DC, pp. 283-306. Wikstrom, M. (1989) Nature 338, 776-778. Kagawa, Y. (1984). Bioenergetics (Ernster, L., ed.). Elsevier, Amsterdam, pp. 149186. Abrahams, J.P. et al. (1994) Nature 370, 621-628. Fillingame, R.H. (1996)Curr. Opin. Struct. Biol. 6, 491-498. Tokuda, H. (1986) Methods Enzymol. 125, 520-530. Dimroth, P. (1986)Methods Enzymol. 125, 530-540. Dimroth, P. (1990) Philos. Trans. R. Soc. Lond. B 326, 465-477. Postma, P. et al. (1996) In Escherichia coli and Salmonella (Neidhardt, N.C., ed.). ASM Press, Washington DC, pp. 1149-1174. Forward, J.A. et al. (1997)J. Bacteriol. 179 (in press). Lewis, K. (1994)Trends Biochem. Sci. 19, 119-123. Henderson, P.J.F. (1991)Biosci. Reports 11,477-538. Mitchell, P. (1961)Nature 191, 144-148. Meister, M. et al. (1987) Cell 49, 643-650. Noji, H. et al. (1997) Nature 386, 299-302. West, I.C. (1970) Biochem. Biophys. Res. Commun. 41,655-661. West, I.C. and Mitchell, P. (1972) Bioenergetics 3, 445-462.
m
37 West, I.C. and Mitchell, P. (1973) Biochem. J. 132, 587-592. 38 Kaback, H.R. (1986)Methods Enzymol. 125, 214-230. 39 Kaback, H.R. et al. (1990) Trends Biochem. Sci. 15, 309-314. 4 o Kaback, H.R. (1997) Proc. Natl. Acad. Sci. USA 94, 5539-5543. 41 Henderson, P.J.F. (1990)J. Bioenerg. Biomembr. 22, 525-569. 42 Berger, E.A. (1973) Proc. Natl Acad. Sci. USA 70, 1514-1518. Berger, E.A. and Heppel, L.A. (1974) J. Biol. Chem. 249, 7747-7750. Boos, W. and Lucht, J.M. (1996) In Escherichia coli and Salmonella (Neidhart, F.C., ed.). ASM Press, Washington DC, pp. 1175-1209. 4s Mitchell, P. (1977) In Microbial Energetics (Haddock, B.A. and Hamilton, W.A., eds) pp. 383-423. Cambridge U.P., Cambridge, UK. Henderson, P.J.F. and Maiden, M.C.J. (1990) Phil. Trans. R. Soc. Lond. B 326, 391410. 47 Maiden, M.C.J. et al. (1987) Nature 325, 641-643. 48 Griffith, J.K. et al. (1992) Curr. Topics Cell Biol. 4, 684-695. 49 Runswick, M.J. et al. (1987) EMBO J. 6, 1367-1373. so Nikaido, H. (1996) In Escherichia coli and Salmonella (Neidhart, F.C., ed.). ASM Press, Washington DC, pp. 29-47. sl Engel, A. et al. (1985)Nature 317, 643-645. s2 Cowan, S.W. et al. (1994) In Bacterial Cell Wall, New Comprehensive Biochemistry, vol. 27. Elsevier, Amsterdam, pp. 353-362. s3 Schirmer, T. et al. (1995) Science 267, 512-514. 54 Kadner, R.J. (1996) In Escherichia coli and Salmonella (Neidhart, F.C., ed.). ASM Press, Washington DC, pp. 58-87. 55 West, I.C. and Mitchell, P. (1974). Biochem. J. 144, 87-90. s6 Harold, F.M. and Maloney, P.C. (1996) In Escherichia coli and Salmonella (Neidhart, F.C., ed.). ASM Press, Washington DC, pp. 283-306. s7 Gennis, R.B. and Stewart, V. (1996) In Escherichia coli and Salmonella (Neidhart, F.C., ed.). ASM Press, Washington DC, pp. 217-261. ss Mitchell, P. (1970) Syrup. Soc. Gen. Microbiol. 20, 121-166. s9 Mitchell, P. (1990)Res. Microbiol. 141,286-289. 60 Mitchell, P. (1990) Res. Microbiol. 141,384-385. 61 Skulachev, V.P. (1985) Eur. J. Biochem. 151, 199-208. 62 Dimroth, P. (1990) Philos. Trans. R. Soc. Lond. B 326, 465-477. Kruckeberg, A.L. (1996)Arch. Microbiol.166, 283-292. Hirayama, B. et al. (1996)Am. J. Physiol - Gastrointestinal and Liver Physiology 33, G919-G926. 6s Schuldiner, S. (1997) Physiol. Rev. 75, 369-392. 64 Subbaiah, C.C. et al. (1994) Plant Cell, 6, 1747-1762. 6z High, S. et al. (1997) In Membrane Protein Assembly (yon Heijne, G., ed.). SpringerVerlag, Heidelberg, pp. 119-134. 6s Mcintosh, I. and Cutting, G.R. (1992) FASEB J. 6, 2775-2782. 69 Weiss, H. et al. 11991) Eur. J. Biochem. 197, 563-576. 7o Higgins, C.F. (1995)Cell 82, 693-696. 71 yon Heijne, G. (1994) Annu. Rev. Biophys. Biomol. Struct. 23, 167-192. 72 Deisenhofer, J. et al. (1984)J. Mol. Biol. 180, 385-398. z3 Kuhlbrandt, W. and Wang, D.N. (1991) Nature 350, 130-134. 74 Lodish, H.F. (1988)Trends Biochem. Sci. 13, 332-334. 75 yon Heijne, G. (1988) Biochim. Biophys. Acta. 947, 307-333.
m
76 77 7s 79 so sl 8e sa s4 s5 s6 s7 as s9 90 91 92
93 94 95 96 97 9s 99
White, S.H. and Jacobs, R.E. (1990)J. Membr. Biol. 115, 145-158. Crimi, M. and Esposti, M.D. (1991) Trends Biochem. Sci. 16, 119. Baldwin, S.A. (1990)Biotech. Appl. Biochem. 12, 512-516. Maloney, P.C. (1990) Res. Microbiol. 141,374-383. Neal, R.J. and Chater, K.F. (1987)Gene 58, 229-241. Paulsen, I.T. et al. (1996) Microbiol. Rev. 60, 575-608. Tate, C.G. and Henderson, P.J.F. (1992) J. Biol. Chem. 268, 26850-26857. Lancaster, J.R. (1982)FEBS Lett. 150, 9-18. Palmieri, F. et al. (1990) Biochim. Biophys. Acta 1018, 147-150. Runswick, M.J. et al. (1994) DNA Sequence 4, 281-291. Pacholczyk, T. et al. (1991) Nature 350, 350-354. Cloherty, E.K. (1995)Biochemistry 34, 15395-15406. Pourcher, T. et al. (1990) Philos. Trans. R. Soc. Lond. B 326, 411-423. Henderson, P.J.F. (1992) In Enzyme Assays: A practical approach. (Eisenthal, R. and Danson, M.J., ed.). Oxford U.P., Oxford, pp. 277-316. Tanford, C. (1983) Annu. Rev. Biochem. 52, 379-409. Scarborough, G.A. (1985)Microbiol. Rev. 49, 214-231. Brooker, R.J. (1990)Res. Microbiol. 141,309-315. Roepe, P.D. et al. (1990) Res. Microbiol. 141,290-308. Karlin, A. (1997) Proc. Natl. Acad. Sci. USA 94, 5508-5509. Jorgensen, P.L. and Anderson, J.P. (1988) J. Membr. Biol. 103, 95-120. Pattus, F. (1990)Curr. Opin. Cell Biol. 2, 681-685. Schertler, G.F. (1997) Molec. Biol. of the Cell 7, 970. Barsukov, I.G. et al. (1990) Eur. J. Biochem. 192, 321-327. Lemmon, M.A. et al. (1994) Nature Struct. Biol. 1, 157-163.
29
For many years there was little information about the amino acid sequences of membrane transport proteins, owing to the difficulty of obtaining sufficient purified quantities for conventional protein sequencing. This changed during the past decade with the cloning and sequencing of ever increasing numbers of genes and, more recently, entire genomes, from which the amino acid sequences of many integral membrane proteins have been deduced. These transport proteins have been grouped by a number of functional criteria, including mechanism {e.g. sodium-solute symporters~), topology (e.g. 12-transmembrane helix transporters2), intracellular location (e.g. mitochondrial transporters a), and possession of amino acid sequence domains (e.g. ATP binding cassette (ABC) transporters 4). In some instances, the amino acid sequences of proteins grouped by functional criteria are related, for example the mitochondrial phosphate carrier and adenine nucleotide translocator families a. In other instances, the amino acid sequences of proteins grouped by functional criteria have no apparent relationship to one another, for example most families of sodium-solute symporters t. It is potentially instructive to group transport proteins by the relationships between their amino acid sequences because overall similarity between amino acid sequences can indicate similar three-dimensional structures, implying similar mechanisms of action. Algorithms such as FASTA s and BLASTP 6 search amino acid sequence databases, for example SwissProt, PIR and GenPept, and list in order of local relatedness proteins whose amino acid sequences are similar to that of the query sequence. The relationships between the amino acid sequences of the proteins identified in this fashion then can be quantitated by pairwise comparison. The statistical significance of the alignment score for each pairwise comparison is evaluated by comparing it to the mean score obtained from comparison of each sequence to random permutations of the other, and is expressed as the number of standard deviations (SD) by which the maximum score for the real comparison exceeds the mean of the scores for the comparisons to randomized sequences 7. If an alignment score is greater than 9SD above the mean of randomly permuted sequences the proteins are very likely homologous, scores of 6-9 SD are taken to indicate likely relatedness, and 3-6 SD possible relatedness. The probability of obtaining an alignment score of 9 SD by chance is approximately 10 -18. Therefore, it is likely that homologous members of a family share a common evolutionary origin, implying similar three-dimensional structures and functional properties. None the less, there are often unexpected differences between the functional attributes of homologous transporters within a family. For example, passive glucose transporters of eukaryotes and proton-dependent sugar transporters of prokaryotes are members of the same family of homologous sugar transporters, implying that the passive transporters evolved relatively recently without extensive sequence modification 2"s. There are also several families in which there is neither a structural nor chemical relationship between many of the substrates recognized by homologous transporters &9. Thus, a perceived difference in function need not be a consequence of a profound difference in structure. In these instances, the structureactivity relationships between functionally dissimilar members of a family can be investigated using algorithms which cluster sequences by similarity to produce a dendrogram representing the clustering relationships. The PILEUP algorithm ~o,
30
used herein, first aligns the two most similar sequences to produce a cluster of two sequences, then aligns this cluster with another cluster of the next two most similar sequences and so on until all sequences have been included in the dendrogram. Amino acid sequence comparisons also reveal unexpected relationships amongst seemingly dissimilar families of proteins. For example, amino acid sequence elements that are highly conserved in the family that contains facilitative sugar transport proteins of mammals also occur in the family that contains protondependent tetracycline antiporters of bacteria s'9. Although there is not significant similarity between all members of all of these families, there is significant similarity (>3 SD) between many members of different families. When the amino acid sequences of multiple families are significantly similar, the families are presumed to be derived from a common ancestor and are considered subgroups of a superfamily of related transporters 8,9,~. One of the most functionally diverse superfamilies, the uniporter-symporterantiporter (USA)or major facilitator (MFS) superfamily, contains uniporters, symporters, and antiporters of structurally dissimilar sugars, sugar phosphate esters, antibiotics, antiseptics, disinfectants carboxylated compounds, catecholamines and indolamines s'9"~1. The significance of about 40% of the pairwise comparisons between families of the superfamily exceed 3 SD and the ALIGN scores for certain pairwise comparisons between families are as high 8.7 SD, reflecting their presumed common ancestry s. This predicts that they also have similar three-dimensional structures, suggests fundamentally similar molecular mechanisms, and implies that relatively subtle structural differences account for the differences in the functional properties of the proteins, such as the recognition of structurally dissimilar substrates, or the vectorial mechanism. As pointed out previously, a perceived profound difference in function need not be a consequence of a profound difference in structure. Multiple sequence alignments generated in this manner often reveal highly conserved "signature motifs ''8'9. These may be unique to either the family or a subgroup of the family, or common to a group of families which share a functional attribute. Signature motifs of the first category can have great utility in assessing the potential relatedness of transporters which are not homologous by the criterion of the alignment score. Signature motifs of the second and third categories, i.e. those that are highly conserved in proteins with a common functional attribute, for example substrate specificity, mode of energization or vectorial mechanism, are predicted to be necessary for that attribute. These predictions can then be tested with site-directed mutagenesis and other molecular-genetic approaches. In only a few instances has it been possible to crystallize integral membrane proteins for molecular structural analysis. Therefore, most investigations of the structureactivity relationships of transport proteins have been founded on amino acid sequence comparisons of this sort. Signature motifs that are conserved in all transporters of a superfamily may dictate structural or functional attributes that are common to all members of the superfamily. For example, alignment of the consensus sequences of the several families comprising the USA/MFS superfamily identifies several amino acid sequence motifs which are highly conserved in all or some of these diverse transporters. A "G-X-X-X-D-R/K-XG-R-R/K" motif, which is strongly predicted to form a r-turn in most cases, is highly conserved between the second and third predicted helices of transporters in all families of the USA/MFS superfamily 2"s'9. The "G-X-X-X-D-R/K-X-G-R-R/K"
31
motif has been proposed to act as a cytoplasmic gate that limits the flow of substrate into and out of the cytoplasm. Site-directed and insertional mutagenesis of the TETA(B) tetracycline/H + antiporter and LACY lactose/H + symporter have demonstrated that several of these conserved residues of the motif are necessary for function. Similarly, a "R-X-X-X-G-X-X-X-G/A" motif is conserved in the fourth predicted helix and the preceding predicted extracellular hydrophilic loop of transporters in all families of the USA/MFS superfamily 2"s'9. The "R-X-X-X-G-X-XX-G/A" motif has been proposed to function in energy coupling. In the ATP binding cassette (ABC)superfamily, the "G-H-S-G-A-G-K-S-T" and "I-L-L-D-E" motifs, the so-called Walker A and B motifs, define the superfamily. These motifs, the first of which is known to be involved in phosphoryl transfer, are shared by many nucleotide binding proteins 4. Although overall amino acid sequence relatedness and the conservation of highly conserved signature motifs provides strong presumptive evidence that two proteins have related functions, this is not always the case. For example, signature motifs corresponding the ATP binding domains define the ATP binding cassette (ABC) transporter superfamily 4. However, these domains are also found in at least two families of the ABC superfamily that are neither associated with the membrane nor implicated in transport. These are the UVRA family of DNA excision repair proteins and the EF3 family of translational elongation factors. Thus, the conservation of an extended functional domain in two proteins, in this case the ATP binding cassette, does not by itself indicate that the two proteins have related functions, although in most instances this is true. The second category of signature motif is conserved in, and thereby can define, subgroups of a supeffamily. These motifs may dictate the shared structural or flmctional properties of the subset, such as substrate specificity or vectorial mechanism, predictions that also can be tested by site-specific mutagenesis. For example, a "G-X-X-X-G-P-X-X-G" motif is highly conserved in the fifth predicted membrane-spanning region of transporters of all families of the USA/MFS superfamily which direct substrate export, but not in any of the transporter families which direct substrate uptake s,9,1e. Molecular modeling of the so-called "antiporter motif" predicts that a "kink" at approximately the position of the GP dipeptide, resulting in a change in helix axis direction of approximately 20 degrees, would be more stable than a regular helical conformation. The repeating pattern of glycine residues in the antiporter motif also forms a pocket, devoid of side-chains, on the surfaces of the fifth predicted helices. Site-directed mutagenesis experiments indicate that even very slight alterations in the structure of this motif, for example replacement of the hydrogen of glycine with either the small methyl side-chain of alanine or the methylol side-chain of serine, has profound and specific effects on resistance to tetracycline ,2. Intramolecular amino acid sequence comparisons are also useful in investigating structure-activity relationships. For example, there are significant similarities between the amino acid sequences of the N- and C-terminal halves of transporters in many families and superfamilies, including the acriflavin-cation resistance family and the USA/MFS superfamily s. This implies that these proteins arose by the duplication of a half-sized ancestor, suggesting that the N- and C-terminal halves of the transporters might have evolved to contain independent functional domains. This prediction was confirmed for the USA/MFS superfamfly by demonstrating that paired in-flame deletion constructs of the E. coli LACY lactose/H+ symporter
32
complement each other functionally la. Using similar methods, two functional complementation groups also have been defined in the TETA(B) tetracycline/H + antiporter, which belongs to a different family from LACY 14,1s. Intramolecular amino acid sequence comparisons have also shown that the Nterminal halves of distantly related transporters of the USA/MFS superfamily are generally much more similar than the C-terminal halves, provided the proteins being compared have structurally dissimilar substrates s. Thus, the greater conservation of the N-terminal halves of transporters that recognize structurally dissimilar substrates has been interpreted to reflect the conservation of structures which confer the substrate binding-induced conformational change that is proposed to be common to these transporters' mechanism of action. The C-terminal halves of transporters that recognize structurally dissimilar substrates are much less conserved than their N-terminal halves, a situation frequently reversed when transporters that recognize structurally similar substrates are considered. These observations support the interpretation that substrate specificity is determined by sequence motifs contained in the C-terminal halves of these transporters. Consistent with this possibility, inhibitor, photo-affinity labeling and domain exchange studies suggest that the substrate binding sites for the USA/ MFS superfamily's sugar transporters are located in their C-terminal halves 16. Likewise, mutations resulting in altered substrate specificities in various antibiotic antiporters have been found primarily in the C-terminal halves of the proteins 9.
References 1 2 3 4 s 6 7 8 9 lo 11 12 13 14 is 16
Reizer, J. et al. (1994) Biochim. Biophys. Acta 1197, 133-166. Henderson, P.J.F. (1993) Curr. Opin. Cell Biol. 5, 708-721. Kuan, J. and Saier, M. (1993) CRC Crit. Rev. Biochem. Mol. Biol. 28, 209-233. Hi~,ins, C.F. (1992) Annu. Rev. Cell Biol. 8, 67-113. Lipman, D. and Pearson, W (1985) Science 227, 1435-1441. Altschul, S. et al. (1990) J. Mol. Biol. 215, 403-410. Dayhoff, M. et al. (1983) Methods Enzymol. 91, 524-545. Griffith, J. et al. (1992) Curr. Opin. Cell Biol. 4, 684-695. Paulsen, I. et al. (1996) Microbiol. Rev. 60, 575-608. Devereaux, J. et al. (1984) Nucleic Acids Res. 12, 387-395. Marger, M.D. and Saier, M. (1993) Trends Biochem. Sci. 18, 13-20. Varela, M. et al. (1995) Mol. Memb. Biol. 12, 313-319. Bibi, E. and Kaback, H.R. (1990) Proc. Natl Acad. Sci. USA 87, 4325-4329. Rubin, R.A. and Levy, S.B. (1991) J. Bacteriol. 173, 4503-4509. Yamaguchi, A. et al. (1993} FEBS Lett. 324, 131-135. Carruthers, A. (1990) Physiol. Rev. 70, 1135-1176.
33
3 Organization of the Data INTRODUCTION Two kinds of information are provided in The Transporter FactsBook. The first is a compilation of the physical and biological properties of nearly 800 transport proteins. Although every attempt was made to make this compilation comprehensive, some sequences were not included, either by design (see below) or by unintentional omission. Moreover, new transporter sequences are being added to the databases on a near daily basis. Thus, this information is best viewed as a representative, rather than an exhaustive, overview of the characteristics of membrane transport proteins. The second kind of information is a comparison of the physical and biological properties of more than 50 families of transport proteins defined by the relatedness of their amino acid sequences. These data provide rationale bases for grouping proteins and identifying relationships between their structures and functions. A key feature of these data is the consensus amino acid sequence that has been provided for each transporter family or group of families. These are displayed in the multiple amino acid sequence alignments and also in the plots of the predicted topologies. The former indicates what kinds of substitutions are permitted at a conserved residue while the latter presents the conserved residues in the context of predicted structure. The consensus sequences provide means to classify newly identified transporters, particularly when they are not closely related to known proteins. They also define sequence elements that are conserved in multiple families with a common functional characteristic, and therefore may be necessary for the expression of that characteristic. This data is useful in predicting the locations of individual structural or functional domains, and designing experiments to test these predictions with site-directed mutagenesis or other techniques. Because the predictive value of the correlation between a signature sequence and a specific functional characteristic increases with the addition of each new sequence to the family, this information, rather than becoming outdated, will in fact become even more valuable as it is refined by the addition of new transporter sequences.
DEFINITION
OF FAMILY
The FASTA and BLASTP algorithms 1,2 were used with default parameters to search the SwissProt, Protein Identification Resource (PIR) and Genbank/EMBL Genpept protein sequence databases for transport proteins that share local similarity with any of several query sequences representative of known classes of transport proteins. The overall (versus local) similarities of the proteins identified in each search were then quantified by pairwise comparisons using the ALIGN a algorithm. ALIGN calculates a score for the best alignment between any pair of sequences using an empirically derived scoring matrix and two types of penalties for breaking a sequence. The first, the gap penalty, is applied every time a gap is inserted, regardless of the length of the gap. The second, the bias, is applied according to the length of the gap. The ALIGN program utilized the normalized Dayhoff 250 PAM mutational matrix, a gap penalty of 6.0 and a bias of 6.0.
i
34
The statistical significance of each alignment score was evaluated by comparing it to the mean score obtained from comparison of each sequence to 100 random permutations of the other sequence, and is expressed as the number of standard deviations (SD) by which the maximum score for the real comparison exceeds the mean of the scores for the randomized sequences. Pairs of proteins with ALIGN scores in excess of 9SD were considered homologous, i.e. having a common evolutionary origin a, and together constituted a "family" s. Hypothetical proteins, the open reading flames of unidentified genes, and partial sequences are not included. Proteins identified in each FASTA or BLASTP search that had ALIGN scores less than 9 SD with the query sequence were used as query sequences for succeeding FASTA and BLASTP searches. Additional families of homologous sequences were again identified by pairwise comparisons using ALIGN. This process was repeated until all transport proteins identified by the successive FASTA and BLASTP searches were assigned to families. "Orphan transporters", proteins which are not homologous to any other transporter in the database, were not included.
GROUPING OF FAMILIES Families with seemingly similar activities, e.g. "H§ symporters" or "P-type ATPases" were grouped together in a section. However, the reader should bear in mind that transporters with similar functions do not necessarily have related amino acid sequences and vice versa.
ORGANIZATION
OF T H E D A T A
Summary The summary provides an overview of the physical and biological properties of the family, its distribution in nature, its relationship to other families, and known disease associations.
Nomenclature, biological sources and substrates Each sequence in a family was assigned an eight- or nine-character alphanumeric code. This code was derived from three or four characters taken from the protein name, the first three characters from the genus name and the first two characters from the species name. For example, the code for the XYLE transporter of Escherichia co/i is Xyleescco. In a few cases, where the species is unknown, the last two characters are "sp". In many sequences found in the SwissProt database - the main exceptions being sequences from very common higher eukaryotes {e.g. human, rat, cattle} - the sequence code is equivalent to the SwissProt code without the underscore separating the parts describing the protein and its source. Tabulated information for sequences only currently present in the EMBL/GENBANK databases refers to the GenPept translations of the gene sequences. The "Description" of each protein, taken directly from the sequence database, is listed in the second column. All known synonyms, including gene names, are
m
included within square brackets below the description in the second column. "Organism", listed in the third column, refers to the Latin name of the species; the common name of the species, or (for most unicellular organisms} a classification such as "gram-negative bacterium" or "yeast" is included within square brackets in the third column. Substances listed in the "Substrate" column are known to be transported across the membrane. Where a protein is only known to corder resistance to a toxic compound, the compound's name is given in this column in square brackets. Where the mechanism of transporter action is known to be symport or antiport, the coupled ions are also listed here.
Phylogenetic trees Phylogenetic trees were constructed for all families containing more than two members using the PILEUP algorithm 7 with default parameters. Proteins more than 90% identical to at least one other member of the family are indicated in the text by italics and are not included in the phylogenetic trees.
Topology plots Each topology plot is derived from a single, typical member of a transporter family. In most cases, the predicted membrane-spanning regions, indicated in the figures by the shaded rectangles, and the interhelical loops, indicated in the figures by thin solid lines, are identified from hydropathy plots and analysis of ~ helix-forming propensity; in a few cases, these predictions are supported by experimental evidence derived from reporter fusions, susceptibility to proteolytic cleavage, reactivity with peptide-specific antibodies or scanning glycosylation mutagenesis. The number of the first and last residue of each predicted membrane helix is boxed. In families with more than two members, and unless there is a very high percentage identity between all family members {more than 50% of the sequence is identical in at least 75% of the proteins}, the locations and identities of residues conserved in more than 75% of family members are indicated on the topology plots. All residues that are conserved in a family are not necessarily conserved in the representative transporter shown in the topology plot. In these instances, the residue is indicated with an asterisk. In the ABC transporter superfamily, the active transporters consist of four domains: two ATP binding domains and two transmembrane domains. These four domains may be expressed as separate chains or fused to form multidomain proteins: almost every conceivable type of domain fusion has been found 6. The sequence motifs characteristic of this superfamily are found in the ATP binding domains. In families in which the ATP binding domains are expressed separately from the transmembrane domains, the tables and alignments describe the cytoplasmic ATP binding domains associated with the transmembrane domains. Since the former chains do not cross the membrane, no topology plots are included for these families. There is great variability in the relatedness of the separately expressed transmembrane domains. Some of the chains containing these transmembrane domains constitute discrete families of homologous proteins, for example the ABC-associated binding proteindependent maltose, peptide and iron transporter families. Other chains are no more similar to one another than would be expected for non-related transmembrane proteins which contain many highly hydrophobic regions. These are not included.
36
Physical and genetic characteristics Molecular weights and sequence length (in amino acids) are listed for all proteins. When available, the proteins' principal expression sites (tissue or organ specificity), Michaelis constants (Km)and chromosomal loci are listed. Where a bacterial sequence is known to be plasmid-encoded, this is also indicated. The chromosomal loci for humans, Escherichia coli, Haemophilus influenzae, Saccharomyces cerevisiae, and Bacillus subtih's are taken from the Online Mendelian Inheritance in Man, Encyclopedia of E. coli Genes and Metabolism, Encyclopedia of Haemophilus influenzae Genes and Metabolism, Saccharomyces Genomic Information Resource and the Bacillus subtih's Genomic Databases, respectively.
Multiple amino acid sequence alignments Multiple amino acid sequence alignments were calculated using the PILEUP algorithm 7 with default parameters. The consensus sequences list residues present in at least 75 % of the aligned sequences. Conservative substitutions were not taken into account. To ensure that the consensus sequences are not biased by the contribution of very closely related sequences, proteins more than 90% identical to at least one other member of the family (indicated in the text in italics)were not included in the alignments. Residues within the consensus sequence that are also conserved in at least one other family are indicated in bold type.
Database accession numbers Information for each transporter was abstracted from the files in the SwissProt, PIR and EMBL/GENBANK databases identified by the accession numbers. No more than two accession numbers for each database are included. SwissProt was used as the primary data source as it is an extremely well annotated database.
References Supplemental references cited in the summary and recent reviews, when available, are listed at the end of each chapter. Reviews are shown in bold type.
References 1 2 a a s 6 7
Lipman, D. and Pearson, W. (1985) Science 227,1435-1441. Altschul, S. et al. (1990)J. Mol. Biol. 215, 403-410. Dayhoff, M. et al. (1983) Methods Enzymol. 91,524-545. Reeck, G. et al. (1987)Cell 40, 667. Griffith, J. et al. (1992) Curr. Opin. Cell Biol. 4, 684-695. Higgins, C.F. (1992) Annu. Rev. Cell Biol. 8, 67-113. Devereaux, J. et al. (1984)Nucleic Acids Res. 12, 387-395.
m
This Page Intentionally Left Blank
THE MEMBRANE TRANSPORT PROTEINS
This Page Intentionally Left Blank
P-Type ATPases
m
Summary
ii i~i~i:~ !i i/i ~i: ::::-i.
~
ii:-. i ~.~! : }: .c.~..s.
-
!i-:::;"7'7::, :i~ .}i~ ~~:./; : .
.
::i'-! " ,!<-i.. .::.. :L.::2 :}<7 ::,: .... !ii]<<::::--. p <:
....
ii~ii:i:::i%:::!:-~i: ~i.<:>.; i]]i]:-:-"] !. ::i;: :.%<.::
..... .
Calcium-transporting ATPase family
.......... ::<.
Transporters of the calcium-transporting ATPase family, the example of which is the probable calcium-transporting ATPase 4 from Saccharomyces cerevisiae (Atc4sacce), mediate active transport of calcium ions, driven by ATPase activity (EC 3.6.1.38). ATPase 4 may be involved in ribosome assembly 1. Members of this family are only found in yeasts. Statistical analysis of multiple amino acid sequence comparisons places the calcium-transporting ATPase family in the P-type ATPase superfamily (also known as El-E2 ATPases2'3). Proteins in this superfamily use the energy of ATP hydrolysis to pump ions across cell membranes. P-Type ATPases are all predicted to contain at least six transmembrane helices by the hydropathy of their amino acid sequences. They have two large cytoplasmic loops separating three pairs of transmembrane helices; the larger of these loops contains the ATP binding domain. The sequences are usually extended by one or more pairs of helices. The calcium-transporting ATPase from Schizosaccharomyces pombe 4 is predicted to contain a total of 12 transmembrane helices. Many residues and some short sequence motifs are completely conserved within the calcium-transporting ATPase family, including motifs unique to the family and signature motifs of the P-type ATPase superfamily.
Nomenclature, biological sources and substrates tliii~ ili%i ..i-i! . ~~iigil..
CODE
DESCRIPTION [SYNONYMS]
Atc4sacce
Probable calcium-transporting ATPase 4 [DRS2, YAL026C, FUN38] Probable calcium-transporting ATPase 5 [YER166W,
AtcSsacce
O R GAmSM
SUBSTRATE(S)
Saccharomyces cerevisiae [yeast]
Ca 2+
Saccharomyces cerevisiae [yeast]
Ca ~
Schizosaccharomyces pombe [yeast]
Ca 2
/COMMONNAMES]
§
SYGP-ORF7] Atcxschpo
Probable calcium-transporting ATPase
Phylogenetic tree htcSsacce
!!!!i~!!!!!:.:!i:~!)~:,!
Atcxschpo
!:~;~!!}:!~!!i?!i-i!!-!! ~i:]]:G]]]~:]:]:-~]]]%~,|
Li;',i'~i:il;i?i',<,%~,]
~Ji:ii!!9:~i:!:.':i:::1
42
Atc4sacce
+
P r o p o s e d o r i e n t a t i o n of A T C X 4 in t h e m e m b r a n e The model is based on predictions of membrane-spanning regions and ~-helical content. The N-terminus of the protein is illustrated on the inside and is folded ten times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75% of the aligned transporters (see below) are shown. OUTSIDE
G
[---
NH
E
N
2
COOH
E K
D R
C
L
G
D A P I DGV
PQkY
G N ..
//
LRG LRNT W G L
D
E
D
S
i
P
0
I
G T F
V
E Y G I C KFEM N TLTGTKDSFIYE LA CH V D A SPDE ALV D G F
G
D D GE GIG G A QIMVDN G IAL EF S RKRMS I
L CKGAD I 1
INSIDE
P h y s i c a l and g e n e t i c c h a r a c t e r i s t i c s AMINO ACIDS
MOL. W T
Atc4sacce
1355
153 844
Atc5sacce Atcxschpo
1571 1402
177 797 159 355
CHROMOSOMAL L O CU S Chromosome 1
43
iiii iiiMultiple amino acid sequence alignments 1
50
Atc5sacce MSGTFHGDGH APMSPFEDTF QFEDNSSNED THIAPTHFDD GATSNKYSRP Atcxschpo .................................................. Atc4sacce .............. MNDDRE TPPKRKPGED DTLFDIDFLD DTTSHSGSRS Consensus ..................................................
~y~!iji:!~!=ii:=-:i:!!.:: ::: 9 i!:iL ii::: i :ili::
!!~!;j~i~i.:!:.!:,ii::ii::~
51 i00 Atc5sacce QVSFNDETPK NKREDAEEFT FNDDTEYDNH SFQPTPKLNN GSGTFDDVEL Atcxschpo ............................................ MESVEE Atc4sacce KVTNSHANGY YIPPSHVLPE ETIDLDADDD NIENDVHENL FMSNNHDDQT Consensus .................................................. i01 150 D N D S G E P H T N .Y D G . . M K R F R M G T K R N K K G N P I M G R S K T L K W A R K N I P N P At c x s c h p o K S K Q R R W L P N .F K A L R L K V Y R L A D R L N I . . . P L A D A A R V . . . . . . . . . . E Atc4sacce SWNANRFDSD AYQPQSLRAV KPPGLFARFG NGLKNA...F TFKRKKGPES Consensus ..................................................
:i!:/!:i:i:i~ii:,:i~!i:,ilA t c 5 s a c c e
151 200 A t c 5 s a c c e FE ..... D F T K D D I D P G A I N R A Q E L . R T V Y Y N . M P L P K D M I D E E G N P I M Q ii!i!,;iiii, i:ii:ii,;i A t c x s c h p o LE ..... E Y . . D G S D P Q S L R G L Q K L P R T L Y F G . L P L P D S E L D D T G E A K R W Atc4sacce FEMNHYNAVT NNELDDNYLD SRNKFNIKIL FNRYILRKNV GDAEGNGEPR !iii=ii!i:S.!~!=:!ili~:iC o n s e n s u s .E . . . . . . . . . . . . D . . . . . . . . . . . . . . . . . . . . L . . . . . D . . G .....
il}~!~-:i:i!i?,:.)~:!!~ii
201 .............. YPRNKI .............. FPRNKI Atc4sacce V I H I N D S L A N S S F G Y S D N H I Consensus ................. N.I
RTTKYTPLTF RTAKYTPIDF STTKYNFATF . T . K Y .... F
250 LPKNILFQFH NFANVYFLVL IPKNIFLQFQ NVANLFFLFL LPKFLFQEFS KYANLFFLCT .PK ..... F . . . A N . . F L . .
251 A t c 5 s a c c e I I L G A F Q I F G .V T N P G L S A V P L V V I V I I T A Atcxschpo VILQSISIFG EQVNPGLAAV PLIVVVGITA At c 4 s a c c e S A I Q Q V P H V S . P T N R Y T T I G T L L V V L I V S A Consensus ............ N . . . . . . . L . V ..... A
300 IKDAIEDSRR TVLDLEVNNT VKDAIEDFRR TMLDIHLNNT MKECIEDIKR ANSDKELNNS .K..IED..R ...D...NN.
Atc5sacce Atcxschpo
ii!i:ii:!i!i?:~ :~i~::~::i
301 350 Atc5sacce KTHILEGVEN ENVSTDNISL WRRFKKANSR LLFKFIQYCK EHLTEEGKKK Atcxschpo PTLRLSHYQN PNIRTEYISY FRRFKKRISA LFRVF ....... LAKQEEKK !ii~i:.i;ii!i,~=i~.-!:?Ai t c 4 s a c c e T A E I F S E A H D D F V E K R W I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Consensus ................. I ................................
=!:.
351 400 Atc5sacce RMQRKRHELR VQKTVGTSGP RSSLDSI..D SY..RVSADY GRPSLDYDNL Atcxschpo RAKRLNDAVP LED.MAGSES RPSYDSIFRE SFEAKRSFED SKGKVPLSAL Atc4sacce .................................................. Consensus .................................................. 401 450 A t c 5 s a c c e E Q G A G ..... E A N I V D R S L P P R T D C K F A K N Y W K G V K V G D I V R I H N N D E I P Atcxschpo DGTATILQSR PMDIIDYEAE ATGECHFKKT YWKDVRVGDF VKVMDNDEIP Atc4sacce ................................. DIRVGDI IRVKSEEPIP Consensus .................................... V G D . . . . . . . . . IP
m
451
500
Atc5sacce A D I I L L S T S D T D G A C Y V E T K N L D G E T N L K V R Q S L K C T N T I R T S K D I A R T K
Atcxschpo ADIVIINSSD PEGICYIETK NLDGETNLKM RHALTCGKNV VDEASCERCR Atc4sacce ADTIILSSSE PEGLCYIETA NLDGETNLKI KQSRVETAKF IDVKTLKNMN C o n s e n s u s A D ...... S . . . G . C Y . E T . N L D G E T N L K . . . . . . . . . . . . . . . . . . . . . 501 550 Atc5sacce FWIESEGPHS NLYTYQGNMK W..RNLADG .... EIRNEPI TINNVLLRGC Atcxschpo FWIESEPPHA NLYEYNGACK SFVHSEAGGS DTSQTVSEPI SLDSMLLRGC Atc4sacce GKVVSEQPNS SLYTYEGTM ............ TLNDRQIPL SPDQMILRGA C o n s e n s u s .... S E . P . . . L Y . Y . G . . . . . . . . . . . . . . . . . . . . . P . . . . . . . LRG. 551 Atc5sacce TLRNTKWAMG VVMFTGGDTK IMLNSGITPT Atcxschpo VLRNTKWVIG VVVFTGDDTK IMLNSGAPPL Atc4sacce TLRNTAWIFG LVIFTGHETK LLRNATATPI C o n s e n s u s . L R N T . W . . G . V . F T G . . T K . . . N .... P.
KKSRISRELN KRSRITRNLN KRTAVEKIIN K ........ N
600 FSVVINFVLL WNVYLNFIIL RQIIRLFTVL ...... F..L
601 650 Atc5sacce FILCFVSCIA NGVYYDKKGR S.RFSYEFGT IAGSAATNGF VSFWVAVILY A t c x s c h p o F S M C F V C A V V E G I A W R G H S R S. S Y Y F E F G S I G G S P A K D G V V T F F T G V I L F A t c 4 s a c c e I V L I L I S S I G N V I M S T A D A K H L S Y L Y L E G T N K A G L F F K D F LTFW... ILF Consensus ............................ G . . . . . . . . . . . . . F .... IL. 651 700 Atc5sacce QSLVPISLYI SVEIIKTAQA AFIYGDVLLY NAKLDYPCTP KSWNISDDLG Atcxschpo QNLVPISLYI SIEIVKTIQA IFIYFDKDMY YKKLKYACTP KSWNISDDLG Atc4sacce SNLVPISLFV TVELIKYYQA FMIGSDLDLY YEKTDTPTVV RTSSLVEELG C o n s e n s u s . . L V P I S L .... E . . K . . Q A . . I . . D . . . Y ..K . . . . . . . . . . . . . . . LG 701 Atc5sacce QVEYIFSDKT GTLTQNVMEF Atcxschpo QVEYIFSDKT GTLTQNVMEF Atc4sacce QIEYIFSDKT GTLTRNIMEF ConsensusQ.EYIFSDKTGTLT.N.MEF
750 KKCTINGVSY GRAYTEALAG LRKRQGIDVE KKCTINGVAY GEAFTEAMAG MAKREGKDTE KSCSIAGHCY IDKIPE .............. K . C . I . G . . Y ..... E . . . . . . . . . . . . . .
751 800 Atc5sacce TEGRREKAEI AKDRDTMIDE LRALSGNSQF YPEEVTFVSK EFVRDLKGAS Atcxschpo ELTLQKQSFI ERDRMQMISQ MRNMHDNKYL VDDNLTFISS QFVHDLAGKA Atc4sacce ............ DKTATVED ........... GIEVGYRKF DDLKKKLNDP Consensus ............ D ..................................... 801 Atc5sacce GEVQQRCCEH Atcxschpo GEEQSLACYE Atc4sacce SDEDSPIIND C o n s e n s u s ..........
Atc5sacce Atcxschpo Atc4sacce Consensus
FMLALALCHS FFLALALCHS FLTLLATCHT F...LA.CH.
850 VLVEANPDNP KKLDLKAQSP DEAALVATAR VVADRVGD...RIVYKAQSP DEAALVGTAR VIPEFQSDGS . .IKYQAASP DEGALVQGGA V ...... D . . . . . . . . A . S P D E . A L V ....
851 DVGFSFVGKT KK..GLIIEM QGIQKEFEIL NILEFNSSRK D V G F V F L D Q R RD.. I M V T R A L G E T Q R F K L M D T I E F S S A R K DLGYKFIIRK GNSVTVLLEE TGEEKEYQLL NICEFNSTRK D.G..F ............... G ........... EF.S.RK
900 RMSCIVKIPG RMSVIVK... RMSAIFRF.. R M S . I .....
45
!~:i!ir:~; 2!2!! ii~i ~
: :::
901 950 Atc5sacce LNPGDEPRAL LICKGADSII YSRLSRQSGS NSEAILEK. T ALHLEQYATE A t c x s c h p o . . . G P D N R Y V L I C K G A D S I I FERL .... EP N E Q V E L R K T T S E H L R I F A L E A t c 4 s a c c e .... P D G S I K L F C K G A D T V I L E R L D D E A N Q Y V E A T M R . . . . . H L E D Y A S E Consensus .......... L. CKGAD.. I . .RL .................. HL...A.E
:
::::::::::::::::::::::::::::::::::::
............
Atc5sacce Atcxschpo Atc4sacce Consensus
:!ii!~ii':iii'~:i:.:i!i':~:iii? ::::::::::::::::::::::::::
!iii.!ii!ii !!~!!:.!i!,:!!:~
951 GLRTLCIAQR GLRTLCIAKR GLRTLCLAMR GLRTLC.A.R
i001 Atc5sacce LLGGTAIEDR Atcxschpo LLGGTAIEDR Atc4sacce LIGATAIEDK ConsensusL.G.TAIED.
ELSWSEYEKW ELTEEEYYEW DISEGEYEEW ..... E Y . . W
I000 NEKYDIAAAS L A N R E D E L E V VADSIERELI KEKYDIAASA IENREEQIEE VADLIESHLT NSIYNEAATT LDNRAEKLDE AANLIEKNLI ... Y . . A A . . . . NR . . . . . . . A . . I E ....
1050 LQDGVPDCIE LLAEAGIKLW VLTGDKVETA INIGFSCNLL LQEGVPDSIA LLAQAGIKLW VLTGDKMETA INIGFSCNLL LQDGVPETIH TLQEAGIKIW VLTGDRQETA INIGMSCRLL LQ.GVP..I..L..AGIK.WVLTGD..ETAINIG.SC.LL
ii00 iiii! ii Atc5sacce N1051 N E M E L L V I K TTGDDVKEFG SEPSEIVDAL LSKYLKEYFN LTGSEEEIFE
!N!ii !!!?!!!i!!
Atcxschpo DAGMDMIKF ..... DVDQEV STPE..LEVI LADYLYRYFG LSGSVEELEA Atc4sacce SEDMNLLIIN EETRDDTE ............. RNLLEKINA LNEHQLSTHD C o n s e n s u s ...M . . . . . . . . . . D . . . . . . . . . . . . . . . . . . . L . . . . . . . . . . . . . . .
',ii!~i'~!ii'ii!iii'~,~i ii!iiii~i;i:!iii:ii'~ii-
::i!?,?2!ii~!ii~i !:,!i~J)?!i~Ji
!NN'~!i~iii
Atc5sacce Atcxschpo Atc4sacce Consensus
Ii01 AKKDHEFPKG NYAIVIDGDA AKKDHDTPSG SHALVIDGSV MK ........ S L A L V I D G K S .K . . . . . . . . . . A L V I D G . .
1150 LKLALYGEDI RRKFLLLCKN CRAVLCCRVS L K R V L . D G P M R T K F L L L C K R CKAVLCCRVS L G F A L . E P E L E D Y L L T V A K L CKAVICCRVS L...L ......... L...K.C.AV.CCRVS
Atc5sacce Atcxschpo Atc4sacce Consensus
1151 PSQKAAVVKL PAQKADVVQL PLQKALVVKM P.QKA.VV..
AIGDGSNDVA AIGDGANDVA AIASGANDVS AI..G.NDV.
iiiiii~iliiiiiiiii~::'~
NNN
N!iiii?~i
N~,i?ili~iNi
46
VKDSLDVMTL VRESLEVMTL VKRKSSSLLL V ........ L
1201 Atc5sacce MCSDYAIGQF RYLARLVLVH At c x s c h p o M S A D Y A I G Q F R F L S K L V L V H Atc4sacce RSADIALGQF KFLKKLLLVH Consensus ...D.A.GQF . .L..L.LVH
MIQSADVGIG MIQKADIGVG MIQAAHVGVG MIQ.A..G.G
1200 IAGEEGRQAV IVGEEGRAAA ISGMEGMQAA I.G.EG..A.
1250 GRWSYKRLAE MIPEFFYKNM IFALALFWYG GRWDYNRVAE MVNNFFYKSV VWTFTLFWYQ GSWSYQRISV AILYSFYKNT ALYMTQFWYV G . W . Y . R . . . . . . . . FYK . . . . . . . . FWY.
Atc5sacce At cxschpo Atc4sacce Consensus
1251 1300 IYNDFDGSYL YEYTYMMFYN LAFTSLPVIF LGILDQDVND TISLVVPQLY IYNNFDANYL FDYTYVMLFN L I F S S L P V I V M G V Y D Q D V N A DLSLRIPQLY FANAFSGQSI MESWTMSFYN L F F T V W P P F V IGVFDQFVSS RLLERYPQLY ..N.F .............. N L . F . . . P .... G..DQ.V ........ PQLY
Atc5sacce Atcxschpo Atc4sacce Consensus
1301 RVGILRKEWN QRKFLWYMLD KRGILQLNSA RKIFIGYMLD KLGQKGQFFS VYIFWGWIIN ..G . . . . . . . . . . F ......
1350 GLYQSIICFF FPYLVYHKNM IVTSNGLGLD GFYQSVICFF FSFLVINNVT TAAQNGRDTM GFFHSAIVFI GTILIYRYGF ALNMHGELAD G . . . S . I . F . . . . L . . . . . . . . . . . G ....
Atc5sacce Atcxschpo Atc4sacce Consensus
1351 HR. Y F V G V Y V T T I A V I S C N T Y V L L H Q Y . R W D W F S G L F I A L AV. Q D L G V Y V A A P T I M V V D T Y V I L N Q S .NW D V F S I G L W A L HWSWGVTVYT TSVIIVLGKA ALVTNQWTKF TLIAIPGSLL ....... VY . . . . . . . . . . . . . . . . Q . . . . . . . . . . . . . L
1400 SCLVVF.AWT SCLTFW. FWT FWLIFFPIYA . .L .......
1401
1450
Atc5sacce G I W S S A I A S R E F F K A A A R I Y G A P S F W A V F F V A V L F C L L P R F T Y D S F Q K F F
Atcxschpo GVYSQSLYTY EFYKSASRIF RTPNFWAVLC GTIVSCLFPK FLFMTTQKLF Atc4sacce SIFPHANISR EYYGVVKHTY GSGVFWLTLI VLPIFALVRD FLWKYYKRMY C o n s e n s u s .......... E . . . . . . . . . . . . . FW . . . . . . . . . . L... F ......... 1451 Atc5sacce YPTDVEIVRE Atcxschpo WPYDVDIIRE Atc4sacce EPETYHVIQE C o n s e n s u s .P ....... E
1500 MWQHGHFDHY PPGYDPTDPN RPKVTKAGQH GEKIIEGIAL SYRTKRLHEL DEEEE ........ IENAEQS PDWASSTLQV M Q K Y N I S D S R P H V Q Q F . . . . . . . . . . . . QN A I R K V R Q V Q R ............................ Q ...........
1501 Atc5sacce SDNLGGSNYS Atcxschpo P..FNASSSS Atc4sacce MKKQRGFAFS C o n s e n s u s ......... S
1550 RDSVVTEEIP MTFMHGEDGS PSGYQKQETW MTSPKETQDL L A T P K K E P L R L ...... DTN S L T L T S S M P R S F T P S Y T P S F Q A E E G G Q E .K I V R M Y D T T Q K R G K Y G E L Q D A S A N P F N D N N G ................................. P ......
Atc5sacce Atcxschpo Atc4sacce Consensus
1551 1600 LQSPQFQQAQ TFGRGPSTNV RSSLDRTREQ MIATNQLDNR YSVERARTSL L E G S P V F S D E I L N R G E Y M P H R G S I S S S E Q P LRP . . . . . . . . . . . . . . . . . L G S N D F E S A E PF . . . . . . . . . . . . . . IENP F A D G N Q N S N R F S S S R D D I S F L .................................................
Atc5sacce Atcxschpo Atc4sacce Consensus
1601 1618 DLPGVTNAAS LIGTQQNN .................. DI . . . . . . . . . . . . . . . . ..................
Residues listed in the consensus sequence are present in at least 75 % of the aligned transporter sequences. Residues indicated in boldface type are also conserved in at least one other family of the P-type ATPase family.
Database accession numbers Atc4sacce Atc5sacce Atcxschpo
SWISSPR OT
FIR
EMBL/GENBANK
P39524 P32660 Q09891
$30768 $30822
L01795; G171114 U18922; G603407 Z67757; E208899
References 1 Ripmaster, T.L. et al. (1993) Mol. Cell. Biol. 13, 7901-7912.
z Green, N.M. and MacLennan, D.H. (1989) Biochem. Soc. Trans. 17, 819-822; Green, N.M. (1989) Biochem. Soc. Trans. 17, 970-972. 3 Fagan, M.J. and Saier, M.H. Jr. (1994) J. Mol. Evol. 38, 57-99. 4 Barrell, B.G. et al. unpublished; EMBL/GenBank/DDBJ databases.
47
Plasma membrane cation-transporting ATPase family
Summary i~!i~i!i~~,
::::::::::::::::::::::::: i:i:::~.~.::i
Transporters of the plasma membrane cation-transporting ATPase family, examples of which are the human calcium-transporting ATPase 1 (Atcdhomsa) and the plasma membrane proton pump from Arabidopsis thaliana 2 (Pma2arath), mediate active transport of cations - sodium, potassium, or calcium ions - or protons, driven by ATPase activity (EC 3.6.1.-). Members of this family may mediate influx, efflux or exchange of cations: for example, the human gastric potassium-transporting ATPase 3 mediates the exchange of protons and potassium ions across the plasma membrane and is responsible for acid production in the stomach. In plants and fungi, plasma membrane proton pumps 4,s drive the active transport of nutrients by proton symport. Plasma membrane cation-transporting ATPases are widely distributed throughout both eukaryotic and prokaryotic taxa. Statistical analysis of multiple amino acid sequence comparisons places the plasma membrane cation-transporting ATPase family in the P-type ATPase superfamily (also known as El-E2 ATPasesn'7). Proteins in this superfamily use the energy of ATP hydrolysis to pump ions across cell membranes. P-Type ATPases are all predicted to contain at least six transmembrane helices by the hydropathy of their amino acid sequences. They have two large cytoplasmic loops separating three pairs of transmembrane helices; the larger of these loops contains the ATP binding domain. The sequences are usually extended by one or more pairs of helices. Members of the plasma membrane cation-transporting ATPase family are predicted to contain a total of eight or ten transmembrane helices: all plasma membrane proton pumps contain eight helices, but some calcium transporters have ten. Some members of this family contain other sequence motifs - for example, the human plasma membrane calcium-transporting ATPase I contains two calmodulin binding domains towards the C-terminus. Some proteins may be glycosylated. Only a few amino acid residues and short sequence motifs are conserved within the plasma membrane cation-transporting ATPase family, including motifs unique to the family and signature motifs of the P-type ATPase superfamily.
Nomenclature, biological sources and substrates OR GANISM DESCRIPTION / C O M M O N NAMES] [SYNONYMS] Synechocystis sp. Atalsynsp Cation-transporting ATPase [PMA1] [cyanobacterium] Atcartsf Calcium-transporting ATPase; Artemia sanfranciscana [brine shrimp] sarcoplasmic/endoplasmic reticulum type [Calcium pump] Atcplafa Calcium-transporting ATPase Plasmodium falciparum [Calcium pump] [protozoan] Atctrybr Probablecalcium-transporting Trypanosoma brucei ATPase, [Calcium pump, TBA1] [trypanosorne] Atclsacce Calcium-transporting ATPase 1, Saccharomyces [PMR1, SCC1, BSD1 cerevisiae [yeast] YGL167C, G1666]
CODE
~:~.,~:-~:..:::7: ~:~:: ;:
i.-~::::.i!ii:.::i:i!!~ :!:ill!
48
S UBSTRAT~(S)
Metal ions C a 2+
Ca2§ Ca2+ Ca2+
CODE ::::::::::::::::::::::.....
Atc3sacce
i::i;~i~i!!ii! Atc3schpo !i:i!i!!~!i~ii':,;!i.li ::.:..... ;:. :, -..:.: .::::::. .... .:::,...:.:
Atcaorycu
:::::::::::::::::::;.:::.::.::. ...... :2 .:.: :
Atcbdrome
!i!i:i~,.!:i.:?-.:iJi.! iii~ii~i?:.!i:.ii:: Atcbgalga ii;ii i~!ii;i.!:::::i:;ii::i!
Atcborycu !!ii!:,S:i!:!!~i,~ !~!ii!i~i:~i;:i,:i;!il Atcdfelca #:i.~' i:!~:!~'i!i;: i:!i:!!i !!ii:;:i-!i
Atcdhomsa
i;:iii;,:ii:i:i:i Atcdorycu
i~ii,ii:'~!i,!i~i:!~ Atcdratno i!i;:i!~::)!ii:;:i! Atcdsussc
DESCRIPTION [SYNONYMS/ Calcium-transporting ATPase 3 [PMC1, YGL006W] Calcium-transporting ATPase 3 [CTA3] Calcium-transporting ATPase, sarcoplasmic reticulum type, calcium pump, neonatal isoform [ATP2A1 ] Calcium-transporting ATPase, sarcoplasmic/endoplasmic reticulum type [Calcium pump, CA-P60A] Calcium-transporting ATPase sarcoplasmic/endoplasmic reticulum type [Calcium pump, SERCA1] Calcium-transporting ATPase, sarcoplasmic reticulum type, adult isoform [Calcium pump, ATP2A 1] Calcium-transporting ATPase, sarcoplasmic reticulum type [Calcium pump, SERCA2] Calcium-transporting ATPase, sarcoplasmic reticulum type [ATP2A2, ATP2B l Calcium-transporting ATPase, sarcoplasmic reticulum type [ATP2A2] Calcium-transporting ATPase, sarcoplasmic reticulum type [ATP2A2] Calcium-transporting ATPase
:::::::::::::::::::::::::::::::
Atcehomsa Calcium-transporting ATPase, endoplasmic reticulum type [ATP2A2] Atcesussc Calcium-transporting ATPase, endoplasmic reticulum type [ATP2A2] Atceorycu Calcium-transporting ATPase, endoplasmic reticulum type [ATP2A2I i!i!ii:/?~!:: Atceratno Calcium-transporting ATPase, !}!!!ii:!!!,:i endoplasmic reticulum type [ATP2A2I !!:~ii.!i:i-i:!i Atcfratno Calcium-transporting ATPase 3 [Calcium pump, ATP2A3] T:? Atclmycge Probable cation-transporting P-type ATPase [PACL, MG071] Atclsynsp Cation-transporting ATPase [PACL] Atcphomsa Calcium-transporting ATPase plasma membrane, isoform 1B [Calcium pump, ATP2B 1] !~:il;i!:.!~i!.:!~ i2!iii!-:.2ii:iT:?:~:i:
OR GANISM [COMMON NAMES/ Saccharomyces cerevisiae [yeast] 9 Schizosaccharomyces pombe [yeast] Oryctolagus cuniculus [rabbit]
SUBSTRATE(S) Ca 2* Ca 2+ Ca 2+
Drosophila melanogaster [fruit flYl
Ca 2+
Gallus gallus [chicken]
Ca 2§
Oryctolagus cuniculus [rabbit]
Ca 2§
Felix cattus [cat]
Ca 2§
Homo sapiens [human]
Ca 2§
Oryctolagus cuniculus [rabbit]
Ca 2§
Rattus norvegicus
Ca 2§
[rat] Sus scrofa [pig] Homo sapiens [human]
Ca 2§ Ca 2§
-::,:,::..::::: :.: .:
Sus scrofa [pigl
Ca 2§
Oryctolagus cuniculus [rabbit]
Ca 2§
......
Rattus norvegicus Irat]
Ca 2§
........... -::!" :.:.y:-:.L.. :: ::
Rattus norvegicus
Ca 2*
......... -..2--.-:.
::G:.::::?-:i]i]:.
........... 9,:... .....
.......
....... ........
................
[ratl
Mycoplasma genitalium [gram-negative bacterium] Synechococcus sp. [cyanobacterium] Homo sapiens [humanl
Metal ions (Ca2§?) Metal ions (Ca2§ Ca 2§
II
CODE Atcporycu
..... ......
~,!!!i!i Atcpratno
:i!i;i= . iiiii:i.i,i. :~:-~.-.-:.: .......
Atcpsussc
:::::::::::::::::::::-~:r
:~i:i!:!ii.i!ii!,~ !:.i Gi:i:(/!.?:.: ......
Atcqhomsa
Atcqratno ...~. :.~.:..~.: ... .,.:::.:~,. ,:~,:.., ......
Atcrhomsa .....
Athahomsa Athaorycu
,::....:-:..,:.:. .::::::::~.,::.:.....
Atharatno Athasussc Atmaescco Atmasalty
...
:.::.:-.::?:L::. :., :':
Atmbsalty
"?:.:?~:*:
2
Atnlbuhna ...........:.: ..
Atnl equca Atnlgalga
;i::.::.:-:-;'?:.:-:?::
Atnloviar ;:~-' .iiis
50
DESCRIPTION [SYNONYMS] Calcium-transporting ATPase plasma membrane, isoform 1B [Calcium pump, ATP2B 1, PMCA1BI Calcium-transporting ATPase plasma membrane, isoform 1B [Calcium pump, ATP2B 1] Calcium-transporting ATPase plasma membrane, isoform 1B [Calcium pump, PMCA1B] Calcium-transporting ATPase plasma membrane, brain isoform 2 [Calcium pump, ATP2B2] Calcium-transporting ATPase plasma membrane, brain isoform 2 [Calcium pump, ATP2B2] Calcium-transporting ATPase plasma membrane, isoform 4 [Calcium pump, ATP2B4] Potassium-transporting ATPase chain [ATP4A] Potassium-transporting ATPase chain [Gastric H+/K§ ATPase a subunit, ATP4A] Potassium-transporting ATPase chain [Gastric H§ § ATPase a subunit, ATP4A] Potassium-transporting ATPase chain [Gastric H+/K§ ATPase a subunit, ATP4A] Mg2§ transport ATPase, P-type 1 [MGTA, MGT, CORB]
OR GANISM [COMMON NAMES] Oryctolagus cuniculus [rabbit]
SUBSTRATE(S)
Rattus norvegicus [rat]
Ca 2+
Sus scrofa [pig]
Ca 2§
Homo sapiens [humanl
Ca 2*
Rattus norvegicus [rat]
Ca 2§
Homo sapiens [human]
Ca 2§
Homo sapiens [human] Oryctolagus cuniculus [rabbit]
K§
R attus norvegicus [rat]
K§
SUS scgofa [pig]
K§
Escherichia cold [gram-negative bacterium] Salmonella Mg2+ transport ATPase, typhimurium P-type 1 [MGTA] [gram-negative bacterium] Mg2+ transport ATPase, Salmonella typhimurium P-type 2 [MGTB] [gram-negative bacteriuml Sodium/potassium-transporting Bufo marinus [toad] ATPase ~1 chain [Sodium pump, Na+/K § ATPase] Sodium/potassium-transporting Equus caballus ATPase ~1 chain [Sodium [horsel pump, Na§ § ATPase] Sodium/potassium-transporting Gallus gallus [chicken] ATPase al chain [Sodium pump, Na+/K + ATPase] Sodium/potassium-transporting Ovis aries [sheep] ATPase al chain [Sodium pump, Na+/K § ATPase, ATP1A1]
Ca 2+
§
K§
Md +
Mg2+
Mg 2§
Na § K+ Na +, K+ Na +, K+ Na +, K+
CODE :~.:~.::.,::..::::,.~:.-,.:::
.....
.....::: ;,::::.::::,.::.
,.,: :..:...: ..... : ::~.::::~,:-.::~:.::.:. :-:..
Atnlratno
DESCRIPTION [SYNONYMS]
OR GANISM [COMMON NAMES] Sodium/potassium-transporting Rattus norvegicus
ATPase ~1 chain [Sodium pump, Na+/K § ATPase, ATP1AI] Atnlsussc
Na +, K+
[rat]
Sodium/potassium-transporting Sus scrofa
ATPase ~1 chain [Sodium pump, Na+/K § ATPase, ATP1AI] Atnlhomsa Sodium/potassium-transporting ATPase al chain [Sodium pump, Na§ § ATPase, ATP1AI] Atnlsacce Sodium transport ATPase 1 [ENA1, PMR2, HOR6, YDR040C,
SUBSTRATE(S)
Na +, K+
[pig]
Homo sapiens
Na +, K+
[human]
Saccharomyces cerevisiae [yeast]
Na §
Gallus gallus
Na § K§
YD6888.02C1 Atn2galga i:i!':ii:!i.!ii?:i!~.i::i~i ii;!i;!i..i-.?all!:. !i:: ~::;~,.~::::::,:::::~:~.;,:~
Atn2homsa Atn2ratno
.......
::i:;;:.;:i.::!i::iii :i?:i?
~!!i.li:i;i!ii:i:!i~i
Atn2sacce Atn3galga
::~::::.:~::;::::::::::::::::::::::::: ~.
:~i',!!~iS',~,ii,;:ii!.~i:[~Atn3homsa ,=,: .:~:::~.~,~,v.:~,~., !iiiii ::i::::i:!::!ii2i-i
Atn3sussc Atn3ratno
iii:i! :i!.i~.:::::::~!!~~;i: .......:.y::.::.: .: -~.:,::. ~..:~.:.: ~.
Atnaartsa
:/i-9::i/!iii:: ..-:~.:,..: :: .~,..~:,::.:,: ...:~::~:::,:::~,:::..::~::;.::
Atnaartsf ~!;!iii.il;il!i~."i(}:i::i,
Atnacatco i:i~i:;=======================i~: ?~.:~i -i:-:i:i;:ii-;,I .......
Atnadrome
:~..~:~:.: .~:..~,,: ,,. ......... .......~.:::..::.~:::: ~..
~!:::::i:i~(:.:~i~:~:.i:.i~::~ :~: ~:..:,: ~..:::::,.~:.:
Atnahydat
::::::::::::::::::::::::::
Atnatorca .
.
.
.
.
.
.
.
Sodium/potassium-transporting ATPase al chain [Sodium pump] Sodium/potassium-transporting ATPase al chain [Sodium pump, ATP 1A2] Sodium/potassium-transporting ATPase al chain [Sodium pump, ATP1A2] Sodium transport ATPase 2 [ENA2, PMR2B, YDR039C] Sodium/potassium-transporting ATPase ~3 chain [Sodium pump] Sodium/potassium-transporting ATPase ~3 chain [Sodium pump, ATP 1A3] Sodium/potassium-transporting ATPase ~3 chain [Sodium pump, ATP1A3] Sodium/potassium-transporting ATPase a3 chain [Sodium pump, ATP1A3] Sodium/potassium-transporting ATPase ~ chain [Sodium pump] Sodium/potassium-transporting ATPase ~ chain [Sodium pump] Sodium/potassium-transporting ATPase ~ chain [Sodium pump] Sodium/potassium_transporting ATPase ~ chain [Sodium pump, NA-P] Sodium/potassium-transporting ATPase ~ chain ] [Sodium pump, NA-P] Sodium/potassium-transporting ATPase ~ chain [Sodium pump]
[chicken]
Homo sapiens
Na+, K§
[human] Rattus norvegicus
Na § K§
[ratl Saccharomyces cerevisiae [yeast] Gallus gallus
Na § Na+, K+
[chicken]
Homo sapiens
Na § K§
[human] Sus scrofa
Na § K+
[pig] Rattus norvegicus
Na § K§
[rat]
Artemia salina
Na+, K§
[brine shrimp]
Artemia sanfranciscana Na § K§ [brine shrimp]
Catostomus commersoni [sucker]
Na § K§
Drosophila melanogaster
Na§ K*
[fruit fly]
Hydra attenuata
Na+, K§
[hydra]
Torpedo californica
Na§ K§
[ray]
51
CODE fi~L:!::y! i,:::L ::-:-: .:. :
Atxaleido Atxbleido
iiiiii!i:;~;:~/i
Pmalajeca Pma 1arath Pma 1canal Pmalklula Pmallyces Pmalneucr
i:i:i; ............
Pmalnicpl
DESCRIPTION [SYNONYMS] Probable El-E2 type cation ATPase 1A Probable El-E2 type cation ATPase 1A Plasma membrane ATPase [Proton pump, PMA1] Plasma membrane ATPase [Proton pump, AHA1] Plasma membrane ATPase [Proton pump, PMA1] Plasma membrane ATPase [Proton pump, PMA1] Plasma membrane ATPase [Proton pump, LHA1] Plasma membrane ATPase [Proton pump, PMA1] Plasma membrane ATPase 1 [Proton pump, PMA1]
Pmalschpo Plasma membrane ATPase 1 [Proton pump, PMA1] Pmalsacce Plasma membrane ATPase 1 [Proton pump, PMA1, YGL008CI Pmalzygro Plasma membrane ATPase [Proton pump] ii,i.i::-i:il.ili1 Pma2arath Plasma membrane ATPase 2 [Proton pump, AHA1] ii'i'i:y!'~!i;i:;! Pma2sacce Plasma membrane ATPase 2 [Proton pump, PMA2, YPL036W] Pma2schpo Plasma membrane ATPase 2 [Proton pump, PMA2] Pma3arath Plasma membrane ATPase 3 [Proton pump, AHA3] Pma3nicpl Plasma membrane ATPase 1 [Proton pump, PMA3] Pma4nicpl Plasma membrane ATPase 4 [Proton pump, PMA4] 2-II?.::.%:. ]
OR GANI SM S UBS TRA TE(S) [COMMON NAMES] Leishmania donovani Metal ions [trypanosome] Leishmania donovani Metal ions [trypanosome] Ajellomyces capsulata H § [gram-negative bacterium] Arabidopsis thaliana H§ [mouse-ear cress] Candida albicans H+ [yeast] Kluyveromyces lactis H + [yeast] Lycopersicon H+ esculentum [tomato] Neurospora crassa H§ [mold] Nicotiana H+ plumbagmifolia [tobacco] Schizosaccharomyces H § pombe [yeast] Saccharomyces H+ cerevisiae [yeast] Zygosaccharomyces rouxii [yeast] Arabidopsis thaliana lmouse-ear cress] Saccharomyces cerevisiae [yeast] Schizosaccharomyces pombe [yeast] Arabidopsis thaliana [mouse-ear cress] Nicotiana plumbaginifolia [tobaccol Nicotiana plurn baginifolia [tobacco]
H§ H§ H§ H§ H§ H§ H+
Phylogenetic tree
..... ,,:...
. .....
52
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and are therefore not included in the phylogenetic tree: Atcbgalga, Atcaorycu (Atcborycu); Atcdfelca, Atcdsussc, Atcdorycu, Atceorycu, Atcdratno, Atceratno, Atcehomsa, Atcesussc (Atcdhomsa); Atcporycu, Atcpratno, Atcpsussc (Atcphomsa); Atcqratno (Atcqhomsa); Athasussc, Athaorycu, Atharatno (Athahomsa); A t m a s a l t y (Atmaescco); A t n l b u f m a , Atnlgalga, Atnloviar, A t n l e q u c a , Atnlsussc, A t n l r a t n o (Atnlhornsa); Atn2sacce (Atnlsacce); Atn2galga, Atn2ratno (Atn2homsa); Pma2arath (Pmalarath); Pma3nicpl (Pmalnicpl).
Atxaieido ~itxbleido Pmallyces Pmalnicpl Pmalarath Pma4nicpl Pma3arath Pmalschpo Pma2schpo Pmalajeca Pmalneucr Pmaisacce Pma2sacce Pma!klula Pmalcanal Pmalzygro __.___...Atcphomsa ~Atcqhomsa Atcrhomsa Atc3sacce Atmaescco Atmbsalty Atc3schpo Atnlsacce Atn3homsa Atn3ratno Atn3galga Atnlhomsa Atn3sussc Atnacatco Atnatorca Atnaartsf Atnadrome Atnaartsa A~nahydat Athahomsa -- Atcartsf Atcbdrome
~
~ ~ f!
[ '"
.,
Atcborycu
Atcdhomsa
i<
Atcfratno Atctrybr Atcplafa Atalsynsp
Atclsynsp
Atclsacce Atclmycge
P r o p o s e d o r i e n t a t i o n of A T C 1 8 in t h e m e m b r a n e The model is based on predictions of membrane-spanning regions and a-hehcal content. The N-terminus of the protein is illustrated on the inside and is folded ten times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75 % of the aligned transporters {see below} are shown.
53
OUTSIDE ~h
iil '-~:?i-~it F;.. 1
,;o
!!:: .-
,~, mls
3 1 5
"
I
9D ; L ~
I.::.-. ~: .:/ i~iL-
-
.
_
L
Q
,~ I ~
95
T .
i .... ' ---..:: : . I .
.
N
.
..
G
.
V
NH
!/i~i :.i
2
T R
D ~
I
R DAP
G
DG
" :"
-:Li:! . _
::.~i~.~
,. ,
.
54
.
ADA
GAKK
I
G C S D
L A P
A
K T G T k
G K
L
T
K
D
N
F
P P R
G
Physical and genetic characteristics
.-....
,......
A
INSIDE
.
,.:.. - .,..:-L -.
--D
P
- , v . . : 1 .
COOH
G
G
:--.Y:L::! I':"-::L.] :: .-,;:: :]
_
k A E
V
S LTGES K - - / /
%1:
I
G
Ata 1synsp Atcartsf Atcplafa Atctrybr Atc 1sacce Atc3sacce Atc3schpo Atcaorycu Atcbdrome Atcbgalga Atcborycu Atcdfelca Atcdhomsa Atcdorycu Atcdratno Atcdsussc Atcehomsa Atcesussc Atceorycu
AMINO ACIDS 915 1003 1228 1011 950 1173 103 7 1001 1002 994 994 997 997 997 997 997 1042 1042 1042
MOL. WT
98 902 110 343 139414 110313 104 5 70 130 860 115 327 110458 109 597 109 023 109 489 109 712 109 690 109 644 109 680 109 726 114 756 114 791 114 705
EXPRESSION SITES
CHROMOSOMAL LOCUS
Chromosome 7 skeletal muscle skeletal muscle skeletal muscle heart kidney muscle heart, stomach stomach smooth muscle kidney stomach smooth muscle smooth muscle
12q23-q24.1
12q23-q24.1
AMINO ACIDS
MOL. W T
EXPRESSION SITES
Atceratno Atcfratno Atclmycge Atclsynsp Atcphomsa Atcporycu
1043 999 874 926 1220 1220
114767 109359 96317 99696 134684 134650
brain kidney (& others)
Atcpratno Atcpsussc :i!:.;i!:~:i~?i}!.~:!:~i Atcqhomsa Atcqratno Atcrhomsa Athahomsa Athaorycu Atharatno Athasussc Atmaescco Atmasalty Atmbsalty Atnlbufma Atnl equca Atnlgalga Atnlhomsa Atnlratno Atnlsussc Atnl oviar Atnlsacce Atn2galga Atn2homsa Atn2ratno Atn2sacce Atn3galga Atn3homsa Atn3sussc Atn3ratno Atnaartsa Atnaartsf Atnacatco Atnadrome
1176 1220 1198 1198 1205 1035 1035 1033 1034 898 902 908 1023 1021 1021 1023 1023 1021 1021 1091 1017 1020 1020 1091 1010 1013 1021 1013 996 1004 1027 1038
129510 134 709 132 722 132615 133 930 114090 114201 114037 114286 99466 99 782 100428 112599 112696 112231 112895 113054 112680 112657 120357 112050 112265 112217 120317 111284 111692 112653 111735 111022 110699 113313 115342
1031 1022 974 974 916 948 895 899 956 920 957 919 918 920 947 947 1010 948 956 952
114161 112429 107 449 107 305 98884 104182 97459 98259 105 103 99 886 105 155 99 883 99619
;...ii!i.-:i:i:~i ii~:;)[;i ~i ...........
..~::::~:.:,~. :::..:..~...
i/:.!ii!-::~:: :ii ::~:!::-:::~ :.:.::.:. ..................
::::::::::::::::::::::::::: ::. ::-:
:::7!!~i.iii!~:--I:-.Ii-~:!i.i
i~i~i:~!~i~:. ::!~::::..::?i::
::,.-.:.~,.~.~:~.~::~..: ~:. .................... ..........
:i'~:.!i::.iil.::-~.:.7:.I:
:ii!ii:ii:i!:::i-7.11:i.i::!
.i:::i;::iii!!:::?:::i".:.i.-i
::::::::::::::::::::::.i::.-~ ::.;':::::
:, :~:.~.. :..,.-,.: ,.
:.i;:ii):.:.4-il;.i:./-il
.-.:-.~-,~: . . . ... v: .,.-:
:.::.:/:::~::~-;: ......... 4; .~: ~::::..~...~.:~:.~.::.~ ,::.::
..;:;::::,F f-:-: .:.
.........
-...:-!~ ;-;. :.;.. ..
..
Atnahydat Atnatorca Atxaleido Atxbleido Pmalajeca Pmalarath Pma 1canal Pmalklula Pmallyces Pmalneucr Pmalnicpl Pmalschpo Pmalsacce Pmalzygro Pma2arath Pma2sacce Pma2schpo Pma3arath Pma3nicpl Pma4nicpl
erythrocyte stomach, smooth muscle & others brain stomach, smooth muscle brain brain erythrocyte stomach stomach stomach stomach
CHROMOSOMAL LOCUS
12q21-q23
3p26-p25 1q25-q32 19q13.1
92.8-1 O0 minutes mgtB locus kidney, bladder & others kidney kidney 1p13-p11 brain kidney kidney Chromosome 4 lq21-q23 brain Chromosome 4 brain kidney brain
19q12-q13.2
hypothalamus tubules, muscles, nervous system
Chromosome 3
leaf, stem, flower, root Chromosome 7
1 O0 061
104 270 102 171 110127 104318 105 112 105 188
root Chromosome 16 leaf, stem, flower, root leaf, stem, flower, root
//
Multiple amino acid sequence alignments 1
Pma2schpo MQRNNGEGRP
EGMHRISRFL
HGNPFKNNAS
PQDDSTTRTE
50
VYEEGGVEDS
51 i00 Pma2schpo A VDYDNASGNA APRLTAAPNT HAQQANLQ SGNTSITHET QSTSRGQEAT Pma2sacce ............................ MS S T E A K Q Y K E K P S K E Y L H A S D Atnadrome ................................................ MA I01 150 Atxaleido ......................... MSSKK YELDAAAFED KPESHSDAEM Atxbleido MSSKK YELDAAAFED KPESHSDAEM Pmallyces ......................... MAEKP EVLDAVLKET VDLENIPIEE Pmalnicpl ........................ MGEEKP EVLDAVLKEA VDLENIPIEE Pmalarath .............................. SGLEDIKNET VDLEKIPIEE Pma4nicpl .......................... MAKA ISLEEIKNET VDLEKIPIEE P m a 3 a r ath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A SGLEDIVNEN VDLEKIPIEE P m a l s c h p o ...... M A D N A G E Y H D A E K H A P E Q Q A P P P Q Q P A H A A A P A Q D D E P D D D I D A P m a 2 s c h p o . . . . . . . . . . . . . . . . . . TS P S L S A S H E K P A R P Q T G E G S D N E D E D E D I D A P m a l a j e c a . M A H S A A S G A AS ...... A A H F E K K T P E V A HEEKKPPLPE EEDEDEDMDA i=i~?i~}.i!i,.!:i:,~P. m a l n e u c r . M A D H S A S G A P A L S T N I E S G K F D E K A A E A A A Y Q P K P K V . . E D D E D E D I D A P m a l s a c c e .... M T D T S S S S S S S S A S S V S A H Q P T Q E K P A K T Y D D A A S E S S D . D D D I D A Pma2sacce GDDPANNSAASSSSSSSTST SASSSAAAVP RKAAAASAAD DSDSDEDIDQ Pmalklula ....................... MSAATEP TKEKPVNNQD SDDEDEDIDQ P m a l c a n a l . . . . . . . . . . . . . . . . . . . M S A T E P T N E K V D K I V ...... S D D E D E D I D Q ?ii ~?!i:i;.i:,.i~i~ P m a l z y g r o . . M S D E R I T E K P P H Q Q P E S E G E P V P E E E V E E E T E E E V P D E Q S S E D D D I D G Atcphomsa ................ MGDMANNSVAYSGV KNSLKE..AN HDGDFGITLA Atcqhomsa ................ MGDM T.NSDFYS.. KNQRNE..SS HGGEFGCTME !!~!!~i:i.i~i~:!i~!!iA:;: t c r h o m s a . . . . . . . . . . . . . . . . . . . M T N P S D . . R V L P A N S M A . . E S R E G D F G C T V M Atc3sacce ................ MSRQ DENSALLANN ENNKPSYTGN ENGVYDNFKL Atmaescco .............. MFKEIF TRLIRHLPSR LVHRDPLPGA QQTVNTVVPP Atmbsalty ..................... MTDMNIENR KLNR.PASEN DKQHKKVFPI Atc3schpo ............................................ MVTINI Atnlsacce .......................................... MGEGTTKE Atn3homsa .......................... MGDK KDDKDSPKKN KGKERRDLDD Atn3ratno .......................... MGDK KDDKSSPKKS KAKERRDLDD Atn3galga .......................... MGD. K G E K E S P K K G K G K . . R D L D D Atnlhomsa ............. MGKGVGR DKYEPAAVSE QGDK...KGK KGKKDRDMDE A t n 2 h o m s a . . . . . . . . . . . . . M G R G A G R E Y S P A A T T A E N G G G ..... K K K Q K E K E L D E A t n 3 s u s s c . . . . . . . . . . . . . M G K G V G P D K Y E P A A V S E H G D K ..... K K A K K E R D M D E Atnacatco ............. MGVGDGR DQYELAAMSE QSGKKKSKNK KEKKEKDMDE Atnatorca ............. MGKGAAS EKYQPAATSE NAKNSKKSKS KTT...DLDE Atnaartsf . . . . . . . . . MAKG KQKKGKDLNE AtnadromeLRSD 89 Atnaartsa ....................................... M GKKQGKQLSD Atnahydat .......... MADPGDLESR GKADSYSVAE KKSAPKKISK KNANKAKLED ::::::::::::::::::::::: A t h a h o m s a ..... M G K A E N Y E L Y S V E L G P G P G G D M A A K M S K K K K A G G G G G K R K E K L E N Atctrybr ............................................ M Atclsynsp .............................................. MKGA Atclsacce .............................. MSDNPFNASL LDEDSNRERE Consensus .................................................. . .:,,~:. ~:..~. ~ ::
..
~':.~i~21':.':} -.?)"
......
.
.
.
.
.
. . . . . . .
56
Atxaleido Atxbleido Pmallyces Pmalnicpl ::,i:ii:: i i?:;:; Pmalarath Pma4nicpl Pma3arath Pmalschpo Pma2schpo Pmalajeca Pmalneucr Pmalsacce Pma2sacce ,%! h ~:~:: Pmalklula ...... : ...... Pmalcanal Pmalzygro Atcphomsa :;~:::: ::::::::::~::::::::::: Atcqhomsa ~:iiii!!:i::i;ii,iii~:iil i: Atcrhomsa Atc3sacce Atmaescco Atmbsalty Atc3schpo Atnlsacce Atn3homsa Atn3ratno Atn3galga Atnlhomsa i~i=======================:~:( Atn2homsa Atn3sussc Atnacatco Atnatorca Atnaartsf Atnadrome Atnaartsa Atnahydat Athahomsa Atcartsf Atcbdrome Atcborycu Atcdhomsa Atcfratno ::;:i: ;~;:)/:,:::: Atctrybr Atcplafa ;:;:;;;:;::2 Atalsynsp Atclsynsp :!!:i:::ii)s :::: Atclsacce :::::_ ::: Atclmycge Consensus ........:.
. . . . . .
7,:::::: :-:t ::::::.~..f;/:
................
....
.........
::i:ii i::====================== ::i~:~:::..
.......
.......
.........
.
......
....
151 200 TPQKPQRRQS VLSKAVSEHD ERATGPATDP VPPSK ........ GLTTEEA TPQKPQRRQS VLSKAVSEHD ERATGPATDL LPPSK ........ GLTTEEA VFENLRC ................................. TREGLTATAA VFENLRC ................................. TKEGLTATAA VFQQLKC ................................. TREGLTTQEG VFEQLKC ................................. TREGLSADEG VFQQLKC ................................. SREGLSGAEG LIEELFSEDV QEEQEDNDDA PA.AGEA..K AVPEELLQTD MNTGLTMSEV LIEDLYSQDQ EEEQVEEEES PGPAGAA..KVVPEELLETD PKYGLTESEV L I E E L E S Q D G H I D I E D D E D G E P G G A .... R P V P D E L L T T D T R H G L T D A E V L I E D L E S H D G H D A E E E E E E A T P G G G .... R V V P E D M L Q T D T R V G L T S E E V LIEELQSNHG VDDEDSDNDG PVAAGEA..R PVPEEYLQTD PSYGLTSDEV LIDELQSNYG EGDESGEEEV RTDGVHAGQRVVPEKDLSTD PAYGLTSDEV LIEDLQSHHG LDDE.SEDDE HVAAGSA..R PVPEELLQTD PSYGLTSDEV L V A D L Q S N P G A G D E E E E E E N ..... D S S F K A V P E E L L Q T D P R V G L T D D E V LIDELQSQ.E AHEEAEEDDG PAAAGEA..R KIPEELLQTD PSVGLSSDEV ELRALMELRS TDALRKIQES YG.DVYGICT KLKTSPNEGL S.GNPADLER ELRSLMELRG TEAWKIKET YG.DTEAICR RLKTSPVEGL P.GTAPDLEK ELRKLMELRS RDALTQINVH YG.GVQNLCS RLKTSPVEGL S.GNPADLEK SKSQLSDLHN PKSIRSFVRL FGYESNSLFK YLKTDKNAGI SLPEISNYRK SLSAHCLKMA VMPEEELWKT FDTHPEG ............... LNQAEVES EAE ...... A F H S P E E T L A R L N S H R Q G . . . . . . . . . . . . . . . L T I E E A S E S N P V Y F S D I K D V E S E ..... F L T S I P N G . . . . . . . . . . . . . . L T H E E A Q N NNNAEFNAYH TLTAEEAAEF IGTSLTEG .............. LTQDEFVH LKKEVAMTEH KMSVEEVCRK YNTDCVQG .............. LTHSKAQE LKKEVAMTEH KMSVEEVCRK YNTDCVQG .............. LTHSKAQE LKKEVAMTEH KMSIEEVCRK YNTDCVQG .............. LTHSKAQE LKKEVSMDDH KLSLDELHRK YGTDLSRG .............. LTSARAAE LKKEVAMDDH KLSLDELGRK YQVDLSKG .............. LTNQRAQD LKKEVSMDDH KLSLDELHRK YGTDLSRG .............. LTPARAAE LKKEVDLDDH KLSLEELHHK YGTDLSKG .............. LSNSRAEE LKKEVSLDDH KLNLDELHQK YGTDLTQG .............. LTPARAKE LKKELDIDFH KIPIEECYQR LGSNPETG .............. LTNAQARS LKQELDIDFH KISPEEMYQR FQTHPENG .............. LSHARAKE LKKELELDQH KIPLEELCRR LGTNTETG .............. LTSSQAKS LKKELEMTEH SMKLESLLSM YETSLEKG .............. LSENIVAR MKKEMEINDH QLSVAELEQK YQTSATKG .............. LSASLAAE ..... M E D A H A K K W E E V V D Y F G V D P E R G . . . . . . . . . . . . . . L A L E Q V K K ..... M E D G H S K T V E Q S L N F F G T D P E R G . . . . . . . . . . . . . . L T L D Q I K A ..... M E A A H S K S T E E C L A Y F G V S E T T G . . . . . . . . . . . . . . L T P D Q V K R ..... M E N A H T K T V E E V L G H F G V N E S T G . . . . . . . . . . . . . . L S L E Q V K K ..... M E E A H L L S A A D V L R R F S V T A E G G . . . . . . . . . . . . . . L T L E Q V T D LPENLPTDPA AMTPAAVAAALRVDTKVG LSSNEVEE .MEEVIKNAH TYDVEDVLKF LDVNKDNG LKNEELDD MGAFPLPPNQ YGFPHLKF.. LPPSPSTRGR HSCRFAHRSR FRSDSGAVAQ I V S A S L T D V R Q P I A H W H S . . L T V E E C H Q Q L D .... A H R N G L T A E V A . . A D ILDATAEALS KPSPSLEYCTLSVDEALEKLD ...TDKNGGLRSSNEANN ................................... MNSW TGLSEQAAIK ..................
201 250 A t x a l e i d o E E L L K K Y G R N EL. P E K K T P S W L I Y V R G L W G P M P A A L W I . . . . . . . . . . . A A t x b l e i d o E E L L K K Y G R N EL. P E K K T P S W L I Y V R G L W G P M P A A L W I . . . . . . . . . . . A
57
:
::I
.... :.:
, i
:/!ii] ....
,
!iii i::l
Pmallyces QERLSIFGYN KL.EEKKESK FLKFLGFMWNPLSWVMEA ........... Pmalnicpl QERLAIFGYN KL.EEKKDSK LLKFLGFMWNPLSWVMEA ........... Pmalarath EDRIVIFGPN KL.EEKKESK ILKFLGFMWNPLSWVMEA ........... Pma4nicpl A S R L Q I F G P N K L . E E K N E S K I L K F L G F M W N P L S W V M E A . . . . . . . . . . . Pma3arath ENRLQIFGPN KL.EEKKESK LLKFLGFMWNPLSWVMEA ........... Pmalschpo EERRKKYGLN QM.KEELENP FLKFIMFFVG PIQFVMEM ........... Pma2schpo EERKKKYGLN QM.KEEKTNN IKKFLSFFVG PIQFVMEL ........... Pmalajeca VARRKKYGLN QM.KEEKENL VLKFLSYFVG PIQFVMEA ........... Pmalneucr VQRRRKYGLN QM.KEEKENH FLKFLGFFVG PIQFVMEG ........... Pmalsacce LKRRKKYGLN QM.ADEKESLVVKFVMFFVG PIQFVMEA ........... Pma2sacce ARRRKKYGLN QM.AEENESL IVKFLMFFVG PIQFVMEA ........... Pmalklula T K R R K K Y G L N Q M . S E E T E N L F V K F L M F F I G P I Q F V M E A . . . . . . . . . . . Pmalcanal TKRRKRYGLN QM.AEEQENL VLKFVMFFVG PIQFVMEA ........... Pmalzygro VNRRKKYGLN QM.REESENL LVKFLMFFIG PIQFVMEA ........... Atcphomsa ..REAVFGKN FIPPKKPKTF LQLVWEALQD VTLIILEI ........... Atcqhomsa ..RKQIFGQN FIPPKKAKPF LQLVWEALQD VTLIILEI ........... Atcrhomsa ..RRQVFGHN VIPPKKPKTF LELVWEALQD VTLIILEI ........... Atc3sacce TNRYKNYGDN SLPERIPKSF LQLVWAAFND KTMQLLTV ........... Atmaescco ..AREQHGEN KLPAQQPSPWWVHLWVCYRN PFNILLTI ........... Atmbsalty ..RLKVYGRN EVAHEQVPPA LIQLLQAFNN PFIYVLMA ........... Atc3schpo ..RLSEYGEN RLEADSGVSA WKVLLRQVLNAMCWLIL ........... Atnlsacce ..RLKTVGEN TLGDDTKIDY KAMVLHQVCNAMIMVLLI ........... Atn3homsa ..ILARDGPN ALTPPPTTPEWVKFCRQLFG GFSILLWI ........... Atn3ratno ..ILARDGPN ALTPPPTTPEWVKFCRQLFG GFSILLWI ........... Atn3galga ..ILARDGPN ALTPPPTTPEWVKFCRQLFG GFSILLWI ........... Atnlhomsa ..ILARDGPN ALTPPPTTPE WIKFCRQLFG GFSMLLWI ........... Atn2homsa ..VLARDGPN ALTPPPTTPEWVKFCRQLFG GFSILLWI ........... Atn3sussc ..ILARDGPN ALTPPPTTPEWVKFCRQLFG GFSMLLWI ........... Atnacatco ILARDGPN ALTPPPTTPEWVKFCKQMFG GFSMLLWT Atnatorca ..ILARDGPN ALTPPPTTPE WIKFCRQLFG GFSILLWT ........... Atnaartsf ..NIERDGPN CLTPPKTTPE WIKFCKNLFG GFALLLWT ........... Atnadrome ..NLERDGPN .LTPPKQTPEWVKFCEDLF. GVAMLLWI ........... Atnaartsa ..HLEKYGPN ALTPPRTTPE WIKFCKQLFG GFQMLLWI ........... Atnahydat ..NLERDGLN ALTPPKQTPEWVKFCKQMFG GFSMLLWI ........... Athahomsa ..LLLRDGPN ALRPPRGTPE YVKFARQLAG GLQCLMWV ........... Atcartsf ..NQEKYGPNELPAEEGKSLLTLILEQFDDLLVKILLL ........... Atcbdrome ..NQKKYGPN ELPTEEGKSI WQLVLEQFDD LLVKILLL ........... Atcborycu ..HLEKYGHN ELPAEEGKSL WELVIEQFED LLVRILLL ........... Atcdhomsa ..LKERWGSN ELPAEEGKTL LELVIEQFED LLVRILLL ........... Atcfratno ..ARERYGPN ELPTEEGKSL WELVVEQFED LLVRILLL ........... Atctrybr ..RRQAFGINELPSEPPTPFWKLVLAQFEDTLVRILLL ........... Atcplafa ..RRLKYGLNELEVEKKKSI FELILNQFDDLLVKILLL ........... Atalsynsp ..RYEQYGRN ELKFKPGKPA WLRFLLQFHQ PLLYILLI ........... Atclsynsp ..RLALYGPN ELVEQAGRSP LQILWDQFAN IMLLMLLA ........... Atclsacce ..RRSLYGPN EITVEDDESL FKKFLSNFIE DRMILLLI ........... Atclmycge ..SRQEHGAN FLPEKKATPF WLLFLQQFKS LVVILLLL ........... C o n s e n s u s ....... G.N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
i!iiiiill :;:, :::
,
: .~. ........... .,
~.:~;.:-..-.,:- 1
::.::;.~:./.....:.: 9 .! ..:-:
:~,,: :~::"1
..::.
..
..
...
..
::-::.:;- - .: -: ..
::.;:.. ...
...
%.i.-:1 : .:
....
1
.
.
..
:>:"
9
../
.
. . . .
i::
.......
.;
"
- ?
..
58
251 Atxaleido IIIEFAL ....... E Atxbleido IIIEFAL ....... E Pmallyces AIMAIALANG GGKPP Pmalnicpl A I M A I A L A N G G G K P P Pmalarath ALMAIALANG DNRPP
.................... .................... .................... .................... ....................
NWPDG NWPDG DWQDF DWQDF DWQDF
A A A A A A A A A A A A A A A A A A L L A S G G G G G G G G G G G G A A A A A A A A A V G A
300 AILFAIQIAN AILFAIQIAN VGIITLLIIN VGIITLLIIN VGIICLLVIN
Pma4nicpl
Pma3arath
AVMAIALANG
AIMAIALANG
PmalschpoAALAAGL
.... ========================
::::::::::::::::::::i~;[:::i;~ :~~i
R ....................
IGIICLLVIN
DWQDF
VGIVCLLVIN
DWVDF
GVICALLMLN
R ....................
DWVDF
GVICALLLLN
Pmalneucr
AVLAAGL
.......
E ....................
DWVDF
GVICGLLLLN
AILAAGL
.......
S ....................
DWVDV
GVICALLLLN
Pmalajeca
AILAAGL
Pmalsacce
AILAAGL
....... .......
Pmalklula
AILAAGL
Pmalzygro
AVLAAGL
Atcqhomsa
AIISLGLSFY
Atc3sacce
AVVSFVLGLY
Atcphomsa
i:,~:iii!!!',ii~:!:;ii:,~!:iA! t c r h o m s a :::::::::::::::::: :::::: :,
.......
DWQDF
.......
Pmalcanal
::::::::::::::::::::::: . . . . . .
GGKPP ....................
Pma2schpoAALAAGL
Pma2sacce ilil.ii!:~!)~!:::::i~;i: :::::~
DGKPP ....................
Atmaescco
AVLAAGL
.......
.......
.......
E .................... S ....................
E ....................
E ....................
E ....................
AIVSLGLSFY
QPPEGDNALC
AIISLVLSFY
RPAGEENELC
GAISYATE
HPPGEGNEGC
DWVDF DWVDF DWVDF DWVDF
DWVDF
MIISFAM
Atn3galga
AILCFLAYGI
Atn3ratno
GVICALLLLN GVICGLLFLN
GQVATTPEDE
NEAQAGWIEGAAILFSVIIV
ATAQGGAEDE
...... ELWM QPPQYDPEGN
...........................
AILCFLAYGI
GVICGLLFLN
GEGETGWIEGAAILLSVVCV
...........................
Atn3homsa
:::::::::::::::::::::::: ::::::::::::::::~: ::::::::::::::::::::::::::::::::::::::::::
GVICGLLMLN
GEVSVG.EEE
GEAEAGWIEGAAILLSVICV KIKQVDWIEG
HDWITG
QAGTE.D
..........
DPS GD...NLYLG
QAGTE.D
..........
EPS ND...NLYLG
GVISAIIVLN GVISFVIAVN
IVLAAVVIIT
QAGTE.D
Atnlhomsa
AILCFLAYSI
QAATE.E
Atxaleido
ATIGWYETIK
AGDAVAALKN
SLKPTATVYR
...... DSKW QQIDAAVLVP
Pmallyces
STISFIEENN
AGNAAAALMA
RLAPKAKVLR
...... DGKW DEEDASVLVP
AGNAAAALMA
GLAPKTKVLR
..........
DPS GD...NLYLG
LIILTMVSLS
AILCFLAYGI
=======================================
..........
VAIMIAVFVV
DLFAAGVIALMVAIS
Atmbsalty AGVSFITDYW LPLRRGE .......... E ...... TDLTGV !?i,ii~ii!i:2!?ii!i!:~i~ Atc3schpoAALSFGT ........................... TDWIEG Atnlsacce
GVICALLLLN
IVLAAVVIIT IVLAAVVIIT
EPQ ND...NLYLGVVLSAVVIIT
Atn2homsa AILCFLAYGI QAAME.D .......... EPS ND...NLYLGVVLAAVVIVT i::i!i2il}ii!iiii,i~.i!!. A t n 3 s u s s c A I L C F L A Y G I QAATE.E .......... EPQ ND...NLYLGVVLSAVVIIT Atnacatco AVLCFLAYGI LAAME.D .......... EPA ND...NLYLGVVLSAVVIIT !i:i!!!iii!:i)!!?i:}!i!!i!~iA:! t n a t o r c a A I L C F L A Y G I Q V A T V . D . . . . . . . . . . N P A N D . . . N L Y L G V V L S T V V I I T Atnaartsf AILCFLAYGI EASSGNE .......... DML KD...NLYLG IVLATVVIVT Atnadrome AILCFVAYSI QASTS.E .......... EPA DD...NLYLG IVLSAVVIVT Atnaartsa SILCFIAYTM EKYKNPD ........... VL GD...NLYLG LALLFVVIMT i:ii::iiii!!iii:::~!:::i/i Atnahydat AILCFFAFGI RAVRD.T .......... NPN MD...ELYLG IVLSVVVIIT ===================:: ======================= QAS.EGD .......... LTT DD...NLYLA IALIAVVVVT ;ii~!ii~iii!~!:i :i!!;i: A t h a h o m s a A A I C L I A F A I Atcartsf AIISLVLA .... LFEEH .......... DDE AEQLTAYVEP FVILLILIAN LVILLILIAN !~!!:~:ii::i,!#;i!:2!!~i:!A!i:!-;ti~c b d r o m e A I I S F V L A . . . . L F E E H . . . . . . . . . . E . . . E T F T A F V E P Atcborycu ACISFVLA .... WFEEG .......... E...ETITAFVEP FVILLILIAN Atcdhomsa ACISFVLA .... WFEEG .......... E...ETITAFVEP FVILLILVAN Atcfratno ALVSFVLA .... WFEEG .......... E...ETTTAFVEP LVIMLILVAN Atctrybr ATVSFAMA .... VVENN .......... A ...... ADFVEP FIILLILILN Atcplafa AFISFVLT .... LLDMK .......... HKK IE.ICDFIEPLVIVLILILN Atalsynsp GTVKAFLG .... SWTNA .................. W ..... VIWGVTLVN Atclsynsp AVVSGALD .... LRDGQ .................. FPKDA IAILVIVVLN =========================== ::~:: Atclsacce SAVVSLFM .... GNIDD .................. AVSIT LAIFIVVTV. Atclmycge SLLSFVVAIV SGLRSNW .......... NFN HDLIIEWVQP FIILLTVFAN Consensus A ................................................. ..........
Atxbleido
Pmalnicpl Pmalarath Pma4nicpl
Pma3arath
Pmalschpo
301
ATIGWYETIK
STISFIEENN STISFIEENN STISFIEENN
STISFVEENN
AVVGFVQEYQ
AGDAVAALKN AGNAAAALMA
SLKPTATVYR
RLAPKAKVLR
AGNAAAALMA
GLAPKTKVLR
AGSIVDELKK
SLALKAVVIR
AGNAAAALMA
GLAPKTKVLR
350
...... DSKW QQIDAAVLVP
...... DGRW KEEDAAVLVP ...... DGKW
SEQEAAILVP
...... DGRW SEQEAAILVP
...... DGKW SEQEASILVP ...... EGQV HELEANEVVP
59
i
i
Pma2schpo ATVGFVQEYQ AGSIVDELKK TMALKASVLR Pmalajeca ACVGFVQEFQ AGSIVDELKK TLALKAWLR Pmalneucr AVVGFVQEFQ AGSIVDELKK TLALKAVVLR Pmalsacce AGVGFVQEFQ AGSIVDELKK TLANTAVVIR Pma2sacce ASVGFIQEFQ AGSIVDELKK TLANTATVIR PmalklulaAAVGFIQEYQ AGSIVDELKK TLANSAVVIR Pmalcanal AFVGFIQEYQ AGSIVDELKK TLANSALVVR Pmalzygro AGVGFIQEFQ AGSIVEELKK TLANTATVIR Atcphomsa VLVTAFNDWS KEKQFRGLQS RIEQEQKFTV Atcqhomsa VLVTAFNDWS KEKQFRGLQS RIEQEQKFTV Atcrhomsa VLVTAFNDWS KEKQFRGLQC RIEQEQKFSI Atc3sacce VLVSAANDYQ KELQFAKLNK KKE.NRKIIV Atmaescco TLLNFIQEAR STKAADALKA MVSNTATVLR Atmbsalty GLLRFWQEFR TNRAAQALKK MVRTTATVLR Atc3schpo ITVGFIQEYK AEKTMDSLRT LASPMAHVTR Atnlsacce VLIGLVQEYK ATKTMNSLKN LSSPNAHVIR Atn3homsa GCFSYYQEAK SSKIMESFKN MVPQQALVIR Atn3ratno G C F S Y Y Q E A K S S K I M E S F K N M V P Q Q A L V I R Atn3galga GCFSYYQEAK SSKIMESFKN MVPQQALVIR Atnlhomsa GCFSYYQEAK SSKIMESFKN MVPQQALVIR Atn2homsa GCFSYYQEAK SSKIMDSFKN MVPQQALVIR Atn3sussc GCFSYYQEAK SSKIMESFKN MVPQQALVIR Atnacatco GCFSYYQDAK $SKIMDSFKN LVPQQALVVR Atnatorca GCFSYYQEAK SSKIMDSFKN MVPQQALVIR Atnaartsf G I F S Y Y Q E N K S S R I M D S F K N L V P Q Y A L A L R Atnadrome GVFSYYQESK SSKIMESFKN MVPQFATVIR Atnaartsa GCFAYYQDHN ASKIMDSFKN LMPQFAFVIR Atnahydat GCFSYYQESK SSKIMESFKK MIPQEALVLR Athahomsa GCFGYYQEFK STNIIASFKN LVPQQATVIR Atcartsf A V V G V W Q E K N A E S A I E A L K E Y E P E M G K V I R Atcbdrome AVVGVWQERN AESAIEALKE YEPEMGKVVR Atcborycu AIVGVWQERN AENAIEALKE YEPEMGKVYR Atcdhomsa AIVGVWQERN AENAIEALKE YEPEMGKVYR Atcfratno A I V G V W Q E R N A E S A I E A L K E Y E P E M G K V I R Atctrybr ATVGVWQENR AEGAIEALKS FVPKTAVVLR Atcplafa AAVGVWQECN AEKSLEALKE LQPTKAKVLR Atalsynsp AIIGYIQEAK AEGAIASLAK AVTTEATVLR Atclsynsp AVLGYLQESR AEKALAALKG MAAPLVRVRR Atclsacce ...GFVQEYR SEKSLEALNK LVPAECHLMR Atclmycge SLIGSIQEFK AQKSASALKS LTKSFTRVFR C o n s e n s u s ...... QE . . . . . . . . . . . . . . . . . . . V . R
~
...
9
.
..
-:.
:
..
..
.
.
. .
9
.
.
60
Atxaleido Atxbleido Pmallyces Pmalnicpl Pmalarath Pma4nicpl Pma3arath Pmalschpo Pma2schpo Pmalajeca Pmalneucr
351 GDLVKLASGS GDLVKLASGS GDIISIKLGD GDIISIKLGD GDIVSIKLGD GDIISVKLGD GDIVSIKLGD GDILKLDEGT GDILHLDEGT GDILQVEEGT GDILQVEEGT
...... D G R V K E I E A S E I V P ...... N G R L T E V E A P E V V P ...... D G T L K E I E A P E V V P ...... D G Q L V E I P A N E V V P ...... D G Q L I E I P A N E V V P ...... D G N L V E V P S N E V V P ...... N G Q L V E I P A N E V V P ...... D G S V Q E A P A N E I V P I R G G Q .... V I Q I P V A D I T V V R A G Q .... V V Q I P V A E I V V I R N G Q .... L I Q L P V A E I V V I R N D Q .... E I L I S I H H V L V VINDKGENGW LEIPIDQLVP RGPGNIGAVQ EEIPIEELVP .... S S K T D . . A I D S H L L V P .... N G K S E . . T I N S K D V V P .... EGEK.. M Q V N A E E V V V .... EGEK.. M Q V N A E E V V V .... EGEK.. M Q L N A E E V V V .... NGEK.. M S I N A E E V V V .... EGEK.. M Q I N A E E V V V .... NGEK.. M S I N A E E V V V .... DGEK.. K Q I N A E E V V I .... DGEK.. S S I N A E Q V V V .... EGQR.. V T L K A E E L T M .... EGEK.. P S L R A E D L V L .... DGKK.. I Q L K A E E V T V .... DGKK.. I T I N A E Q C V V .... DGDK.. F Q I N A D Q L V V .... A D K T G I Q K I K A R D L V P .... Q D K S G I Q K V R A K E I V P .... A D R K S V Q R I K A R D I V P .... Q D R K S V Q R I K A K D I V P .... S D R K G V Q R I R A R D I V P .... D G . . D I K T V N A E E L V P .... D G K W E I . . I D S K Y L Y V .... DGQN.. L R I P S Q D L V I .... DNRD.. Q E I P V A G L V P .... CGQE.. S H V L A S T L V P .... N G . . E L I S I N V S E V V V . . . . . . . . . . . . . . . . . . V.
AVPADCSI...NEGVIDVDEAALTGESLPV AVPADCSI...NEGVIDVDEAALTGESLPV IIPADARLLE .GDP.LKIDQ SALTGESLPV IIPADARLLE .GDP.LKIDQ SALTGESLPV IIPADARLLE .GDP.LKVDQ SALTGESLPV IIPADARLLE .GDP.LKIDQ SALTGESLPV IIPADARLLE .GDP.LKVDQ SALTGESLPA IICADGRVVT .PDVHLQVDQ SAITGESLAV ICPADGRLIT .KDCFLQVDQ SAITGESLAV IIPADGRIVT .EEAFLQVDQ SAITGESLAV IIPADGRIVT .DDAFLQVDQ SALTGESLAV
TM TM TK TK TK TK TK DK DK DK DK
400 ........ ........ ........ ........ ........ ........ ........ ........ ........ ........ ........
Pmalsacce GDILQLEDGT VIPTDGRIVT .EDCFLQIDQ Pma2sacce GEILQLESGT IAPADGRIVT .EDCFLQIDQ Pmalklula GDILQLEDGV VIPADGRLVT .EDCFIQIDQ Pmalcanal GDILQLEDGT VIPTDGRIVS .EDCLLQVDQ Pmalzygro GDILKLEDGT VIPADGRLVT .EECFLQVDQ Atcphomsa GDIAQVKYGD LLPADGILIQ GND..LKIDE Atcqhomsa GDIAQVKYGD LLPADGLFIQ GND..LKIDE Atcrhomsa GDIAQVKYGD LLPADGILIQ GND..LKIDE Atc3sacce GDVISLQTGDVVPADCVMIS GKC...EADE Atmaescco GDIIKLAAGD MIPADLRILQ ARD..LFVAQ Atmbsalty GDVVFLAAGD LVPADVRLLA SRD..LFISQ Atc3schpo GDVVVLKTGDVVPADLRLVE TVN..FETDE Atnlsacce GDICLVKVGD TIPADLRLIE TKN..FDTDE Atn3homsa GDLVEIKGGD RVPADLRIIS AHGC..KVDN Atn3ratno GDLVEIKGGD RVPADLRIIS AHGC..KVDN Atn3galga GDLVEVKGGD RVPADLRIIS AHGC..KVDN Atnlhomsa GDLVEVKGGD RIPADLRIIS ANGC..KVDN Atn2homsa GDLVEVKGGD RVPADLRIIS SHGC..KVDN Atn3sussc GDLVEVKGGD RIPADLRIIS ANGC..KVDN Atnacatco GDLVEVKGGD RIPADLRIIS SHGC..KVDN Atnatorca GDLVEVKGGD RIPADLRIIS ACSC..KVDN Atnaartsf GDIVEVKFGD RVPADLRVLE ARSF..KVDN Atnadrome GVLVELEFGD LIPLVYRIIE ARDF..KVDN Atnaartsa GDLVEVKFGD RIPADIRITS CQSM..KVDN Atnahydat GDVVFVKFGD RIPADIRIVE CKGL..KVDN Athahomsa GDLVEMKGGD RVPADIRILA AQGC..KVDN Atcartsf GDIVEISVGDKIPADLRLIS ILSTTLRIDQ Atcbdrome GDLVEVSVGD KIPADIRITH IYSTTLRIDQ Atcborycu GDIVEVAVGD KVPADIRILS IKSTTLRVDQ Atcdhomsa GDIVEIAVGD KVPADIRLTS IKSTTLRVDQ Atcfratno GDIVEVAVGD KVPADLRLIE IKSTTLRVDQ Atctrybr GDVVEVAVGN RVPADMRVVE LHSTTLRADQ Atcplafa GDIIELSVGN KTPADARIIK IYSTSLKVEQ Atalsynsp GDIVSLASGD KVPADLRL.. LKVRNLQVDE Atclsynsp GDLILLEAGD QVPADARL.. VESANLQVKE Atclsacce GDLVHFRIGD RIPADIRI.. IEAIDLSIDE Atclmycge GDIIFVDAGD IIPADGKLLQVN..NLRCLE C o n s e n s u s G D ...... G . . . P A D . R . . . . . . . . . . . D. Atxaleido Atxbleido Pmallyces Pmalnicpl Pmalarath Pma4nicpl Pma3arath Pmalschpo Pma2schpo Pmalajeca Pmalneucr Pmalsacce Pma2sacce Pmalklula
401 ............................ ............................ ............................ ............................ ............................ ............................ ............................ ............................ ............................ ............................ ............................ ............................ ............................ ............................
GP GP GP GP HP NP GP HY HQ HK HK HY HY RF
S A I T G E S L A V DK ........ S A I T G E S L A A E K ........ S A I T G E S L A V DK ........ S A I T G E S L A V DK ........ S S I T G E S L A V DK ........ S S L T G E S D H V KK ........ S S L T G E S D Q V RK ........ S S L T G E S D H V KK ........ SSITGESNTI QKFPVDNSLR A S L T G E S L P V EK ........ S I L S G E S L P V EK ........ A L L T G E S L P V IK ........ S L L T G E S L P V SK ........ S S L T G E S E P Q TR ........ S S L T G E S E P Q TR ........ S S L T G E S E P Q TR ........ S S L T G E S E P Q TR ........ S S L T G E S E P Q TR ........ S S L T G E S E P Q TR ........ S S L T G E S E P Q TR ........ S S L T G E S E P Q SR ........ S S L T G E S E P Q AR ........ S S L T G E S E P Q SR ........ S S L T G E S E P Q SR ........ S S L T G E S E P Q SR ........ S S L T G E S E P Q TR ........ S I L T G E S V S V I K ........ S I L T G E S V S V IK ........ S I L T G E S V S V IK ........ S I L T G E S V S V IK ........ S I L T G E S V S V TK ........ S I L N G E S V E A MK ........ S M L T G E S C S V DK ........ S A L T G E A V P V EK ........ S A L T G E A E A V QK ........ S N L T G E N E P V HK ........ S F L T G E S T P V DK ........ S . L T G E S . . . . K ........ 450 EHMPKMGSNV VRGEVEGTVQ EHMPKMGSNV VRGEVEGTVQ GDGVYSGSTC KQGEIEAVVI GDGVYSGSTC KQGEIEAIVI GQEVFSGSTC KQGEIEAVVI GDEVFSGSTC KQGELEAVVI GEEVFSGSTC KQGEIEAVVI GDPTFASSGV KRGEGLMVVT NDTMYSSSTV KRGEAFMVVT GDTCYASSAV KRGEAFMVIT GDQVFASSAV KRGEAFVVIT GDQTFSSSTV KRGEGFMVVT GDEVFSSSTV KTGEAFMVVT GDSTFSSSTV KRGEAFMIVT
61
Pmalcanal
Pmalzygro
: . : - : 5
. .
.
.
5,...
...... SLDK
....................
Atcrhomsa
...... SLDK
....................
Atmbsalty
..... YDVMA
Atnlsacce
..... DANLV
Atn3ratno
..... S P D . . . . . . . . . . . . .
Atnlhomsa
..... SPD .............
Atn3sussc
..... SPD .............
Atn3homsa Atn3galga
Atn2homsa .
.
DPMLLSGTHV
MEGSGRMVVT
DCMLISGSRI
DTLCFMGTTV
MEGSGRMLVT
LSGLGRGVIT
VSGTAQAMVI
F . . . . . . G.. K E E E T S V G D R
LNLAFSSSAV
VKGRAKGIVI
CTHDNPLET
RNITFFSTNC
VEGTARGVVV
FTNENPLET
RNIAFFSTNC
FTNENPLET
RNIAFFSTNC
..... S P D . . . . . . . . . . . . .
CTHDNPLET
..... S P D . . . . . . . . . . . . .
CTHDNPLET
..... SPE .............
YSSENPLET
INLAYSSSIV
RNITFFSTNC RNITFFSTNC
RNICFFSTNC
TSGRAQAVVV
TKGRAKGICY
VEGTARGVVV VEGTARGVVI
VEGTARGIVV VEGTARGIVI
VEGTARGIVV
KNIAFFSTNC
VEGTARGIVI
KNLAFFSTNA
VEGTMRGIVI
KNIAFFSTNC
VEGTARGIVI
..... GAE .............
FTHENPLET
KNLAFFSTNA
VEALPKGVVI
Atnahydat
..... A V D . . . . . . . . . . . . .
FTHENPIET
KNLAFFSTNA
VEGTATGIVV
..... S T E . . . . . . . . . . . . .
CTNDNPLET
Atcbdrome
..... HTDAI
...... PD.. P..RAVNQDK
KNILFSGTNVAAGKARGVVI
Atcdhomsa
..... HTDPV
...... PD.. P..RAVNQDK
KNMLFSGTNIAAGKAMGVVV
Atcfratno
..... HTEPV ..... H T D A I ..... Q I E A V ..... Y A E K M
...... PD.. P..RAVNQDK ...... PD.. P..RAVNQDK ...... PD.. P..RAVNQDK
KNMLFSGTNV
...... ED..
..... L A D . . . . . . . .
SYKNCEIQLK
EL.. LPEETPLAER
QQ.. LPTDVVIGDR
ASGKALGVAV
KNILFSSTAI
VCGRCIAVVI
TNCLFQGTEV
LQGRGQALVY
LNMAYAGSFV
SCIAYMGTLV
VYGKALCVVV TFGQGTGVVV
EKSSFNDQ..
PNSIVPISER
TAALLQSVES
DLGN ..........................
. . . . . TI . . . . . . . . . . D.. S N E K A T I L E Q TNLVFSGAQV ..........................................
Atxaleido
YTGSLTFFGK
Pmallyces
ATGVHTFFGKAAHLVDSTNQ
YTGSLTFFGK
SAGKARGVVM
KNMLFSGTNI
Atclmycge Consensus
451
LEGTAQGLVV
KNMLFSGTNIAAGKALGIVA
...... KG.. R..QERFPA..CMVYSGTAI
...... AV ........ ..... T S Q T I
RNIAFFSTMC
LEGTGRGIVI
..... SPE ............. ..... H T D P V
CTHESPLET
KNLAFFFTNT
Athahomsa
Atxbleido
TAALLQSVES
KEGHGKGIVV
VYGSGVFQVE G .......
DLGN ..........................
ATGVHTFFGKAAHLVDSTNQ
V.GH ..........................
Pma4nicpl
ATGVHTFFGKAAHLVDSTNN
V.GH ..........................
Pmalschpo
ATGDSTFVGR
GTGH ..........................
Pmalarath
ATGVHTFFGK
AAHLVDSTNQ
ATGVHTFFGKAAHLVDSTNQ
AASLVNAAAG
Pma2schpo
ATADSTFVGR
Pmalsacce
ATGDNTFVGRAAALVNKAAG
Pmalajeca Pmalneucr Pma2sacce
ATGDNTFVGR ATGDNTFVGR
AASLVGAAGQ
SQGH ..........................
ATGDNTFVGR
AAALVGQASG
VEGH ..........................
ATGDSTFVGR
AAALVNKASA
GTGH ..........................
AVGVNSQTGI
IFTLLGAGGE
EEEKKDEKKK
Pmalzygro
ATGDNTFVGRAASLVNAAAG
Atcphomsa
V.GH ..........................
GTGH .......................... GSGH ..........................
ATGDSTFVGRAAALVNKAAAGSGH
Pmalcanal
V.GH ..........................
GPALVNAASA AAALVNAASG
Pmalklula
500
V.GH ..........................
Pmalnicpl
Pma3arath
62
MEGSGRMVVT
DPMLLSGTHV
Atnadrome
Atclsacce
.
DPLLLSGTHV
GNICLMGTNV
FTNDNPLET
Atclsynsp
.
KTGEAFMIVT
KRGEGFMIVT
PDKDKSLLDL
..... SPE .............
Atalsynsp
9
F . . . . . . Q.. M N E D V P I G D R
Atnaartsf
Atctrybr
"
..... D A H A T
DVAGKDSEQL
..... SNPLEC
FSNDNPLET
Atcplafa
t
AATTRQPEH
..... SPD .............
Atcborycu
..
..........
Atnacatco
Atcartsf
~
DVNEDGNKIA
FTHENPLET
Atnaartsa
.
HNHSKPLDIG
..... S P E . . . . . . . . . . . . .
Atnatorca
. . ~
HY GDEVFSSSTV
....................
DFKKFNSIDS
Atc3schpo
9
...... SVDK
Atc3sacce
Atmaescco
: : .
. . .
RS G D S C Y S S S T V
Atcphomsa
Atcqhomsa
:
............................
............................
GQGH .......................... ..........................
GQGH .......................... EKKNKKQDGA
IENRNKAKAQ
Atcqhomsa Atcrhomsa Atc3sacce Atmaescco Atmbsalty Atc3schpo Atnlsacce Atn3homsa !ii[iiii'~;:::iiii:iii.:: A t n 3 r a t n o Atn3galga Atnlhomsa Atn2homsa Atn3sussc Atnacatco Atnatorca i!!~!i!i'~,!?;i!!?! A t n a a r t s f Atnadrome Atnaartsa Atnahydat Athahomsa Atcartsf Atcbdrome Atcborycu Atcdhomsa Atcfratno Atctrybr Atcplafa Atalsynsp Atclsynsp Atclsacce Atclmycge Consensus ..........
i,~,@
::::::::::::::::::::::
..........................
:.,:.~-~::.~:~:~ .:.~:-:.~:,:,
............ ...............
............~
.............. ................
AVGVNSQTGI IFTLLGAGGE EEEKKD ..... KKAKQQDGA .......... AVGVNSQTGI ILTLLGVNED DEGEK ...... KKKGKKQGV PENRNKAKTQ SVGINSVYGQ TMTSLNAEP ............................... ATGANTWFGQ LAGRVSEQES EPN ........................... A T G S R T W F G S L A K S I V G T R T QT . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATGMQTQIGA IAAGLRQKGK LFQRPEKDEP NYRRKL .............. K T A L N S E I G K I A K S L Q G D S G L I S R .... DP S K S W L Q . . . . . . . . . . . . . . ATGDRTVMGR IAT..LASGL EVGK .......................... ATGDRTVMGR IAT..LASGL EVGK .......................... ATGDRTVMGR IAT..LASGL EVGK .......................... YTGDRTVMGR IAT..LASGL EGGQ .......................... ATGDRTVMGR IAT..LASGL EVGR .......................... YTGDRTVMGR IAT..LASGL EGGQ .......................... STGDRTVMGR IAT..LASGL EVGR .......................... NIGDHTVMGR IAT..LASGL EVGQ .......................... GIGDNTVMGR IAG..LASGL DTGE .......................... SCGDHTVMGR IAA..LASGL DTG ........................... NVGDDSVMGR IAC..LASSL DSGK .......................... RIGDNTVMGR IAN..LASGL GSGK .......................... NTGDRTIIGR IAS..LASGV ENEK .......................... GTGLNTAIGS IRTQMFE..TEEMK .......................... GTGLSTAIGK IRTEMSE..T EEIK .......................... TTGVSTEIGK IRDQMAA..T EQDK .......................... ATGVNTEIGK IRDEMVA..T EQER .......................... ATGLHTELGK IRSQMAA..V EPER .......................... RTGASTEIGTIERDVRE..QEEVK .......................... NIGMKTEIGHIQHAVIESNS EDTQ .......................... ATANATEMGQ ISQSMEK..Q VSLM .......................... ATGMNTELGR IATLLQS..V ESEK .......................... GTGTNTSFGA VFEMMNN..I EKPK .......................... A V G I K T Q V G K I A K T V D D S V T KL . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..G..T..G .........................................
501 550 Atxaleido .......................................... IHVILRRV Atxbleido .......................................... IHVILRRV Pmallyces .............................................. FQKV Pmalnicpl .............................................. FQKV Pmalarath .............................................. FQKV Pma4nicpl .............................................. FQKV Pma3arath .............................................. FQKV Pmalschpo .............................................. FTEV Pma2schpo .............................................. FTEV Pmalajeca .............................................. FTEV Pmalneucr .............................................. FTEV Pmalsacce .............................................. FTEV Pma2sacce .............................................. FTEV Pmalklula .............................................. FTEV Pmalcanal .............................................. FTEV P m a l z y g r o ....... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FTEV Atcphomsa DGAAMEMQPL KSEEGGDGDE KDKKKANLPK KEKSVLQGKL TKLAVQIGKA Atcqhomsa ..AAMEMQPL KSAEGGDAD..DRKKASMHK KEKSVLQGKL TKLAVQIGKA Atcrhomsa DGVALEIQPL NSQEGIDNEE KDKKAVKVPK KEKSVLQGKL TRLAVQIGKA Atc3sacce ............................... ESTPLQLHL SQLADNISVY
63
Atmaescco
.........................................
Atc3schpo
.................
Atn3ratno
........................................
Atmbsalty
:~{i : r i;ii! i:il i
iiI~i
Atnlsacce Atn3homsa Atn3galga
Atnlhomsa
Atn2homsa
Atn3sussc
Atnacatco
AFQQGISRV
......................................... NKY Y..LKVTSYY
VQRVLGLNVG
................. NTW ISTKKVTGAF .... L G T N V G ........................................ ........................................
AFDRGVNSV
TPLQRKLTVL
TPLHRKLSKL TPIAIEIEHF
TPIAIEIEHF
TPIAVEIEHF
........................................
TPIAAEIEHF
........................................
TPIAAEIEHF
........................................
TPIAMEIEHF TPISIEIEHF
........................................
Atnatorca
........................................
TPIAAEIEHF
i!ii'i
Atnadrome
........................................
TPIAKEIHHF
yi:ii!-~ : r::: !
Atnahydat
........................................
i :i~ x~il : :
Atnaartsf Atnaartsa Athahomsa Atcartsf
:y! ..
4:!i.
Atcbd[ome
Atcborycu
..... -.< ::... : i.-~~ii.
}-
;i-: :J..i
.-:.:-
i i,:.:!i:u
TPLQRKLDEF
Atcplafa
........................................
TPLQIKIDLF
Atclsynsp Atclsacce
Atclmycge
TPLTRKFAKF
........................................
TPLQQRLDKL
TPLQLTMDKL
........................................
...........................
SPLQQKLEKI
Consensus
..................................................
Atxaleido
MFSLCAISFM
LCMCCFIYLL
A ............
R F Y .... E T ..... F R H
Pmallyces
LTAIGNFCIC
SIAVGMIIEI
I ............
VMYPIQH
R K ..... Y R P
SIAIGIAIEI
V ............
VMYPIQH
RK ..... YRD
SIAVGIAIEI
V ............
T ............
AAF.YRS
V R ..... L A A
SSF.YRS
N P ..... I V Q
ACF.YRT
V G ..... I V $
ACF.YRT
V R ..... I V P
FWVQKRPWLA
ECTPIYIQY.
FV ..... KFF
ILILYFVIDN
FVINRRPWLP
ECTPIYIQY.
SMLLIRFMLV
MAPVVLLING
YTKGD .............
AYILFCIAII
LAIIVMAAHS
FHVTN ...............
Pmalnicpl
Pma4nicpl Pma3arath
Pmalschpo
551
MLALCAISFI LTAIGNFCIC
LTSIGNFCIC LTAIGNFCIC LTAIGNFCIC
LNGIGTILLV
LCMCCFIYLL
$IAVGMIIEI SIAIGMLVEI
LVLLTLFCIY
A ............ I ............
I ............
T ............
ii:--.>;-!/ :
LNGIGTILLI
LVIFTLLIVWV
............
Pma2sacce
LNGIGIILLV
LVIATLLLVWT
............
Pmalajeca Pmalsacce Pmalklula
LNGIGTVLLI LNGIGIILLV LNGIGTILLI
Pmalcanal
LNGIGTTLLV
Atcphomsa
GLLMSAITVI
Pmalzygro
Atcqhomsa Atcrhomsa Atc3sacce
Atmaescco
Atmbsalty
Atc3schpo
64
TPLQVKLDEF
........................................
Pmalneucr
!i"/!:
TPLQQKLDEF
........................................
LVILTLLCIY
i/i~:Vi:
TPLQQKLDEF
........................................
LNGIGTILLV
./{
TPLQQKLDEF
........................................
Pma2schpo
..
TPIAIEIEHF
........................................
:~!}' ;:i i/ -!-
TPIALEIEHF
........................................
........................................
Pmalarath
:,
TPIAREIEHF
Atcfratno
Atcdhomsa
Atxbleido
;::
........................................
TPLQQKLDEF
Atalsynsp
...
TPIAKEIAHF
........................................
Atctrybr
....
........................................
LVILTLLVVWV LVIATLLLVWT LVIVTLLLVWV
FVIVTLLVVWV
............
ILVLYFTVDT
FVVNKKPWLP
SWLLIRFMLI
MVPVVLLING
VMYPIQR
AAF.YRS SSF.YRS
ASF.YRT
............
GLVMSAITVI
FLVLFTRYLF
VMYPIQH
............
T ............
GCV.SAI.IL
VMYPIQH
ACF.YRT
LVVITLLLIW
GLLMSALTVF
R F Y .... E T ..... F R H
............
LNGIGVILLV
ILVLYFVIDT
600
YIIPEDGRFH
ACF.YRT
ECTPVYVQY. DLDPAQKGSK
FSKGD .............
R A ..... Y R P R K ..... Y R D R H ..... Y R D
V R ..... L A R N S ..... I V T
NG ..... IVR N K ..... I V R V R ..... I V P F V ..... K F F F V ..... K F F F M ..... N I F
WWEA
WVEA
..... A L F
..... S L F
E V ..... S I Y
Atnlsacce AVLLFWIAVL FAIIVMASQK FDVDK ............. A t n 3 h o m s a I Q L I T G V A V F L G V S F F I L S L I .... L G Y T W L RVEA..... AIYvIE
Atn3ratno Atn3galga Atnlhomsa Atn2homsa Atn3sussc Atnacatco Atnatorca Atnaartsf Atnadrome Atnaartsa Atnahydat Athahomsa Atcartsf Atcbdrome Atcborycu Atcdhomsa Atcfratno Atctrybr Atcplafa :i!j/jiiiiiiA t a l s y n s p Atclsynsp Atclsacce Atclmycge Consensus : ::.:::.:~:~!:i?:::!!
.......
.....
.....
.. ...
-~.;..- .: -.:~
.:ili.i~!/=,~!i=.;i.
i~:~i:!!:i)J
Atxaleido Atxbleido Pmallyces Pmalnicpl Pmalarath Pma4nicpl Pma3arath Pmalschpo Pma2schpo Pmalajeca Pmalneucr Pmalsacce Pma2sacce Pmalklula Pmalcanal Pmalzygro Atcphomsa Atcqhomsa Atcrhomsa Atc3sacce Atmaescco Atmbsalty Atc3schpo Atnlsacce Atn3homsa Atn3ratno
:i.i:=.Si~!:i}~!'~:i~il
I Q L I T G V A V F L G V S F F I L S L I . . . . . . . . . . . . . L G Y T W L EA ..... VIF I Q L I T G V A V F L G I S F F V L S L I . . . . . . . . . . . . . L G Y T W L EA ..... VIF I H I I T G V A V F L G V S F F I L S L I . . . . . . . . . . . . . L E Y T W L EA ..... VIF I Q L I T G V A V F L G V S F F V L S L I . . . . . . . . . . . . . L G Y S W L EA ..... VIF IHIITGVAVF LGVSFFILSL I L E Y T W L EA VIF I H I I T G V A V F L G V S F L L L S L V . . . . . . . . . . . . . L G Y S W L EA ..... VIF I H I I T G V A V F L G V S F F I L S L I . . . . . . . . . . . . . L G Y T W L EA ..... VIF IHIITGVAVF LGVTFFIIAF V L G Y H W L DA VVF I H L I T G V A V F L G V T F F V I A F I . . . . . . . . . . . . . L G Y H W L DA ..... VIF I H I I T A M A V S L A A V F A V I S F L . . . . . . . . . . . . . Y G Y T W L EA ..... AIF I H I V T G V A V F L G V S F L I I S L A . . . . . . . . . . . . . M G Y H W L EA ..... IIF V D I I A G L A I L F G A T F F I V A M C . . . . . . . . . . . . . I G Y T F L RA ..... MVF G E Q L S K V I S V I C V A V W A I N I G H F N D P A H G G S ....... WI KG ..... A I Y G E Q L S K V I S V I C V A V W A I N I G H F N D P A H G G S ....... WI KG ..... AIY G E Q L S K V I S L I C V A V W L I N I G H F N D P V H G G S ....... WI RG ..... A I Y G E Q L S K V I S L I C I A V W I I N I G H F N D P V H G G S ....... WI RG ..... A I Y G R Q L S H A I S V I C V A V W V I N I G H F A D P A H G G S ....... WL RG ..... A V Y G V L L S K V I G Y I C L V V F A V N L V R W Y A T H K P T K N E T F F T R Y I QP ..... SVH GQQLSKIIFVICVTVWIINF K H F S D P I H G S ........ F L Y G ..... CLY S H T L L Y V I V T L A A F T F A V G W ...... G R G G S P L E M . . . . . . . . . . . . . . . GNVLVSGALI LVAIWGLGV ...... L N G Q S W E D L . . . . . . . . . . . . . . . G K D L S L V S F I V I G M I C L V G I ...... IQGR S W L E M . . . . . . . . . . . . . . . G K W F S W F G L G L F A V V F L V Q T A L L G . . . . . . . . . . . F D N F T NN ..... WSI .................................................. 601 ALQFAVVVLV ALQFAVVVLV GIDNLLVLLI GIDNLLVLLI GIDNLLVLLI GIDNLLVLLI GIDNLLVLLI LLEYTLAITI LLEYTLAITI ILEFTLAITI ILEFTLAITI ILRYTLGITI ILRYTLGITI ILRYTLAITI ILRYTLAITI ILRYTLGITI I..IGVTVLV I..IGVTVLV I..IGITVLV I..TSITVIV .... A L S V A V .... A L A V A V .... A I S L G I .... A I C V A L L .... I G I I V L .... I G I I V
VSIPIALEIV VTTTLAVGSK HLSKHKIIVT VSIPIALEIV VTTTLAVGSK HLSKHKIIVT GGIPIAMPTV LSVTMAIGSH RLAQQGAITK GGIPIAMPTV LSVTMAIGSH RLAQQGAITK GGIPIAMPTV LSVTMAIGSH RLSQQGAITK GGIPIAMPTV LSVTMAIGSH RLSQQGAITK GGIPIAMPTV LSVTMAIGSH KLSQQGAITK IGVPVGLPAV VTTTMAVGAAYLAEKQAIVQ IGVPVGLPAV VTTTMAVGAAYLAKKKAIVQ IGVPVGLPAV VTTTMAVGAAYLAKKKAIVQ IGVPVGLPAV VTTTMAVGAAYLAKKKAIVQ IGVPVGLPAV VTTTMAVGAAYLAKKQAIVQ IGVPVGLPAV VTTTMAVGAAYLAKKQAIVQ VGVPVGLPAV VTTTMAVGAAYLAKKQAIVQ IGVPVGLPAV VTTTMAVGAAYLAKKQAIVQ VGVPVGLPAV VTTTMAGGAAYLAKKQAIVQ VAVPEGLPLA VTISLAYSVK KMMKDNNLVR VAVPEGLPLA VTISLAYSVK KMMKDNNLVR VAVPEGLPLA VTISLAYSVK KMMKDNNLVR VAVPEGLPLA VTLALAFATT RMTKDGNLVR GLTPEMLPMI VTSTLARGAV KLSKQKVIVK GLTPEMLPMI VSSNLAKGAIAMSRRKVIVK SIIPESLIAV LSITMAMGQK NMSKRRVIVR SMIPSSLVVVLTITMSVGAAVMVSRNVIVR ANVPEGLLAT VTVCLTVTAK RMARKNCLVK ANVPEGLLAT VTVCLTLTAK RMARKNCLVK
650 KLSAIEMMSG KLSAIEMMSG RMTAIEEMAG RMTAIEEMAG RMTAIEEMAG RMTAIEEMAG RMTAIEEMAG KLSAIESLAG KLSAIESLAG KLSAIESLAG KLSAIESLAG KLSAIESLAG KLSAIESLAG KLSAIESLAG KLSAIESLAG KLSAIESLAG HLDACETMGN HLDACETMGN HLDACETMGN VLRSCETMGS HLDAIQNFGA RLNAIQNFGA KLEALEALGG KLDSLEALGA NLEAVETLGS NLEAVETLGS
65
Atn3galga Atnlhomsa Atn2homsa Atn3sussc Atnacatco Atnatorca 7711 A t n a a r t s f ii!~ii1i! A t n a d r o m e iik~!:!11 iil!i::i~:!I A t n a a r t s a Atnahydat Athahomsa Atcartsf Atcbdrome Atcborycu Atcdhomsa Atcfratno Atctrybr Atcplafa Atalsynsp Atclsynsp Atclsacce Atclmycge Consensus -
.:..._
. . .
.
.
:::::::::::::::: ~ ::
--::7:;:
'-
iiii!'
................)i:)i:
651 AtxaleidoVNMLCSDKTG Atxbleido VNMLCSDKTG Pmallyces MDVLCSDKTG Pmalnicpl MDVLCSDKTG Pmalarath MDVLCSDKTG Pma4nicpl MDVLCSDKTG Pma3arath MDVLCSDKTG Pmalschpo VEVLCSDKTG Pma2schpo VEILCSDKTG Pmalajeca VEILCSDKTG Pmalneucr VEILCSDKTG Pmalsacce VEILCSDKTG Pma2sacce VEILCSDKTG Pmalklula VEILCSDKTG Pmalcanal VEILCSDKTG Pmalzygro VEILCSDKTG Atcphomsa ATAICSDKTG Atcqhomsa ATAICSDKTG Atcrhomsa ATAICSDKTG Atc3sacce ATAVCSDKTG Atmaescco MDILCTDKTG Atmbsalty MDVLCTDKTG Atc3schpo VTDICSDKTG Atnlsacce VNDICSDKTG Atn3homsa TSTICSDKTG Atn3ratno TSTICSDKTG Atn3galga TSTICSDKTG Atnlhomsa TSTICSDKTG Atn2homsa TSTICSDKTG
iiiiiili;.i!ii: ,:i :!:i!!!7::!!i!'::!~i
66
L .... I G I I V A N V P E G L L A T V T V C L T L T A K R M A R K N C L V K N L E A V E T L G S L .... I G I I V A N V P E G L L A T V T V C L T L T A K R M A R K N C L V K N L E A V E T L G S L .... I G I I V A N V P E G L L A T V T V C L T L T A K R M A R K N C L V K N L E A V E T L G S L .... I G I I V A N V P E G L L A T V T V C L T L T A K R M A R K N C L V K N L E A V E T L G S L .... I G I I V A N V P E G L L A T V T V C L T L T A K R M A K K N C L V K N L E A V E T L G S L .... I G I I V A N V P E G L L A T V T V C L T L T A K R M A R K N C L V K N L E A V E T L G S L .... I G I I V A N V P E G L L A T V T V C L T L T A K R M A S K N C L V K N L E A V E T L G S L .... I G I I V A N V P E G L L A T V T V C L T L T A K R M A S K N C L V K N L E A V E T L G S M .... I G I I V A K V P E G L L A T V T V C L T L T A K R M A K K N C L V R N L E A V E T L G S L .... I G I I V A N V P E G L L A T V T V C L T L T A K K M A K K N C L V K H L E A V E T L G S F .... M A I V V A Y V P E G L L A T VTVCLSLTAK RLASKNCWK NLEAVETLGS YFKIAVALAVAAIPEGLPAV ITTCLALGTR RMAKKNAIVR SLPSVETLGC YFKIAVAVAVAAIPEGLPAV ITTCLALGTR RMAKKNAIVR SLPSVETLGC YFKIAVALAVAAIPEGLPAV ITTCLALGTR RMAKKNAIVR SLPSVETLGC YFKIAVALAVAAIPEGLPAV ITTCLALGTR RMAKKNAIVR SLPSVETLGC YFKIAVALAVAAIPEGLPAV ITTCLALGTR RMARKNAIVR SLPSVETLGC CLKVAVALAVAAIPEGLPAV VTTCLALGTR RMAQHNALVR DLPSVETLGR YFKISVALAVAAIPEGLPAV ITTCLALGTR RMVKKNAIVR KLQSVETLGC .FEAAVALAV SGIPEGLPAV VTVTLAIGVNRMAKRNAIIR KLPAVEALGS .LSVGLSMAV AIVPEGLPAV ITVALAIGTQ RMVQRESLIR RLPAVETLGS .FQISVSLAVAAIPEGLPII VTVTLALGVL RMAKRKAIVR RLPSVETLGS ALIGAIALVVAIIPEGLVTF INVIFALSVQ KLTKQKAIIK YLSVIETLGS . . . . . . . . . . . . . P . G L . . . . T . . . . . . . . . . . . . . . . V . . L . A . E .... 700 TLTLNKMEIQ .............................. TLTLNKMEIQ .............................. TLTLNKLTVD .............................. TLTLNKLTVD .............................. TLTLNKLSVD .............................. TLTLNKLSVD .............................. TLTLNKLSVD .............................. TLTKNKLSLG .............................. TLTKNRLSLG .............................. TLTKNKLSLA .............................. TLTKNKLSLH .............................. TLTKNKLSLH .............................. TLTKNKLSLH .............................. TLTKNKLSLH .............................. TLTKNKLSLH .............................. TLTKNKLSLH .............................. TLTMNRM..TWQAYINEKH Y K K ..... VP E P E A I P P N I L TLTTNRM..TWQAYVGDVH YKE ..... IP D P S S I N T K T M TLTMNRM..TVVQAYIGGIH Y R Q ..... IP S P D V F L P K V L TLTENVM..TVVRGFPGNSK FDDSKSLPVS EQRKLNSKKV TLTQDKIVLE .............................. TLTQDNIFLE .............................. TITQGKMITR RVWI .......................... TLTQGKMLAR QIWI .......................... TLTQNRM..T VAHMW ...................... FDN TLTQNRM..T VAHMW ...................... FDN TLTQNRM..T VAHMW ...................... FDN TLTQNRM..T VAHMW ...................... FDN TLTQNRM..T VAHMW ...................... FDN
::::?ii u/iT
L........... ::i::;,::::::: :: +.::
++<-> ,<
........:1- ii::i: IK!:. !!. .!"~i IiI.i]Fi ~iii =======================
i: ": '8" L;! ....... 8] :-i>8: >::-::::-:'.:........ ..... .>-<,<- -.:
i" " 1!i:i"@:':>i:? -
:d~:)i~iii.:Si;.ii. :i!!ii:;ii]i:iii:ii"i
i:::ii!:i]iiii.i!i,i}!~ :::ii :!]: :i;ii!-i=i.=i il):i:-!!: i:f!:,!:-.i:!': :!i- !i:.ii::ii)iii;i.:iiiii-::i
@'ii!'i!!iiFi::: ?i-!i !i:.?iii[i?i!i-'-[?.
!i!:!~!I
i71=::i:=)i~i! iii~i;iii!iiiii~??ii!
A t n 3 s u s s c T S T I C S D K T G T L T Q N R M . . T V A H M W . . . . . . . . . . . . . . . . . . . . . . FDN A t n a c a t r T S T I C S D K T G T L T Q N R M . . T V A H M W . . . . . . . . . . . . . . . . . . . . . . FDN A t n a t o r c a T S T I C S D K T G T L T Q N R M . . T V A H M W . . . . . . . . . . . . . . . . . . . . . . FDN A t n a a r t s f T S T I C S D K T G T L T Q N R M . . T V A H M W . . . . . . . . . . . . . . . . . . . . . . FDG A t n a d r o m e T S T I C S D K T G T L T Q N R M . . T V A H M W . . . . . . . . . . . . . . . . . . . . . . FDN A t n a a r t s a T S T I C S D K T G T L T Q N R M . . T V A H M W . . . . . . . . . . . . . . . . . . . . . . FDQ A t n a h y d a t T S V I C S D K T G T L T Q N R M . . T V A H M W . . . . . . . . . . . . . . . . . . . . . . FDK A t h a h o m s a T S V I C S D K T G T L T Q N R M . . T V S H L W . . . . . . . . . . . . . . . . . . . . . . FDN Atcartsf TSVICSDKTGTLTTNQM..SVSRMF ......................... Atcbdrome TSVICSDKTG TLTTNQM..S VSRMF ......................... Atcborycu TSVICSDKTG TLTTNQM..S VCKMF ......................... Atcdhomsa TSVICSDKTG TLTTNQM..S VCRMF ......................... Atcfratno TSVICSDKTG TLTTNQM..S VCRMF ......................... Atctrybr CTVICSDKTGTLTTNMM..SVLHAF ......................... Atcplafa TTVICSDKTG TLTTNQMTTT VFHLFRESDS LTEYQLCQKG DTYYFYESSN Atalsynsp ATVVCSDKTG TLTENQM..T VQAVYAGGKH YEVSGGGYSP KGEFWQVMGE Atclsynsp VTTICSDKTG TLTQNKM..V VQQIHTLDHD FTVTGEGYVP AGHF..LIGG AtclsacceVNVICSDKTG TLTSNHM..T VSKLWCLDSM SNKLNVLSLD KNKKTKNSNG Atclmycge VQIICTDKTG TLTQNQMKV ............................... C o n s e n s u s .... C S D K T G T L T . N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Atxaleido Atxbleido Pmallyces !:77iii!i!:i~. Pmalnicpl Pmalarath i:,iii'ii)i@~',:):'~i! Pma4nicpl i,i~!iiiiiii)?,':~::ii!i!: i @: i;i i i: i ?:i:;~i:i P m a 3 a r a t h Pmalschpo iiii!i!~!i!iiilii~ Pma2schpo fi@iiii~iiill Pmalajeca Pmalneucr !!7}W!i:-;;i Pmalsacce Pma2sacce Pmalklula Pmalcanal !;!!;;i:i;~i;i;!fr Pmalzygro ;@i;iiii!I i!ii)iiiii~!i::!i!i! At c p h o m s a Atcqhomsa Atcrhomsa i,@i:ii~i !ii;iiii:iiii:ii!!il; Atc3sacce Atmaescco ,:!ii!:i!i:!!ii!i!i;!~;!!;i ';',i@iiiii!',!~ A t m b s a l t y };a[?2:::!2i~i!21~;f~2fi iiii:iiii@?,ii', Atc3schpo Atnlsacce ~:ili~i ili,!i!i i ~il:ii il, Atn3homsa Atn3ratno '!!~:i!.i'~',;i!i!i!i!i ?i i'~ A t n 3 g a l g a Atnlhomsa Atn2homsa lii:,i!ii!iiiii!!iiii', !;i!iiii!iiii~i!i! Atn3sussc Atnacatco A t n a t o r ca !!:!:@!:@:if!i::: <. ................... :....... .........
![!!:H!:i!!i!i!i:iii:!ii;11!
.............
........ .. .................. -..
:i! .ii..i::!ii::i!i!iliiii.!-:i ii21!Y~1-111.:,1
!ii:.ii:di:!iii:iii:i .... >::,::>.: ................
ii!i:i!iliiiiii)ii:ii.]ii[i!tl
........................
..: ..: ::.:.....: ......<:::.:,:,::. .........
.......................... !i..!i!iii!i::s
i!i!iii,ii!iiiii~i ......................... :...... ::::.:.>:.:. ...............
701 750 .................................................. .................................................. .................................................. .................................................. .................................................. .................................................. .................................................. .................................................. .................................................. .................................................. ..................................................
.... .:+.+.+.:.: .....
:.i~i~-;I~:Aii:i]i~i: ;r )ili(ii: i:.-!:::. ,9 ,.:.::.+:::. ........ :;::;.:! ::..; ~%:~:.~
::!:!:.F.ii:i!:.!i!::::.!:!!i~::i() i:iii!i:i!;iiii:. ;iii!
..... .,:>:::.::.::. ..........
.................................................. .................................................. ..................................................
iii~i:>i::;i!ild::::i:'!
.......
i:]!(iiiiiiii!i:i?;i:)': .................................. ~)!~:)!~:ii~F%iI::!I ............................
:::::::::::::::::::::::::::::::::: ........ . ...................... ~ ..............
~::;, ;::,~:~.~,~.~
....======================== ..... !iii:~iiii!!~W:ii;ii ::::::::::::::::::::::::::::
i,i!:ili:.;!i:i!iii:@:; ....=========================== ............ ::::::::::::::::::::::::::: ............... ...............
~:: 9 ,s.s::::~.~:~-:s.:,., ............
:::::.is
~z])]]("(] ............... ...............
K-K.i
SYLVTGISVN .......... E L L I N A I A I N DLIVNGISIN FEENCSSSLR NDLLANIVLN ............. NHTDISG ............. HHLDVSG .PSYGYLSVD TSDAN.NPTI .PRFGTITIS NSDDPFNPNE QIHEADTTED QSGTSFDKSS QIHEADTTED QSGTSFDKSS QIHEADTTED QSGTSFDKSS QIHEADTTEN QSGVSFDKTS QIHEADTTED QSGATFDKRS QIHEADTTEN QSGVSFDKTS QIHEADTTEN QSGTSFDRSS QIHEADTTEN QSGISFDKTS
C S ............................. S. . . . . . . . . STAFENRDYK KNDKNTNGSK NMSKNLSFLD KTSE . . . . . . . . . . . . . . . . . . . . . . . . . . VKSS .......................... GTVSGL ........... EAA MQDVLKEKKQ GNVSLIPRFS PYEYSHNEDG DVGILQNFKD HTWV .......................... HTWV .......................... ATWV .......................... ATWL .......................... PTWT .......................... ATWL .......................... DTWA .......................... LSWN ..........................
67
Atnaartsf
TITEADTTED
QSGAQFDKSS
Atnaartsa
KIVTADTTEN
QSGNQLYRGS
Atnadrome
b
k
Atnahydat
MIVEADTTED
Atcbdrome
Athahomsa Atcartsf Atcborycu
Atcdhomsa
-:
9 ,~
SSFLEFEMTG
STYE ..............
PI G E . V F L N G Q R
ILDRV.EGDT
CSLNEFTITG
STYA ..............
PI G E . V H K D D K P
IIDKV.DGDF
STYA ..............
CRLHEFTISG
TTYT ..............
SFFNKLKDEG
NVEALTDDGE
T L K ..... G D G S I K E Y E L K D S R F N EVDNVLLDGL EI..IVPNDY
PPVLEECLLT
RDLMLL.LAAG R .... E T L T I
..............
PE G E . V L K N D K P
PE G E . V R Q G E Q L IVSNSVTCEGRQ
EGSIDEADPY
SDYFSSDSKK
G .............................
.............................
Atclsacce
NLKNYLTEDV
Consensus
..................................................
Atxaleido
,(
Atxbleido
....... VDH FCFNSTTQTD
G .............................
LARA ..........................
.
:.
..................................................
Pma3arath
..................................................
Pma2schpo
.................................................. ..................................................
..................................................
..................................................
Pmalsacce
..................................................
Pma2sacce
Pmalklula
Pmalcanal Pmalzygro
At cphomsa
..
..................................................
Pmalajeca
Pmalneucr
i/
..................................................
Pmalarath
Pmalschpo
:
..................................................
..................................................
Pma4nicpl
...
751
Pmallyces
Pmalnicpl
.
At cqhomsa Atcrhomsa
.................................................. ..................................................
..................................................
.................................................. ..................................................
...... AYTS K .......................................
...... AYTT K ....................................... ...... AYTS
Atc3sacce K C K S R L S F F K .
.L
--
-
,
Atc3schpo
EMKNID.PSN
Atn3homsa
..................................................
Atmbsalty
Atn3galga
Atnlhomsa .
..
..
Atn2homsa
.s..
68
..................................................
RLYEKDLPED
QP ...................................... ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
..................................................
.................................................. ..................................................
..................................................
Atn3sussc
..................................................
Atnatorca
..................................................
Atnacatco
Atnaartsf
_._.:).
K .......................................
..................................................
Atn3ratno
.........
K .......................................
Atmaescco
Atnlsacce
-
CSLNEFSITG
.
!:-
:
KGFP ..........................
IFDKV.EGND
LTNDIYAGES
Atclmycge
-
PGFK ..........................
HIHTADTTED QSGQTFDQSS ETWR .......................... VFKDIPDDAAPELYQFELTGSTYE .............. PI G E . T F M Q G Q K
Atcplafa
Atclsynsp '
AGWK ..........................
LTWK ..........................
VVAEA.EAGA
Atalsynsp
QSGVQYDRTS
QSGIAHDKGS
Atcfratno Atctrybr
:--:
QIIEADTTED
Atnadrome Atnaartsa
..................................................
.................................................. .................................................. ..................................................
800
.................o
~...... ..
... ......
Atnahydat Athahomsa Atcartsf Atcbdrome Atcborycu Atcdhomsa Atcfratno Atctrybr Atcplafa
Atalsynsp Atclsynsp Atclsacce Atclmycge Consensus
Atxaleido );~:~:!i~i!:! ............. Atxbleido Pmallyces Pmalnicpl Pmalarath Pma4nicpl Pma3arath Pmalschpo Pma2schpo i!i ;i" iii!!' P m a l a j e c a Pmalneucr Pmalsacce Pma2sacce Pmalklula Pmalcanal Pmalzygro Atcphomsa Atcqhomsa Atcrhomsa Atc3sacce Atmaescco Atmbsalty Atc3schpo Atnlsacce Atn3homsa :i i i !:i!ij::: :!i~i A t n 3 r a t n o Atn3galga Atnlhomsa Atn2homsa Atn3sussc Atnacatco Atnatorca Atnaartsf i'i~i'S.~,i~:.i~.:i!?~':A t n a d r o m e Atnaartsa Atnahydat :::ii~:[!: !.:::~::::::~ ~!~:::::.: ;i:!i:iil.-!!!)i:ii A t h a h o m s a Atcartsf . . .
. . . . .
iii~:i~ii!!ii~iii!
.................................................. .................................................. INA..ADYDA ............... VKEI .... TTIC ............. IKA..ADYDT ............... LQEL .... STIC ............. IRS..GQFDG ............... LVEL .... ATIC ............. VNC..HQYDG ............... LVEL .... ATIC ............. VRC..GQFDG ............... LVEL .... ATIC ............. VSSPLEQDGA ............... LTKL .... ANIA ............. MKNDLNNNNN NNNNSSRSGA KRNIPLKEMK SNENTIISRG SKILEDKINK .................................................. .................................................. .................................................. .................................................. ..................................................
801 ....... ....... ....... .......
EQC F.TFEEGNDL EQC F.TFEEGNDL KAL IEVFAKGIDA KNL IEVFAKGVDA KNL VEVFCKGVEK RNL VEVFAKGVDK ....... KNL IEVYCKGVEK ....... EPF T...VSGVSG ....... EPY C...VEGVSP ....... EPY C...VSGVDP ....... DPY T...VAGVDP ....... EPY T...VEGVSP ....... EPY T...VEGVSP ....... EPY T...VEGVDP ....... EPY T...VEGVEP ....... EPY T...VEGVSS .............. ILPPEK .............. ILPPEK .............. ILPPEK .............. GNREDD ........... RVLHSAWLN ........... RVLMLAWLN ...SDQFIPL LKTCALCNLS ...MDLFQKW LETATLANIA ........ AL SHIAGLCNRA ........ AL SHIAGLCNRA ........ AL SHIAGLCNRA ........ AL SRIAGLCNRA ........ AL SRIAGLCNRA AL SRIAGLCNRA . . . . . . . . SL A R I A G L C N R A ........ AL SRIAALCNRA ........ AL VKIAALCSRA ........ AL SRIATLCNRA ........ EL IRVASLCSRA . . . . . . . . SL A K V A A L C S R A ........ AL CRVLTLCNRA .............. MMCNDS
KSTLVLAALA AKWREPPRDA KSTLVLAALA AKWREPPRDA DTWLMAARA S..RIENQDA DMVVLMAARA S..RTENQDA DQVLLFAAMA S RVENQDA EYVLLLAARA S RVENQDA DEVLLFAARA S..RVENQDA DDLVLTACLA ASRKRKGLDA DDLMLTACLA SSRKKKGLDA EDLMLTACLA ASRKKKGIDA EDLMLTACLA ASRKKKGIDA DDLMLTACLA ASRKKKGLDA DDLMLTACLA ASRKKKGLDA DDLMLTACLA ASRKKKGLDA DDLMLTACLA ASRKKKGLDA DDLMLTACLA ASRKKKGLDA EGGLPRHV ......... GNK EGALPRQV ......... GNK EGGLPRQV ......... GNK EDQLFKNVNK GRQEPFIGSK SHYQTGLKNL LDTAVLEGTD SSSQSGARNV MDRAILRFGE TV.NQTETGE ...WVVKGEP TVFKDDATDC ...WKAHGDP VFKGGQDNIP VLKRDVAGDA VFKGGQDNIP VLKRDVAGDA VFKGGQENVP ILKRDVAGDA VFQANQENLP ILKRAVAGDA VFKAGQENIS VSKRDTAGDA VFQANQENLP ILKRAVAGDA VFLAEQIDVP ILKRDVAGDA VFQAGQDSVP ILKRSVAGDA EFKPNQSTTP ILKREVTGDA EFKGGQDGVP ILKKEVSGDA EFKTEHAHLP VLKRDVNGDA EFKPNQNDVA VLRKECTGDA AFKSGQDAVP VPKRIVIGDA AIDFNEYKQA FEK...VGEA
850 LDTMVLG... LDTMVLG... IDTAIVGML. IDAAIVGML. IDAAMVGML IDACMVGML IDAAMVGML. IDKAFLKALK IDKAFLKALR IDKAFLKSLR IDKAFLKSLK IDKAFLKSLK IDKAFLKSLI IDKAFLKSLI IDKAFLKSLI IDKAFLKSLA TECALLGLL. TECGLLGFV. TECALLGFV. TETALLSLAR E E S A ...... G R I A ...... TEIALHVFSK TEIAIQVFAT SESALLKCIE SESALLKCIE SESALLKCIE SESALLKCIE SESALLKCIE SESALLKCIE SESALLKCIE SESALLKCIE SEAAILKCVE SEAALLKCME SEAAILKFAE SETAILKFVE SETALLKFSE TETALIVLGE
69
Atcbdrome
..............
Atcdhomsa Atctrybr
Atcborycu Atcfratno .. $11Ji. :
.
i --::i:-
::.:.:i-: -::,: ..:: :... :?.: :..:. :.: -. ~:.: : ::,..~. .: :~.:.1 --.~:. :.
.
.
7i i-?::?~ii'::/ .::~....... -.,.:.:. ...
Atxaleido
Atxbleido
Pmallyces
Pmalnicpl
QLEHRGDDWA
V ..... VGDP
SF..SQEHAI
F ..... LGNP TDVALL..EQ
YMCLVNCNEA
TETALTCLVE
..............
..............
MLCNDS
AVCNDA
NLCNNA
NIFCNDNSQI ALVASGEHWS
VKK...FGDS
I ..... VGDP
TEAALLVMSE TELALLHFVH
TEGALLASAA
TEGSLLTVAA
TEIALLEWKD A ......
851
..................................................
..................................................
..................................................
..................................................
..................................................
NY ................................................
Pma2schpo
NY ................................................
Pmalneucr
YY ................................................
Pma2sacce
EY ................................................
YY ................................................
QY ................................................ SY ................................................
Pmalcanal
NY ................................................
Atcphomsa
..LDLKR.DY
QDVR ....................................
..TDLKQ.DY
QAVR ....................................
Pmalzygro
Atmaescco
Atmbsalty
tc3
hpo
QY ................................................ ..LDLKQ.DY
LSLGLQPGEL
EPVR .................................... QYLR ....................................
..................................................
..............................................
Atnlsacce
KMDL ..............................................
Atn3ratno
..................................................
Atnlhomsa
..................................................
Atn3sussc
..................................................
Atn3homsa
Atn3galga Atn2homsa Atnacatco
Atnatorca Atnaartsf
..................................................
.................................................. .................................................. .................................................. .................................................. ..................................................
Atnadrome
..................................................
Atnahydat
..................................................
Atnaartsa Athahomsa
:;i!~:~ii?:i?!}ii:i~i:~A t c a r t s f
Atcbdrome Atcborycu
i;!i:%~:~!:~,;i,i!i!i?,i A t c d h o m s a
900
..................................................
Pma3arath
Atc3sacce
70
VEK...IGEA
..................................................
:~!ii~:}!i!i!i:~i,;A:=t c r h o m s a
:iA:~!i~ilil, i
SLHHNAATVQ
TETALTCLVE
Pmalarath
Atcqhomsa
i!::~i::i~!i:ii;i:?.!:~ii:ii:.::!i ilii~
VLCNDA
YEK...VGEA
YEK...VGEA
............. LCLCNNA SISKDA ..... NK...TGDP ...........................................
Pmalklula
!ilili::i~ii~Ti~ii:!i;:!i:ii~ ~.~:~::~,::.::..::: 1,::~..=~;~:
..............
ALDYNEAKGV
Atclmycge Consensus
Pmalsacce
.~-:-+~.- .-...~-~:.~ ........
ALDYNEAKGV
TETALTTLVE
..............
!~.',il;!i:!:!'SilS,-:P m a l a j e c a
...................::.:::::>:.: !i!.!i!!!!!!.:".i .......... ~!!i!:iiiii!:-:
ALCNDS
ALCNDS
TETALIVLAE
YEK...VGEA
Atclsynsp
Atalsynsp
Pmalschpo
.....................
..............
..............
FEK...VGEA
SLDFNETKGV
YCYSEYDYNF
Pma4nicpl
:.::.~:!::;ii~ii:"ii:i:ii~i::..-
AIDYNEFKQA
ALCNDS
Atcplafa
Atclsacce
?:-.::-1::.: :
IMCNDS
..............
.................................................. ..................................................
KLNPYNL
..........................
KMNVFNT
..........................
KLNSFSV KMNVFDT
SKAGKDR
RSAALVVRED
EVRNLSK
VERANACNSV
..........................
NKSGLDR
..........................
ELKGLSK
RSAAIACRGE IERANACNSV
Atcfratno
KMNVFDT
Atcplafa
NFDILPTFSK
Atclsynsp
KAGI .............................
Atctrybr L~!!i:i iil :i!iiil !iL!!I!
Atalsynsp ::t :::....
.
.
.
.
.
NTTPVQSSNK
KAGF .............................
D .......
KDKSPRGINK
RSQL .............................
EMPDIRN
TVQ .......
DLKTYYR
901
..................................................
..................................................
Pmalschpo
..................................................
Pmalajeca
..................................................
Pmalsacce
..................................................
Pmalklula
..................................................
Pmalneucr
Pma2sacce Pmalcanal
Pmalzygro
.................................................. .................................................. ..................................................
.................................................. ..................................................
..................................................
..................................................
Atcrhomsa
..................................................
Atc3sacce Atmaescco
Atmbsalty Atc3schpo Atnlsacce Atn3homsa Atn3ratno Atn3galga
Atnlhomsa
.................................................. .................................................. .................................................. .................................................. .................................................. ..................................................
.................................................. .................................................. ..................................................
..................................................
Atn2homsa
..................................................
Atnacatco
..................................................
Atnaartsf
..................................................
Atnaartsa
..................................................
Atn3sussc Atnatorca Atnadrome Atnahydat Athahomsa
.................................................. .................................................. .................................................. .................................................. ..................................................
Atcartsf
MDTRW
.............................................
At cbo rycu
IRQLM
.............................................
At cbdr ome Atcdhomsa
Atcfratno
Atctrybr
Atcplafa
950
..................................................
Atcphomsa Atcqhomsa
V .........
..................................................
Pma4nicpl
Pma2schpo
VLP .......
.................................................. ..................................................
Pma3arath
SAVNAFRTL
FFSSKNDNSH
QKP .......
Pmalnicpl
Pmalarath
VERAGACNSV
SQAGLAS
DPEGLQR
..................................................
.
:::,,::::,::
NNKMPAEYEK
DLKGLSR
Consensus
Atxbleido !ii?:i),.i!i~:i}~i~i!:!:i:,)i:,.:i,: P m a l l y c e s .
..........................
LANF .............................
Atxaleido
.
..........................
Atclsacce
Atclmycge
.
KFANIKG
IETKW IKQLM
IKQLM
CEGKW
............................................. .............................................
.............................................
.............................................
ITSTLNENDK
NLKNANHSNY
TTAQATTNGY
EAIGENTFEH
GTSFENCFHS
Vl
.
.
.
. .
[. .:. ./. .:
.......... . ~
..
~..~.~.
~
. . . . .9: ~ : ..~ ....... . . . . ~..,:.~...~ .:.:::
...
....
!" I.:.:/ ' . !Y ;I.
....... ,:~..~..~
::
~!Yi?::.. :i .,,--s-: ....
.....
:-..-..::..
..:~.~...
....,.:~
.
..
.
.::~:~.-~. : ~:.~
.: .,.. . .,:.. .~: .~..
.
..
..::.:......
72
Atalsynsp Atclsynsp Atclsacce Atclmycge Consensus
.................................................. .................................................. .................................................. .................................................. ..................................................
Atxaleido Atxbleido Pmallyces Pmalnicpl Pmalarath Pma4nicpl Pma3arath Pmalschpo Pma2schpo Pmalajeca Pmalneucr Pmalsacce Pma2sacce Pmalklula Pmalcanal Pmalzygro Atcphomsa Atcqhomsa Atcrhomsa Atc3sacce Atmaescco Atmbsalty Atc3schpo Atnlsacce
951 i000 ........................... AAD LDECDNYQQL NFVPFDPTTK ........................... AAD LDECDNYQQL NFVPFDPTTK ........................... ADP KEARAGIREI HFLPFNPTDK ........................... ADP KEARAGIREI HFLPFNPTDK ........................... ADP KEARAGIREV HFLPFNPVDK ........................... ADP KEARAGIREV HFLPFNPVDK ........................... ADP KEARAGIREI HFLPFNPVDK ........................... PGP RSMLTKYKVI EFQPFDPVSK ........................... PKA KDQLSKYKVL DFHPFDPVSK ........................... PRA KSVLTQYKVL EFHPFDPVSK ........................... PRA KSVLSKYKVL QFHPFDPVSK ........................... PKA KDALTKYKVL EFHPFDPVSK ........................... PKA KDALTKYKVL EFHPFDPVSK ........................... PRA KAALTKYKLL EFHPFDPVSK ........................... PRA KAALPKYKVI EFQPFDPVSK ........................... PKA KGALTKYKVL EFHPFDPVSK ........................... N E I PE .... E A L Y K V Y T F N S V R K ........................... SQM PE .... E K L Y K V Y T F N S V R K ........................... N E V PE .... E K L Y K V Y T F N S V R K ........................... DQP MEKFNIEKVVQTIPFESSRK .............................. RSLASRWQKI DEIPFDFERR .............................. PSTKARFIKR DELPFDFVRR ...GKEDLLK TNT PHNALTGEKSTNQSNENDQSSLSQHNEKPGSAQFE FVREYPFDSEIKHI AEFPFDSTVK
Atn3homsa Atn3ratno Atn3galga Atnlhomsa Atn2homsa Atn3sussc Atnacatco Atnatorca Atnaartsf Atnadrome Atnaartsa Atnahydat Athahomsa Atcartsf Atcbdrome Atcborycu Atcdhomsa Atcfratno Atctrybr Atcplafa Atalsynsp Atclsynsp Atclsacce
........................ ........................ ........................ ........................ ........................ ........................
LSSGSV KLMRERNKKV LSSGSV KLMRERNKKV LSSGSV KVMRERNKKV LCCGSV KEMRERYAKI LSCGSV RKMRDRNPKV LCCGSV KEMRERYTKI LCCGSV KEMREKFTKV ........................ LCCGSV SQMRDRNPKI ........................ LTTGET EAIRKRNKKI ........................ LALGDV MNIRKRNKKI ........................ MSTGSV MNIRSKQKKV ........................ LSVGNV MDIRAKNKKV ........................ LTLGNA MGYRDRFPKV ..................................... KKE ..................................... KKE ..................................... KKE ..................................... KKE ..................................... QKE ..................................... KKN KLGNKINTTS THNNNNNNNN NSNSVPSECI SSWRNECKQI ...................................... RL ...................................... RQ ...................................... KV
AEIPFNSTNK AEIPFNSTNK AEIPFNSTNK VEIPFNSTNK AEIPFNSTNK VEIPFNSTNK AEIPFNSTNK VEIPFNSTNK CEIPFNSANK AEVPFNSTNK SEIPFNSANK TEIPFNSTNK CEIPFNSTNK FTLEFSRDRK FTLEFSRDRK FTLEFSRDRK FTLEFSRDRK FTLEFSRDRK ATLEFTRKRK KIIEFTRERK DSIPFESDYQ DEIPFTSERK QELPFNSKRK
Atclmycge Consensus i!:~3'~i~i~; :~i!
Atxaleido :iiii~iiii:,ili!~ii ~ili A t x b l e i d o Pmallyces Pmalnicpl Pmalarath Pma4nicpl Pma3arath Pmalschpo Pma2schpo !....Y~ii:/!:,}1 P m a l a j e c a Pmalneucr ;.:~:::::::::::::::::::. ~q Pmalsacce Pma2sacce :ii~'~iii:q i:i: P m a l k l u l a Pmalcanal Pmalzygro Atcphomsa Atcqhomsa Atcrhomsa Atc3sacce Atmaescco !:"!~}5~::::~-!::!:.{!!;i Atmbsalty Atc3schpo Atnlsacce Atn3homsa Atn3ratno Atn3galga Atnlhomsa Atn2homsa Atn3sussc Atnacatco i!!!.:il;!i!:;i:;;4:i .i Atnatorca Atnaartsf ~i:~=:i/{;};~i-)!-ii Atnadrome :::::2::i:i-!E !!!:i:s Atnaartsa Atnahydat ,:_ :::,ss: :: Athahomsa Atcartsf ......... ..:, Atcbdrome : :-v."? : Atcborycu Atcdhomsa ,~=====================i:i ::i, i:}:!;:!:!!:i.i~i!i; A t c f r a t n o Atctrybr }~i!iiii}!=il}A!~it=c p l a f a Atalsynsp Atclsynsp Atclsacce ..... Atclmycge ;r.:::ii;-ii;:::.}:i2.1! Consensus .
.
.
,
ilil)ii;ii!!i!{i! :i!%i
ii'.f;:ilifi!i :i
@;
'-!!=!i,'i!!i;i-:'!;!:!',:il ~)ii}!!:.iiiiii!
........................................ YEKAFDSIRK ........................................... PF .... K i001 1050 R T A A T L V D R R SGEK . . . . . . . . F D V T K G A P H V I L Q M V . . . . . . . . Y N Q D E R T A A T L V D R R SGEK . . . . . . . . F D V T K G A P H V I L Q M V . . . . . . . . Y N Q D E RTALTYLD.G EGKM ........ HRVSKGAP EQILNLA ........ HNKSD RTALTYLD.G EGKM ........ HRVSKGAP EQILNLA ........ HNKSD RTALTYID.S DGNW ........ HRVSKGAP EQILDLA ........ NARPD RTALTYID.N NNNW ........ HRASKGAP EQILDLC ........ NAKED RTALTFID.S NGNW ........ HRVSKGAP EQILDLC ........ NARAD KVTAYVQA.P DGTR ........ ITCVKGAP LWVLKTV ........ EEDHP KITAYVEA.P DGQR ........ ITCVKGAP LWVFKTV ........ QDDHE KVSAVVLS.P QGER ........ ITCVKGAP LSVLKTV ........ EEDHP KVVAWES.P QGER ........ ITCVKGAP LFVLKTV ........ EEDHP KVTAVVES.P EGER ........ IVCVKGAP LFVLKTV ........ EEDHP KVTAVVES.P EGER ........ IVCVKGAP LFVLKTV ........ EEDHP KVTAIVES.P EGER ........ IICVKGAP LFVLKTV ........ EEEHP KVTAIVES.P EGER ........ IICVKGAP LFVLKTV ........ EDDHP KVTAVVES.P EGER ........ IICVKGAP LFVLKTV ..... EEDHP S M S T V L K N S D GS.. YRIFSKGAS EIILKKCFKI LSANGEAKVF S M S T V I K L P D ES.. FRMYSKGAS EIVLKKCCKI LNGAGEPRVF S M S T V I R N P N GG . . . . . . . . . F R M Y S K G A S E I I L R K C N R I L D R K G E A V P F W A G L V V K Y K E GKN .... KKP F Y R F F I K G A A E I V S K N C S Y K R N S D D T L E E I RMSVVVAE.N TEHHQ ........ LVCKGAL QEILNVCSQV RHN..GEIVP RVSVLVEDAQ HGDRC ........ LICKGAV EEMMMVATHL REG..DRVVA R M . A V I Y E D Q QG ........ Q Y T V Y A K G A V E R I L E R C S T S NG ...... ST R M S S V Y Y N N H NE TYNIYGKGAF ESIISCCSSW YGKDGVKITP Y Q L S I H E . T E DPN ..... DN R Y L L V M K G A P E R I L D R C S T I L L Q . . G K E Q P Y Q L S I H E . T E DPN ..... DN R Y L L V M K G A P E R I L D R C A T I L L Q . . G K E Q P Y Q L S I H E . T E DPN ..... DN R Y L L V M K G A P E R I L D R C S T I L L Q . . G K E Q P Y Q L S I H K N P N TS ...... EP Q H L L V M K G A P E R I L D R C S S I L L H . . G K E Q P Y Q L S I H E . R E DS PQ S H V L V M K G A P E R I L D R C S T I L V Q GKEIP Y Q L S I H K N P N TA EP R H L L V M K G A P E R I L D R C T S I LIH GKEQP Y Q L S V H K I P S GGK ES Q H L L V M K G A P E R I L D R C A T I M I Q GKEQL Y Q L S I H E . . N D K A ..... DS R Y L L V M K G A P E R I L D R C S T I L L N . . G E D K P F Q V S I H E . N E DKS ..... DG R Y L L V M K G A P E R I L E R C S T I F M N . . G K E I D Y Q V S I H E . T E DTN ..... DP R Y L L V M K G A P E R I L E R C S T I F I N . . G K E K V Y Q V S V H E R E D KSG . . . . . . . . Y F L V M K G A P E R I L E R C S T I L I D . . G T E I P Y Q V S V H E Q E N SSG . . . . . . . . Y L L V M K G A P E K V L E R C S T I L I N . . G E E Q P F Q L S I H T L . E DPR... DP R H L L V M K G A P E R V L E R C S S I L I K . . G Q E L P S M S S Y C V P L . KAG LLSNGPKMFVKGAPEGVLDRCTHVRVG TKKV.P SMSSYCTPL. KAS...RLGT GPKLFVKGAP EGVLERCTHA RVG.TTKV.P SMSVYCSPA. KSS...RAAV GNKMFVKGAP EGVIDRCNYV RVG.TTRV.P SMSVYCTPN. KPS...RTSM S.KMFVKGAP EGVIDRCTHI RVG.STKV.P SMSVYCTPT. RAD...PKAQ GSKMFVKGAP ESVIERCSSV RVG.SRTV.P SMSVHVTSTV TGS...PASS TNNLFVKGAP EEVLRRSTHV MQDNGAVV.Q L M S V I V E N K K K ......... E I I L Y C K G A P E N I I K N C K Y Y .LTKNDIR.P YMA ........... TLHDGD GRTIYVKGSV ESLLQRCESM LLDDG.QMVS RMSVVVADLG ETTLTIREGQ PYVLFVKGSA ELILERCQHC .FGNA.QLES LMATKILN ........ PVDN KCTVYVKGAF ERILEYSTSY LKSKGKKTEK LMTVVVQKDN R .......... FIVIVKGAP DVLL ............... P . . . . . . . . . . . . . . . . . . . . . . . . . . KGAP ...L . . . . . . . . . . . . . . . .
73
Plasma membrane cation-transporting ATPase filmily 1051 Ii00 Atxaleido INDEVVDI .... IDSL...A A R G V R C L S V A KTD ........ QQGRWHMA. Atxbleido INDEVVDI .... IDSL...A A R G V R C L S V A KTD ........ QQGRWHMA. Pmallyces IERRVHTV .... IDKF...A E R G L R S L G V A YQEVPEGRKE SAGGPWQFI. Pmalnicpl IERRVHAV .... IDKF...A E R G L R S L G V A YQEVPEGRKE SAGGPWQFI. Pmalarath LRKKVLSC .... IDKY...A E R G L R S L A V A RQVVPEKTKE SPGGPWEFV. Pma4nicpl VRRKVHSM .... M D K Y . . . A E R G L R S L A V A RRTVPEKSKE SPGGRWEFV. Pma3arath LRKRVHST .... IDKY...A ERGLRSLAVS RQTVPEKTKE SSGSPWEFV. Pmalschpo IPEDVLSAYK D K V G D L . . . A SRGYRSLGVA RK ........ IEGQHWEIM. Pma2schpo V P E A I T D A Y R E Q V N D M . . . A SRGFRSLGVA RK ........ ADGKQWEIL. Pmalajeca IPDEVDSAYK N K V A E F . . . A T R G F R S L G V A RK ........ RGEGSWEIL. Pmalneucr IPEEVDQAYK N K V A E F . . . A T R G F R S L G V A RK ........ RGEGSWEIL. Pmalsacce IPEDVHENYE N K V A E L . . . A SRGFRALGVA RK ........ RGEGHWEIL. Pma2sacce IPEDVHENYE N K V A E L . . . A SRGFRALGVA RK ........ RGEGHWEIL. Pmalklula IPEDVRENYE N K V A E L . . . A SRGFRALGVA RK ........ RGEGHWEIL. Pmalcanal IPEDVHENYQ N T V A E F . . . A SRGFRSLGVA RK ........ RGEGHWEIL. Pmalzygro IPEDVHENYE N K V A E L . . . A SRGFRALGVA RK ........ RGEGHWEIL. A t c p h o m s a R P R D R D D I V K T V I E P M . . . A SEGLRTICLA F R D F P A G E . . . P E P E W D N E N A t c q h o m s a R P R D R D E M V K KVIEPM...A CEWLRTICVA YRDFPSS .... PEPDWDNEN AtcrhomsaKNKDRDDMVRTVIEPM...ACDGLRTICIAYRDF..DD...TEPSWDNEN Atc3sacce N E D N K K E . T D D E I K N L . . . A SDALRAISVA H K D F C E C D S W PPEQLRDKDS Atmaescco L D D I M L R K I K R V T D T L N R Q G ...LRVVAVA T K Y L P A R E G D ..YQRAD... A t m b s a l t y LTETRRELLL A K T E D Y N A Q G ...FRVLLIA T R K L D G S G N N PTLSVED... Atc3schpo LEEPDRELII A Q M E T L A A E G L R V L . A L A T K VIDKADNWE ...... TLPRD Atnlsacce L T D C D V E T I R KNVYSLSNEG L R V L . G F A S K SFTKDQVNDD Q L K N I T S N R A A t n 3 h o m s a L D E E M K E A F Q N A Y L E L G G L G ERVL.GFCHY YLPEEQYPQG FAFDC.DDVN Atn3ratno LDEEMKEAFQ N A Y L E L G G L G ERVL.GFCHY Y L P E E Q F P K G FAFDC.DDVN A t n 3 g a l g a LDEEMKEAFQ N A Y L E L G G L G ERVL.GFCHF Y L P E E Q Y P K G FAFDC.DDVN A t n l h o m s a L D E E L K D A F Q N A Y L E L G G L G ERVL.GFCHL FLPDEQFPEG FQFDT.DDVN A t n 2 h o m s a L D K E M Q D A F Q N A Y M E L G G L G ERVL.GFCQL N L P S G K F P R G FKFDT.DELN Atn3sussc L D E E L K D A F Q N A Y L E L G G L G E R V L . G F C H L FLPDEQFPEG FQFDT.DDVN Atnacatco L D D E I K E S F Q N A Y L E L G G L G ERVL.GFCHF YLPDEQFPEG FQFDA.DDVN A t n a t o r c a L N E E M K E A F Q N A Y L E L G G L G E R V L . G F C H L KLSTSKFPEG YPFDV.EEPN Atnaartsf M T E E L K E A F N N A Y M E L G G L G E R V L . G F C D Y L L P L D K Y P H G FAFNA.DDAN Atnadrome L D E E M K E A F N N A Y M E L G G L G ERVL.GFCDF M L P S D K Y P N G FKFNT.DDIN A t n a a r t s a LDNHMKECFN N A Y M E L G G M G ERVL.GFCDF E L P S D Q Y P R G YVFDA.DEPN Atnahydat LKDDVIEIYN KAYDELGGLG ERVL.GFCHY Y L P V D Q Y P K G FLFKTEEEQN A t h a h o m s a L D E Q W R E A F Q T A Y L S L G G L G E R V L . G F C Q L Y L N E K D Y P P G YAFDV.EAMN Atcartsf MTPAIMDKIL EVTRAYG.TG R D T L R C L A L A TIDDPMDPKD M D I I D S T K F V Atcbdrome LTSALKAKIL A L T G Q Y G . T G R D T L R C L A L A VADSPMKPDE MDLGDSTKFY A t c b o r y c u M T G P V K E K I L SVIKEWG.TG R D T L R C L A L A TRDTPPKREE MVLDDSSRFM A t c d h o m s a M T S G V K Q K I M SVIREWG.SG SDTLRCLALA THDNPLRREE MHLEDSANFI Atcfratno L S A T S R E H I L A K I R D W G . S G SHTLRCLALA T R D T P P R K E D M Q L D D C S Q F V Atctrybr LSATHRKRII E Q L D K I S . G G A N A L R C I G F A FKPTKA.VQH VRLNDPATFE Atcplafa LNETLKNEIHNKIQNM...GKRALRTLSFAYKK..LSSKDLNIKNTDDYY Atalsynsp I...DRGEIE E N V E . . D . M A Q Q G L R V L A F A KKTVEPHHHA IDHGD ..... Atclsynsp L T A A T R Q Q I L A A G E . . A . M A SAGMRVLGFA Y R . . . P S A I A DVDED ..... Atclsacce L T E A Q K A T I N E C A N . . S . M A SEGLRVFGFA KLTLSDSSTP LT.ED ..... Atclmycge L C N N V Q N E V K N I E N L L D Q S A G Q G L R T L A V A LKVL .... YK FDQNDQKQID Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii01 1150 Atxaleido . . . . . . . . . . . . . . . . . . . G ILTFLDPPRP DTKDTIRRSK EYGVDVKMIT Atxbleido . . . . . . . . . . . . . . . . . . . G ILTFLDPPRP DTKDTIRRSK EYGVDVKMIT
74
Pmallyces Pmalnicpl Pmalarath Pma4nicpl Pma3arath Pmalschpo Pma2schpo Pmalajeca Pmalneucr Pmalsacce Pma2sacce Pmalklula Pmalcanal Pmalzygro Atcphomsa Atcqhomsa Atcrhomsa Atc3sacce Atmaescco Atmbsalty Atc3schpo Atnlsacce Atn3homsa Atn3ratno Atn3galga Atnlhomsa Atn2homsa Atn3sussc Atnacatco Atnatorca Atnaartsf Atnadrome Atnaartsa Atnahydat Athahomsa Atcartsf Atcbdrome Atcborycu Atcdhomsa Atcfratno Atctrybr Atcplafa Atalsynsp Atclsynsp Atclsacce Atclmycge Consensus
................... A ................... G ................... G ................... G ................... G ................... G ................... G ................... G ................... G ................... G ................... G ................... G ................... G ................... G DIVTGLTCI .......... A DILNELTCI .......... C EILTELTCI .......... A PNIAALDLLF NSQKGLILDG ..ESDLILE .......... G ..ETELTIE .......... G VAESSLEFV .......... S TAESDLVFL .......... G FTTDNLCFV .......... G FTTDNLCFV .......... G FATDNLCFV .......... G FPIDNLCFV .......... G FPTEKLCFV .......... G FPLDNLCFV .......... G FPTENLCFV .......... G FPITDLCFV .......... G FPLTGLRFA .......... G FPIDNLRFV .......... G FPISGLRFV .......... G FPLEGLCFL .......... G FPSSGLCFA .......... G KYEQNCTFV .......... G QYEVNLTFV .......... G EYETDLTFV .......... G KYETNLTFV .......... G QYETGLTFV .......... G DVESDLTFV .......... G KLEQDLIYL .......... G .IETGLIFL .......... G .AETDLTWL .......... G .LIKDLTFT .......... G ELENNLEFL .......... G ................... G
Atxaleido Atxbleido Pmallyces Pmalnicpl Pmalarath
1151 GDHLLIAKEM CRMLDLDPN GDHLLIAKEM CRMLDLDPN GDQLAIGKET GRRLGMGTN GDQLAIGKET GRRLGMGTN GDQLAIGKET GRRLGMGTN
LLPLFDPPRH DSAETIRRAL NLGVNVKMIT LLPLFDPPRH DSAETIRRAL NLGVNVKMVT LLPLFDPPRH DSAETIRRAL NLGVNVKMIT LLPLFDPPRH DSAETIRRAL NLGVNVKMIT VLPLFDPPRH DSAETIRRAL DLGVNVKMIT IMPCSDPPRH DTARTISEAK RLGLRVKMLT IMPCSDPPRH DTARTIHEAI GLGLRIKMLT IMPCSDPPRH DTAKTINEAK TLGLSIKMLT IMPCMDPPRH DTYKTVCEAK TLGLSIKMLT VMPCMDPPRD DTAQTVSEAR HLGLRVKMLT VMPCMDPPRD DTAQTINEAR NLGLRIKMLT VMPCMDPPRD DTAQTVNEAR HLGLRVKMLT IMPCMDPPRD DTAATVNEAR RLGLRVKMLT VMPCMDPPRD DTAATVNEAK RLGLSVKMLT VVGIEDPVRP EVPDAIKKCQ RAGITVRMVT VVGIEDPVRP EVPEAIRKCQ RAGITVRMVT VVGIEDPVRP EVPDAIAKCK QAGITVRMVT LLGIQDPLRA GVRESVQQCQ RAGVTVRMVT YIAFLDPPKE TTAPALKALK ASGITVKILT MLTFLDPPKE SAGKAIAALR DNGVAVKVLT LVGIYDPPRT ESKGAVELCH RAGIRVHMLT LIGIYDPPRN ETAGAVKKFH QAGINVHMLT LMSMIGPPRA AVPDAVGKCR SAGIKVIMVT LMSMIDPPRA AVPDAVGKCR SAGIKVIMVT LMSMIDPPRA AVPDAVGKCR SAGIKVIMVT LISMIDPPRA AVPDAVGKCR SAGIKVIMVT LMSMIDPPRA AVPDAVGKCR SAGIKVIMVT LISMIDPPRA AVPDAVGKCR SAGIKVIMVT LMSMIDPPRA AVPDAVGKCR SAGIKVIMVT LMSMIDPPRA AVPDAVGKCR SAGIKVIMVT LMSMIDPPRA AVPDAVAKCR SAGIKVIMVT LMSMIDPPRA AVPDAVAKCR SAGIKVIMVT LMSMIDPPRA AVPDAVSKCR SAGIKVIMVT LLSMIDPPRA AVPDAVSKCR SAGIKVIMVT LVSMIDPPRA TVPDAVLKCR TAGIRVIMVT VVGMLDPPRK EVLDAIERCRAAGIRVIVIT VVGMLDPPRK EVFDSIVRCRAAGIRVIVIT VVGMLDPPRK EVMGSIQLCR DAGIRVIMIT CVGMLDPPRI EVASSVKLCR QAGIRVIMIT CVGMLDPPRP EVAACITRCS RAGIRWMIT ACGMLDPPRE EVRDAIVKCR TAGIRVVVIT GLGIIDPPRK YVGRAIRLCH MAGIRVFMIT LQGMIDPPRP EAIAAVHACH DAGIEVKMIT LMGQIDAPRP EVREAVQRCR QAGIRTLMIT LIGMNDPPRP NVKFAIEQLL QGGVHIIMIT FVSLQDPPRK ESKEAILACK KANITPIMIT ..... D P P R . . . . . . . . . . . . . G . . V . M . T
............................. ............................. ............................. ............................. .............................
1200 IL IL MY MY MY
75
,:-,.
~
..
.
: . . .
)<-
..
. :.-.:;: , - >
: .....
:
...
i:!) :ti
76
Pma4nicpl Pma3arath Pmalschpo Pma2schpo Pmalajeca Pmalneucr Pmalsacce Pma2sacce Pmalklula Pmalcanal Pmalzygro Atcphomsa Atcqhomsa Atcrhomsa Atc3sacce Atmaescco Atmbsalty Atc3schpo Atnlsacce Atn3homsa Atn3ratno Atn3galga Atnlhomsa Atn2homsa Atn3sussc Atnacatco Atnatorca Atnaartsf Atnadrome Atnaartsa Atnahydat Athahomsa Atcartsf Atcbdrome Atcborycu Atcdhomsa Atcfratno Atctrybr Atcplafa Atalsynsp Atclsynsp Atclsacce Atclmycge Consensus
G D Q L A I A K E T G R R L G M G T N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MY G D Q L A I A K E T G R R L G M G S N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MY G D A V D I A K E T A R Q L G M G T N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IY G D A V G I A K E T A R Q L G M G T N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VY G D A V G I A R E T S R Q L G L G T N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VY GDAVGIARET SRQLGLGTN IY G D A V G I A K E T C R Q L G L G T N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IY G D A V G I A K E T C R Q L G L G T N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IY G D A V G I A K E T C R Q L G L G T N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IY G D A V G I A K E T C R Q L G L G T N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IY GDAVGIAKET CRQLGLGTN IY G D N I N T A R A I A T K C G I L H . . . P G E D . . . . . . . . . . . . . . . . . . . . . . FLC G D N I N T A R A I A I K C G I I H . . . P G E D . . . . . . . . . . . . . . . . . . . . . . FLC G D N I N T A R A I A T K C G I L T . . . P G D D . . . . . . . . . . . . . . . . . . . . . . FLC G D N I L T A K A I A R N C A I L S T D ISSEA . . . . . . . . . . . . . . . . . . . . . . YSA GDSELVAAKV CHEVGLDAGE ............................. V GDNPVVTARI CLEVGIDTHD ............................. I G D H P E T A K A I A R E V G I I P P . . . . . . . . . . . . . . . . . . FIS D R D P N M S W M V G D F V G T A K A I A Q E V G I L P T N . . . . . . . . . . . . . . . . . LYH Y S Q E I V D S M V GDHPITAKAI AKGVGIISEG NETVEDIAAR LNIP...VSQVNPRDAKACV GDHPITAKAI AKGVGIISEG NETVEDIAAR LNIP...VSQVNPRDAKACV GDHPITAKAI AKGVGIISEG NETVEDIAAR LNIP...VSQVNPRDAKACV GDHPITAKAI AKGVGIISEG NETVEDIAAR LNIP...VSQVNPRDAKACV GDHPITAKAI AKGVGIISEG NETVEDIAAR LNIP...MSQVNPREAKACV GDHPITAKAI AKGVGIISEG NETVEDIAAR LNIP...VSQVNPRDAKACV GDHPITAKAI AKGVGIISEG NETVEDIAAR LNIP...VNEVNPRDAKACV GDHPITAKAI AKGVGIISEG NETVEDIAAR LNIP...VNQVNPRDAKACV G D H P I T A K A I A K S V G I I S E G N E T V E D I A A R LNIP VSEVNPRDAKAAV GDHPITAKAI AKSVGIISEG NETVEDIAQR LNIP...VSEVNPREAKAAV GDHPITAKAI ARQVGIISEG HETVDDIAAR LNIP...VSEVNPRSAQAAV GDHPITAKAI AKGVGIISEG NECEEDIALR LNIPLEDLSE DQKKSAKACV GDHPITAKAIAASVGIISEG SETVEDIAAR LRVP...VDQVNRKDARACV GDNKATAEAI CRRIGVFGEDENTEGM ....................... A GDNKATAEAI CRRIGVFAED EDTTGK S GDNKGTAIAI CRRIGIFGEN EEVADR ....................... A GDNKGTAVAI CRRIGIFGQD EDVTSK ....................... A GDNKGTAVAI CRRLGIFGDT EDVLGK ....................... A GDRKETAEAI CCKLGLLSSTADTTGL ....................... S G D N I N T A R A I A K E I N I L N K N E G D D E K D . . . . . . . . . . . NY T N N K N T Q I C C G D H I S T A Q A I A K R M G I A A E G DGIA . . . . . . . . . . . . . . . . . . . . . . . . . . G D H P L T A Q A I A R D L G I T E V G HPV . . . . . . . . . . . . . . . . . . . . . . . . . . . G D S E N T A V N I A K Q I G I P V I D PKLS . . . . . . . . . . . . . . . . . . . . . . . . . V G D H L K T A T V I A K E L G I L T L D NQ . . . . . . . . . . . . . . . . . . . . . . . . . . . A GD .... A . . . . . . . G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Atxaleido Atxbleido Pmallyces Pmalnicpl Pmalarath Pma4nicpl Pma3arath Pmalschpo
1201 TADKLPQIKD TADKLPQIKD PSSALLGQTK PSSALLGQTK PSAALLGTDK PSASLLGQDK PSSSLLGKHK .NAERLGLTG
ANDLPEDLGE ANDLPEDLGE DESIA...AL DESIS...AL DSNIA...SI DSAIA...SL DEAMA...HI GGNMP...GS
KYGDMMLSVG KYGDMMLSVG PIDELIEKAD PIDELIEKAD PVEELIEKAD PIEELIEKAD PVEDLIEKAD EVYDFVEAAD
GFAQVFPEHK GFAQVFPEHK GFAGVFPEHK GFAGVFPEHK GFAGVFPEHK GFAGVFPEHK GFAGVFPEHK GFGEVFPQHK
1250 FMIVETL... FMIVETL... YEIVKRL... YEIVKRL... YEIVKKL... YEIVKKL... YEIVKKL... YAVVDIL...
Plasma membrane cation-transporting ATPase filmily
ii~ii:i,::!i
}i.::i}:.i!!.i:}
ii:.-{:;i!i!i !i!!?:i::::i::i~!::1i.:i
Pma2schpo Pmalajeca Pmalneucr Pmalsacce Pma2sacce Pmalklula Pmalcanal Pmalzygro Atcphomsa Atcqhomsa Atcrhomsa Atc3sacce Atmaescco Atmbsalty Atc3schpo Atnlsacce Atn3homsa Atn3ratno Atn3galga Atnlhomsa Atn2homsa Atn3sussc Atnacatco Atnatorca Atnaartsf Atnadrome Atnaartsa Atnahydat Athahomsa Atcartsf Atcbdrome Atcborycu Atcdhomsa Atcfratno Atctrybr Atcplafa Atalsynsp Atclsynsp
.NAERLGLSG GGDMP...GS EVNDFVEAAD .NAERLGLGG GGTMP...GS EVYDFVEAAD .NAERLGLGG GGDMP...GS EVYDFVEAAD .NAERLGLGG G G D M P . . . G S E L A D F V E N A D .NAERLGLGG G G D M P . . . G S E L A D F V E N A D .NAERLGLGG G G D M P . . . G S E L A D F V E N A D .DADRLGLSG G G D M A . . . G S E I A D F V E N A D .DAERLGLGG G G S M P . . . G S E M Y D F V E N A D LEGKDFNRRI RNEKGEIEQE RIDKIWPKLR LEGKEFNRRI RNEKGEIEQE RIDKIWPKLR LEGKEFNRLI RNEKGEVEQE KLDKIWPKLR M E G T E F R K L T KNER . . . . . . . . I R I L P N L R V I G S D I E T L S D D E L A N L A Q R ...... TT.. L T G T Q V E A M S D A E L A S E V E K ...... RA.. M T G S Q F D A L S D E E V D S L . . K ...... ALCL M T G S Q F D G L S E E E V D D L . . P ...... VLPL I H G T D L K D F T S E Q I D E I L Q N ...... HTEI IHGTDLKDFT SEQIDEILQN HTEI IHGTDLKDMS SEQIDEILQN HTEI V H G S D L K D M T S E Q L D D I L K Y ...... HTEI V H G S D L K D M T S E Q L D E I L K N ...... HTEI V H G S D L K D M T S E Q L D D I L K Y ...... HTEI V H G G D L K D L $ C E Q L D D I L K Y ...... HTEI V H G T D L K D L S H E N L D D I L H Y ...... HTEI V H G G E L R D I T P D A L D E I L R H ...... HPEI V H G A E L R D V S S D Q L D E I L R Y ...... HTEI I H G N D L K D M N S D Q L D D I L R H ...... YREI I H G A K L K D I K N E E L D K I L C D ...... HTEI I N G M Q L K D M D P S E L V E A L R T ...... HPEM Y T G R E F D D L S V E G Q R D A V A R ...... SR.. Y S G R E F D D L S P T E Q K A A V A R ...... SR.. Y T G R E F D D L P L A E Q R E A C R R ...... AC.. F T G R E F D E L N P S A Q R D A C L N ...... AR.. Y T G R E F D D L S P E Q Q R Q A C R T ...... AR.. Y T G Q E L D A M T P A Q K R E A V L T ...... AV.. Y N G R E F E D F S L E K Q K H I L K N ...... TPRI F E G R Q L A T M G P A E L A Q A A E D ...... S..C L T G Q Q L S A M N G A E L D A A V R S ...... V..E Atclsacce L S G D K L D E M S D D Q L A N V I D H ...... V..N A t c l m y c g e V L G S E L D E K K ILDYR . . . . . . . . . . . . . . . Consensus ...............................
i!!iii!!ii!l
i?;)iil
Atxaleido Atxbleido Pmallyces Pmalnicpl Pmalarath Pma4nicpl Pma3arath Pmalschpo Pma2schpo Pmalajeca Pmalneucr
1251 ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... .......
RQR RQR QAR QAR QER QER QER QQR QQR QQR QQR
GYTCAMTGDGVNDAPALKRA GYTCAMTGDGVNDAPALKRA KHICGMTGDGVNDAPALKKA KHICGMTGDGVNDAPALKKA KHIVGMTGDGVNDAPALKKA KHIVGMTGDGVNDAPALKKA KHICGMTGDGVNDAPALKKA GYLVAMTGDGVNDAPSLKKA GYLVAMTGDGVNDAPSLKKA GYLVAMTGDGVNDAPSLKKA GYLVAMTGDGVNDAPSLKKA
GFAEVFPQHK GFAEVFPQHK GFAEVFPQHK GFAEVFPQHK GFAEVFPQHK GFAEVFPQHK GFAEGFPTNK GFAEVFPQHK VLARSSPTDK VLARSSPTDK VLARSSPTDK VLARSSPEDK LFARLTPMHK VFARLTPLQK VIARCAPQTK VIARCSPQTK VFARTSPQQK VFARTSPQQK VFARTSPQQK VFARTSPQQK VFARTSPQQK VFARTSPQQK VFARTSPQQK VFARTSPQQK VFARTSPQQK VFARTSPQQK VFARTSPQQK VFARTSPQQK VFARTSPQQK LFARVEPFHK LFSRVEPQHK CFARVEPSHK CFARVEPSHK CFARVEPAHK LFSRTDPSHK VFCRTEPKHK VFARVAPAQK VYARVAPEHK IFARATPEHK VFARVTPQQK FA...P..K
YAVVDIL... YNVVEIL... YNVVEIL... YRVVEIL... YRVVEIL... YNVVEIL... YNAVEIL... FAVVDIL... HTLVKGIID. HTLVKGIID. HTLVKGIID. RLLVE ..... ERIVTLL... TRILQAL... VKMIEAL... VRMIEAL... LIIVEGC... LIIVEGC LIIVEGC LIIVEGC... LIIVEGC... LIIVEGC... LIIVEGC... LIIVEGC... LIIVEGC... LIIVEGC... LIIVEGV... LIIVEGC... LVIVESC... SKIVEYL... SKIVEFL... SKIVEYL... SKIVEFL... SRIVENL... MQLVQLL... KQIVKVL... LQLVEAL... LRIVESL... LNIVRAL... LAIVSAW... ...V ......
DVGIAVH.GA DVGIAVH.GA DIGIAVD.DA DIGIAVD.DA DIGIAVA.DA DIGIAVA.DA DIGIAVA.DA DTGIAVE.GA DAGIAVE.GA DTGIAVE.GA DTGIAVE.GS
1300 TDAARAAADM TDAARAAADM TDAARSASDI TDAARSASDI TDAARGASDI TDAARGASDI TDAARGASDI TDAARSAADI SDAARSAADI SDAARSAADI SDAARSAADI
77
78
Pmalsacce Pma2sacce Pmalklula Pmalcanal Pmalzygro Atcphomsa Atcqhomsa Atcrhomsa Atc3sacce Atmaescco Atmbsalty Atc3schpo Atnlsacce Atn3homsa Atn3ratno Atn3galga Atnlhomsa Atn2homsa Atn3sussc Atnacatco Atnatorca Atnaartsf Atnadrome Atnaartsa Atnahydat Athahomsa Atcartsf Atcbdrome Atcborycu Atcdhomsa Atcfratno Atctrybr Atcplafa Atalsynsp Atclsynsp Atclsacce Atclmycge Consensus
....... Q N R G Y L V A M T G D G V N D A P S L K K A DTGIAVE.GA TDAARSAADI ....... Q N R G Y L V A M T G D G V N D A P S L K K A DTGIAVE.GA TDAARSAADI ....... QQR G Y L V A M T G D G V N D A P S L K K A DTGIAVE.GA TDAARSAADI ....... Q S R G Y L V A M T G D G V N D A P S L K K A DTGIAVE.GA TDAARSAADI ....... Q Q R G Y L V A M T G D G V N D A P S L K K A DTGIAVE.GA TDAARSAADI .... S T V S D Q R Q V V A V T G D G T N D G P A L K K A D V G F A M G I A G T D V A K E A S D I .... S T H T E Q R Q V V A V T G D G T N D G P A L K K A D V G F A M G I A G T D V A K E A S D I .... S T V G E H R Q V V A V T G D G T N D G P A L K K A D V G F A M G I A G T D V A K E A S D I ..... T L K G M G D V V A V T G D G T N D A P A L K L A D V G F S M G I S G T E V A R E A S D I ....... KRE G H V V G F M G D G I N D A P A L R A A D I G I S V D . G A VDIAREAADI ....... QKN G H T V G F L G D G I N D A P A L R D A D V G I S V D . S A A D I A K E S S D I ....... HRR K A F V A M T G D G V N D S P S L K Q A NVGIAMGQNG SDVAKDASDI ....... HRR K K F C T M T G D G V N D S P S L K M A NVGIAMGING SDVSKEASDI ....... QRQ G A I V A V T G D G V N D S P A L K K A DIGVAMGIAG SDVSKQAADM ....... QRQ G A I V A V T G D G V N D S P A L K K A DIGVAMGIAG SDVSKQAADM ....... QRQ G A I V A V T G D G V N D S P A L K K A DIGVAMGIRG SDVSKQAADM ....... QRQ G A I V A V T G D G V N D S P A L K K A DIGVAMGIAG SDVSKQAADM ....... Q R Q G A I V A V T G D G V N D S P A L K K A DIGIAMGISG SDVSKQAADM ....... QRQ G A I V A V T G D G V N D S P A L K K A DIGVAMGIAG SDVSKQAADM ....... QRT G A I V A V T G D G V N D S P A L K K A DIGVAMGIAG SDVSKQAADM ....... QRQ G A I V A V T G D G V N D S P A L K K A DIGVAMGIAG SDVSKQAADM ....... QRQ G A I V A V T G D G V N D S P A L K K A DIGVAMGIAG SDVSKQAADM ....... QRM G A I V A V T G D G V N D S P A L K K A DIGVAMGIAG SDVSKQAADM ....... QRQ G E F V A V T G D G V N D S P A L K K A DIGVAMGIAG SDVSKQAADM ....... QRQ G A I V A V T G D G V N D S P A L K K A DIGVAMGIAG SDVSKQAADM ....... QRL G A I V A V T G D G V N D S P A L K K A DIGVAMGIAG SDAAKNAADM ....... QGM G E I S A M T G D G V N D A P A L K K A EIGIAMG.SG TAVAKSAAEM ....... QSM N E I S A M T G D G V N D A P A L K K A EIGIAMG.SG TAVAKSAAEM ....... QSY D E I T A M T G D G V N D A P A L K K A EIGIAMG.SG TAVAKTASEM ....... QSF D E I T A M T G D G V N D A P A L K K A EIGIAMG.SG TAVAKTASEM ....... QSF N E I T A M T G D G V N D A P A L K K A EIGIAMG.SG TAVAKSAAEM ....... KDE R L I C A M T G D G V N D A P A L K K A DIGIAMG.SG TEVAKSASKM ....... KDL G E T V A M T G D G V N D A P A L K S A DIGIAMGING TEVAKEASDI ....... QEK G H I V A M T G D G V N D A P A L K R A DIGIAMGKGG TEVARESSDM ....... Q R Q G E F V A M T G D G V N D A P A L K Q A NIGVAMGITG TDVSKEASDM ....... RKR G D V V A M T G D G V N D A P A L K L S DIGVSMGRIG TDVAKEASDM ....... KEA G F T V S V T G D G V N D A P A L I K S DVGCCMGITG VDIAKDASDL .............. A.TGDGVND.PALKKA..G.A ...... D.A..A.D.
Atxaleido Atxbleido Pmallyces Pmalnicpl Pmalarath Pma4nicpl Pma3arath Pmalschpo Pma2schpo Pmalajeca Pmalneucr Pmalsacce Pma2sacce Pmalklula
1301 V L T ...... V L T ...... V L T ...... V L T ...... V L T ...... V L T ...... V L T ...... V F L ...... V F L ...... V F L ...... V F L ...... V F L ...... V F L ...... V F L ......
E E E E E E E A A A A A A A
PGLSVVVEAMLVSREVFQRM PGLSVVVEAMLVSREVFQRM PGLSVIISAV LTSRAIFQRM PGLSVIISAV LTSRAIFQRM PGLSVIISAV LTSRAIFQRM PGLSVIISAV LTSRAIFQRM PGLSVIISAV LTSRAIFQRM PGLSAIIDAL KTSRQIFHRM PGLSAIIDAL KTSRQIFHRM PGLSAIIDAL KTSRQIFHRM PGLGAIIDAL KTSRQIFHRM PGLSAIIDAL KTSRQIFHRM PGLSAIIDAL KTSRQIFHRM PGLSAIIDAL KTSRQIFHRM
1350 LSFLTYRISA TL.QLVCFFF LSFLTYRISA TL.QLVCFFF KNYTIY..AV SI.TIRIVLG KNYTIY..AV SI.TIRIVLG KNYTIY..AV SI.TIRIVFG KNYTIY..AV SI.TIRIVFG KNYTIY..AV SI.TIRIVFG YSYWYRIALSL.HLEIFLG YAYWYRIALSL.HLEIFLG YAYWYRIAL SL.HLEIFLG YAYWYRIALSI.HLEIFLG YSYWYRIALSL.HLEIFLG YSYWYRIALSL.HLEIFLG YSYWYRIALSL.HLEIFLG
Pmalcanal Pmalzygro Atcphomsa Atcqhomsa Atcrhomsa Atc3sacce i,~:i;~:.:i;!ii:',!:~i~:i!!iAii:~t:!'i!mi a e s c c o :~).:i:~!i}!i:}ilmil Atmbsalty Atc3schpo Atnlsacce :lib~iel';i!i!ii!i~',i~,i:i':i:~A) t n 3 h o m s a Atn3ratno Atn3galga Atnlhomsa Atn2homsa Atn3sussc Atnacatco Atnatorca Atnaartsf Atnadrome Atnaartsa Atnahydat Athahomsa :::::::::::::::::::: :::: ~:::: Atcartsf ~!!::~?!i: i~!::! ~,:i::!~ :~:!s:: ! Atcbdrome i:!i:::!i:~i~i;ii~ii::ii; i9!;:12 i::i Atcborycu Atcdhomsa i:iiii~!i;ii;i~:ii:i::! 'ii Atcfratno Atctrybr Atcplafa Atalsynsp Atclsynsp :::~:, :::::::::::::::::: :::::~ Atclsacce Atclmycge i;:!: !::::: ============================= i~2 Consensus
V F L ...... A P G L S A I I D A L K T S R Q I F H R M Y S Y V V Y R I A L S L . H L E L F L G V F L ...... A P G L S A I I D A L K T S R Q I F H R M Y A Y V V Y R I A L S L . H L E I F L G I L T ...... D D N F T S I V K A V M W G R N V Y D S I S K F L Q F Q L T V N V V A V I V A F T I L T ...... D D N F S S I V K A V M W G R N V Y D S I S K F L Q F Q L T V N V V A V I V A F T I L T ...... D D N F T S I V K A V M W G R N V Y D S I S K F L Q F Q L T V N V V A V I V A F T ILM ...... T D D F S A I V N A I K W G R C V S V S I K K F I Q F Q L I V N I T A V I L T F V I L L ...... E K S L M V L E E G V I E G R R T F A N M L K Y I K M T A S S N F G N V F S V L V I L L ...... E K D L M V L E E G V I K G R E T F G N I I K Y L N M T A S S N F V N V F S V L V V L T ...... D D N F S S I V N A I E E G R R M F D N I M R F V L H L L V S N V G E V I L L V V V L S ...... D D N F A S I L N A V E E G R R M T D N I Q K F V L Q L L A E N V A Q A L Y L I I I L L ...... D D N F A S I V T G V E E G R L I F D N L K K S I A Y T L T S N I P E I T P F L L I L L ...... D D N F A S I V T G V E E G R L I F D N L K K S I A Y T L T S N I P E I T P F L L I L L ...... D D N F A S I V T G V E E G R L I F D N L K K S I A Y T L T S N I P E I T P F L L ILL ...... D D N F A S I V T G V E E G R L I F D N L K K S I A Y T L T S N I P E I T P F L I ILL ...... D D N F A S I V T G V E E G R L I F D N L K K S I A Y T L T S N I P E I T P F L L I L L ...... D D N F A S I V T G V E E G R L I F D N L K K S I A Y T L T S N I P E I T P F L I I L L ...... D D N F A S I V T G V E E G R L I F D N L K K S I A Y T L T S N I P E I T P F L F I L L ...... D D N F A S I V T G V E E G R L I F D N L K K S I A Y T L T S N I P E I T P F L V ILL D DNFASIVTGV EEGRLIFDNL KKSIVYTLTS NIPEISPFLL ILL ...... D D N F A S I V T G V E E G R L I F D N L K K S I A Y T L T S N I P E I S P F L A ILL ...... D D N F A S I V T G V E E G R L I F D N I K K S I A Y T L T S K I P E L S P F L M I L L ...... D D N F A S I V T G V E E G R L I F D N L K K S I V Y T L T S N I P E I S P F L M I L L ...... D D N F A S I V T G V E Q G R L I F D N L K K S I A Y T L T K N I P E L T P Y L I V L A ...... D D N F S T I V A A V E E G R A I Y N N M K Q F I R Y L I S S N I G E V V S I F L V L A ...... D D N F S S I V S A V E E G R A I Y N N M K Q F I R Y L I S S N I G E V V S I F L V L A ...... D D N F S T I V A A V E E G R A I Y N N M K Q F I R Y L I S S N V G E V V C I F L V L A ...... D D N F S T I V A A V E E G R A I Y N N M K Q F I R Y L I S S N V G E V V C I F L V L S ...... D D N F A S I V A A V E E G R A I Y N N M K Q F I R Y L I S S N V G E V V C I F L V L A ...... D D N F A T V V K A V Q E G R A I Y N N T K Q F I R Y L I S S N I G E V V C I L V V L A ...... D D N F N T I V E A I K E G R C I Y N N M K A F I R Y L I S S N I G E V A S I F I L L T ...... D D N F A S I E A A V E E G R T V Y Q N L R K A I A F L L P V N G G E S M T I L I V L L ...... D D N F A T I V A A V E E G R I V Y G N I R K F I K Y I L G S N I G E L L T I A S V L T ...... D D D F S T I L T A I E E G K G I F N N I Q N F L T F Q L S T S V A A L S L V A L IIS ...... D D N F A T I V N G I E E G R K T F L T C K R V L L N L F L T S I A G T V V V L L ............... I ....... R ........... Y ..............
Atxaleido ii~i,~:~iii ,i!iii!!Ai~it!x b l e i d o Pmallyces Pmalnicpl Pmalarath Pma4nicpl Pma3arath Pmalschpo Pma2schpo Pmalajeca Pmalneucr :i : :!;i!~?!:i!i i!.;!i P m a l s a c c e Pma2sacce Pmalklula Pmalcanal Pmalzygro Atcphomsa
1351 IACFSLTPKA YGSVDPHFQF IACFSLTPKA YGSVDPNFQF FMLLALIWK ........... FMLLALIWK ........... FMLIALIWE ........... FMFIALIWK ........... FMLIALIWK ........... LWLIIRNQL ........... LWLIIRNQL ........... LWIAILNTS ........... LWIAILNRS ........... LWIAILDNS ........... LWIAILNNS ........... LWIAILNRS ........... LWIAILNRS ........... LWIAILNHS ........... G..ACIT ........... QD
. . . . . . . . . .
?ii:i~ii!i:.!~ii!':.i
........................
FHLPVLMFML ITLLNDG... FHLPVLMFML ITLLNDG... FDFPPFMVLI IAILNDG... FDFPPFMVLI IAILNDG... FDFSAFMVLI IAILNDG... YDFSAFMVLI IAILNDG... FDFSPFMVLI IAILNDG... LNLE..LVVF IAIFADV... LNLE..LIVF IAIFADV... LNLQ..LVVF IAIFADI... LNIE..LVVF IAIFADV... LDID..LIVF IAIFADV... LDIN..LIVF IAIFADV... LNID..LVVF IAIFADV... LDIN..LIVF IAIFADV... LDID..LIVF IAIFADV... SPLKAVQMLWVNLIMDTLAS
1400 CLMTIGYDHV CLMTIGYDHV TIMTISKDRV TIMTISKDRV TIMTISKDRV TIMTISKDRV TIMTISKDRV ATLAIAYDNA ATLAIAYDNA ATLAIAYDNA ATLAIAYDNA ATLAIAYDNA ATLTIAYDNA ATLAIAYDNA ATLAIAYDNA ATLAIAYDNA LALATEPPTE
79
Atcqhomsa Atcrhomsa Atc3sacce Atmaescco Atmbsalty Atc3schpo Atnlsacce Atn3homsa Atn3ratno Atn3galga Atnlhomsa Atn2homsa :i:.i...i~iP(] A t n 3 s u s s c Atnacatco Atnatorca Atnaartsf Atnadrome Atnaartsa Atnahydat Athahomsa Atcartsf Atcbdrome Atcborycu Atcdhomsa Atcfratno Atctrybr :i!!;i!(!:'-',%..:11 A t c p l a f a Atalsynsp Atclsynsp Atclsacce Atclmycge Consensus .
.
.
.
.
_
ii!!iii! i!:-?i:?ji Atxaleido Atxbleido Pmallyces Pmalnicpl Pmalarath Pma4nicpl Pma3arath Pmalschpo Pma2schpo Pmalajeca Pmalneucr Pmalsacce Pma2sacce Pmalklula Pmalcanal i;!;!i! ;; i;ii !~;(! ~i;::! P m a l z y g r o Atcphomsa Atcqhomsa :::::::::::::::::::::: ::): i: Atcrhomsa Atc3sacce
80
G . . A C I T . . . . . . . . . . . QD S P L K A V Q M L W V N L I M D T F A S LALATEPPTE G . . A C I T . . . . . . . . . . . QD S P L K A V Q M L W V N L I M D T F A S LALATEPPTE S S V A S S D . . . . . . . . . . . ET S V L T A V Q L L W I N L I M D T L A A L A L A T D K P D P A S A F . . . . . . . . . . . . . LPF L P M L P L H L L I Q N L L Y D . V S Q V A I P F D N V D D A S A F . . . . . . . . . . . . . IPF L P M L A I H L L I Q N L M Y D . I S Q L S L P W D K M D K GLAFR ........ DEVHLSV FPMSPVEILW CNMITSSFPS MGLGMELAQP GLVFR ........ DENGKSV FPLSPVEVLW IIVVTSCFPA MGLGLEKAAP FIMANI ............. P LPLGTITILC IDLGTDMVPA ISLAYEAAES FIMANI ............. P LPLGTITILC IDLGTDMVPA ISLAYEAAES FIMANI ............. P LPLGTITILC IDLGTDMVPA ISLAYEAAES FIIANI ............. P LPLGTVTILC IDLGTDMVPA ISLAYEQAES FIIANI ............. P LPLGTVTILC IDLGTDMVPA ISLAYEAAES FIIANI ............. P LPLGTVTILC IDLGTDMVPA ISLAYEQAES FIIANI ............. P LPLGTVTILC IDLGTDMLPA ISLAYEAAES FIIANV ............. P LPLGTVTILC IDLGTDMVPA ISLAYERAES FILFDI ............. P LPLGTVTILC IDLGTDMVPA ISLAYEEAES SILCDI P LPLGTVTILC IDLGTDMVPA ISLAYDHAEA YILFDL ............. P LAIGTVTILC IDLGTDWPA ISMAYEGPEA FILFGI ............. P LPLGTITILC IDLGTDMVPA ISLAYEKAES YITVSV ............. P LPLGCITILF IELCTDIFPS VSLAYEKAES T A A L . . . . . . . . . . . . GLPE . A L I P V Q L L W V N L V T D G L P A TALGFNPPDL T A A L . . . . . . . . . . . . GLPE . A L I P V Q L L W V N L V T D G L P A TALGFNPPDL T A A L . . . . . . . . . . . . GLPE . A L I P V Q L L W V N L V T D G L P A TALGFNPPDL TAAL GFPE A L I P V Q L L W V N L V T D G L P A TALGFNPPDL T A I L . . . . . . . . . . . . GLPE . A L I P V Q L L W V N L V T D G L P A TALGFNPPDL T G L F . . . . . . . . . . . . GLPE . A L S P V Q L L W V N L V T D G L P A TALGFNAPDR TALL ............ GIPD .SLAPVQLLWVNLVTDGLPA TALGFNPPEH SVLL . . . . . . . . . . . . ALN. L P I L S L Q V L W L N M I N S I T M T V P L A F E A K S P APLL ............ GLGA VPLTPLQILW MNLVTDGIPA LALAVEPGDP STAF . . . . . . . . . . . . KLPN . P L N A M Q I L W I N I L M D G P P A Q S L G V E P V D H GLFILGQVFK TNLLQQGHDF QVFSPTQLLI INLFVHGFPA VALAVQPVKE ................................... D . . . . . . L ....... 1401 IPSERPQKWNLPVVFVSASI LAAVACGSSL IPSERPQKWNLPVVFVSASI LAAVACGSSL KPSPLPDSWK LAEIFTTGVVLGGYLAMMTV KPSPLPDSWK LAEIFTTGIV LGGYLAMMTV KPSPTPDSWK LKEIFATGIV LGGYQAIMSV KPSPMPDSWK LKEIFATGVVLGGYQALMTV KPSPTPDSWK LKEIFATGVVLGGYMAIMTV PYSMKPVKWNLPRLWGLSTV IGIVLAIGTW PYAMKPVKWNLPRLWGLATI VGILLAIGTW PFSKTPVKWNLPKLWGMSVL LGIVLAVGTW PYSQTPVKWNLPKLWGMSVL LGVVLAVGTW PYSPKPVKWNLPRLWGMSII LGIVLAIGSW PYAPEPVKWNLPRLWGMSII LGIVLAIGSW PYSPKPVKWNLRRLWGMSVI LGIILAIGTW PYDPKPVKWNLPRLWGMSIV LGIILAIGTW PFSPSPVKWNLPRLWGMSIM MGIILAAGTW SLLLRKP.YG RNKPLISRTM MKNILGHAFY TLLLRKP.YG RNKPLISRTM MKNILGHAVY SLLKRRP.YG RNKPLISRTM MKNILGHAFY NIMDRKP.RG RSTSLISVST WKMILSQATL
1450 MLLWIGLEGY SSQYYENSWF MLLWIGLEGY SSQYYENSWF IFFWAAYKTN FFPRIFGVST IFFWAAYKTN FFPHVFGVST IFFWAAHKTD FFSDKFGVRS VFFWAMHDTD FFSDKFGVKS VFFWAAYKTD FFPRTFHVRD ITNTTMI .......... AQG IVNTTMI AQG I T L T T M L . . . . . . . . . . VGS ITVTTMY .......... AQG I T L T T M F . . . . . . . . . . LP. I T L T T M F . . . . . . . . . . LP. ITLTTMF VP I T L T T M L . . . . . . . . . . LP. I T L T T M F . . . . . . . . . . LP. QLVVV ....... FTLLFAGE QLALI ....... FTLLFVGE QLIVI ....... FILVFAGE QLIVT ....... FILHFYGP
Atmaescr Atmbsalty Atc3schpo Atnlsacce Atn3homsa Atn3ratno Atn3galga Atnlhomsa Atn2homsa Atn3sussr Atnacatco !~ii.i,:iii).!~i.i Atnatorca Atnaartsf Atnadrome Atnaartsa Atnahydat Athahomsa Atcartsf Atcbdrome Atcborycu !ii i i :i!!!ii:i ~)ilAtcdhomsa Atcfratno Atctrybr Atcplafa ~!~ii:(!i;. i.~!:i Atalsynsp Atclsynsp Atclsacce Atclmycge Consensus
EQIQKPQRWNPADL...GRF MIFFGPISSI FDILTFCLMWWVFHANTPET EFLRKPRKWD AKNI...GRF MLWIGPTSSI FDITTFALMW YVFAANNVEA DVMERLPHDN KVGIFQKSLI VDMM ............... V YGFFLGVVSL DLMDRPPHDS EVGIFTWEVI IDTF ............... A YGIIMTGSCM DIMKRQPRNP RTDKLVNERL ISMAYGQ... IGMIQALGGF FSYFVILAEN DIMKRQPRNP RTDKLVNERL ISMAYGQ... IGMIQALGGF FSYFVILAEN DIMKRQPRNP RSDKLVNERL ISMAYGQ... IGMIQALGGF FSYFVILAEN DIMKRQPRNP KTDKLVNERL ISMAYGQ... IGMIQALGGF FTYFVILAEN DIMKRQPRNS QTDKLVNERL ISMAYGQ... IGMIQALGGF FTYFVILAEN DIMKRQPQNP KTDKLVNEQL ISMAYGQ... IGMIQALGGF FTYFVILAEN DIMKRQPRNP KTDKLVNERL ISIAYGQ... IGMIQALAGF FTYFVILAEN DIMKRQPRNP KTDKLVNERL ISMAYGQ... IGMIQALGGF FSYFVILAEN DIMKRRPRNP VTDKLVNERL ISLAYGQ... IGMIQASAGF FVYFVIMAEC DIMKRPPRDP FNDKLVNSRL ISMAYGQ... IGMIQAAAGF FVYFVIMAEN D..PRKPRDP VKEKLVNERL ISMAYGQ... IGVMQAFGGF FTYFVIMGEC DIMKRHPRNP IRDKLVNERL ISLAYGQ... IGMMQATAGF FTYFIILAEN DIMHLRPRNP KRDRLVNEPLAAYSYFQ... IGAIQSFAGF TDYFTAMAQE DIMNKPPRRA D.EGLITGWL FFRYMAIGTY VGAATVGAAAHWFMMSPTGP DIMEKPPRKA D.EGLISGWL FFRYMAIGFY VGAATVGAAAWWFVFSDEGP DIMDRPPRSP K.EPLISGWL FFRYMAIGGY VGAATVGAAAWWFMYAEDGP DIMNKPPRNP K EPLISGWL FFRYLAIGCY VGAATVGAAAWWFIAADGGP DIMEKLPRNP R.EALISGWL FFRYLAIGVY VGLATVAAATWWFLYDAEGP DIMEQRPRRM E.EPIVNGWL FMRYMVIGVY VGLATVGGFLWWFLRHG... DVMKCKPRHK N.DNLINGLT LLRYIIIGTY VGIATVSIFV YWFLFYPDSD GIMQQAPRNP N.EPLITKKL .... LHRILL VSLFNW .............. TIMQRRPHNP Q.ESIFARGL G T Y M L R V G W F S A F T I .............. EVMKKPPRKR T.DKILTHDV MKRLLTTAAC IIVGTV .............. KLM..VGSFS T.KNLFYNRQ GFDLIWQSLF LSFLTL .............. ............................................
i;7:; ;;:::
1451 HRLGLAQLPQ GKLVTMMYLK ISISDFLTLF HRLGLAQLPQ GKLVTMMYLK ISISDFLTLF LEKTATD.DF RKLASAIYLQ VSTISQALIF LEKTATD.DF RKLASAIYLQ VSIISQALIF IRDNNDE ..... LMGAVYLQ VSIISQALIF LRNSDEE ..... MMSALYLQ VSIISQALIF LRGSEHE ..... MMSALYLQ VSIVSQALIF QNRGIVQ.NF GVQDEVLFLE ISLTENWLIF QNRGIVQ.NF GVQDEVLFLQ ISLTENWLIF ENGGIVQ.NF GRTHPVLFLE ISLTENWLIF ENGGIVQ.NF GNMDEVLFLQ ISLTENWLIF .KGGIIQ.NF GAMNGIMFLQ ISLTENWLIF .NGGIIQ.NF GAMNGVMFLQ ISLTENWLIF .KGGIIQ.NF GSIDGVLFLQ ISLTENWLIF .KGGIIQ.NF GGLDGILFLQ ISLTENWLIF .KGGIIQ.NF GSIDGILFLE ISLTENWLIF KFF ......................... DI KMF ......................... QI KFF ......................... DI ELF ................... FKKHEDEI QTLFQSGWFV VGLLSQTLIV HM ....... I QALFQSGWFI E G L L S Q T L V V H M ....... L MTWVVIMYGF GTGNLSYDCN AHYHAGCNDV
U
. . . . . ::) .!
. . . . .
.
.
.
.
.
.
.
i~i~:i2 !:~:i~: :-~:
....... . . . . .
Atxaleido Atxbleido Pmallyces Pmalnicpl Pmalarath Pma4nicpl Pma3arath Pmalschpo Pma2schpo Pmalajeca Pmalneucr Pmalsacce Pma2sacce Pmalklula Pmalcanal Pmalzygro Atcphomsa Atcqhomsa Atcrhomsa Atc3sacce Atmaescco Atmbsalty Atc3schpo
iiii! iil)-iii .ii!.i i!~i!i;i~:i.i!:i! i!:.i~i
i!i!: i!!J: [i!ii~/!:/ii! ::~i:i:~::
SSRTGGHFFF SSRTGGHFFF VTRSRSWSFV VTRSRSWSFV LTRSRSWYFV VTRSRSWSFL VTRSRSWSFT VTRCNGPFWS ITRCSGPFWS ITRANGPFWS ITRANGPFWS ITRAAGPFWS VTRAAGPFWS ITRAAGPFWS VTRAQGPFWS ITRAVGPFWS DSGRNAPLHA DSGRNAPLHS DSGRKAPLHS TSHQQQQLNA RTRRVPFIQS RTQKIPFIQS FKARSAVFAV
1500 YMPPSPILFC YVPPSPILFC ERPGL..LLV ERPGF..LLV ERPGA..LLM ERPGM..LLV ERPGY..FLL SIPSW..QLS SFPSW..QLS SIPSW..QLS SIPSW..QLS SIPSW..QLA SIPSW..QLA SIPSW..QLS SIPSW..QLS SIPSW..QLA PPSEHYTIVF PPSEHYTIIF PPSQHYTIVF ....... MTF CASWPLMIMT RATLPVLLTT VTFCILIMAV
81
Plasma m e m b r a n e cation-transporting ATPase [amily Atnlsacce Atn3homsa Atn3ratno Atn3galga Atnlhomsa Atn2homsa Atn3sussc Atnacatco Atnatorca Atnaartsf [~4!!!iiiiiiii~!~i !!iiiii!!i!i?A t n a d r o m e Atnaartsa i;:;i:?~i~:?:!?: ~i(? Atnahydat Athahomsa [~.i?:!:~i)~i!!iLil A t c a r t s f Atcbdrome Atcborycu Atcdhomsa Atcfratno Atctrybr Atcplafa Atalsynsp .!i.i~i:::!ii;S:~!.~i?i!i. ~!!!i i!:.~/~;i A t c l s y n s p ......... Atclsacce Atclmycge Consensus
~il;i:ji:ii~:)i~-~;:~i!:~i Atxaleido Atxbleido Pmallyces Pmalnicpl Pmalarath Pma4nicpl Pma3arath Pmalschpo Pma2schpo Pmalajeca :?i?i!i)i!ii! P m a l n e u c r Pmalsacce Pma2sacce :ii:-i~:ff(iij~-i)i: Pmalklula :-i(;ii:i~il;[-i") Pmalcanal Pmalzygro i.~ji::i'ih i.. Atcphomsa Atcqhomsa Atcrhomsa Atc3sacce Atmaescco Atmbsalty :'~:!~ii=i.!i=.i:-: A t c 3 s c h p o Atnlsacce Atn3homsa !?i}!:!!:i::::iiii::i: ~i Atn3ratno
82
ASFTGSLYGI NSGRLGHDCD GTYNSSCRDV YRSRSAAFAT MTWCALILAW GFLPGNLVGI RLNWDDRTVNDLEDSYGQQW TYEQRKVVEF TCH...TAFF GFLPGNLVGI RLNWDDRTVNDLEDSYGQQW TYEQRKVVEF TFH...TAFF GFLPSCLVGI RLSWDDRTIN DLEDSYGQQW TYEQRKVVEF TCH...TAFF GFLPIHLLGL RVDWDDRWIN DVEDSYGQQW TYEQRKIVEF TCH...TAFF GFLPSRLLGI RLDWDDRTMN DLEDSYGQEW TYEQRKVVEF TCH...TAFF GFLPIHLLGL RVNWDDRWIN DVEDSYGQQW TYEQRKIVEF TCH...TAFF GFLPPRLLGI RMNWDDKYIN DLEDSYGQQW TYEQRKIVEF TCH...TAFF GFLPIDLIGI REKWDELWTQ DLEDSYGQQW TYEQRKIVEY TCH...TSFF GFLPWDLFGL RKHWDSRAVNDLTDSYGQEW TYDARKQLES SCH...TAYF GFLPKKLFGI RKMWDSKAVNDLTDSYGQEW TYRDRKTLEY TCH...TAFF GFLPNRLFGL RKWWESKAYN DLTDSYGQEW TWDARKQLEY TCH...TAFF GFLPSYLFGL RSQWDDMSNN NLLDSFGSEW TYFQRKEIEL TCQ...TAFF GWFPLLCVGL RAQWEDHHLQ DLQDSYGQEW TFGQRLYQQY TCY...TVFF G . . . L N F Y Q L S H H L Q C T P E N E ....... Y F E G I D C E I F S D P . H . P M T M A L K . . . L S Y W Q L T H H L S C L G G G D ....... EF K G V D C K I F S D P . H . A M T M A L G . . . V T Y H Q L T H F M Q C T E D H P ....... HF E G L D C E I F E A P . E . P M T M A L R . . . V S F Y Q L S H F L Q C K E D N P ....... DF E G V D C A I F E S P . Y . P M T M A L Q . . . V T F H Q L R N F L K C S E D N P ....... LF A G I D C E V F E S R . F . P T T M A L .... F S W H D L T T Y T A C .... S ....... DM T N G T C L L L A N P Q T . A R A I A L MHTLINFYQL SHYNQCKAWNNFRVNKVYDM SEDHCSYFSA GKIKASTLSL ....................... ILIFGMF EWVNRTYDDL ALAR..TMAI ....................... VLMVIAY QYTQVPLPGL DPKRWQTMVF ....................... YIFV ..... KEMAEDGK VTARDTTMTF ..... L F Y S L G I I Y A I N N R D L Q T S G D L I N R A G S T C G F F . . . . . . . . . . . . .................................................. 1501 GAIISLLVST GAIISLLVST FAFFVAQLVA IAFVIAQLVA IAFVIAQLVA IAFMIAQLVA IAFWVAQLIA GAVLAVDILA GAVLVVDILA GAILLVDIIA GAIFLVDILA GAVFAVDIIA GAVFAVDIIA GAVLIVDIIA GAVLIVDIIA GAVFVVDVVA NTFVLMQLFN NTFVMMQLFN NTFVLMQLFN NTFVWLQFFT VIVMIVGIAL GLIMAIGIYI EVKNFDNSLF EVVDMRRSFF VSIVVVQWAD VSIVVVQWAD
MAASFWHKSR MAASFWHKSR TLIAVYANWS TLIAVYANWS TLIAVYADWT TLIAVYANWA TAIAVYGNWE TMFCIFGWFK TLFCIFGWFK TLFTIFGWFV TCFTIWGWFE TMFTLFGWWS TMFTLFGWWS TMFCLFGWWS TCFTLFGWWS TMFTLFGWWS EINARKIHGE EINARKIHGE EINSRKIHGE MLVSRKLDEG PFSPLASYLQ PFSPLGAMVG NLHGIPWGEW RMH..PDTDS LIICKTRRNS LIICKTRRNS
1550 PDNVLTEGLA WGQTNAEKLL PLWVWIYCIV PDNVLTEGLA WGQTNAEKLL PLWVWIYCIV FAAI . . . . . . . . . E G I G W G W A G V I W L Y N I V FAAI . . . . . . . . . E G I G W G W A G V I W I Y N L V FAKV ......... KGIGWGWAGVIWIYSIV FARV ......... KGCGWGW AGVIWLYSII FARI . . . . . . . . . K G I G W G W A G V I W L Y S I V GGHQ ......... TSI..VA VLRIWMYSFG GGHQ ......... TSI..VA VIRIWMYSFG GGQ . . . . . . . . . . T S I . . V A V V R I W V F S F G HSD .......... TSI..VAVVRIWIFSFG ENW .......... TDI..VTVVRVWIWSIG ENW .......... TDI..VSVVRVWIWSIG QNW .......... NDI..VTVVRVWIFSFG QNW .......... TDI..VTVVRTWIWSFG QNW .......... TDI..VTVVRIYIWSIG R .............. NVFEG IFNNAIFCTI R .............. NVFDG IFRNPIFCTI K .............. NVFSG IYRNIIFCSV DGISNWRGRI SAANLNFFQD LGRNYYFLTI ......... L Q A L P .... LS Y F P W L V A I L A ......... L E P L P .... LS Y F P W L V A T L L ......... N F R . . . Y F L H T L V E N K F L A W A ......... P V K . . . E F F R S I W G N Q F L F W S ......... V FQ ...... QG . M K N K I L I F G ......... V FQ ...... QG . M K N K I L I F G
Atn3galga Atnlhomsa ......:.i.i.~.i~i~)i!i A t n 2 h o m s a Atn3sussc Atnacatco Atnatorca Atnaartsf Atnadrome Atnaartsa Atnahydat Athahomsa Atcartsf Atcbdrome Atcborycu Atcdhomsa ~...~.:i~,{i'.{,i-. :.i':::i'~-:ii: !{:::i~:.::: Atcfratno Atctrybr Atcplafa Atalsynsp Atclsynsp Atclsacce Atclmycge ~:,,:...-...~.~:,~:::~.;,.~.:.~.:.~.:~ Consensus
~i,ili:ii:~i~!:i,i(il
V S I V V V Q W A D L I I C K T R R N S ......... V FQ ...... QG . M K N K I L I F G V S I V V V Q W A D L V I C K T R R N S ......... V FQ ...... QG . M K N K I L I F G A S I V V V Q W A D L I I C K T R R N S ......... V FQ ...... QG . M K N K I L I F G V S I V V V Q W A D L V I C K T R R N S ......... V FQ ...... QG . M K N K I L I F G T S I V I V Q W A D L I I C K T R R N S ......... V FQ ...... QG . M K N K I L I F G V S I V I V Q W A D L I I C K T R R N S ......... I FQ ...... QG . M K N K I L I F G V S I V I V Q W A D L I I S K T R R N S ......... V FQ ...... QG . M R N N I L N F A I S I V V V Q W A D L I I C K T R R N S ......... I FQ ...... QG . M R N W A L N F G I S I V I V Q W T D L I I C K T R R L S ......... L FQ ...... QG . M K N G T L N F A T T I V V V Q W A D L I I S K T R R L S ......... L FQ ...... QG . M T N W F L N F G I S I E V C Q I A D V L I R K T R R L S ......... A FQ ...... QG F F R N K I L V I A S V L V T I E M L N A I N S L S E N Q S ......... L L V M P P . . . . . . W S N I W L I S A S V L V T I E M L N A M N S L S E N Q S ......... L ITMPP . . . . . . W C N L W L I G S S V L V T I E M C N A L N S L S E N Q S ......... L M R M P P . . . . . . W V N I W L L G S S V L V T I E M C N A L N S L S E N Q S ......... L L R M P P . . . . . . W E N I W L V G S S V L V T I E M C N A L N S V S E N Q S ......... L L R M P P . . . . . . W L N P W L L G A S I L V V V E M L N A L N A L S E N A S ......... L IVSRP . . . . . . S S N V W L L F A S V L V L I E M F N A L N A L S E Y N S ......... L F E I P P . . . . . . W R N M Y L V L A Q A L V A A R V I Y L L S I S Q L G R S ......... F L G Y V T G K R Q T I T K A S I L L L G TTLCLAQMGH AIAVR...SD L L T I Q T P M R .... T N P W L W L S T C F V F F D M F N A L A C R H N T K S ......... I FEI ...... G F F T N K M F N Y A . I L G A S A A L N S L N L M V D K P L ......... L M T N P ...... W F F K L V W I G S ..................................................
1551 AtxaleidoWWFVQDVVKV AtxbleidoWWFVQDVVKV Pmallyces TYIPLDLIKF Pmalnicpl FYIPLDIIKF i:ii?!iiii~i!~i? P m a l a r a t h T Y F P Q D I L K F Pma4nicpl FYLPLDIMKF ,i.ii::iii:-~i~,iil: P m a 3 a r a t h F Y F P L D I M K F Pmalschpo IFCIMAGTYY Pma2schpo IFCLIAGVYY Pmalajeca CFCVLGGLYY Pmalneucr IFCIMGGVYY Pmalsacce IFCVLGGFYY Pma2sacce IFCVLGGFYY Pmalklula VFCVMGGAYY Pmalcanal VFCVMGGAYY Pmalzygro IFCCLGGAYY Atcphomsa VLGTFVVQII Atcqhomsa VLGTFAIQIV Atcrhomsa VLGTFICQIF Atc3sacce MAIIGSCQVL Atmaescco GYMTLTQLVK Atmbsalty SYCLVAQGMK Atc3schpo IALAAVSVFP Atnlsacce IIFGFVSAFP Atn3homsa LFEETALAAF Atn3ratno LFEETALAAF Atn3galga LFEETALAAF Atnlhomsa LFEETALAAF Atn2homsa LLEETALAAF ~,~:-~.~::..~.:...:. ........
i!iii:i!!ii!i
i!!il
1600 LAHICMDAVD LFGCVSDASG SGPIKPYSDD MKVNGFEPVK LAHICMDAVD LFGCVSDASG SGPIKPYSDD MKVNGFEPVK LIRYALSGKA WDLVLEQRIA FTRKKDFGKE L..RELQWAH FIRYALSGRA WDLVFERRIA FTRKKDFGKE Q..RELQWAH AIRYILSGKA WASLFDNRTA FTTKKDYGIG E..REAQWAQ AIRYILSGKAWNNLLDNKTA FTTKKDYGKE E..REAQWAL AIRYILAGTA WKNIIDNRTA FTTKQNYGIE E..REAQWAH ILS...ESAG FDRMMNGK.P KESRNQRSIE DLVVALQRTS ILS...ESSS FDRWMHGK.H KERGTTRKLE DFVMQLQRTS LLQ...GSAG FDNMMHGKSP KKNQKQRSLE DFVVSLQRVS ILQ...DSVG FDNLMHGKSP KGNQKQRSLE DFVVSLQRVS EMS...TSEA FDRLMNGKPM KEKKSTRSVE DFMAAMQRVS IMS...TSQA FDRLMNGKSL KEKKSTRSVE DFMAAMQRVS MMS...ESEA FDRFMNGKSR RDKPSGRSVE DFLMAMQRVS LMS...TSEA FDNFCNGRKP QQHTDKRSLE DFLVSMQRVS LMS...ESET FDRLMNGKPL KENKSTRSVE DFLASMRRVS IVQFGGKPFS CSELSIEQWL WSIFLGMGTL LWGQLISTIP IVQFGGKPFS CSPLQLDQWM WCIFIGLGELVWGQVIATIP IVEFGGKPFS CTSLSLSQWL WCLFIGIGEL LWGQFISAIP IMFFGGAPFS IARQTKSMWI TAVLCGMLSL IMGVLVRICP GFYSRRYGWQ .............................. RFYIKRFGQW F ............................. TIYIPVINRD VFKHTYIGWE WGVVA ............... VVYIPVINDK VFLHKPIGAE WGLAI ............... LSYCPGMDVA LRMYPL ................... KPSWW LSYCPGMDVA LRMYPL ................... KPSWW LSYCPGMDVA LRMYPL ................... KPSWW LSYCPGMGVA LRMYPL ................... KPTWW LSYCPGMGVA LRMYPL ................... KVTWW
83
Atn3sussc Atnacatco Atnatorca Atnaartsf Atnadrome Atnaartsa Atnahydat Athahomsa Atcartsf Atcbdrome Atcborycu Atcdhomsa Atcfratno Atctrybr Atcplafa Atalsynsp Atclsynsp Atclsacce Atclmycge Consensus
i::!s :&y
.
,
.:::
.,r
..
9 .:
.
.
.
.
. . . . . .
.
:.;..
::-
.
.x:
":::;f -"!-i.
:{}}:/?{:/{:
: ,:4:. :: i;i:
.{:~ }:/
!~:!!.~};!:.'~',/ :::?!k:
! ::.:
..
:c:
..
:
-
:. :.. ; }Y;
9 ...
~:;!{:{:/kv.: ; ..... .>
1601 1650 Atxaleido KPAEKSTEKA LNSSVSSASH KALEGLREDT HSPIEEASPV NVYVSRDQK.
Atxbleido Pmallyces Pmalnicpl Pmalarath Pma4nicpl
!4:.--:.i.
:.
i:i
84
LFEETALAAF LSYCPGMGVA LRMYPL ................... KPTWW LFEETALAAF LSYCPGMDVA LRMYPL ................... KPNWW LFEETALAAF LSYTPGTDIA LRMYPL ................... KPSWW LVFETCLAAF LSYTPGMDKG LRMYPL ................... KINWW LVFETVLAAF LSYCPGMEKG LRMYPL KLVWW LVFETCVAAF LSYTPGMDKG LRMYPL ................... KIWWW LFFETALAAF LQYTPGVNTG LRLRPM ................... NFTWW IVFQVCIGCF LCYCPGMPNI FNFMPI ................... RFQWW ICLSMTLHFVILYVEILSTVFQICPL ................... TLTEW MALSFTLHFV ILYVDVLSTV FQVTPL SAEEW ICLSMSLHFL ILYVDPLPMI FKLKAL ................... DLTQW ICLSMSLHFL ILYVEPLPLI FQITPL ................... NVTQW VVMSMALHFL ILLVPPLPLI FQVTPL ................... SGRQW IFSSLSLHLI IMYVPFFAKL FNIVPLGVDP HVVQQAQPWS ILTPTNFDDW TIGSLLLHVL ILYIPPLARI FGVVPL ................... SAYDW IAVAIALQIG FSQLPFMNVL FKTAPM ................... DWQQW VIVTALLQLA LVYVSPLQKF FGTHSL ................... SQLDL VGLSLLGQMC AIYIPFFQSI FKTEKL ................... GISDI LA.SILVFLL IIFINPLGLV FNVLQ .................... DLTNH ..................................................
Pma3arath Pmalschpo Pma2schpo Pmalajeca Pmalneucr Pmalsacce Pma2sacce Pmalklula Pmalcanal Pmalzygr o Atcphomsa
KPAEKSTEKA AQRTLHGLQV AQRTLHGLQV AQRTLHGLQP AQRTLHGLQP
LNLSVSSGPH PD.PKIFSET PD.TKLFSEA KEDVNIFPEK PEATNLFNEK
KALEGLREDT TNFNELNQLA TNFNELNQLA GSYRELSEIA NSYRELSEIA
HVLNESTSPV EEAKRRAEIA EEAKRRAEIA EQAKRRAEIA EQAKRRAEMA
NAFSPKVKK. RLRELHTLKG RLRELHTLKG RLRELHTLKG RLRELHTLKG
AQRTLHGLQN TETANVVPER GGYRELSEIA NQAKRRAEIA RLRELHTLKG TRHEKGDA .......................................... THHEAEGKVT $ ....................................... TQHEKSS ........................................... TQHEKSQ ........................................... TQHEKET ........................................... TQHEKSS ........................................... TQHEKEN ........................................... TQHEKST ........................................... TQHEKGN ........................................... T S R L K F L K E A G H G T Q K E E I P EEELAEDVEE IDHAERELRR G Q I L W F R G L N
Atcqhomsa TSRLKFLKEA GRLTQKEEIP EEELNEDVEE IDHAERELRR GQILWFRGLN Atcrhomsa Atc3sacce Atc3schpo Atnlsacce Atn3homsa Atn3ratno Atn3galga Atnlhomsa Atn2homsa Atn3sussc Atnacatco Atnatorca Atnaartsf Atnadrome
TRSLKFLKEA GHGTTKEEIT ................ DEVA VAVMFYFFYV EIWKSIRRSL AFTIAFWIGA ELYKCGKRRY FCAFPYSFLI FVYDEIRKLI FCAFPYSFLI FVYDEIRKLI FCAFPYSFLI FVYDEIRKLI FCAFPYSLLI FVYDEVRKLI FCAFPYSLLI FIYDEVRKLI FCAFPYSLLI FVYDEVRKLI FCAFPYSLLI FIYDEIRKLI FCAFPYSLII FLYDEARRFI FPALPFSFLI FVYDEARKFI FPAIPFALAI FIYDETRRFY
KD..AEGLDE VKVFPAAFVQ TNPQKKGKFR FKTQRAHNPE LRRNPGGWVE LRRNPGGWVE LRRNPGGWVE IRRRPGGWVE LRRYPGGWVE IRRRPGGWVE LRRNPGGWME LRRNPGGWVE LRRNPGGWVE LRRNPGGWLE
IDHAEMELRR GQILWFRGLN ........... RFKYVFGLE RTL . . . . . . . . . . . . . . SNT NDLESNNKRD PFEAYSTSTT KETYY ............... KETYY ............... KETYY ............... KETYY ............... KETYY ............... KETYY ............... RETYY ............... QETYY ............... QETYY ............... QETYY ...............
Plasma membrane cation-transporting ATPase filmily Atnaartsa Atnahydat Athahomsa Atcartsf Atcbdrome Atcborycu Atcdhomsa Atcfratno Atctrybr Atcplafa Atalsynsp Atclsynsp Atclsacce Atclmycge
FPPMPFSLLI
LVYDECRKFL
LVPLPYGILI
FVYDEIRKLG
LPGLPFSLLI
FVYDEIRRYL
IVVLKISFPVLLL
MRRNPGGFLE
RETYY ...............
VRCCPGSWWD
QELYY ...............
LRKNPGGWVE
KETYY ...............
.... D E V L K F V A R K Y T D E F S F I K
..............
ITVMKFSIPV
V L L .... D E T L K F V A R K I A D
VPDVVVDRM
LMVLKISLPV
I L M .... D E T L K F V A R N Y L E
PAILE ...............
KAVIVFSVPV
I F L .... D E L L K F I T R R M E K
AQEKKKD
LMVLKISLPV
GVVLQMSLPV
I G L .... D E I L K F I A R N Y L E
I L L .... D E A L K Y L S R H H V D
FLVFLWSFPVIIL
V P V .... R I L A N R L D P
LLLLLISSSV
F I V .... D E L R K L W T R K K N E
AIC.LGFSLL
EKKDLK ..............
.............
.... D E I I K F Y A K R K L K E E Q R T K K I K I D
AICLLPMIPM
...........
G ...................
.........
........................
L F V .... Y L E A E K W V R H G R Y
....................
EDSTYFSNV
...........
Consensus
PVLISYSFGG VILYMGMNEV VKLIRLGYGN I ................... ..................................................
Pmallyces
HVESVVKLKG
LDIETIQQSY
TV ............................
Pmalarath
HVESVAKLKG
LDIDTAGHHY
TV ............................
HVESVVKLKG
LDIETAG.HY
Pmalnicpl Pma4nicpl
Pma3arath Atcphomsa Atcqhomsa Atcrhomsa Atc3sacce Atc3schpo Atnlsacce Atceorycu Atceratno Atcehomsa Atcesussc Consensus At cphomsa Atcqhomsa At crhomsa Consensus At cphomsa Atcqhomsa Atcrhomsa Consensus
1651
HVESVVKLKG HVESVVKLKG
LDIETIQQAY LDIETIQQHY
1700
TV ............................ TV ............................ TV ............................
RIQTQIRVVNAFRSSLYEGL
EKPESRSSIH
NFMTHPEFRI
EDSEPHIPLI
RIQTQIKVVK
AFHSSLHESI
QKPYNQKSIH
SFMTHPEFAI
EEELPRTPLL
ITTESKLSEK IHTEVNIGIK
DLEHRLFLQS RRA ........................... Q .......................................
DGISWPFVLL
IMPLVVWVYS
RIQTQIRVVK
FLRKNHTGKH
EGVSWPFVLL DGISWPFVLL
AFRSSLYEGL DDEEALLEES
EKPESRTSIH DSPESTAFY
NFMAHPEFRI
EDSQPHIPLI
.....................
IVPLVMWVYS
TDTNFSDLLW
S ...................
IMPLVIWVYS
TDTNFSDMFW
S ...................
TDTNFSDMFW
S ...................
DGISWPFVLL IMPLVIWVYS TDTNFSDMFW S ................... .................................................. 1701
DDTDAEDDAP
DDTDLEEDAA
TKR ........
LKQ ........
NSSPPPSPN
NSSPPSSLN
KNNNAVDSGI
KNNSSIDSGI
1750
HLTIEMNKSA
NLTTDTSKSA
DEEEEENPDK ASKFGTRVLL LDGEVTPYAN TNNNAVDCN... QVQLPQS. .................................................. 1751
TSSSPGSPLH
TSSSPGSPIH
1766
SLETSL
SLETSL
..... D S S L Q S L E T S V ................
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and are therefore not included in the alignment: Atcbgalga, Atcaorycu [Atcborycu); Atcdfelca, Atcdsussc, Atcdorycu, Atceorycu, Atcdratno, Atceratno, Atcehomsa, Atcesussc [Atcdhomsal; Atcporycu, Atcpratno, Atcpsussc [Atcphomsal; Atcqratno [Atcqhomsa); Athasussc, Athaorycu, Atharatno [Athahomsa); Atmasalty (Atmaesccol;
Atnlbufma, Atnlgalga, Atnloviar, Atnlequca, Atnlsussc, Atnlratno
85
(Atnlhomsa); Atn2sacce (Atnlsacce); Atn2galga, Atn2ratno (Atn2homsa); Pma2arath (Pmalarath); Pma3nicpl (Pmalnicpl). Residues listed in the consensus sequence are present in at least 75% of the aligned transporter sequences. Residues indicated in boldface type are also conserved m at least one other family of the P-type ATPase superfamily.
-:.
Database accession numbers SWISSPR OT
9
" . . . : : . .2
.:"
. .
..
" 7 - ' ~ '
-
-
.
.
,
.
.
..
. . . . . .
-.
..
v , -
.
.
..
. . . .
iii
i
..:
..
86
Ata 1synsp Atcartsf Atcplafa Atctrybr Atclsacce Atc3schpo Atc3sacce Atcaorycu Atcbdrome Atcbgalga Atcborycu Atcdfelca Atcdhomsa Atcdorycu Atcdratno Atcdsussc Atcehomsa Atcesussc Atceorycu Atceratno Atcfratno Atclmycge Atclsynsp Atcphomsa Atcporycu Atcpramo Atcpsussc Atcqhomsa Atcqramo Atcrhomsa Athahomsa Athaorycu Atharatno Athasussc Atmaescco Atmasalty Atmbsalty Atnlbufma Atnl equca Atn 1galga Atnlratno Atnlsussc Atnloviar Atn 1homsa Atnlsacce Atn2galga Atn2homsa Atn2ratno Atn2sacce Atn3galga
P3 7367 P35316 Q08853 P35315 P13586 P22189 P38929 P04191 P22700 P 13585 Pl 1719 Q00779 P 16614 P04192 P 11508 P11606 P 16615 P 11607 P20647 P 11507 P 18596 P47317 P3 7278 P20020 Q00804 P 11505 P23220 Q01814 Pl 1506 P23634 P20648 P27112 P09626 P 19156 P39168 P36640 P22036 P30714 P 18907 P09572 P06685 P05024 P04074 P05023 P13587 P24797 P50993 P06686 Q01896 P24798
PIR
EMBL/GENBANK
S40440; $33207 S07526
X71022; G296568 X51674; G665604 X71765; G402222 M73769; G162201 M25488; G172199 J05634; G 173355 U03060; G454003 M12898; G164779 M62892; G 158416 M26064; G211224 M12898 Z11500; G1081 M23115; G306851 X 0 2 8 1 4 G1469 ; J04023; G203059 X15073; G1921 M23114; G306850 X15074; G1923 J04703;G164739 J04022; G203057 M30581; G206899 U39687; G1045747 D 16436; G435123 J04027; G 190133 X59069; G 1675 J03 753; G203047 X53456; G2061 L20977; G404702 J03754; G203049 M25874; G179163 J05451; G561634 X64694; G 1471 J02649; G20303 7 M22724; G 164384 U14003; G537084 U07843; G468207 M57715; G397973 Z11798; G62492 X16773; G871026 J03230; G211220 D 10359; G220824 X03938; G 1898 X02813; G1206 D00099; G219942 U24069; G790261 M59959; G212406 J05096; G 179165 M14512; G203029 X67136; G5513 M59960; G212408
A45598 S05787; P W B Y R 1 A36096 A01075; P W R B F C A36691; S07050 A32792 $23444 B31981 A01076; P W R B S C B31982; S04269 S04651 /%31981 S04652 S10335; PWRBMC A31982 A34307 $36742 A30802 S 17179 A28065 S 13057 A38871 B28065 A35547 A35292; A36558 $23406 A25344 A31671; A24228 B39083 $24650; A43451 S04630 A28199 A24639; S00460 B24862 A01074; PWSHNA A24414 S05788; P W B Y R 2 B24639 $25007 B37227
Plasma membrane cation-transporting ATPase filmily
Atn3homsa Atn3sussc !!i!i i !i i ~i i i ,~!i,i,:iAtn3ratno :::::::::::::::::::::: Atnaartsa Atnaartsf i:====================== !.;i;i:-i~!::.il G~iii!:.::iii~i .i::i~;::Ii.2::: Atnacatco Atnadrome ........... Atnahydat Atnatorca Atxaleido i::::.::.;::i~:.~ :::::::::::::::::::::: ii~i!i~!:iiii ili~i!:! Atxbleido Pmalajeca Pmalarath Pmal canal ii!iiiiii::i',::iii~ Pmal klula Pmallyces Pmalneucr i:iii!ii!!i:iiG!i Pmalnicpl Pmalschpo Pmalsacce Pma 1zygro Pma2arath ~i: :::::::::::::::::::::::::::::::::::::::::: ~i:i;i~ii;ii~i~i:;iPma2sacce i.i:ii!;i~:~i?i:: !ii!ii:-:.~i Pma2schpo Pma3arath .~:..~.~.~:.:::..:::~:.:~. i!(~ii~:~..i~:.:i :.i ~:i!i i:Pma3nicpl Pma4nicpl iiiiiii~ii~ii!ii:::iiiiii iii:iiiiiiii:'~ii:!~ iii~i!~ii!ili
SWISSPR OT P 1363 7 P 18874 P06687 P17326 P28774 P25489 P 13607 P35317 P05025 Pl1718 P 12522 Q07421 P20649 P28877 P49380 P22180 P07038 Q08435 P09627 P05030 P24545 P19456 P 19657 P28876 P20431 Q08436 Q03194
PIR S00801
C24639 S06635 JH0470 S14740; PWCCNM S03632 S00503 A27124; PXLNPD A32326; PXMUP1 A41336; PXCKP A45506 A26497; PXNCP A41779 A28454; PXZP1P A25823; PXBY1P JX0181; PXKZP A37116; PXMUP2 A32023; PXBY2P A40945; PXZP2P A33698; PXMUP3 $24959; $33548
EMBL/GENBANK M3 7457; G497763 M38445; G 164382 M14513; G203031 Y07513; G5670 X56650; G 10934 X58629; G62642 X14476; G732656 M75140; G159258 X02810; G64400 M17889; G159294 J04004; G 159295 L07305; G409249 M24107; G166746 M74075; G170818 L37875; G598435 M60166; G170464 M14085; G168761 M80489; G 170289 J03498; G173429 X03534; G4187 D 10764; G218531 J05570; G166629 J04421; G295644 M60471; G 173431 J0473 7; G 166625 M80490; G 170295 X66737; G19704
References :::::::::::::::::::::::: ,::.~::.:;.
:;!.!ii~!:r162 i.:::~:~i~;~ll :..::i~::.i~::i:.~::~i:.i:::
1 z 3 4
:::~i!~:::ii ~::!:i:::: ;il
5 6 ili::~,!!iii:i:::::~ii?~::ii:'~ ii~i!ii~!:::i::::i:Li: ~i
ii:!~:r :~-~i:~:: :~:?i:::~
7 s
Lytton, J. and MacLennan, D.H. (1988) J. Biol. Chem. 263, 15024-15031. Harper, J.F. et al. (19891Proc. Natl Acad. Sci. USA 86, 1234-1238. Maeda, M. et al. (1990) J. Biol. Chem. 265, 9027-9032. Sussman, M.R. (1994) Annu. Rev. Plant Physiol. Plant Mol. Biol. 45, 211-234. Assmann, S.M. and Haubrick, L.L. (1996) Curr. Opin. Cell Biol. 8, 458-467. Green, N.M. and MacLennan, D.H. (1989) Biochem. Soc. Trans. 17, 819-822; Green, N.M. (1989) Biochem. Soc. Trans. 17, 970-972. Fagan, M.J. and Saier, M.H. Jr. (1994) J. Mol. Evol. 38 57-99. Rudolph, H.K. et al. (1989)Cell 58, 133-145.
87
m
Heavy metal-transporting ATPase family Summary
Transporters of the heavy metal-transporting ATPase family, examples of which are heavy-metal transporting P-type ATPases from bacteria such as Enterococcus (Atkaentfa) 1 and human copper-transporting ATPase 12 (At7ahomsa), mediate active transport of heavy metal ions driven by ATPase !i!il:i:;~!i~i:!:~: activity. Where the natural substrate is known it is usually divalent copper or cadmium. The nitrogen fixation protein FIXI from R h i z o b i u m meliloti a is also a member of this family. In humans, mutations in copper-transporting ATPases cause hereditary Menkes' disease (Cu-transporting ATPase 12} and Wilson's disease (Cu-transporting ATPase 2 4). Members of the heavy metaltransporting ATPase family have a broad biological distribution that includes gram-positive and gram-negative bacteria, yeast and humans. Heavy metaltransporting ATPases from bacteria may be chromosomal or plasmid-encoded. Statistical analysis of multiple amino acid sequence comparisons places the heavy metal-transporting ATPase family in the P-type ATPase superfamily (also known as El-E2 ATPases s,6). Proteins in this superfamily use the energy of ATP hydrolysis to pump ions across cell membranes. P-Type ATPases are }i: !i:!ii :~:~:ii:,~~!::!~;:: all predicted to contain at least six transmembrane helices by the hydropathy i}!i~:i:i!i!i~i~i::ii~. of their amino acid sequences. They have two large cytoplasmic loops separating three pairs of transmembrane helices; the larger of these loops contains )i?!Gi!~ i':?!ir the ATP binding domain. The sequences are usually extended by one or two i:)',~i!Nii:,!i:i more pairs of helices s. Members of the heavy metal-transporting ATPase family are predicted to contain eight transmembrane helices 7. They also have !i;iiii!~r an N-terminal cytoplasmic domain which contains one or more repeats of a sequence associated with heavy metal binding, the HMA sequence 7. In the iiii~:~i;ili!!iiii~',!:iii human copper-transporting proteins 2'4 this domain contains six tandem HMA sequences. Eukaryotic proteins may be glycosylated. A few short sequence motifs are very highly conserved within the heavy metal-transporting ATPase family of transporters, including motifs unique to the family and signature motifs of the P-type ATPase superfamily.
Nomenclature, biological sources and substrates CODE
DESCRIPTION [SYNONYMS]
At7ahomsa Copper-transporting ATPase 1 [Copper pump 1, Menkes' disease-associated protein, ATP7A, MNK, MC1] At7acrigr Copper-transporting ATPase 1 At7bhomsa Copper-transporting ATPase 2 [Copper pump 2, Wilson's disease-associated protein, ATP7B, WND, PWD, WCI] Atc2sacce Probablec a l c i u m transporting A T P a s e [PCA1, YBR295W, YBR21121
88
ORGANISM [COMMON NAMES] Homo sapiens
SUBSTRATE(S)
Cu2+
[human]
Cricetulus griseus
C u 2+
[hamster] Homo sapiens
C u 2+
[human]
Saccharomyces cerevisiae
[yeast]
Ca2§
CODE
DESCRIPTION [SYNONYMS]
OR GANISM
SUBSTRATE(S)
[COMMON NAMES]
Atcssynsp Cation-transporting ATPase [PACS] Atkaentfa Potassium/coppertransporting ATPase A [ATKA]
Synechococcus sp. [cyanobacterium] Enterococcus faecalis [gram-positive bacterium]
Metal ions
Atkbentfa
Enterococcus faecalis [gram-positive bacterium]
Cu 2+, K+
Escherichia coli [gram-negative bacterium] Synechococcus sp. [cyanobacterium]
Cu 2+
Saccharomyces cerevisiae [yeast]
Cu2+
Bacillus firmus [gram-positive bacterium]
Cd 2+
Staphylococcus aureus [gram-positive bacterium]
Cd 2+
Staphylococcus aureus [gram-positive bacterium]
Cd 2+
Atsyescco Atsysynsp Atulsacce Cadabacfi
Cadastaau
Caddstaau
Potassium/coppertransporting ATPase A [ATKB] Probable coppertransporting ATPase Probable coppertransporting ATPase [SYNAI Probable coppertransporting ATPase [Cu2+-ATPase, CCC2] Probable cadmiumtransporting ATPase [Cadmium efflux ATPase, CADA] Probable cadmiumtransporting ATPase [Cadmium efflux ATPase, CADA] Probable cadmiumtransporting ATPase [Cadmium efflux ATPase, CADAI P-Type ATPase
Bradyrhizobium japonicum [gram-negative bacterium] Ctppromi Heavy metal-transporting Proteus mirabilis [gram-negative bacterium] P-type ATPase Ctpamycle Cation-transporting P-type Mycobacterium leprae [gram-negative bacterium] ATPase A [CTPB] Ctpbmycle Cation-transporting P-type Mycobacterium leprae [gram-negative bacterium] ATPase A [CTPB] Rhizobium rneliloti Fixirhime Nitrogen fixation protein [gram-negative bacterium] [FIXlI
Ctpbraja
Cu 2+, K+
Cu2+
Metal ions Metal ions Mg2§ Mg~*
Metal ions
89
Heavy metal-transporting ATPase family
ir
P h y l o g e n e t i c tree Ctpbbraja Fixirhime
. .
:;::.id
............... Ji};
~ "
i!i~::~iii.liii:!r~ !i~;i::iiii~i~,-:.~!!!i]!iii
Cadastaau Caddstaau Cadabacfi Ctpbmycle Ctpamycle
-
Atsysynsp At7acrigr
N: |
LAtTahomsa At7bhomsa .... A t c s s y n s p Ctppromi Atkaentfa Atsyescco Atulsacce Atkbentfa Atc2sacce
P r o p o s e d o r i e n t a t i o n of A T 7 A z in t h e m e m b r a n e . . . . . . . . . . . . . . .
:.:.:..=.:.:.::.:.:.-v.:
............. ...... ...,......., ....
'~if! ~';!i!:2j,2.! (,~ ;2.. !~!
90
The model is based on predictions of membrane-spanning regions and ~-helical content. The N-terminus of the protein is illustrated on the reside and is folded eight times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75 % of the aligned transporters {see below} are shown.
OUTSIDE
L
i ,'i!!
v
G I
l
Nt
N
T. . . . .
T G D
S
~
APL
A
A
A
13
G I
Q
G G
A
COON
A
D
A
V I
A DG
J
A
DKI'GTLT G V -
E
S HP AI
v,,,GV]vp
T
G NH
t
N
D Q
L
C
V
A fi....... L.:
940~,
F
C
TGG
A ....
" .!
iL i~
V
P
PG
I)(3 G D TGE
2 INSIDE
Physical and genetic characteristics At7ahomsa At7acrigr At7bhomsa Atc2sacce Atcssynsp Atkaentfa Atkbentfa Atsyescco Atsysynsp Atul sacce Cadabacfi Cadastaau Caddstaau Ctpbraja Ctppromi Ctpamycle Ctpbmycle Fixirhime
AMINO ACIDS 1500 1476 1443 1216 747 727 745 834 790 1004 723 727 804 730 829 780 750 757
MOL. WT
163334 160335 154 776 131838 79 732 78 388 81 522 87 782 83 694 109 828 78 207 78 811 86 882 77 337 87859 82 384 78 224 79559
EXPRESSION SITES endothelial cells
CHROMOSOMAL L O CU S Xq13.3
liver, kidneys
13q 14.3 copAB operon copAB operon Chromosome 4
91
Multiple amino acid sequence afignments
92
At7acrigr At7ahomsa At7bhomsa Consensus
1 50 MEPSMDVNSV TISVEGMTCI SCVRTIEQKI GKENGIHHIK VSLEEKSATI MDPSMGVNSV TISVEGMTCN SCVWTIEQQI GKVNGVHHIK VSLEEKNATI ...................... MPEQERQI TAREGASRKI LS.KLSLPTR ..................................................
At7acrigr At7ahomsa At 7bhomsa Consensus
51 i00 IYDPKLQTPK TLQEAIDDMG FDALLHNANP LPVLTDTLFL TVTASLTLPW IYDPKLQTPK TLQEAIDDMG FDAVIHNPDP LPVLTDTLFL TVTASLTLPW AWEPAMKKSF AFDNVGYEGG LDGLGPSSQV ATSTVRILGM TCQSCV .... ..................................................
At 7acrigr At 7ahomsa At 7bhomsa Consensus
i01 150 DHIQSTLLKT KGVTDIKIFP QKRTLAVTII PSIVNANQIK ELVPELSLET DHIQSTLLKT KGVTDIKIYP QKRTVAVTII PSIVNANQIK ELVPELSLDT KSIEDRISNL KGIISMKVSL EQDSATVKYV PSVVCLQQVC HQIGDMGFEA ..................................................
At7acrigr At7ahomsa At7bhomsa Atc2sacce Consensus
151 200 GTLEKRSGAC EDHSMAQAGEVVLKIKVEGM TCHSCTSTTE GKIGKLQGVQ GTLEKKSGAC EDHSMAQAGEVVLKMKVEGM TCHSCTSTIE GKIGKLQGVQ SIAEGKAASW PSRSLP.AQE AVVKLRVEGM TCQSCVSSIE GKVRKLQGVV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MKPEKLFSGL ..................................................
At7acrigr At7ahomsa At7bhomsa Atc2sacce Consensus
201 250 RIKVSLDNQE ATIVYQPHLI SVEEIKKQIEAMGFPAFVKK QPKYLKLGAI RIKVSLDNQE ATIVYQPHLI SVEEMKKQIEAMGFPAFVKK QPKYLKLGAI RVKVSLSNQE AVITYQPYLI QPEDLRDHVNDMGFEAAIKS KVAPLSLGPI G T S D G E Y G V V N S E N I S I D A M Q D N R G E C H R R SIEMHANDNL GLVSQRDCTN ..................................................
At7acrigr At7ahomsa At7bhomsa Atc2sacce Consensus
251 300 DVERLKNT .... PVKSLEGS QQR.PSYPSD S .... TATFI IEGMHCKSCV DVERLKNT .... PVKSSEGS QQRSPSYTND S .... TATFI IDGMHCKSCV DIERLQSTNP KRPLSSANQN FNNSETLGHQ GSHVVTLQLR IDGMHCKSCV RPKITPQECL SETEQICHHG ENRTKAGLDV DDAETGGDHT NESRVDECCA ..................................................
At7acrigr At7ahomsa At7bhomsa Atc2sacce Consensus
301 350 SNIESALPTL QYVSSIAVSL ENRSAIVKYN ASSVTPEMLI KAIEAVSPGQ SNIESTLSAL QYVSSIVVSL ENRSAIVKYN ASSVTPESLR KAIEAVSPGL LNIEENIGQL LGVQSIQVSL ENKTAQVKYD PSCTSPVALQ RAIEALPPGN EKVNDTETGL DVDSCCGDAQ TGGDHTNESC VDGCCVRDSS VMVEEVTGSC ..................................................
At7acrigr At7ahomsa AtTbhomsa Atc2sacce Consensus
351 400 YRVSIANEVE STSS...SPS SSSLQKMPLNVVSQPLTQET VINISGMTCN YRVSITSEVE STSN...SPS SSSLQKIPLNVVSQPLTQET VINIDGMTCN FKVSLPDGAE GSGTDHRSSS SHSPGSPPRN QV.QGTCSTT LIAIAGMTCA EAVSSKEQLL TSFEVVPSKS EGLQSIHDIR ETTRCNTNSN QHTGKGRLCI ..................................................
At7acrigr At7ahomsa At7bhomsa Atc2sacce Consensus
401 450 SCVQSIEGVVSKKPGVKSIH VSLANSFGTV EYDPLLTAPE TLREVIVDMG SCVQSIEGVI SKKPGVKSIR VSLANSNGTV EYDPLLTSPE TLRGAIEDMG SCVHSIEGMI SQLEGVQQIS VSLAEGTATV LYNPSVISPE ELRAAIEDMG ESSDSTLKKR SCKVSRQKIE VSSKPECCNI SCVERIASRS CEKRTFKGST ..................................................
At7acrigr At7ahomsa At7bhomsa Atsyescco Atc2sacce Consensus
451 500 FDAVLPDMSE PLVVIAQPSL ETPLLPSTND .................... FDATLSDTNE PLVVIAQPSS EMPLLTSTNE FYTKG ............... FEASVVSESC STNPLGNHSA GNSMVQTTDG TPTSVQEVAP HTGRLPANHA ................................... MSQTI DLTLDGLSCG NVGISGSSST DSLSEKFFSE QYSRMYNRYS SILKNLGCIC NYLRTLGKES ..................................................
Caddstaau At7acrigr At7ahomsa At7bhomsa Ctppromi Atsyescco Atulsacce Atc2sacce Consensus
501 550 ...... MDSS T K T L T E D K Q V Y R V E G F S C A N C A G K F E K N V K E L S G V H D A K V ....... QDN M M T A V H S K C Y I Q V S G M T C A S C V A N I E R N L R R E E G I Y S V L V ...MTPVQDK EEGKNSSKCY IQVTGMTCAS CVANIERNLR REEGIYSILV PDILAKSPQS TRAVAPQKCF LQIKGMTCAS CVSNIERNLQ KEAGVLSVLV ...... M N T P T T L S S A N R L S L P V E G M T C A S C V G R V E R A L K A V P E I K D A V V HCVKRVKESL EQRPDVEQAD VSITEAHVTG TASAEQLIET IKQAGYDASV ............... MREVI LAVHGMTCSA CTNTINTQLR ALKGVTKCDI CCLPKVRFCS GEGASKKTKY SYRNSSGCLT KKKTHGDKER LSNDNGHADF ..................................................
Ctpbraja Fixirhime Cadastaau Caddstaau Cadabacfi Ctpbmycle Ctpamycle Atsysynsp At7acrigr At7ahomsa At7bhomsa Atcssynsp Ctppromi Atkaentfa Atsyescco Atulsacce Atkbentfa Atc2sacce Consensus
551 600 .......................... MHVT RDFSHY ..... VRTAGEGIK ........ MS C C A S S A A I M V A E G G Q A S P A S E E L W L A . . . . . S R D L G G G L R . . . . . . . . . . . . . . . . . . . . . . . . . . . . MS E Q K V K . . . . . . . . . . L M E E E NFGASKIDVF GSATVEDLEK AGAFENLKVA PEKARR ..... RVEPVVTED . . . . . . . . . . . . . . . . . . . . . . . . . . . . MS D Q K A . . . . . . . . . . ITSEQE ............................ MT A S L V E D . . . . . T N N N H E S V R ................................................ MQ ............................... MPAAI ..... VHSADPSST ALMAGKAEVR YNPAVIQ..P PVIAEFIREL GFGATV ..... MENADEGDG ALMAGKAEVR YNPAVIQ..P PMIAEFIREL GFGATV ..... IENADEGDG ALMAGKAEIK YDPEVIQ..P LEIAQFIQDL GFEAAV ..... MEDYAGSDG ............................................. MVNQQ N L A T E R A D I T F S S T P N P . . V ......... L A V S A I E . . . . . S S G Y K V P E E ............................................ MATNTK S H P K A K P L A E S S I P S E A . L .......... T A V S E A L . . . . . P A A T A D D D D SLVTNECQVT YDNEVTADSI KEIIEDCGFD CEILRD ..... SEITAISTK ................................... M ..... NNGIDPENE VCSKSCCTKM KDCAVTSTIS GTSSSEISRI VSMEPIENHL NLEAGSTGTE ..................................................
Ctpbraja Fixirhime Cadastaau Caddstaau Cadabacfi Ctpbmycle
601 HIDLAVEGVH QTELSVPNAY MNVYRVQGFT KNVYRVEGFS MKAYRVQGFT RIQLDVAGML
CAGCMAKIER CGTCIATIEG CANCAGKFEK CANCAGKFEK CANCAGKFEK CAACASRVET
GLSAIPDVTL ALRAKPEVER NVKKIPGVQD NVKQLAGVQD NVKQLSGVED KL.NKIPGVR
ARVNLTDRRV ARVNLSSRRV AKVNFGASKI AKVNFGASKI AKVNFGASKI ASVNFATRVA
650 ALEWKAGT.. SIVWKEEVGG DVYGNASVEE DVYGNASVEE AVYGNATIEE TI..oDAVDV
93
Ctpamycle Atsysynsp At7acrigr At7ahomsa At7bhomsa i!i~ii'![ (!!ili!~iii~i A t c s s y n s p Ctppromi Atkaentfa Atsyescco Atulsacce i!i I(I!~!E~! A t k b e n t f a Atc2sacce Consensus (i~iii~ii ~:;'i~i~ ~ !i~ii
~iii~i:i:ii::i~ii~!!i C t p b r a j a Fixirhime Cadastaau Caddstaau Cadabacfi Ctpbmycle Ctpamycle Atsysynsp At7acrigr At7ahomsa At7bhomsa Atcssynsp Ctppromi Atkaentfa ....;,:::ii:Y%: Atsyescco !i!iti/.!::/i A t u l s a c c e Atkbentfa Atc2sacce :i!i l;.41~:i!!~: Consensus
i!!i!i!!!
% ::::, i;i
:if!i:/(ii .,.:
..
.-.
:.
)
)
i:.:..::'ij.!-. i
s . . . . . . . . .....
94
.
RIQLNITGMS SILVEVEGMK ILKLVVRGMT VLELVVRGMT NIELTITGMT ..TLTLRGMG ITELAIEEMT METFVITGMT SQQLLLSGMS EGLLSVQGMT TNKKGAIGKN HIVLSVSGMS ....... G..
TLVNSATRVA RLQQTAGVEA VSVNLITRLA TLTKHKGIFY CSVALATNKA SLTKHRGILY CSVALATNKA KLTRTNGITY ASVALATSKA LIQALPGVQE CSVNFGAEQA ALAQIPGVLE ATVNLATERA ELNEQPGVMS ATVNLATEKA ALQSVPGVTQ ARVNLAERTA QVEGIEGVESVVVSLVTEEC NTKNNLQEHG KMENMDQHHT SFGALKCVHG LKTSLILSQA
CSCCAPNGWNNLPNKLSDFS
CAGCVAAVER CASCVHKIES CASCVHKIES CASCVHNIES CAACAGRIEA CASCVGRVEK CANCSARIEK CASCVTRVQN CGSCVSTVTK PEEKITVEQT CTGCESKLKK C..C
..................
RL...TSAR. KVDYDAALIE HIKYDPEIIG HIKYDPEIIG LVKFDPEIIG QVCYDPALTQ RVRHLSGVVS SVKYTDTTTE LVM...GSAS HVIYEPSKT. HGHMERHQQM EFNLDLAQGS
V .................
651 700 ..LDPGRFIDRLEELGYKAYPFETESAEVAEVAES . . . . . . . . . . . . . RF RRTNPCDFLH AIAERGYQTH LFSPGEEEGD DLLKQ ...... _ _ _ _ - _ _ _ _ ...LEK . . . . . . . A G A F E N L K V S P E K L A N Q T I Q R V K D D T K A H K E E K T P F Y ...LEK . . . . . . . A G A F E N L K V I P E K L A N P S I Q A V K E D T K A P K E E K I P F Y LEK AGAFENL KVTPEKSARQ ASQEVKEDT...KEDKVPFY ...AVDELRQ VIEQAGYRAT ........... AHAESAVEE IDPDADYARN ...SPRPLRY VKAVRRAALC ........... TDGGEALQR RQADADNARY . . . D P T V L T T E I T G L G F R A Q L R Q D D N P L T L P I A E I P P L Q Q QR ........ . . . P R D I I H T . I G S L G F E A S L V K K D R S A S H L D H K R E I K Q W RSS ....... . . . P R D I I H T .IESLGFEAS L V K K D R S A S H L D H K R E I R Q W RRS ....... ...PRDIIKI .IEEIGFHAS L A Q R N P N A H H L D H K M E I K Q W KKS ....... ...VAAIQAA.IEAAGYHAF PLQDPWDN.. EVEAQERHRR ARSQRQLAQR ...ITDLEVA .WHAGYKPRRLSDNPANTRDLSEERREKEARS ....... ..... RLIKS .VENIGYGAI L Y D E A H K Q K I A E E K Q T Y L R K M K F D ...... . . . P Q D L V Q A . V E K A G Y G A K R L K M T L N A A S A S K K P P S L A M K R ........ ...TLETARE M I E D C G F D S N I I M D G N G N A D M T E K T V I L K V T K A F E D E S P L . . . D H G H M S G .MDHSHMDHE D M S G M N H S H M G H E N M S G M D H S M H M G N F K Q K VKDVIKHLSK TTEFKYEQIS NHGSTIDVVVPYAAKDFINE EWPQGVTELK ..................................................
701 Ctpbraja LLRCLGVAAF ATMNVMMLSI Fixirhime LILAVAVSGFAATNIMLLSV Cadastaau KKHSTLLFAT LLIAFGYLSH Caddstaau KKHSTLLFAT LLIAFGYLSH Cadabacfi KKHSTLLYAS LLITFGYLSS Ctpbmycle LLRRLIVAAL LFVPLADLST Ctpamycle LLIRLAVAAALFVPLAHLSV A t s y s y n s p .......... L Q L A I A A F L L At7acrigr FLVSLFFCTP VMGLMMYMMA At7ahomsa FLVSLFFCIP VMGLMTYMMV At7bhomsa FLCSLVFGIP VMALMIYMLI AtcssynspVWVSGLIASL LVIGSLPMML Ctppromi LRRALLIATI FTLPVFVIEM Atkaentfa LIFSAILTLP LMLAMIAMML Atsyescco FRWQAIVALA VGIPVMVWGM Atulsacce ILSSVSERFQ FLLDLGVKSI Atkbentfa FWLSLILAIP IILFSPMMGM
750 PVWSGNVSDM LPEQRDFF ............ S V W S G A D .... A A T R D L F . . . . . . . . . . . . FVNGE . . . . . . . . . . . . . . . . . . . . . . . . . FVNGE . . . . . . . . . . . . . . . . . . . . . . . . . YVNGE . . . . . . . . . . . . . . . . . . . . . . . . . .............................. .............................. .............................. .............................. .............................. .............................. .............................. .............................. .............................. .............................. EISDDMHTLT IKYCCNELGI RDLLRHLERT SF . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Atc2sacce IVERNIIRIY FDPKVIGARD LVNEGWSVPV SIAPFSCHPT IEVGRKHLVR Consensus .................................................. 751 800 Ctpbraja .................................................. Fixirhime .................................................. Cadastaau .................................................. Caddstaau .................................................. Cadabacfi .................................................. C t p b m y c l e ..... M F A I V P T N R . . . . . . . . . . . . . . . . . . . . . . . . . . FPGWGYLL.. C t p a m y c l e ..... M F A V L P S T H . . . . . . . . . . . . . . . . . . . . . . . . . . FPGWEWML.. A t s y s y n s p ..... I V S S W G H L G H W L D H P L P G T D Q L . . . . . . . . . . . . . . . . . . WFH.. A t 7 a c r i g r ..... M E H H F A T I H H N Q S M S N E E M I K N H S S M F L E R Q I L P G L S I M N L L S . . A t 7 a h o m s a ..... M D H H F A T L H H N Q N M S K E E M I N L H S S M F L E R Q I L P G L S V M N L L S . . A t 7 b h o m s a . . . . . . . . . . . . . . . . . . PS N E P .... HQS M V L D H N I I P G L S I L N L I F . . A t c s s y n s p ..... G I S . I P G I P M W L H H P G . . . . . . . . . . . . . . . . . . . . . . . . . LQ.. Ctppromi ..... G S H F I P G V H H W V T Q T L G Q Q . . . . . . . . . . . . . . . . . . LNWYIQ.. A t k a e n t f a ..... G S H . . G P I V S F F H L S L . . . . . . . . . . . . . . . . . . . . . . . . . VQ.. Atsyescco ............ IGDNMMVT ADNR .................. SLWLVI.. Atulsacce GYKFTVFSNL DNTTQLRLLS KEDEIRFWKK NSIKSTLLAI ICMLLYMIVP Atkbentfa ................................... PFQVT FPGSNWVV.. A t c 2 s a c c e V G C T T A L S I I L T I P I L V M A W A P Q L R E K I S T IS . . . . . . . . . . . . . . . . . . Consensus .................................................. Ctpbraja Fixirhime Cadastaau Caddstaau Cadabacfi Ctpbmycle Ctpamycle Atsysynsp At7acrigr At7ahomsa At7bhomsa Atcssynsp Ctppromi Atkaentfa Atsyescco Atulsacce Atkbentfa Atc2sacce Consensus
801 850 .......................... HWLSALIALPAAAY AGQPFFRSAW .......................... HWIS ALIAGPALIY AGRFFYKSAW ......................... DNLVT SMLFVGSIVI GGYSLFKVGF ......................... DNLVT SMLFVSSIVI GGYSLFKVGF ......................... ENIVT TLLFLASMFI GGLSLFKVGL .............................. TALAAPIVTWAAWPFHRVAL .............................. TALAIPVVTWAAWPFHRVAI .............................. ALLATWALLG PGRSILQAGW .............................. LLLCLPVQFF GGWYFYIQAY .............................. FLLCVPVQFF GGWYFYIQAY .............................. FILCTFVQLL GGWYFYVQAY .............................. LGLTLPVLWA .GRSFFINAW .............................. FVLATIVMFG PGLRFFKKGI .............................. LLFALPVQFY VGWRFYKGAY .............................. GLITLAVMVF AGGHFYRSAW MMWPTIVQDR IFPYKETSFV RGLFYRDILG VILASYIQFS VGFYFYKAAW .............................. LVLATILFIY GGQPFLSGAK ............................ AS M V L A T I I Q F V I A G P F Y L N A L ................................ L . . . . . . . . G . . F .....
Ctpbraja Fixirhime Cadastaau Caddstaau Cadabacfi Ctpbmycle Ctpamycle Atsysynsp
851 RALS.AKTTN NAIR.HGRTN QNLI.RFDFD QNLI.RFDFD QNLL.RFEFD RNAR.YRAAS HNAR.YHGAS QGLR.CGAPN
900 MDVPISIGVI LALGMSVVET I ............... HHAE MDVPIALAVS LSYGMSLHET I ............... GHGE MKTLMTVAVI GATIIGK ....................... MKTLMTVAVI GAAIIGE ....................... MKTLMTVAVI GGAIIGE ....................... METLISAGILAATGWSLSTI FVDKEPRQTH GIWQAILHSD METLISTGITAATIWSLYTV FGHHQSTEHR GVWRALLGSD M N S L V L L G T G S A Y L A S L V A L L W ....... P Q L ...... G W
95
A t 7 a c r i g r K A L K . H K T A N M D V L I V L A T T IAFAYSLII. LL ....... V A M Y E R A K V N P
~Ji};}JJ:!ili)~J A t 7 a h o m s a K A L K . H K T A N M D V L I V L A T T IAFAYSLII. LL ....... V A M Y E R A K V N P At7bhomsa Atcssynsp Ctppromi Atkaentfa
......... . . . . .
~::i:.i~i::.ii::~::::!:-::!i::5,
........ ....
Atsyescco
Atulsacce Atkbentfa Atc2sacce Consensus
.:..~ 9 . . . ..:.:.:...... . .
ilil;iiiiii -iii!!i
. . . .
Ctpbraja Fixirhime Cadastaau Caddstaau Cadabacfi Ctpbmycle Ctpamycle Atsysynsp At7acrigr At7ahomsa At7bhomsa Atcssynsp Ctppromi Atkaentfa Atsyescco Atulsacce Atkbentfa Atc2sacce Consensus
!i!i!iiii
........ .................. ....... if:;! i~iL ;..i.~2!
96
M D V L I V L A T S IAYVYSLVI. LV ....... V A V A E K A E R S P M D T L V A V G T G A A F L Y S L A V T LF ....... P Q W L T R Q G L P P M N S L V S V G T V A A Y G Y S V V S T FI ....... P Q V L . . P A G T A M D V L V A I G T S A A F A L S I Y N G FF ....... P ...... SHSH M D T L V A L G T G V A W L Y S M S V N L W ....... P Q W F P M E A . . R M D T L V C V S T T C A Y T F S V F S L V H N M F H P S S T G K L P R ..... M M T L I A M G I T V A Y V Y S V Y S F I ........ A N L I N P H T H V M M D L L I V L S T S A A Y I F S I V S F GY . . . . . . . . . F V V G R P L S T M.L . . . . . . . . A...S . . . . . . . . . . . . . . . . . . . . . . . .
901 HAYFDAAIML LTFLLVGRFL HAWFDASVTL LFFLLIGRTL ...WAEASIV VILFAISEAL ...WAEASIV VILFAISEAL ...WAEVAIV VILFAISEAL SIYFEVAAGV TVFVLAGRFF AIYFEVAAGI TVFVLAGKYY VCFFDEPVML LGFILLGRTL ITSFDTPPML FVFIALGRWL ITFFDTPPML FVFIALGRWL VTFFDTPPML FVFIALGRWL DVYYEAIAVI IALLLLGRSL NIYFEAAVVI VTLILLGRNL DLYFESSSMI ITLILLGKYL HLYYEASAMI IGLINLGHML .IVFDTSIMI I S Y I S I G K Y L D F F W E L A T L I .VIMLLGHWI Atc2sacce E Q F F E T S S L L V T L I M V G R F V C o n s e n s u s . . . . . . . . . . . . . . . . G..L
Ctpbraja Fixirhime Cadastaau Caddstaau Cadabacfi Ctpbmycle Ctpamycle Atsysynsp At7acrigr At7ahomsa At7bhomsa Atcssynsp Ctppromi Atkaentfa Atsyescco Atulsacce Atkbentfa
iiiiiii!i!i!i i::?:-: :::.::: :~!:.i'! ~
KSLR.HRSAN KAFR.QNTAT PALL.RGAPD HALK.TKAPN KSLL.NGAAT ASLK.HGSGT MELK.QKSPA KSLIFSRLIE ..........
950 DQNMRRRTRA VAGNLAALKA ETAAKFVGPD DHMMRGRART AISGLARLSP RGATVVHPDG E R F S M D R S R Q S I R S L M D I A P KEALVRRNG. E R F S M D R A R Q S I R S L M D I A P KEALVRRNG. E R F S M D R A R Q S I R S L M D I A P KEALVKRNG. EARAKSKAGS ALRALAARGA KNVEVLLPNG TARAKSHASI ALLALAALSA KDAAVLQPDG EEQARFRSQA ALQNLLALQP ETTQLLTAPS E H I A K G K T S E A L A K L I S L Q A TEATIVT... E H I A K G K T S E A L A K L I S L Q A TEATIVT... E H L A K S K T S E A L A K L M S L Q A TEATVVT... E E R A K G Q T S A A I R Q L I G L Q A KTARVLR... EAKAKGNTSQ AIKRLVGLQA KTARVSR E H T A K S K T G D A I K Q M M S L Q T KTAQVLR... E A R A R Q R S S K A L E K L L D L T P PTARLVT... E T L A K S Q T S T A L S K L I Q L T P SVCSII .... EMNAVSNASD ALQKLAELLP ESVKRLKKDG SELARHRAVK SI.SVRSLQA SSAILVDKTG E . . . . . . . . . . . . . L..L .... A .......
951 i000 .......... E I S Q V P V A A I S P G D I V L L R P G E R C A V D G T V I E G R S E I D Q S .......... S R E Y R A V D E I N P G D R L I V A A G E R V P V D G R V L S G T S D L D R S .......... Q E I I I H V D D I A V G D I M I V K P G E K I A M D G I I V N G L S A V N Q A .......... Q E I M I H V D D I A V G D I M I V K P G E K I A M D G I I I N G V S A V N Q A .......... Q E I M I H V D D I A V G D I M I V K P G Q K I A M D G V V V S G Y S A V N Q T .......... A E L T I P A G E L K K Q Q H F L V R P G E T I T A D G V V I D G T A T I D M S .......... S E M V I P A N E L N E Q Q R F V V R P G Q T I A A D G L V I D G S A T V S M S SIAPQDLLEA PAQIWPVAQL RAGDYVQVLP GDRIPVDGCI VAGQSTLDTA ..LDSDNILL S E E Q V D V E L V Q R G D I I K V V P G G K F P V D G R V I E G H S M V D E S ..LDSDNILL S E E Q V D V E L V Q R G D I I K V V P G G K F P V D G R V I E G H S M V D E S ..LGEDNLII R E E Q V P M E L V Q R G D I V K V V P G G K F P V D G K V L E G N T M A D E S ..QGQ ...... E L T L P I T E V Q V E D W V R V R P G E K V P V D G E V I D G R S T V D E S ..HGE ...... I L E I P L D Q V M M G D I V V V R P G E K I P V D G E V V E G H S Y V D E S ..DGK ...... E E T I A I D E V M I D D I L V I R P G E Q V P T D G R I I A G T S A L D E S ..DEG ...... E K S V P L A E V Q P G M L L R L T T G D R V P V D G E I T Q G E A W L D E A .... SDVERN E T K E I P I E L L Q V N D I V E I K P G M K I P A D G I I T R G E S E I D E S .......... T E E T V S L K E V H E G D R L I V R A G D K M P T D G T I D K G H T I V D E S .......... K E T E I N I R L L Q Y G D I F K V L P D S R I P T D G T V I S G S S E V D E A . . . . . . . . . . . . . . . . . . . . . . . D . . . V . P G ..... DG .... G .... D..
N .
.
.
.
.
.
.
.
.
.
.
.
i001 Ctpbraja LITGETLYVT A E Q G T P V Y A G FixirhimeVVNGESSPTV VTTGDTVQAG Cadastaau AITGESVPVS KAVDDEVFAG Caddstaau A I T G E S V P V A KTVDDEVFAG Cadabacfi AITGESVPVE KTVDNEVFAG Ctpbmycle A I T G E A R P V H A S P A S T V V G G Ctpamycle P I T G E A K P V R V N P G A Q V I G G Atsysynsp MLTGEPLPQP CQVGDRVCAG At7acrigr L I T G E A M P V A KKPGSTVIAG A t 7 a h o m s a L I T G E A M P V A KKPGSTVIAG A t 7 b h o m s a LITGEAMPVT KKPGSTVIAR Atcssynsp MVTGESLPVQ KQVGDEVIGA Ctppromi M I T G E P V P V A KEIGAEVVGG A t k a e n t f a MLTGESVPVE KKEKDMVFGG Atsyescco MLTGEPIPQQ KGEGDSVHAG Atulsacce LMTGESILVP KKTGFPVIAG A t k b e n t f a A V T G E S K G V K KQVGDSVIGG Atc2sacce LITGESMPVP KKCQSIVVAG Consensus ..TGE..PV ....... V..G
1050 SMNISGTLRV RVSAASEATL L A E I A R L L D N TLNLTGPLTL EATAAARDSF IAEIIGLMEA T L N E E G L I E V KITKYVEDTT ITKIIHLVEE T L N E E G L L E V KITKYVEDTT ISKIIHLVEE T L N E E G L L E V E I T K L V E D T T ISKIIHLVEE TTVLDGRLVI E A T A V G G D T Q FAAMVRLVED TVVLNGRLIV EAAAVGDETQ LAGMVRLVEQ TLNLSHRLVI R A E Q T G S Q T R LAAIVRCVAE SINQNGSLLI CATHVGADTT LSQIVKLVEE SINQNGSLLI CATHVGADTT LSQIVKLVEE SINAHGSVLI KATHVGNDTT LAQIVKLVEE TLNKTGSLTI RATRVGRETF L A Q I V Q L V Q Q TINKTGTFSF KVTKVGANTI LAQIIRLVEE TINTNGLIQI Q V S Q I G K D T V L A Q I I Q M V E D TVVQDGSVLF R A S A V G S H T T L S R I I R M V R Q SVNGPGHFYF R T T T V G E E T K LANIIKVMKE SINGDGTIEI TVTGTGENGY LAKVMEMVRK SVNGTGTLFV KLSKLPGNNT ISTIATMVDE ..N..G ............ T .... I...V..
1051 Ctpbraja A L Q A R S R Y M R L A D R A S R L Y A Fixirhime A E G G R A R Y R R IADRAARYYS Cadastaau A Q G E R A P A Q A FVDKFAKYYT Caddstaau A Q G E R A P A Q A FVDKFAKYYT Cadabacfi A Q G E R A P S Q A FVDKFAKYYT Ctpbmycle A Q V Q K A R V Q H L A D R I A A V F V Ctpamycle A Q Q Q N A N A Q R L A D R I A S V F V Atsysynsp A Q Q R K A P V Q R FADAIAGRFV At7acrigr A Q T S K A P I Q Q FADKLGGYFV A t 7 a h o m s a A Q T S K A P I Q Q FADKLSGYFV A t 7 b h o m s a A Q M S K A P I Q Q LADRFSGYFV Atcssynsp A Q A S K A P I Q R LADQVTGWFV Ctppromi A Q G S K L P I Q A LVDKVTMWFV A t k a e n t f a A Q G S K A P I Q Q IADKISGIFV Atsyescco A Q S S K P E I G Q L A D K I S A V F V Atulsacce A Q L S K A P I Q G YADYLASIFV A t k b e n t f a AQGEKSKLEF LSDKVAKWLF Atc2sacce A K L T K P K I Q N IADKIASYFV C o n s e n s u s A Q ...... Q . . . D .......
ii00 PVVHATALIT ILGWVIA ............. PAVHLLALLT FVGWMLV ............. P I I M V I A A L V A V V P P L F F G G SWDTW ..... P I I M V I A A L V A V V P P L F F G G SWDTW ..... P I I M I I A T L V A I V P P L F F D G SWETW ..... P M V F V I A G L A GASWLLAG ............ PCVFAVAALD ...RCWMA ............ YGVCAIAALT F G F W A T L G S R W W P Q V L Q Q P L PFIVLVSIAT L L V W I I I G F Q NFT ....... PFIVFVSIAT LLVWIVIGFL NFE ....... PFIIIMSTLT LVVWIVIGFI DFG ....... PAVIAIAILT FLLWFNWI ............ PAVMIGATIT FFIWLAFG ............ PIVLFLALVT LLVTGWLT ............ P V V V V I A L V S A A I W Y F F G ............ PGILILAVLT FFIWCFI ............. Y V A L V V G I I A FIAWLFLA ............ P T I I G I T W T FCVWIAVG ............ P .............................
i!!ii!iii:i:i!iiii~i!!
)iii~;~ii}:~ii??i:
}=:iii:!ii~!ili!!i!ii~!i~
Ctpbraja Fixirhime Cadastaau Caddstaau Cadabacfi Ctpbmycle ~:;i)!:~,;,!:'~'i:i:i:i:.Ctpamycle Atsysynsp At7acrigr At7ahomsa At7bhomsa
i~:~:!ii:~:i!i!ii!'.i~
i!i?ii!!)i
ii01 . . . . . . . . . . . . . . . . . . . G A S W H D A I V T G VAVLIITCPC . . . . . . . . . . . . . . . . . . . E G D V R H A M L V A VAVLIITCPC . . . . . . . . . . . . . . . . . . . . . . . . . . VYQG LAVLVVGCPC . . . . . . . . . . . . . . . . . . . . . . . . . . VYQG LAVLVVGCPC . . . . . . . . . . . . . . . . . . . . . . . . . . IYQG LAVLVVGCPC . . . . . . . . . . . . . . . . . ASP D R A F S V V L G . . . V L V I A C P C . . . . . . . . . . . . . . . . . DRR E R T R P S V L G A IAVLVIACPC PGLLIHAPHH GMEMAHPHSH SPLLLALTLA ISVLVVACPC ...IVETYFP GYSRSISRTE TIIRFAFQAS ITVLCIACPC ...IVETYFP GYNRSISRTE TIIRFAFQAS ITVLCIACPC ...VVQKYFP NPNKHISQTE VIIRFAFQTS ITVLCIACPC
1150 ALGLAIPTVQ ALGLAVPVVQ ALVISTPISI ALVITTPISI ALVISTPISI TLGLATPTAM ALGLATPTAM ALGLATPTAI SLGLATPTAV SLGLATPTAV SLGLATPTAV
97
........................
...........
!~::~i!;!:~::!; :!:~::!!!:!'J! ::::.::::.:
......................... .......................
9..:.x.
:::?,:.: :::::::::::::::::::::::::::::::
...............
....... 9.:~ .........................
...............
...........
::::::::::::::::::::::::::::::::
............
:,:,,,,:,::::: :
-
Atcssynsp Ctppromi Atkaentfa Atsyescco Atulsacce Atkbentfa Atc2sacce Consensus
. . . . . . . . . . . . . . . . . . GN ..VTLALITA V G V M I I A C P C A L G L A T P T S I . . . . . . . . . . . . . . . . . . PE P A L T F A L I N A V A V L I I A C P C A M G L A T P T S I . . . . . . . . . . . . . . . . . . KD WQ..LALLHS V S V L V I A C P C A L G L A T P T A I . . . . . . . . . . . . . . . . . . PA P Q I V Y T L V I A TTVLIIACPC ALGLATPMSI ...LNISANP P V A F T A N T K A D N F F I C L Q T A T S V V I V A C P C A L G L A T P T A I ..................... NLPDALERM VTVFIIACPH ALGLAIPLVV ........... IRVEKQSRS D A V I Q A I I Y A ITVLIVSCPC VIGLAVPIVF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VL...CPC .LGLATP...
1151 Ctpbraja TVASGAMFKS FixirhimeVVAAGRLFQG Cadastaau V S A I G N A A K K Caddstaau V S A I G N A A K K Cadabacfi V S A I G N A A K K Ctpbmycle M V A S G R G A Q L Ctpamycle M V A S G R G A Q L Atsysynsp L V A T G L A A E Q At7acrigr M V G T G V G A Q N At7ahomsa MVGTGVGAQN At7bhomsa MVGTGVAAQN Atcssynsp M V G T G K G A E Y Ctppromi M V G T G R A A E L Atkaentfa MVGTGVGAHN Atsyescco ISGVGRAAEF Atulsacce M V G T G V G A Q N Atkbentfa ARSTSIAAKN Atc2sacce V I A S G V A A K R Consensus .... G..A.. Ctpbraja Fixirhime Cadastaau Caddstaau Cadabacfi Ctpbmycle Ctpamycle Atsysynsp At7acrigr At7ahomsa At7bhomsa Atcssynsp Ctppromi Atkaentfa Atsyescco Atulsacce Atkbentfa Atc2sacce Consensus
GVLLNSGDAI ERLAEADHVI FDKTGTLTLP G V M V K D G S A M ERLAEIDTVL L D K T G T L T I G G V L V K G G V Y L E K L G A I K T V A FDKTGTLTKG G V L I K G G V Y L E E L G A I K A I A FDKTGTLTKG G V L V K G G V Y L E E M G A L K A I A FDKTGTLTKG GIFIKGYRAL E T I N A I D T V V F D K T G T L T L G GILLKGHESF E A T R A V D T V V F D K T G T L T T G G I L V R G G D V L E Q L A R I K H F V FDKTGTLTQG GILIKGGEPL EMAHKVKVVVFDKTGTITHG GILIKGGEPL EMAHKVKVVVFDKTGTITHG G I L I K G G K P L EMAHKIKTVM FDKTGTITHG G I L I K S A E S L ELAQTIQTVI L D K T G T L T Q G GILFRKGEAL Q A L R D V S V V A L D K T G T L T K G GILIKGGEAL EGAAHLNSII L D K T G T I T Q G GVLVRDRDAL QRASTLDTVVFDKTGTLTEG G V L I K G G E V L E K F N S I T T F V FDKTGTLTTG GLLLKNRNAMEQANDLDVIM LDKTGTLTQG GVIFKSAESI E V A H N T S H V V F D K T G T L T E G G.L.K ..... E .......... DKTGTLT.G
1200 DLEVMNAADI KPRLVNAHEI VPVVTDFEVL VPVVTDFKVL VPAVTDYNVL QLSVSTVTST QLKVSAVTAA QFELIEIQPL TPVVNQVKVL TPVVNQVKVL VPRVMRVLLL QPSVTDFLAI RPELTDLIP. RPEVTDVIGP KPQVVAVKTF FMVVKKFLKD KFTVTGIEIL KLTWHETVR ...V ......
1201 1250 PA ........ D I F E L A G R L A L S S H H P V A A A V A Q A A G A R S P IV ........ SP ........ G R L A T A A A I A V H S R H P I A V A IQNSAGAASP IA ........ N D . . . Q V E E K ELFSIITALE Y R S Q H P L A S A IMKKAEQDNI PYSNVQV... N D . . . Q V E E K ELFSIITALE Y R S Q H P L A S A IMKKAEQDNI TYSDVRV... N K . . . Q I N E K ELLSIITALE Y R S Q H P L A S A IMKKAEEENI TYSDVQV... G G W . C S G E . . . V L A L A S A V E A A S E H S V A T A IV ...... A A Y A D P R P V . . . P G W . Q A N E . . . V L Q M A A T V E SASEHAVALA IA ...... AS TTHREPV... AD .... VDPD RLLQWAAALE A D S R H P L A T A L Q T . . A A Q A A N L A P I A A . . . VES.NKIPRS KILAIVGTAE S N S E H P L G A A V T K Y C K Q E L D TETLGTC... TES.NRISHH KILAIVGTAE SNSEHPLGTA ITKYCKQELD TETLGTC... GDV.ATLPLR KVLAVVGTAE A S S E H P L G V A V T K Y C K E E L G TETLGYC... G D . . . R D Q Q Q TLLGWAASLE N Y S E H P L A E A IVRY..GEAQ GITLSTV... A E . . . K F E Y N E I L S L V A S I E TYSEHPIAQS I V N A . . A N E A K L T L A S V . . . KE ......... IISLFYSLE H A S E H P L G K A IVAY..GAKV GAKTQPI... A D . . . V D E A Q A.LRLAAALE Q G S S H P L A R A IL .... DKAG DMRLPQV... SNWVGNVDED EVLACIKATE SISDHPVSKA IIRYCDGLNC N K A L N A W L E DE...AYQEE EILKYIGALE A H A N H P L A I G IMNYLKEKKI TPYQAQ .... GDRHNSQ ...... SLLLGLT E G I K H P V S M A IASYLKEKGV SAQNVSNTKA .................... E .S.HP..AI ....................
1251 1300 Ctpbraja G A V E E . A G Q G VRADVDGAE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fixirhime G D I R E I P G A G IEVKTEDGV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
Cadastaau Caddstaau !~iiiiiiiiii@ C a d a b a c f i Ctpbmycle Ctpamycle Atsysynsp At7acrigr ;~::: ======================== A t 7 a h o m s a At7bhomsa Atcssynsp Ctppromi Atkaentfa Atsyescco Atulsacce iiii!~i;f!iiii A t k b e n t f a Atc2sacce Consensus
EEFTSITGRG KDFTSITGRG EDFSSITGKG ADFVAFAGCG ANFRAVPGHG SDRQQVPGLG TDFQVVPGCG IDFQVVPGCG TDFQAVPGCG TDFEAIPGSG DNFEAIPGFG TDFVAHPGAG NGFRTLRGLG SEYVLGKGIV .EQKNLAGVG VTGKRVEGTS ....... G.G
Ctpbraja Fixirhime ~::~iii::ii;,ii@~;ii Cadastaau ~5~272:2=2~22~ Caddstaau ============================== Cadabacfi Ctpbmycle Ctpamycle Atsysynsp At7acrigr At7ahomsa At7bhomsa Atcssynsp Ctppromi Atkaentfa i~i!?~i!iii!~ A t s y e s c c o Atulsacce ;:;'j;:; j'jj Atkbentfa Atc2sacce Consensus
1301 1350 . . . . . . . . . . . . . . . . . . . . . . . I R L G R P S F C G A E A L V G D G T R L D P .... . . . . . . . . . . . . . . . . . . . . . . . Y R L G S R D F .... A V G G S G P D G R Q .... ....................... YYIGSPK LFKELNVSDF SLGFENNVKI ....................... YYIGSPR LFKELNVSDF SLEFENKVKV ....................... YYIGSPK LFKELLTNDF DKDLEQNVTT ....................... VKIGKPSWVTRNA..PC DWLESARRR ....................... VRVGKPS WIASRC..NS TTLV.TARRN ...................... SLRLGNPTWV .......... QVATAKLP TSSSMIIDAP LSNAVDT..Q QYKVLIGNRE WMIRNGL.VI SNDVDDSMID TSSSMIIDAQ ISNALNA..Q QHKVLIGNRE WMIRNGL.VI NNDVNDFMTE PASHLNEAGS LPAEKDAVPQ TFSVLIGNRE WLRRNGL.TI SSDVSDAMTD ...................... WLQIGTQR WLGELGI.ET S.ALQNQWED ...................... SVSVGADR FMKQLGL.DV S.QFASSAQK ...................... HYFAGTRK RLAEMNL.SF D.EFQEQALE ...................... ALLLGNQA LLNEQQV.GT K.AIEAEITA ................... N TYDICIGNEA LILEDAL.KK SGFINSNVDQ ....................... VKIINEK EAKRLGL.KI D...PERLKN . . . . . . . . . . . . . . . . . . . . . L K L Q G G N C R W L G H N N D P D V R K A L E ..... .......................... G .......................
~i!i~i!!!ii!~iiii:il i~:@i:~!ii;::?iii~!!
[ii ;,',i i:i ~; . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
}if!F!!!ii~!!!! N::N~;:::@i~%
...............................
. . . . . . . . . . . . . . . .
...........................
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
..........................
. . . . . . . . . . . . . .
;;~;:~;:;:~;~:~
. . . . . . . . . . . . . . .
~..o
.....................
.........................
::::::::::::::::::::::::: ................ ................. -~-,;-~-~.:t~.~:~::..~:~:~-~
...........................
Ctpbraja Fixirhime Cadastaau Caddstaau Cadabacfi Ctpbmycle Ctpamycle Atsysynsp At7acrigr At7ahomsa At7bhomsa Atcssynsp Ctppromi Atkaentfa
1351 ..... E A S I V ..... S E A I L LQNQGKTAMI LQNQGKTAMI LQNQGKTAMI RRITGETWF AELRGETAVF TGSAAATSIW HGRKGRPAVL HERKGRTAVL HEMKGQTAIL WEAAGKTVVG LGEQGKTPLY LEQAGKTVMF
IKGIVNGTT ............................... IQGNIDGTT ............................... IKGIVNGTT ............................... VSGWAEHH ............................... VSGTVAERA VSGTCDGR ................................ ISCKVTNIEG LLHKSNLKIE ENNTKNASLV QIDAINEQSS ISCKVTNIEG LLHKNNWNIE DNNIKNASLV QIDASNEQSS I G C K V S N V E G I L A H S E R P L . . . . . . . . . . . . . . . . . . . SA VQGQVEGI ................................ VSATVDGR ................................ ISGTINGV ................................ VSGEAEGH ................................ SKCQVNG ................................. LEATVEDKD ............................... YSG ...... ........................................
AFSKGAEKFI SL.DFRELAC IGTEKTILGV IGTDQTILGV IGTEKEILAV VSVDGVACGA VEIDGEQCGV LADDQQLLAC VTIDDELCGL VAVDDELCGL VAIDGVLCGM VAADGHLQAI TAIDGRLAAI LANEEQVLGM
LWVRQGLRPD AQAVIAALKA FRFEDQPRPA SRESIEALGR IAVADEVRET SKNVIQKLHQ IAVADEVRET SKNVILKLHQ IAVADEVRES SKEILQKLHQ VAIADTVKDSAADAISALCS IAVADAVKASAADAVAALHD FWLQDQPRPEAAEVVQALRS IAIADTVKPE AELAVHILKS IAIADTVKPE AELAIHILKS IAIADAVKQEAALAVHTLQS LSIADQLKPS SVAWRSLQR IAVADPIKET TPEAIKALHA IAVADQIKED AKQAIEQLQQ
1400 RNI.GIEILS LGI.ATGILS LGIKQTIMLT LGIKQTIMLT LGIKKTIMLT RGL.HTILLT RGF.RTALLT RGA.TVQILS MGL.EVVLMT MGL.EVVLMT MGV.DVVLIT LGL.QWMLT LGL.KVAMIT KGV.DVFMVT
99
Heavy metal-transporting ATPase f a m i l y Atsyescco QASQGATPVL LAVDGKAVAL LAVRDPLRSD SVAALQRLHK AGY.RLVMLT A t u l s a c c e .... G N T V S Y V S V N G H V F G L F E I N D E V K H D S Y A T V Q Y L Q R N G Y . E T Y M I T Atkbentfa YEAQGNTVSF LVVSDKLVAV IALGDVIKPE AKEFIQAIKE KNI.IPVMLT At c 2 s a c c e . . . Q G Y S V F C F S V N G S V T A V Y A L E D S L R A D A V S T I N L L R Q RGI. S L H I L S C o n s e n s u s .... G.T . . . . . . . . . . . . . . . . . D . . . . . . . . . . . . L . . . G ....... T
}~i 9 .!;i:;:~77 !i!~- :~:: i! :J
}?i:~. : ::-
:"2:."!-:..!/i.:: :
.
:-..-s..:;: -
;i!2:1 .
~i-
. ::.: ..::
....
..:..: <:....
:: ;.~::~ :..~::: :.
:ii!i!!i/iiSH
10(
1401 1450 Ctpbraja GDREPAVKAAAHALAI..PE W R A G V T P A D K I A R I E E L . . . . . . . . KRRG. F i x i r h i m e G D R A P V V A A L A S S L G I . . S N W Y A E L S P R E K V Q V C A A A . . . . . . . . AEAG. C a d a s t a a u G D N Q G T A N A I G T H V G V . . S D I Q S E L M P Q D K L D Y I K K M . . . . . . . . QSE.. C a d d s t a a u G D N Q G T A E A I G A H V G V . . S D I Q S E L L P Q D K L D Y I K K M . . . . . . . . KAE.. C a d a b a c f i G D N K G T A N A I G G Q V G V . . S D I E A E L M P Q D K L D F I K Q L . . . . . . . . RSE.. Ctpbmycle GDNQAAARAVAAQVGI..DT V I A D M L P E A K V D V I Q R L . . . . . . . . RDQG. C t p a m y c l e G D N P A S A A A V A S R I G I . . D E V I A D I L P E D K V D V I E Q L . . . . . . . . RDRG. Atsysynsp GDRQTTAVAL AQQLGLESETVVAEVLPEDKAAAIAAL . . . . . . . . QSQG. A t 7 a c r i g r G D N S K T A R S I A S Q V G I . . T K V F A E V L P S H K V A K V K Q L . . . . . . . . QEEG. A t 7 a h o m s a G D N S K T A R S I A S Q V G I . . T K V F A E V L P S H K V A K V K Q L . . . . . . . . QEEG. A t 7 b h o m s a G D N R K T A R A I A T Q V G I . . N K V F A E V L P S H K V A K V Q E L . . . . . . . . QNKG. A t c s s y n s p G D N R R T A D A I A Q A V G I . . T Q V L A E V R P D Q K A A Q V A Q L . . . . . . . . QSRG. Ctppromi G D N K A T A K A I A K Q L G I . . D E I V A E V L P D G K V A A L K Q L . . . . . . . . SQKG. A t k a e n t f a G D N Q R A A Q A I G K Q V G I D S D H I F A E V L P E E K A N Y V E K L . . . . . . . . QKAG. A t s y e s c c o G D N P T T A N A I A K E A G I . . D E V I A G V L P D G K A E A I K H L . . . . . . . . QSEG. Atulsacce GDNNSAAKRV AREVGISFEN VYSDVSPTGK CDLVKKI ........ QDKEG A t k b e n t f a G D N P K A A Q A V A E Y L G I . . N E Y Y G G L L P D D K E A I V Q R Y . . . . . . . . LDQG. Atc2sacce GDDDGAVRSMAARLGIESSN IRSHATPAEK SEYIKDIVEG RNCDSSSQSK ConsensusGDN...A.A.A...GI .......... P..K .................... Ctpbraja Fixirhime Cadastaau Caddstaau Cadabacfi Ctpbmycle Ctpamycle Atsysynsp At7acrigr At7ahomsa At7bhomsa Atcssynsp Ctppromi Atkaentfa Atsyescco Atulsacce Atkbentfa Atc2sacce Consensus
1451 .ARVLMVGDG MNDAPSLAAAHVSMS.PISA .HKALVVGDG INDAPVLRAAHVSMA.PATA YDNVAMIGDG VNDAPALAAS TVGIAMGGAG HGNVAMIGDG VNDAPALAAS TVGIAMGGAG YGNVAMVGDG VNDAPALAAS TVGIAMGGAG .HTVAMVGDG INDGPALACA DLGLAM.GRG .HVVAMVGDG INDGPALARA DLGMAI.GRG .DAVAMIGDG INDAPALATA AVGISL.AAG .KRVAMVGDG INDSPALAMA NVGIAI.GTG .KRVAMVGDG INDSPALAMA NVGIAI.GTG .KKVAMVGDG VNDSPALAQA DMGVAI.GTG .QVVAMVGDG INDAPALAQA DVGIAI.GTG .DKVAFVGDG INDAPALAQA DVGLAI.GTG .KKVGMVGDG INDAPALRLA DVGIAM.GSG .RQVAMVGDG INDAPALAQA DVGIAM.GGG NNKVAVVGDG INDAPALALS DLGIAI.STG .KKVIMVGDG INDAPSLARA TIGMAI.GAG RPVVVFCGDG TNDAIGLTQA TIGVHI.NEG ...V.MVGDG.ND.PALA.A ..G.A .... G
1500 A H L S Q A T A D L V F L ...... G A D V G R Q A A D F V F M ...... H T D T A I E T A D I A L M ...... G T D T A I E T A D I A L M ...... G T D T A L E T A D V A L M ...... G T D V A I G A A D L ILV ...... R T D V A I G A A D I ILV ...... R S D I A Q D S A G L LLS ...... R T D V T I E A A D V VFI ...... R T D V A I E A A D V VLI ...... R T D V A I E A A D V VLI ...... R T D V A I A A S D I TLI ...... S T D V A I E A A D V V L M ...... S T D I A M E T A D V TLM ...... N S D V A I E T A A I TLM ...... R TEIAIEAADI VILCGNDLNT T D I A I D S A D V V L T ...... N S E V A K L A A D V V M L ...... K .D.A...AD ...........
Ctpbraja Fixirhime Cadastaau Caddstaau Cadabacfi
1501 RPLAPVAAAI ERLSAVPFAI DDLSKLPFAV DDLSKLPFAV DDLRKLPSTV
NVLAVPVAIS NVIAVPIAIL KIIALLLVIP KIIALLLVIP KFIASLLVIP
DSARKALHLM ETSRHAGQLI RLSRKTLNII RLSRKTLNII KLSRKTLNII
RQNLWLAIGY RQNFALAIGY KANITFAIGI KANITFAIGI KANITFAIAI
1550 GV ....... V GY ....... A GWLTLWIAIL GWLTLWIAIL GWLTLWIAIL
i!i!!iii~iii !i!i~i!iii C t p b m y c l e Ctpamycle i'!~!~i:~i}i~' :il}:i:'!!i:f A t s y s y n s p iii::::ii~i~iii:!iiii~::i:i!::, i{{iiii !.;.;..... A t 7 a c r i g r !ii~ii:iiii!;i!!:i!! A t 7 a h o m s a At7bhomsa ii!i!ifi~!!ii!~ii:'i~il A t c s s y n s p i!~ifi?!i!i!~ii!~ii: C t p p r o m i :!i!i~:ii!i'~i~=!~:iAlj: t k a e n t f a Atsyescco :::::::::::::::::::::::::::i::i~::::::{ i.ilf[:h:,!~!i'~i'~ A t u l s a c c e :::::::::::::::::::::::::: ::::::::g::i~ A t k b e n t f a Atc2sacce Consensus i)!~}~!ii:ii:~::i.li:i::~;!
DSLGWPVAL DLARATMRTI DNLDVVPITL DLAAATMRTI DRLDSVLVAW NLSQMGLRTI NDLLDWASI DLSRKTVKRI NDLLDVVASI DLSRKTVKRI NDLLDVVASI HLSKRTVRRI GDLQGIVTAI QLSRATMTNI GDLRGWDAI ALSQATIRNI SHLTSINQMI SLSAATLKKI HSLMGVADAL AISRATLHNM NSLRGLANAI DISLKTFKRI SDPKDILHFL ELAKETRRKM PKLNNILTMI TVSQKAMFRV ..L . . . . . . . . . S . . T . . . I
R I N M I W A F G Y N V A A I P I A S S GL ....... L K F N M V W A F G Y N I A A I P I A A A G L ....... L RQNLTWALGY NVVMLPLAAG AFLPAYGLAL RINFLFPLIY NLVGIPIAAG VFLPI.GLVF RINFVFALIY NLVGIPIAAG VFMPI.GLVL RINLVLALIY NLVGIPIAAG VFMPI.GIVL RQNLFFAFIY NVAGIPIAAG ILYPLLGWLL KQNLFWTFAY NALLIPVAAG MLYPINGMLL KQNLFWAFIY NTIGIPFAA ...... FG.FL KQNLLGAFIY NSIGIPVAAG ILWPFTGTLL KLNLFWALCY NIFMIPIAMG VLIP.WGITL IQNLWWGAGY NIIAIPLAAG ILAPI.GLIL KLNFLWSFTY NLFAILLAAG AFV.. DFHI . . N . . . A . . Y N .... P . A . . . . . . . . . . . L
1551 TPLIAAAAMS GSSILVMLNS TPLVAAVAMS SSSLVVVFNA SDMGA ....... TILVALNS SDMGA ....... TILVALNS SDMGA ....... TLLVALNG NPLIAGAAMA FSSFFWSNS NPLVAGAAMA FSSFFWSNS TPAIAGACMA VSSLAWSNS QPWMGSAAMA ASSVSWLSS QPWMGSAAMA ASSVSWLSS QPWMGSAAMA ASSVSWLSS SPMLAGAAMA FSSVSVVTNA SPIFAAAAMA LSSVFVLGNA NPIIAGGAMA FSSISVLLNS NPVVAGAAMA LSSITVVSNA PPMLAGLAMA FSSVSVVLSS SPAVGAVLMS LSTVVVALNA PPEYAGLGEL VSILPVIFVA . P . . A . . . M . . S . . . V ....
1600 LR ....... A R S D S R E I V . . . . . . . . . . . . L R L K R S L A A G R G A T P G T L I H S G A V T S .... LRLMRVKDK ..................... LRLMRVKDK ..................... LRLMRVKE ...................... LRLSNFGLSQ TSD ................. LRLRNFGAIL SCGTSRHRTV KRWRCPPPTR LLLRYWFRRS LNHSVSV ............. LFLKLYRKPT YDNYELRTRS HTGQRSPSEI LFLKLYRKPT YESYELPARS QIGQKSPSEI LQLKCYKKPD LERYEAQAHG HMKPLTASQV LRLRQFQPR ..................... LRLKRFQAPM KTH ................. LSLNRKTIK ..................... NRLLRFKPKE .................... LMLKKWTPPD IESHGISDFK SKFSIGNFWS LTLK .......................... ILLRYAKI ...................... L.L ...........................
;::ii!i::ii:l[i!;iiii!
!!?ii!:,'~~: i~:i~i:
Ctpbraja Fixirhime i!i!}iii~i!i Cadastaau Caddstaau ii~li:iiii:i::iiiii:ii C a d a b a c f i iis:!!ii:i.lili!}i!:=<::i21~!:~il: ?ii:'i[~::i-~:~! ~.::i!!!! Ctpbmycle ................ . ....... Ctpamycle Atsysynsp At7acrigr At7ahomsa At7bhomsa ;.i!i~!~~,i;i:.?;ii.ilhi :2ii~;i !~i@iiii,.,!:~,ii:i A t c s s y n s p Ctppromi Atkaentfa ~i............ iAi4':'j:,:~!~!!?A t s y e s c c o Atulsacce Atkbentfa ii@~i!$!}ii!iil A t c 2 s a c c e Consensus ::::::::::::::::::::::::: ::::::::~:::::::::
!!!!ili!!iii;!Ci!tilpi:aiim y c l e
At7acrigr At7ahomsa At7bhomsa Atulsacce i;~,!i~!?i~i!:~i!:'?i~:~i:i! i',ii',i[%!ii!~ C o n s e n s u s :::::::::::::::::::::::
i::::i!!{!:.:-i:!~i.i:.~::ii.:.:-! :.:i
!=i!iiiii= i
1601 1650 LRSTACSPVD ASPLRPVAHR TGVKPPTHR ..................... SVHVGIDDAS RNSPRLGLLD RIVNYSRASI NSLLSDKRSL NS.VVNSEPD SVHVGIDDTS RNSPKLGLLD RIVNYSRASI NSLLSDKRSL NS.WTSEPD SVHIGMDDRW RDSPRATPWD QVSYVSQVSL SSLTSDKPSR HSAAADDDGD RLFSTRAIAG EQDIESQAGL MSNEEVL ....................... .................................................. 1651
1667
i!!!fli:i!ii;'~:j',:ji!:~i:' A t 7 a c r i g r KHS . . . . . . . ._. . . . . !i~iiiii:i!% At 7 a h o m s a K H S L L V G D F R E D D D T A L ....:......:... !',!i',';!%,':}:! At 7 b h o m s a K W S L L L N G R D E E Q Y I . . Consensus
....... i);i!:~ii:ii:::~!::i:i:i I{li::::::i~
!ifiiiiiiii%
.................
R e s i d u e s l i s t e d in t h e c o n s e n s u s s e q u e n c e are p r e s e n t in at least 75 % of t h e a l i g n e d t r a n s p o r t e r s e q u e n c e s . R e s i d u e s i n d i c a t e d in b o l d f a c e t y p e are also c o n s e r v e d in at least o n e o t h e r f a m i l y of t h e P - t y p e A T P a s e s .
101
:::-:.-~:~:.~.~:::::-.~::ii::::?~i:i!,iili~i::!:i::::ii
.~:.~:-~:.., ,.:,...::,,~ ~:~-~::~::.:..:-.:.~::~t. :.:.:.::.:.:.%,:.~:.~:~ :~:,-. .......
:::::::::::::::::::::::::::::::::::::
:::.::-:-;,:::: ,:.::;:.,::.::,., ....
:::::::::::::::::::::::::: ...... !i!:!ii!~i!:.:::i~i~!!:::!:ii!i
.=..: ......
Database accession numbers SWISSPR OT At 7ahomsa Q04656 At7bhomsa P35670 Atc2sacce P38360 Atcssynsp P3 7279 Atkaentfa P32113 Atkbentfa P05425 Atsyescco Atsysynsp P37385 Atul sacce P38995 Cadabacfi P30336 Cadastaau P20021 Caddstaau P3 7386 Ctpbraja Ctppromi Ctpamycle P46839 Ctpbmycle P46840 Fixirhime P18398
PIR $3 7287
$46177 $36741 A45995 A29576; B45995 $48298 D42707 A32561
C32052; $39994
EMBL/GENBANK L06133; G 179253 U11700; G551502 Z29332; G547580 D 16437; G435125 L 13292; G290642 L 13292; G290643 U58330 U04356; G436954 L36317; G538515 M90750; G143753 J04551; G 150719 L 10909; G 152978 X95634 U42410 Z46257; G559907 Z46257; G559912 M24144; Z21854
Refe/'ences
~!:i~:ii!~;~!ili~!,
102
1 0 d e r m a t t , A. et al. {1993} J. Biol. Chem. 268, 12775-12779. 2 Vulpe, C. et al. {1993} Nature Genet. 3, 7-13. a K a Y , D. et al. {1989} J. Bactenol. 1 7 1 , 9 2 9 - 9 3 9 . 4 Petmkhin, K. et al. (1993) Nature Genet. 5, 3 3 8 - 3 4 3 . s Green, N.M. and MaeLennan, D.H. {1989} Bioehem. Soe. Trans. 17, 819-822; Green, N.M. {1989} Bioehem. Soe. Trans. 17, 9 7 0 - 9 7 2 . 6 Fagan, M.J. and Saier, M.H. Jr. (1994) J. Mol. Evol. 38, 5 7 - 9 9 . r Bull, P.C. and Cox, D.W. {1994} Trends Genet. 10, 246-252.
Vacuolar ATPases
Vacuolar ATPase family Summary
::.5:::':::::9 : :":
::::::::::::::::::::::::::::::::::::::.
.........
i~!i:i;;!ii!iii~?:~i;i:~i!.: ..............:::: :::::::::::::::::::::::::::::::::::::::::::::::::: .......
. . . . . . . . .
.......... ...:.:.:..
Transporters of the vacuolar ATPase family, examples of which are vacuolar ATPase and vacuolar proton pump subunits from humans t (Vphlhomsa), rodents ~ (Vpplratno)and yeast a (Stvlsacce), mediate proton transport by ATPase (H+-transporting ATP synthase; EC 3.6.1.34) activity. Other members of the vacuolar ATPase family include the mouse immune suppression factor TJ6 ~ (Tj6musmu). This ATPase subunit is required for assembly as well as for ATPase activity a. Members of the vacuolar ATPase family have only been found in eukaryotes. Statistical analysis of multiple amino acid sequence comparisons reveals no apparent relationship between these transporters and any other ATPase or transporter family. Members of the vacuolar ATPase family contain two domains: a hydrophilic N-terminal domain containing many charged residues, and a hydrophobic C-terminal domain. The hydrophobic domain is predicted to contain six transmembrane helices by the hydropathy of amino acid sequences. Unusual for any transporter family, both the N-terminal domain and the C-terminus are predicted to be extracellular. They are also known to be glycosylated. Many amino acids, and several long sequence motifs, are conserved throughout this family. These conserved sequence motifs are more prevalent in the hydrophobic, membrane-spanning C-terminal domain of the proteins.
Nomenclature, biological sources and substrates CODE
DESCRIPTION [SYNONYMS]
Stvlsacce
VacuolarATP synthase 101 kDa subunit [V-ATPase subunit AC115, STV1, YMR054W, YM9796.07] Immune suppressor factor j6b7 Vacuolar proton pump subunit [OC-116 kDa, VPP1] VacuolarATPase 98 kDa subunit [VPH1] VacuolarATP synthase 95.5 kDa subunit [VPH1, YOR270C] Putative clathrin-coated vesicle/synaptic vesicle proton pump subunit [ZK637.81 Clathrin-coated vesicle/ synaptic vesicle proton pump 116 kDa subunit
. . . . . . . .
::::::::::::::::::::::::::::::: ~:~:=========================: .:::..:. !~!?.!!~i::!::!:i: ~!-:~!!:-!
~:~i:~.-:~::~:i~i ~ .:~
Tj6musmu Vphlhomsa Vphlneucr Vphlsacce
:::::::::::::::::::..:::,:~::
Vpplcaeel
Vpplratno
104
ORGANISM [ C O M M O N NAMES] Saccharomyces cerevisiae
SUBSTRATE(S)
H§
[yeast] Mus musculus
H§
[mouse] H o m o sapiens
H§
[human] Neurospora crassa
H§
[mold] Saccharomyces cerevisiae
H§
[yeast] Caenorhabditis elegans
H+
[nematode] R attus norvegicus
[rat]
H§
i:!if:i,iii~!!:iiii!i!!ii: 'iii~i~iii
P h y l o g e n e t i c tree "r
:=:..:~ .......................
i ~!,~: ~i ~i!?:: ,~~,i~,~.i i !i~:i il):ii!~::i!i:::!~::i~~::ii::~i i!iiii.:i
Vpplratno
~.ji~iiiiii~iiiiii~!i~!ii: i!!'i!i< ............................. :: ................................ !:i::i.::::.%.].!i ~::]::i:::: .::i:~ii.!!
Tj6musmu
: i:j? i]i:i:/~!i-i!i:j~. ]]-!~!~i
VDhlhomsa : ~:~.:~-?~::~:~.::.:,:.~::::~::
....::::::::::::::::::::: ....
Stv!sacce
j,i<~,i-~,-i!!i::~:: i%i',-ii]
i~i~,,iI::~!I:~,I!C~?:r i::]
's
~i%ii~iii!iiii/iii::~i
::~-ii::::::::::::::::::::::: }i ]~]]:~. . ..................... ..
! :.::~i:::::::::::::::::::::::::::::::::: ~::i ::.......... !%1!ij!::ii:.;<:ki!! .:: ::.::.::: :.<: ...... i!..ilii~iii)i:j:i:,ii~!:~;;il;.
=========================== ....
:::: ....................... : :::::::::::::::::::::::::::::::::::::: :::: ..... .:: :.::._:: .:=<
",/phlneucr
P r o p o s e d o r i e n t a t i o n of V P H 1 in t h e m e m b r a n e The model is based on predictions of membrane-spanning regions and ~-helical content. The N-terminus of the protein is illustrated on the outside and is folded six times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75% of the aligned transporters (see below) are shown.
10~
i!ii~i~!ii:i!.!i:;,!.;).~i
OUTSIDE
ilG:i:i~i~f,i :iAi ~:i!i i!~!i:ii :".i ilil :i:i~Q.)::~(,I~.ii: :1 ~!@:i!!i::iii!:.:i.i;:
'~!', i!!~r :::::::::::::::::::::::::::: :;-.Ii::-
ii;i@i!i
NMF L
!@)F:ii:i!',ii!
NH
: ..+.:. ::...::....:::...:
!):Yi~: :::)~i!.!i:::i~:i:.ii;
L F
!!i!i~,)AGi:,.:.!-.i{:
R
i:i~i':;i!!iii ~iF.,ii! i!~I~I@:S:~!:I !~i~ii@i!ii',:i i~i',iii!:!i!i:il
P
S
iii:iii:,i::ii!ii.iii;i!i:i i:iiii!iii?,i!! !!iiiii',i:!',i~.i!ii
ii!ii ~ii',iii,,'~;,)i:iii; i;i'Ri;',~:i!!ii i~!iii~i!!ii@ ................... ...............
!!~!::,~i;!:.!Fiii!! I;GIG::~d
G
G
I
I
R
................
!i~iiiiiiii~i:!ii~ ::iii!i!::i;~-~ili]:id:i::: ~i:::~i~:.~::::::::::::::::::::::::::::: :::::::::::::::::::::::::::::::::::::::::
i~!i:i#!:~ii@,i
I
=
L
R
L W
E G
V .................. 9 :..:::::::: :-::::
.
F
E
iiFf:.i,i!}~:;~:i.iii!:
<
R
F
R
I I
:::::::::::::::::::::::::::::::
................ >~-~-.~-:-~v.~-:-~.:-~.~.,.
.......... ~+,+>.+>: ~2~2~i':i!i:::d!~:!ii!
!iii#@i INSIDE
.................... ~-~4.;:~:~;.#:~:~;.;.:~:~:
Physical and genetic characteristics
............................. ..................................... ...................
.... . ~ + : + , . + . , : : . + : ~ . :
.................. ::::::::::::::::::::::::::
!!:.:(!(!:E';(!::!!::?I:!!II .................. -::: ::-:-::-:::::.::::
10(
9
Stvl sacce Tj6musmu Vphlhomsa Vphlneucr Vphlsacce Vpp 1caeel Vpp 1ratno
AMINO ACIDS 890 855 829 856 840 1030 838
MOL. WT
101 660 98 048 93 011 97 992 95 528 117 544 96 327
EXPRESSION SITES
CHROMOSOMAL LOCUS ADH3 to centromere
thymus 17q21 Chromosome 15 ZK63 7.8 brain
Multiple amino acid sequence alignments 1
50
Vpplcaeel MGDYVTPGEE PPQPGIYRSE QMCLAQLYLQ SDASYQCVAE Vpplratno ............ MGELFRSE EMTLAQLFLQ SEAAYCCVSE Tj6musmu ............ MGSLFRSE SMCLAQLFLQ SGTAYECLSA Vphlhomsa ............ MGSMFRSE EVALVQLFLP TAAAYTCVSR S t v l s a c c e ......... M N Q E E A I F R S A D M T Y V Q L Y I P L E V I R E V T F L V p h l s a c c e ........ MA E K E E A I F R S A E M A L V Q F Y I P Q E I S R D S A Y T V p h l n e u c r ........ MA P K Q D T P F R S A D M S M V Q L Y I S N E I G R E V C N A Consensus ................ FRS..M...QL ............. Vpplcaeel Vpplratno Tj6musmu Vphlhomsa Stvlsacce Vphlsacce Vphlneucr Consensus
51 DLNPDVSSFQ DLNPDVNVFQ DLNQNVSSFQ DLNASVSAFQ DLNKDLTAFQ DLNSKVRAFQ DLNSELSAFQ DLN ..... FQ
LGELGLVQFR LEELGKVQFR LGEKGLVQFR LGELGLVEFR LGKMSVFMVM LGQLGLVQFR LGELGLVHFR LG..G.V.FR
i00 R K Y V N E V R R C D E M E R K L R Y L E R E I K K D Q I P M ......... R K F V N E V R R C E E M D R K L R F V E K E I R K A N I P I ......... R K F V G E V K R C E E L E R I L V Y L V Q E I T R A D I P L ......... R R F V V D V W R C E E L E K T F T F L Q E E V R R A G L V L ......... RGYVNQLRRF DEVERMVGFL NEWEKHAAE TWKYILHIDD R T F V N E I R R L D N V E R Q Y R Y F Y S L L K K H D I K LY ....... E R A F T Q D I R R L D N V E R Q L R Y F H S Q M E K A G I P L R K F ..... D R . . V .... R .... ER . . . . . . . . . . . . . . . . . . . . . . . . .
i01 V p p l c a e e l .......... L D T G E N P D A P V p p l r a t n o .......... M D T G E N P E V P Tj6musmu .......... P E G E A S P P A P V p h l h o m s a .......... P P P K G R L P A P Stvlsacce EGNDIAQPDM ADLINTMEPL Vphlsacce GDTDKYLDGS GELY...VPP Vphlneucr PDVDI ........... LTPP Consensus ................... p
150 LPREMIDLEA TFEKLENELR EVNKNEETLK FPRDMIDLEA NFEKIENELK EINTNQEALK PLKHVLEMQE QLQKLEVELR EVTKNKEKLR PPRDLLRIQE ETERLAQELR DVRGNQQALR SLENVNDMVK EITDCESRAR QLDESLDSLR SGSVIDDYVR NASYLEERLI QMEDATDQIE TTTEIDELAE RAQTLEQRVS SLNESYETLK ............... E ..............
Vpplcaeel Vpplratno Tj6musmu Vphlhomsa Stvlsacce Vphlsacce Vphlneucr Consensus
151 200 KNFSELTELK HILRKTQTFF EEVDHDRWRI LEGGSGRRGR STEREETRPL RNFLELTELK FILRKTQQFF DEMADP..DL LEESSS ............. L KNLLELVEYT HMLRVTKTFL KRNVEFEPTY EEFPAL ......... ENDSL AQLHQLQLHA AVLRQG ..... HEPQLAAAH TD.GAS ......... ERTPL SKLNDLLEQR QVIFECSKFI EVNPGIAGRA TNPEIEQEER DVDEFRMTPD VQKND.LEQY RFILQSG ......................... DEFFLKGD KREVELTEWRWVLREAGGFF DRAHG ............... NVEEIRASTD ..... L.E . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Vpplcaeel Vpplratno Tj6musmu Vphlhomsa Stvlsacce Vphlsacce Vphlneucr Consensus
201 250 IDIGDMDDDSAARMSAQAAMLRLGYVVLGK MDRPESATIA KRDLVYVVLF LEPNEM ......... GRGAP LRLG .......................... LDYSCMQ ........................................... LQAPGGP ........................................... DISETLSDAF SFDDETPQDR GALG .......................... N T D S T ..... S Y M D E D M I D A N G E N . . . . . . . . . . . . . . . . . . . . . . . . . . NDDAPL .......... LQDV EQHN .......................... ..................................................
251 300 Vpplcaeel VSFSFCIPLV FFPDSFLHED MIASSAESSG IGEVLSADEE ELSGRFSDAM Vpplratno ..................................................
10~
Vacuolar
ATPase
Tj6musmu
!:iiiiiii!! V p h l h o m s a ....
Stvlsacce Vphlsacce Vphlneucr Consensus
iiiii!iii!:!.!ilii!ii:
i i ':i:ii!ii :;i i:
[alnily
.................................................. .................................................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NDLTR N Q S V E D L S F L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IAAAI GASVN ..... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TAADV E R S F S G M N I G ..................................................
Vpplcaeel Vpplratno Tj6musmu Vphlhomsa Stvlsacce Vphlsacce Vphlneucr Consensus
301 SPLKLQLRFV ........ FV .RLGAKLGFV .HQDLRVNFV EQGYQHRYMI ........ YV ........ FV ......... V
AGVIQRERLP AFERLLWRAC AGVINRERIP TFERMLWRVC SGLIQQGRVE AFERMLWRAC A G A V E P H K A P ALERLLWRAC T G S I R R T K V D ILNRILWRLL T G V I A R D K V A TLEQILWRVL AGVIGRDRVD AFERILWRTL .G.I ........ ER.LWR..
9 Vpplcaeel Vpplratno Tj6musmu Vphlhomsa Stvlsacce Vphlsacce Vphlneucr Consensus
351 GDPVNKCVFI GDYVHKSVFI GEVIKWYVFL GEPATWMTFL K..VEKDCFI REYKHKNAFI NEPVLKNVFV ........ F.
400 IFFQGDHLKT KVKKICEGFR ATLYPCPDTP Q E R R E M S I G V IFFQGDQLKN R V K K I C E G F R A S L Y P C P E T P Q E R K E M A S G V ISFWGEQIGH KVKKICDCYH CHIYPYPNTA EERREIQEGL ISYWGEQIGQ KIRKITDCFH CHVFPFLQQE EARLGALQQL IFTHGETLLK KVKRVIDSLN G K . . . I V S L N TRSSELVDTL VFSHGDLIIK RIRKIAESLDANLYDVDSSN EGRSQQLAKV IFAHGKEILA KIRRISESMG AEVYNVDEHS D L R R D Q V H E V I...G ......... I . . . . . . . . . . . . . . . . . R .......
RGNVFLRTSE RGNVFLRQAE KGYTIVTYAE RGFLIASFRE RGNLIFQNFP RGNLFFKTVE RGNLYMNQAE RG ....... E
401
350 IDDVLNDTVT IENPLEDPVT LDECLEDPET LEQPLEHPVT IEEPLLEGKE IEQPVYDVKT IPEPLIDPTI .... L .....
450
ii!!iiiii~!:~.:~ii Vpplcaeel M T R I E D L K T V L G Q T Q D H R H R V L V A A S K N V R M W L T K V R K I K SIYHTLNLFN :: 7-:::.- : ; .~ :~i:i-
~i'i"ii): i:i !i::i
..::+. ........
Vpplratno Tj6musmu Vphlhomsa Stvlsacce Vphlsacce Vphlneucr Consensus
NTRIDDLQMV LNQTEDHRQR VLQAAAKNIRVWFIKVRKMK N T R I Q D L Y T V L H K T E D Y L R Q VLCKAAESVC SRVVQVRKMK Q Q Q S Q E L Q E V L G E T E R F L S Q V L G R V L Q L L P PGQVQVHKMK NRQIDDLQRI L D T T E Q T L H T E L L V I H D Q L P V W S A M T K R E K N K N L S D L Y T V LKTTSTTLES ELYAIAKELD SWFQDVTREK N A R L E D V Q N V L R N T Q Q T L E A ELAQISQSLS A W M I T I S K E K ..... D L . . V L . . T ....... L . . . . . . . . . . . . . . . . . K
Vpplcaeel Vpplratno Tj6musmu Vphlhomsa Stvlsacce Vphlsacce Vphlneucr Consensus
451 IDVTQKCLIA IDVTQKCLIA FDVTNKCLIA VSTTHKCLIA FQQESQGLIA YDTNRKILIA YDRARRTLIA ....... LIA
500 EVWCPIAELD R I K M A L K R G T DESGSQVPSI LNRMETNEAP E V W C P V T D L D SIQFALRRGT EHSGSTVPSI LNRMQTNQTP EVWCPEVDLP GLRRALEEGS RESGATIPSF MNTIPTKETP EAWCSVRDLP ALQEALRDSS M E E G . . V S A V AHRIPCRDMP EGWVPSTELI HLQDSLKDYI E T L G S E Y S T V FNVILTNKLP E G W I P R D E L A TLQARLGEMI ARLGIDVPSI IQVLDTNHTP EGWCPTNDLP L I R S T L Q D V N N R A G L S V P S I INEIRTNKTP E.W.P...L ...... L ....... G ........... T...P
Vpplcaeel Vpplratno Tj6musmu Vphlhomsa Stvlsacce Vphlsacce
501 PTYNKTNKFT PTYNKTNKFT PTLIRTNKFT PTLIRTNRFT PTYHRTNKFT PTFHRTNKFT
KGFQNIVDAY HGFQNIVDAY EGFQNIVDAY ASFQGIVDRY QAFQSIVDAY AGFQSICDCY
i)':.:,:::-:: 5:
~!;:.;i:!!. ~(ii!iiii.. ,
~,i~;iii!Cii:.i: ...... _.......
AIYHTLNLCN AIYHMLNMCS AVYLALNQCS YVYTTLNK.. AIFEILNKSN AVYNTLNLFS ..Y..LN...
.
i:i/i~i-
108
GIATYREINP GIGTYREINP GVGSYREVNP GVGRYQEVNP GIATYKEINA GIAQYREINA
APYTMISFPF APYTVITFPF ALFTIITFPF APYTIITFPF GLATVVTFPF GLPTIVTFPF
550 LFAVMFGDMG LFAVMFGDFG LFAVMFGDFG LFAVMFGDVG MFAIMFGDMG MFAIMFGDMG
Vphlneucr PTYLKTNKFT EAFQTIVNAY GTATYQEVNP AIPVIVTFPF LFAVMFGDFG ConsensusPT...TNKFT . . F Q . I V D . Y G . . . Y . E . N .... T . . T F P F . F A . M F G D . G 551 600 Vpplcaeel HGAIMLLAAL FFILKEKQLEAARIKDEIFQ TFFGGRYVIF LMGAFSIYTG Vpplratno HGILMTLFAV WMVLRESRIL SQKNENEMFS MVFSGRYIIL LMGLFSIYTG Tj6musmu HGFVMFLFAL LLVLNENHPR LSQSQ.EILR MFFDGRYILL LMGLFSVYTG Vphlhomsa HGLLMFLFALAMVLAENRPA VKAAQNEIWQ TFFRGRYLLL LMGLFSIYTG Stvlsacce HGFILFLMAL FLVLNERKFG .AMHRDEIFD MAFTGRYVLL LMGAFSVYTG Vphlsacce HGFLMTLAAL SLVLNEKKIN .KMKRGEIFD MAFTGRYIIL LMGVFSMYTG Vphlneucr HALIMLCAAL AMIYWEKPLK .KVTF.ELFA MVFYGRYIVL VMAVFSVYTG ConsensusHG..M.L.AL ...L.E . . . . . . . . . . E . . . . . F . G R Y . . L L M G . F S . Y T G Vpplcaeel Vpplratno Tj6musmu Vphlhomsa Stvlsacce Vphlsacce Vphlneucr Consensus
601 650 F M Y N D V F S K S I N T F G S S W . . . . . . . . . QNT I P E S V I D Y Y L D D E K R S E S Q L L I Y N D C F S K S L N I F G S S W . . . . . . . . . . SV R P M F T I G N W T E E T L L G S S V L LIYNDCFSKS VNLFGSGWNV CAMYSSSHSP EEQRKMVLWNDSTIRHSRTL F I Y N E C F S R A T S I F P S G W S V A A M A N Q S G . . . . . . . . . . WS D A F L A Q H T M L LLYNDIFSKS MTIFKSGWQW ..PSTFRKG .................... E FLYNDIFSKT MTIFKSGWKW ..PDHWKKG .................... E LIYNDVFSKS MTLFDSQWKWVVPENFKEG .................... M . . Y N D . F S K .... F . S . W . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Vpplcaeel Vpplratno Tj6musmu Vphlhomsa Stvlsacce Vphlsacce Vphlneucr Consensus
651 IL.PPETAFD GNPYPIGVDPVWNLAEGNKL QLNPAIPGVF GGPYPFGIDP IWNIA.TNKL QLDPNIPGVF RGPYPFGIDP IWNLA.TNRL TLDPNVTGVF LGPYPFGIDP IWSLA.ANHL S I E A K K T G V .... Y P F G L D F A W H . G T D N G L S I T A T S V G T .... Y P I G L D W A W H . G T E N A L TVKAVLREPN GYRYPFGLDW RWH.GTENEL . . . . . . . . . . . . . Y P . G . D . . W ..... N.L
SFLNSMKMKM TFLNSFKMKM TFLNSFKMKM SFLNSFKMKM LFSNSYKMKL LFSNSYKMKL LFINSYKMKM .F.NS.KMK.
700 SVLFGIAQMT SVILGIIHML SVILGIFHMT SVILGVVHMA SILMGYAHMT SILMGFIHMT AIILGWAHMT S...G..HM.
Vpplcaeel Vpplratno Tj6musmu Vphlhomsa Stvlsacce Vphlsacce Vphlneucr Consensus
701 FGVLLSYQNF IYFKSDLDIK YMFIPQMIFL SSIFIYLCIQ FGVSLSLFNH IYFKKPLNIY FGFIPEIIFM SSLFGYLVIL FGWLGIFNH LHFRKKFNVY LVSVPEILFM LCIFGYLIFM FGVVLGVFNH VHFGQRHRLL LETLPELTFL LGLFGYLVFL YSFMFSYINY RAKNSKVDII GNFIPGLVFM QSIFGYLSWA YSYFFSLANH LYFNSMIDII GNFIPGLLFM QGIFGYLSVC YSLCFSYINA RHFKRPIDIW GNFVPGMIFF QSIFGYLVLC ........ N . . . F . . . . . . . . . . . P . . . F . . . . FGYL...
750 ILSKWLFFGA IFYKWTAYDA IIYKWLAYSA VIYKWLCVWA I V Y K W ..... I V Y K W ..... I I Y K W ..... I . Y K W .....
Vpplcaeel Vpplratno Tj6musmu Vphlhomsa Stvlsacce Vphlsacce Vphlneucr Consensus
751 VGGTVLGYKY PGSNCAPSLL .......... H S S R N A P S L L .......... E T S R E A P S I L .......... A R A . A S P S I L ..... SKDWI K D D K P A P G L L ..... A V D W V K D G K P A P G L L ..... SVDWF G T G R Q P P G L L ................ P..L
800 IGLINMFMMK SRNAGFVDDS GETYPQCYLS IHFINMFLF ............. SYPESGNA IEFINMFLFP .............. TSKTHG IHFINMFLFS .............. HSPSNR NMLINMFLAP GTIDD..Q ............ NMLINMFLSP GTIDD..E ............ NMLIYMFLQP GTLDGGVE ............ ...INMFL ......................
801 850 Vpplcaeel TWYPGQSFFE TIFVLVAIAC VPVMLFGKPY FLWKEEKERR EGGHRQLATI Vpplratno MLYSGQKGIQ CFLIVVAMLC VPWMLLFKPL ILRHQYLRKK HLGTLNFGGI
11(~
Tj6musmu Vphlhomsa Stvlsacce Vphlsacce Vphlneucr Consensus
.LYPGQAHVQ R V L V A L T V L A LLYPRQEVVQ ATLVVLALAM .LYSGQAKLQVVLLLAALVC .LYPHQAKVQ V F L L L M A L V C .LYPGQATVQ V I L L L L A V I Q .LY..Q...Q ..L...A...
V P V L F L G K P L F L L W L H N G R N CFGMSRSG.. V P I L L L G T P L H L L H R H R R R .... LRRRP.. VPWLLLYKPL TLRRLNKNGG GGRPHGYQSV I P W L L L V K P L H F K F T H K K ...... KSHEPL VPILLFLKPF YLRWENNRAR AKGYRGIGER VP..L..KP..L ..................
Vpplcaeel Vpplratno Tj6musmu Vphlhomsa Stvlsacce Vphlsacce Vphlneucr Consensus
851 900 E I I L W L A L V Q V P I M L F A K P Y F L Y R R D K Q Q SRYSTLTAES N Q H Q S V R A D I ............................................ RVGNGP ........................................... YTLVRKD .............................................. ADRQ GNI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EH .EEQIAQQRH PST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EA .DA ....... SRV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SA L D E D D E E D P S ..................................................
Vpplcaeel Vpplratno Tj6musmu Vphlhomsa Stvlsacce Vphlsacce Vphlneucr Consensus
901 950 N Q D D A E V V H A P E Q T P K P S G H G H G H G D G P . . . . . . . LEMGD V M V Y Q A I H T I T E E D A E I I Q H D Q L S T H S E D A E E P T E D E V . . . . . . . FDFGD T M V H Q A I H T I SEEEVSLLGNQDIE.EGNSRMEEGCREVTCE...EFNFGE ILMTQAIHSI EENKAGLLDL PDASVNGWSS DEEKAGGLDD EEEAELVPSE VLMHQAIHTI SAEGFQGMII S D V A S V A D S I N E S V G G G .... E Q G P F N F G D V M I H Q V I H T I S S E D L E A Q Q L ISAMDADDAE E E E V G S G .... S H G E . D F G D IMIHQVIHTI N G D D Y E G A A M LT ........ H D E H G D G .... E H E E F E F G E V M I H Q V I H T I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G ..... Q.IHTI
Vpplcaeel Vpplratno Tj6musmu Vphlhomsa Stvlsacce Vphlsacce Vphlneucr Consensus
951 EFVLGCVSHT EYCLGCISNT EYCLGCISNT EFCLGCVSNT EFCLNCISHT EFCLNCVSHT EFCLNSVSHT E.CL.C.S.T
ASYLRLWALS ASYLRLWALS ASYLRLWALS ASYLRLWALS ASYLRLWALS ASYLRLWALS ASYLRLWALS ASYLRLWALS
LAHAQLSDVL LAHAQLSEVL LAHAQLSDVL LAHAQLSEVL LAHAQLSSVL LAHAQLSSVL LAHQQLSAVL LAHAQLS.VL
i000 WTMVFRNAFV LDGYTGAIAT WTMVIHIGLH VRSLAGGLGL WAMLMRVGLR VDTTYG...V WAMVMRIGLG LGREVGVAAV WDMTISNAFS SKNSGSPLAV WTMTIQIAFG FRGF...VGV WSMTMAKALE SKGLGG..AI W.M . . . . . . . . . . . . . . . . .
Vpplcaeel Vpp ir atno Tj6musmu Vphlhomsa Stvlsacce Vphlsacce Vphlneucr Consensus
i001 YI...LFFIF GSLSVFILVL FF... IFAAF A T L T V A I L L I LL.LPVMAFF AVLTIFILLV VL.VPIFAAF AVMTVAILLV MKVVFLFAMW FVLTVCILVF FMTVALFAMW FALTCAVLVL FLVVA.FAMF FVLSVIILII ...... FA . . . . . . . . IL..
MEGLSAFLHA MEGLSAFLHA MEGLSAFLHA MEGLSAFLHA MEGTSAMLHA MEGTSAMLHS MEGVSAMLHS MEG.SA.LH.
LRLHWVEFQS LRLHWVEFQN IRLHWVEFQN LRLHWVEFQN LRLHWVEAMS LRLHWVESMS LRLAWVESFS LRLHWVE...
Vpplcaeel Vpp ir atno Tj6musmu Vphlhomsa Stvlsacce
1051 1073 A P F S F E K I L A E E R E A E E N L .... L P F S F E H I R E GKFDE . . . . . . . . V P F S F S L L S S K F S N D D S I A .... SPFTFAATDD ............. E P F S F R ...... AIIE .......
1050 KFYGGLGYEF KFYTGTGFKF KFYVGAGTKF KFYSGTGYKL KFFEGEGYAY KFFVGEGLPY KFAEFGGWPF KF..G.G...
Vphlsacce EPFAFEYKDM EVAVASASSS ASS Vphlneucr TPFSFKQQLE ESEELKEYIG . . . Consensus .PF.F . . . . . . . . . . . . . . . . . .
Residues listed in the consensus sequence are present in at least 75 % of the aligned transporter sequences.
Database accession numbers Stv 1sacce Tj6musmu Vphlhomsa Vphlneucr Vphlsacce Vpplcaeel Vpplratno
SWISSPR OT
PIR
EMBL/GENBANK
P3 7296 P15920
A54081 JH0287
P32563 P30628 P25286
A42970 S15795 B38656
U06465; G460160 M31226; G293678 U45285 U36396 M89778; G173173 Z11115; G1067097 M58758; G206430
References 1 2 3 4
Li, Y.P. et al. {1996) Biochem. Biophys. Res.. C o m m u n . 218, 813-821. Perin, M.S. et al. (1991)J. Biol. Chem. 266, 3877-3881. Manolson, M.E et al. {1992) J. Biol. Chem. 267, 14294-14303. Lee, C.-K. and Ghoshal, K.K.D. (1990)Mol. Immunol. 27, 1137-1144.
m
This Page Intentionally Left Blank
ABC Multidrug Resistance Proteins
White transporter family Summary Typical transporters of the white family, the example of which is the white 1 protein of Drosophila melanogaster (Whitdrome), mediate the import of pigment precursors into cells in the compound eye by acting as ATP-dependent effiux pumps. In Drosophila the white protein dimerizes with the brown protein (Browdrome) to import guanine 2 and with the scarlet protein (Scrtdrome) to import tryptophan a. Members of the white transporter family are also found in mammals and a few other eukaryotes. The human homolog of the white protein 4 is located on chromosome 21 and may be implicated in Down's syndrome (trisomy 21). Statistical analysis of multiple amino acid sequence comparisons places the white transporter family in the multidrug resistance subdivision of the ATP binding cassette (ABC) superfamily s. Proteins in this superfamily use the energy of ATP hydrolysis to pump substrates across cell membranes. Transporters of the white family consist of a single ATP binding domain (containing the sequence patterns characteristic of the ABC transporter superfamily)fused to a transmembrane domain, with the ATP binding domain towards the Nterminus 2. The functional transporter complex is formed from a dimer - in the case of the Drosophila pigment proteins, a heterodimer of white with either brown or scarlet. The transmembrane domains are predicted to contain six membrane-spanning helices by the hydropathy of their amino acid sequences. Several amino acids are conserved within the white transporter family, including motifs unique to the family, signature motifs of the ABC superfamily, and motifs necessary for function by the criterion of site-directed mutagenesis.
Nomenclature, biological sources and substrates CODE
Browdrome
DESCRIPTION [SYNONYMS] ProbableATP-dependent dermease precursor [ADP1, YCR11C, YCR105] Brownprotein [BW]
Scrtdrome
Scarletprotein [ST]
Whitanoal
Eye pigment protein [White] White protein [W]
Adp 1sacce
Whitdrome Whithomsa Whitmusmu
114
White protein homolog [WHIT1] White protein homolog
ORGANISM [COMMON NAMES] Saccharomyces cerevisiae [yeast]
SUBSTRATE(S)
Drosophila melanogaster [fruit flY] Drosophila melanogaster [fruit fly] Anopheles albimanus [mosquito] Drosophila melanogaster ]fruit fly] Homo sapiens [human] Mus musculus [mouse]
Guanine Tryptophan Pigment precursors? Guanine tryptophan Pigment precursors? Pigment precursors?
P h y l o g e n e t i c tree :::::::::::::::::::::::::::::::::::::::::: :-z::;;~ ======================
Whithomsa
.............. :::::::::::::::::::::::::::::
.Z. I. Z. . '.Z. .I.I. I. . . . . ::::::::::::::::::::::::::::: :::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::
!~i!iiiii',ii!%::i!
Whitanoal
............................ .................
.................
Whitdrome
~i~{i!ii~',i!iii',:,
Scrtdrome
.................. .................
!N~.ii?ilN~ Adplsacce ............................... .................
.................
Browdrome
77.:2~tC" ................
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and are therefore not included in the phylogenetic tree: Whitmusmu {Whithomsa}. P r o p o s e d o r i e n t a t i o n of w h i t e p r o t e i n ~ in t h e m e m b r a n e The model is based on predictions of membrane-spanning regions and ~-helical content. The N-terminus of the protein is illustrated on the reside and is folded six times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75 % of the aligned transporters {see below} are shown.
115
E
I
I
T t
S D L G
L A
OUTOYI I
A
v
~
v
L L
sI L
L D
T p
G
IG
E D
N G
L
O D
E
~
L
T
E
A
R K R E G G S
G
k
C
R
p
Q
N P A
k
D F
S
~
COOH F
RE
Y
i DR
NH
2
L
INSIDE
T
I
Physical and genetic characteristics AMINO ACIDS 1049 675 666 709 687 674 666
Adplsacce Browdrome Scrtdrome Whitanoal Whitdrome Whithomsa Whitmusmu
MOL. W T
117 231 75 943 74 506 79 052 75 672 75 169 74 032
EXPRESSION SITES
head retina
CHR O M O S O M A L LOCUS Chromosome 3
21q22.3
Multiple amino acid sequence alignments 1
50
51
i00
I01
150
151
200
Adplsacce MGSHRRYLYY SILSFLLLSC SVVLAKQDET PFFEGTSSKN SRLTAQDKGN Adplsacce D T C P P C F N C M LPIFECKQFS ECNSYTGRCE CIEGFAGDDC SLPLCGGLSP
Adplsacce DESGNKDRPI RAQNDTCHCD NGWGGINCDV CQEDFVCDAF MPDPSIKGTC
Adplsacce YKNGMIVDKV FSGCNVTNEK ILQILNGKIP QITFACDKPN QECNFQFWID
11(
Whithomsa Whitanoal Whitdrome Scrtdrome Adplsacce Browdrome Consensus
201 250 ................................................ MA ........................................... MTINTDD ........................................... MGQEDQE .................................................. QLESFYCGLS DCAFEYDLEQ NTSHYKCNDV QCKCVPDTVL CGAKGSIDIS .................................................. ..................................................
Whithomsa Whitanoal Whitdrome Scrtdrome Adplsacce Browdrome Consensus
251 300 A F S V G T A M N A S S Y S A E M T E P KS . . . . . . . . . . . . . . VCVS V D E V V S S N M E Q Y A D G E S K T T I S S N R R Y S T S SF . . . . . . . . . . . . . . QDQS M E D D G I N A T L L L I R G G S K H P S A E H L N N G D S GA . . . . . . . . . . . . . . ASQS CINQGFGQ.. .MSDSDSKRI D V E A P E R V E Q HE . . . . . . . . . . . . . . LQVM P V G S T I E V P S DFLTETIKGP GDFSCDLETR QCKFSEPSMN DLILTVFGDP YITLKCESGE .................................................. ..................................................
Whithomsa Whitanoal Whitdrome Scrtdrome Adplsacce Browdrome Consensus
301 350 ATETDLLNGH LKKVDNNLTE AQRFSSLPRRAAVNIEFRDL S.YSVPEGPW TNDKATL.IQVWRPKSY... GSVKGQIPAQ DRLTYTWREI DVFGQAAIDG A K N Y G T L . L P PSPPEDS... GSGSGQLA.. E N L T Y A W H N M D I F G A V N Q P G LDSTPKL.SK RNSSERSLPL RSYSKWSPTE QGATLVWRDL CVYTNVGGSG CVHYSEIPGY KSPSKDPTVS WQGKLVLALT AVMVLALFTF ATFYISKSPL . . . . . . . . . . . . . . . . . . MQ E S G G S S G Q G G P S L C L E W K Q L N Y Y V P D Q E Q S ..................................................
Whithomsa Whitanoal Whitdrome Scrtdrome Adplsacce Browdrome Consensus
351 400 WRKKGY ............................................ KSREPLCSRL RHCFTRQRLV KDFNPR ........................ SGWRQLVNRT RGLFCNERHI PA..PR ........................ . . . . . . . . . . . . . . . . . . . . . . . QRM . . . . . . . . . . . . . . . . . . . . . . . . FRNGLGSSKS PIRLPDEDAV NNFLQNEDDT LATLSFENIT YSVPSINSDG ....................................... N YSFWNECRKK ..................................................
Whithomsa Whitanoal Whitdrome Scrtdrome Adplsacce Browdrome Consensus
401 ..KTLLKGIS ..KHLLKNVT ..KHLLKNVC ..KRIINNST VEETVLNEIS RELRILQDAS ..... L ....
Whithomsa Whitanoal Whitdrome Scrtdrome Adplsacce Browdrome Consensus
451 500 ..LINGLPRD L R C F R K V S C Y I M Q D D M L L P H L T V Q E A M M V S A H L K L Q . . . E IRTLNGVPVT AEQMRARCAY VQQDDLFIPS LTTKEHLMFQ AMLRMGRDVP MRLLNGQPVD AKEMQARCAY VQQDDLFIGS LTAREHLIFQAMVRMPRHLT L..INGRRIG PF.MHRNHGY VYQDDLFLGS VSVLEHLNFM AHLRLDRRVS SIKVNGISMD RKSFSKIIGF VDQDDFLLPT LTVFETVLNS ALLRLPKALS ..VLNGMAME R H Q M T R I S S F L P Q F E I N V K T F T A Y E H L Y F M S H F K M H R R T T .... NG . . . . . . . . . . . . . . . . QDD ...... T..E ..... A .........
GKFNSGELVA GVARSGELLA GVAYPGELLA GAIQPGTLMA GIVKPGQILA GHMKTGDLIA G .... G . L . A
IMGPSGAGKS VMGSSGAGKT VMGSSGAGKT LMGSSGSGKT IMGGSGAGKT ILGGSGAGKT .MG.SGAGKT
450 TLMNILAGYR ETGMKGAV.. TLLNELAFRS PPGVKISPNA TLLNALAFRS PQGIQVSPSG TLMSTLAFRQ PAGTVVQGDI TLLDILAMKR KTG...HVSG TLLAAISQRL RGNLTGDV.. T L . . . L A ..... G .......
r
[17
Whithomsa Whitanoal Whitdrome Scrtdrome Adplsacce Browdrome Consensus
501 550 KDEGRREMVK EILTALGLLS CANTRTGS ...... LSGGQR KRLAIALELV ATPIKMHRVD EVLQELSLVK CADTIIGVAG RVKGLSGGER KRTAFRSETL YRQ.RVARVD QVIQELSLSK CQHTIIGVPG RVKGLSGGER KRLAFASEAL KEERRLI.IK ELLERTGLLSAAQTRIGSGD DKKVLSGGER KRLAFAVELL F.EAKKARVY KVLEELRIID IKDRIIG.NE FDRGISGGEK RRVSIACELV KAE.KRQRVA DLLLAVGLRDAAHTRI ...... QQLSGGER KRLSLAEELI ........ V . . . L . . . . L . . . . . T . I G . . . . . . . L S G G E R K R . . . A . E . .
Whithomsa Whitanoal Whitdrome Scrtdrome Adplsacce Browdrome Consensus
551 NNPPVMFFDE TDPHLLLCDE TDPPLLICDE NNPVILFCDE TSPLVLFLDE TDPIFLFCDE ..P..L..DE
)!iiiiiiii
PTSGLDSASC PTSSLDSFMA PTSGLDSFTA PTTGLDSYSA PTSGLDASNA PTTGLDSFSA PT.GLDS..A
FQVVSLMKGL QSVLQVLKGM HSVVQVLKKL QQLVATLYEL NNVIECLVRL YSVIKTLRHL ..V...L..L
600 A ................... A ................... S ................... A ................... S ................... CTRRRIAKHS LNQVYGEDSF ....................
601 650 Whithomsa ............................................... QG. Whitanoal ............................................... MK. Whitdrome ............................................... QK. Scrtdrome ............................................... QK. Adplsacce ............................................... SDY Browdrome ETPSGESSAS GSGSKSIEMEVVAESHESLL QTMRELPALG VLSNSPNGTH Consensus ..................................................
) .
- 4
..... ::2? ::)i::
.
.
.
.
651 GRSIICTIHQ GKTIILTIHQ GKTVILTIHQ GTTILCTIHQ NRTLVLSIHQ KKAAICSIHQ ....... IHQ
700 PSAKLFELFD QLYVLSQGQC VYRGKVCNLV PYLRDLGLNC PSSELYCLFD RILLVAEG.V AFLGSPYQSA DFFSQLGIPC PSSELFELFD KILLMAEGRV AFLGTPSEAV DFFSYVGAQC PSSQLFDNFN NVMLLADGRV AFTGSPQHAL SFFANHGYYC PRSNIFYLFD KLVLLSKGEM VYSGNAKKVS EFLRNEGYIC PTSDIFELFT HIILMDGGRI VYQGRTEQAAKFFTDLGYEL P . S . . F . L F .... L . . . G . . . . . G . . . . . . . F .... G..C
701 750 W h i t h o m s a P T Y H N P A D F V M E V . . A S G E Y G D Q N . . . S R L V R A V R E G M C D SDH ....... W h i t a n o a l P P N Y N P A D F Y V Q M L A I A P N K E T E C . . . R E T I K K I C D S F A V SPI ....... W h i t d r o m e P T N Y N P A D F Y V Q V L A V V P G R E I E S . . . R D R I A K I C D N F A I SKV ....... S c r t d r o m e P E A Y N P A D F L I G V L A T D P G Y E Q A S . . . Q R S A Q H L C D Q F A V SSA ....... Adplsacce PDNYNIADYL IDITFEAG.. PQGK...RRR IRNISDLEAG TDTNDIDNTI B r o w d r o m e P L N C N P A D F Y L K T L A D K E G K E N A G A V L R A K Y E H E T D G L Y S GS ........ ConsensusP...NPADF .......................... D ..............
.
..
~
Whithomsa Whitanoal Whitdrome Scrtdrome Adplsacce Browdrome Consensus
.
118
Whithomsa Whitanoal W h i t d r ome Scrtdrome Adplsacce Br owdr ome Consensus
751 ........................................ ........................................ ........................................ ........................................ HQTTFTSSDG TTQREWAHLA AHRDEIRSLL RDEEDVEGTD ..................................... WLL .........................................
800 KRDLGGDAEV ARDI.. I E T A ARDM..EQLL AKQR..DMLV GRRGATEIDL ARSYSGDYLK R ........
t i?~............. ./ u
;:~'!; ?g5 -;!::.ff
iii~-.i):~)12;
..-......
...
!~i: ?-:2~i
...
!i;jii::!j:: ;:y ...... ......
:ii.i;!!(~/
ill -i: ...
..
801 85O N P F L W H R P S E E V K Q T K R L K G L R K D S S S M E G CHSF . . . . . . . . . . SASC.L S Q V N G D G G I E L T R T K H T T D P Y F L Q P M E G V D STGY . . . . . . . . . . R A . S W W A T K N L E K P L E . . . . . . . . . . . . . . . . QPEN GYTY KA TWF N ....... LE I H M A Q S G N F P F ...... DTE VESF . . . . . . . . . . R G V A W Y NTKLLHDKYK DSVYYAELSQ EIEEVLSEGD EESNVLNGDL PTGQQSAGFL HVQNFKK ....................................... IRWI ..................................................
Whithomsa Whitanoal Whitdrome Scrtdrome Adplsacce Browdrome Consensus
851 TQFCILFKRT FLSIMRDSVL THLRITSHIG IGLLIGLLYL TQFYCILWRS WLSVLKDPML VKVRLLQTAM VASLIGSIYF MQFRAVLWRS WLSVLKEPLL VKVRLIQTTM VAILIGLIFL KRFHVVWLRA IVTLLRDPTI QWLRFIQKIA MAFIIGACFA QQLSILNSRS FKNMYRNPKL LLGNYLLTIL LSLFLGTLYY YQVYLLMVRF MTEDLRNIRS GLIAFGFFMI TAVTLSLMYS .Q ...... R . . . . . . . . . . . . . . . . . . . . . . . . . . G ....
Whithomsa Whitanoal Whitdrome Scrtdrome Adplsacce Browdrome Consensus
901 950 LSNSGFLFFS MLFLMFAALM PTVLTFPLEM GVFLREHLNY WYSLKAYYLA MNINGSLFLF LTNMTFQNVF AVINVFSAEL PVFLREKRSR LYRVDTYFLG MNINGAIFLF LTNMTFQNVF ATINVFTSEL PVFMREARSR LYRCDTYFLG QAVQGALFIM ISENTYHPMY SVLNLFPQGF PLFMRETRSG LYSTGQYYAA QNRMGLFFFI LTYFGFVT.F TGLSSFALER IIFIKERSNN YYSPLAYYIS QDVGGSIFML SNEMIFTFSY GVTYIFPAAL PIIRREVGEG TYSLSAYYVA .... G..F . . . . . . . F . . . . . . . . . F . . . . . . F.RE ..... Y .... Y...
Whithomsa Whitanoal Whitdrome Scrtdrome Adplsacce Browdrome Consensus
951 KTMAD VPFQ I M F P V A Y C S I KTIAE.LPLF IAVPFVFTSI KTIAE.LPLF LTVPLVFTAI NILAL.LPGM IIEPLIFVII KIMSEVVPLRWPPILLSLI LVLS.FVPVA FFKGYVFLSV ....... P ..... P ..... I
Whithomsa Whitanoal Whitdrome Scrtdrome Adplsacce Browdrome Consensus
i001 LGLL.IGAAS TSLQVATFVG PVTAIPVLLF FGYL.ISCAS SSISMALSVG PPVVIPFLIF FGYL.ISCAS SSTSMALSVG PPVIIPFLLF CGCF F S T A F N S V P L A M A Y L V P L D Y I F M I T LEILTIGIIF EDLNNSIILS VLVLLGSLLF YGVF.LSSLF ESDKMASECA APFDLIFLIF .G . . . . . . . . . S...A . . . . . . . . . . . L.F
Whithomsa Whitanoal Whitdrome Scrtdrome Adplsacce Browdrome Consensus
1051 ISYVRYGFEG VILSIY.GLD REDLHCDIDE TCHFQKSEAI LSWFRYANEA LLINQWADHR DGEIGCTRAN VTCPASGEII LSWFRYANEG LLINQWADVE PGEISCTSSN TTCPSSGKVI LSWMLYANEA MTAAQWSGVQ NITCFQESAD LPCFHTGQDV FSVFYYAYES LLINEVKTLM LKERKYGLNI EV...PGATI LSLFFYSNEA LMYKFWIDID NIDCPVN.ED HPCIKTGVEV .S...Y..E . . . . . . . . . . . . . . . . . . . . . . . . . . . G...
...:
.i~::~:: .~::-i~:::.:
?
Whithomsa Whitanoal Whitdrome Scrtdrome Adplsacce Browdrome Consensus
..
~
900 GIGNETKK.V G.QVLDQDGV G.QQLTQVGV GTTEPSQLGV NVSNDI.SGF GIGGLTQRTV G ........ V
i000 VYWMTSQPSD AVRFVLFAAL GTMTSLVAQS TYPMIGLKAAISHYLTTLFI VTLVANVSTS AYPMIGLRAG VLHFFNCLAL VTLVANVSTS CYWLTGLRST FYAFGVTAMCVVLVMNVATA VYPMTGLNMK DNAFFKCIGI LILF.NLGIS IYASIYYTRG FLLYLSMGFL MSLSAVAAVG .Y . . . . . . . . . . . . . . . . . . . . L ....... 1050 SGFFVSFDTI P.TYLQWMSY GGFFLNSASV P.AYFKYLSY GGFFLNSGSV P.VYLKWLSY SGIFIQVNSLP VAFWWTQF SGLFINTKNI TNVAFKYLKN G G T Y M N V D T V PG ..... LKY .G.F ...... P .........
ii00 LRELDVENAK LETFNFRVED LETLNFSAAD LDKYTFNESN LSTFGFVVQN LQQGSYRNAD L .........
11~
ii01
1135
Whithomsa ..LYLDFIVL GIFFISLRLI AYLVLRYKIR AER.. ~ii !?ii ~i!i i i i!i W h i t a n o a l . . F A L D I G C L F A L I V L F R L G A L F C L W L R S R SKE.. Whitdrome ..LPLDYVGL AILIVSFRVL AYLALRLRAR RKE.. S c r t d r o m e . . V Y R N L L A M V G L Y F G F H L L G Y Y C L W R R A R KL... iii? i i', ' iiii:i :::::::::::::::::::::::::::::::::::::::: ~ii~ii:;~i;ii!:i!i;:A d p l s a c c e . . L V F D I K I L A L F N V V F L I M G Y L A L K W I V V E Q K . . Browdrome YTYWLDCFSLVVVAVIFHIV SFGLVRRYIH RSGYY ,:............ +:.:; ..... C o n s e n s u s ..... D . . . L ...... F . . . . . . . L . . . . . . . . . . ~:.i::::ii;i:::.~/. i::i:~::!~i~i.::; -
i~!iii!i:~i:.::i!::i~,!ii!!il :::::::::::::::::::::::::::::: .:.~#.:
Residues listed in the consensus sequence are present in at least 75 % of the aligned transporter sequences. Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and are therefore not included in the alignment: W h i t m u s m u (Whithomsa). Residues indicated in boldface type are also conserved in at least one other family of the ABC transporter superfamily. Database accession numbers 2.2L.. LL..Z ....
ZZ.Z-C :::::::::::::::::::::::::::::
:::::::::::::::::::::::::::::::::..:
i!i)i!s~::!ii:!!::ili:~::i
Adplsacce Browdrome Scrtdrome Whitanoal Whitdrome Whithomsa Whitmusmu
SWISSPR OT
PIR
EMBL/GENBANK
P25371 P12428 P45843
S19421; $40914 A31399; FYFFB
P10090 P45844
S07263; FYFFW
X59720; G5381 M20630; G157014 U39739; G 1079665 L76302 X51749; G8826 X91249; E218444 U34920
Rs163 t 2 3 4 s
12(
Pepling, M. and Mount, S.M. {1990) Nucleic Acids Res. 18, 1633. Dreesen, T.D. et al. (1988) Mol. Cell Biol. 8, 5206-5215. Tearle, R.G. et al. (1989)Genetics 122, 595-606. Chen, H. et al. (1996)Am. 1. Hum. Genet. 59, 66-75. Higgins, C.E (1992) Annu. Rev. Cell Biol. 8, 67-113.
ABC 1 & 2 transporter family
Summary i~i!i~!i!i~~ ,i':i:i Transporters of the ABC 1 & 2 family, the example of which is the novel mouse i:!:~;i~ ,!i~!i:!~~ !,i,l' ATP binding protein ABC 1 1 (Abclmusmu), are believed to act as transporters, although their natural substrate is unknown. The two known members of this family are found only in mammals. Statistical analysis of multiple amino acid sequence comparisons places the ABC 1 & 2 transporter family in the multidrug resistance subdivision of iiiii',i:,d; the ATP binding cassette (ABC)superfamily 2. Proteins in this superfamily use the energy of ATP hydrolysis to pump substrates across cell membranes. !iliiii;ii~;ii~il Transporters of the ABC 1 & 2 transporter family consist of a single polypeptide chain made up of four domains. The N- and C-terminal halves of the protein are i',~!!iii:i~!~!i!!iiii:~:ii homologous, and each half is made up of a transmembrane domain followed by an ATP binding domain. Each transmembrane domain is predicted to contain six membrane-spanning helices by the hydropathy of the amino acid sequences and may be glycosylated. ..........5::*:: :::".............
i~i!i:.!ii~i!i;i:ii:.:::::i
Nomenclature, biological sources and substrates CODE
Abclmusmu Abc2musmu
DESCRIPTION
OR GANISM
S UBSTRATE(S)
[SYNONYMS]
[COMMON NAMES] Mus musculus
Unknown
ATP binding cassette transporter 1 ATP binding cassette transporter 2
[mouse] Mus musculus
Unknown
[mouse]
Proposed orientation of ABC1 ~ in the membrane The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the inside and is folded twelve times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed.
121
A B C 1 8. 2 t r a n s p o r t e r f a m i l y
OUTSIDE
,i~!!!i:;i,~!ii~:~i~:~-!
iiii..i:.,.ii!:~.i:!
.. . . . . . . . ..
:iii!~;i!::;.iil}d
I
~_ ..... . ......
tcr/ t ' ~ l i'll,
:~:i!!iT~i.:;~:.!,~!ili! 7iii:D77!!!:q i~!i77:7~;2i711
7
.............. ............... ...... .:.,:<..,
~:~::~n~:-~,~n-::
.~t _
'~!....
.....
I
I
L I 1 I
ii:!i~ii~i........ T~:.:~.
ATP BINDING SITE NH
2
COOH
ATP BINDING SITE
INSIDE
Physical and genetic characteristics AMINO ACIDS 2201 1472
Abclmusmu Abc2musmu
i!17;%:!
flail .iiii ii!iii:::i: i?!i~-;iTili?::~.:
12,~
MOL. W T 246686 163 140
EXPRESSION SITES uterus, m a n y others uterus, m a n y others
CHROMOSOMAL LOCUS 4A5-B3 2A2-B
Multiple amino acid sequence alignments 1 Abclmusmu
-GCEYFALFE
50 EQGIGVQWDN
LFESPVEEDG
FNLTTAVSMM
QYGIPRPWYF
PCTKSYWFGE EIDEKSHPGS -E E T
LFDTFLYGVM
Abc2musmu Consensus
QAC AMESRH F .......... . .C . . . . . . . . . . . . . . . . . . . . . . . E... F ...................
Abclmusmu Abc2musmu
TWYIEAVFPG
Consensus
51
.............................
i01 Abclmusmu EEEPTHLRLG Abc2musmu EEEPTHLPLV ConsensusEEEPTHL.LLV
i00
SQKGVSEICM RG ..... M
E E ............
G ......
150 VSIQNLVKVY RDGMKVAVDG LALNFYEGQI TSFLGHNGAG VCVDKLTKVY KNDKKLALNK LSLNLYENQV VSFLGHNGAG .... L . K V Y .... K . A . . . L . L N . Y E . Q . . S F L G H N G A G
151 200 Abclmusmu KTTTMSILTG LFPPTSGTAY ILGKDIRSEM SSIRQNLGVC PQHNVLFDML Abc2musmu KTTTMSILTG LFPPTSGSAT IYGHDIRTEM DEIRKNLGMC PQHNVLFDRL Consensus KTTTMSILTG LFPPTSG.A.I.G.DIR.EM .. I R . N L G . C P Q H N V L F D . L 201 250 Abclmusmu TVEEHIWFYA RLKGLSEKHV KAEMEQMALD VGLPPSKLKS KTSQLSGGMQ Abc2musmu TVEEHLWFYS RLKSMAQEEI RKETDKMIED LELS-NKRHS LVQTLSGGMK ConsensusTVEEH.WFY. RLK . . . . . . . . . E . . . M . . D . . L . . . K . . S .... LSGGM. 251 300 Abc i m u s m u R K L S V A L A F V G G S K V V I L D E P T A G V D P Y S R R G I W E L L L K Y R Q G R T I I L S T Abc2musmu RKLSVAIAFV GGSRAIILDE PTAGVDPYAR RAIWDLILKY KPGRTILLST C o n s e n s u s R K L S V A . A F V GGS... ILDE P T A G V D P Y . R R. I W . L . L K Y . . G R T I L L S T 301 350 Abclmusmu HHMDEADILG DRIAIISHGK L-CCVGSSLF LKNQLGTGYY LTLVKKDVES Abc2musmu HHMDEADLLG DRIAIISHGK LKCC-GSPLF LKGAYXDGYR LTLVKQPAEP C o n s e n s u s H H M D E A D L L G D R I A I I S H G K L . C C . G S . L F LK ..... GY. L T L V K . . . E . 351 400 Abclmusmu SLSSCRN-SS STVSCLKKED SVSQSSSDAG LGSDHESDTL TIDVSAISNL Abc2musmu GTSQEPGLAS SPSGCPR LSSCSEPQ ........ VSQF C o n s e n s u s ..S ...... S S...C . . . . . . . . . . . . . . . L . S . . E . . . . . . . . . . . S.. 401 450 Abclmusmu IRKHVSEARL VEDIGHELTY VLPYEAAKEG AFVELFHEID DRLSDLGISS Abc2musmu IRKHVASSLL VSDTSTELSY ILPSEAVKKG AFERLFQQLE HSLDALHLSS C o n s e n s u s I R K H V .... L V . D . . . E L . Y .LP.EA.K.GAF..LF ...... L..L..SS 451 500 Abclmusmu YGISETTLEE IFLKVAEES- GVDAETSD ...... GTLPAR RNRRAFGDKQ A b c 2 m u s m u F G L M D T T L E E V F L K V S E E D Q S L E N S E A D V K ESR.KDVLPGA E G L T A V G G Q A C o n s e n s u s . G . . . T T L E E .F L K V . E E . . . . . . . . . D . . . . . . . . LP . . . . . . A.G... 501 550 Abc I m u s m u S .... C LHP --FTED-DAV DPN-DSDIDP Abc2musmu GNLARCSELA QSQASLQSAS SVGSARGEEG TGYSDGYGDY RPLFDNLQDP C o n s e n s u s ..... C . . . . . . . . . L . . . . . . . . . . . . . . . . . . . . . . . . . P..D...DP 551 600 Abclmusmu ES---RETDL LS-GMDGKGS YQLKGWKLTQ QQFVALLWKR LLIARRSRKG Abc2musmu DNVSLQEAEM EALAQVGQGS RKLEGWWLKM RQFHGLLVKR FHCARRNSKA C o n s e n s u s ...... E . . . . . . . . . G.GS .... G W . L . . . Q F . . L L . K R ...ARR..K. 601 650 A b c l m u s m u F F A Q I V L P A V F V C I A L V F S L I V P P F G K Y P S L E L Q P W M Y N E Q Y T . . . . . FV Abc2musmu LCSQILLPAF FVCVAMTVAL SVPEIGDLPP LVLSPSQYHN -YTQPRGNFI C o n s e n s u s ...QI.LPA. F V C . A .... L . V P . . G . . P . L . L . P . . Y . . . Y ...... F. 651 700 Abclmusmu SNDAPEDMGT QELLNALTKD PGFGTRCMEG NPIPDTPCLA GEEDWTISPV Abc2musmu PYANEERQEY RLRLSPDASP QQLVSTFRLP SGVGATCVLK SPANGSLGPM C o n s e n s u s ..... E . . . . . . . L . . . . . . . . . . . . . . . . . . . . . T . . L . . . . . . . . . P.
122
701 750 Abclmusmu PQSIVDLFQN GNWTMKNPSP ACQCSSDKIK KMLPVC---P PGAGGLPPPQ Abc2musmu .... LNLSSG ESRLLAARFF DSMCL-ESFT QGLPLSNFVP PPPSPAPSDS iii2!!iii~::~:;ii:iiii',i C o n s e n s u s ...... L . . . . . . . . . . . . . . . . C . . . . . . . . LP ..... P P ..... P... .......... ....... ::,: ., ,,:<::.:s~:: ::,:
. . . . . :::::::::::::::::::::::::k.;~:~:~
:::::::::::::::::::::::.::::::~::
::f!::i!'::-!::! ::::::::::: ....... ::::::::::::::::::::::::: ?:~:;~. .........
751 800 Abc imusmu RKQKTADILQ --NLTGRNIS DYLVKTYVQI IAKSLKNKIW ...... VNEF Abc2musmu PVXPDEDSLQ AWNMSLPPTA GPETWTSAPS LPRLVHEPVR CTCSAQGTGF C o n s e n s u s ...... D . L Q . .N . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . . . . F
!!ii!:i!!:ii!i:!i'i!:!:i~i!~i!: ..................... ........ ........................ :.::: :...: ...................
':ii:iii!ii!iii~i:::.iilili~i:.:i~
i!':ii}ii@ .!:.i::!:~i ~)'.;N~-!.:iiil
751 800 A b c i m u s m u RY . . . . G G F S - - L G V S N S Q A L P - - P S H E V N D A I K Q M K K L L K L T K . . . . . . Abc2musmu SCPSSVGGHP PQMRVVTGDI LTDITGHNVS EYLLFTSDRF RLHRYGAITF C o n s e n s u s ...... GG . . . . . . V ..... L ..... H . V . . . . . . . . . . . . L . . . . . . . . 851 900 Abc lmusmu .... DTSADR FLSSLGRFMA GLDTKNNVKV WFNNKGWHAI SSFLNVINNA Abc2musmu GNVQKSIPAS FGARVPPMVR KIAVRRVAQV LYNNKGYHSM PTYLNSLNNA Consensus .......... F .................. V..NNKG.H ..... LN..NNA 901 950 Abc imusmu ILRANLQKGE -NPSQYGITA FNHPLNLTKQ QLSEVALMTT SVDVLVSICV Abc2musmu ILRANLPKSK GNPAAYXITV TNHPMNKTSA SLS-LDYLLQ GTDVVIAIFI C o n s e n s u s I L R A N L . K . . . N P .... I T . . N H P . N . T . . . L S ......... DV...I.. 951 Abclmusmu IFAMSFVPAS FWFLIQERV SKAKHLQFIS GVKPVIYWLSN Abc2musmu IVAMSFVPAS FWFLVAEKS TKAKHLQFVS GCNPVIYWLAN Consensus I.AMSFVPAS FWFL..E...KAKHLQF.SG..PVIYWL.N
i000 FVWDMCNYV YVWDMLNYL .VWDM.NY.
i001 1050 Abclmusmu VPATLVIIIF ICFQQKSYVS STNLPVLALL LLLYGWSITPL MYPASFVFK Abc2musmu VPATCCVIIL FVFDLPAYTS PTNFPAVLSL FLLYGWSITPI MYPASFWFE ConsensusVPAT...II...F .... Y.S .TN.P .... L . L L Y G W S I T P . M Y P A S F . F . 1051 ii00 Abclmusmu IPSTAYVVLT SVNLFIGING SVATFVLELF TNNK-LNDIND ILKSVFLIF Abc2musmu VPSSAYVFLI VINLFIGITA TVATFLLQLF EHDKDLKVVNS YLKSCFLIF Consensus .PS.AYV.L...NLFIGI...VATF.L.LF ...K.L...N..LKS.FLIF ii01 1150 Abc imusmu PHFCLGRGLI DMVKNQAMAD ALERFGE-NR FVSPLSWDLVG RNLFAMAVE Abc2musmu PNYNLGHGLM EMAYNEYINE YYAKIGQFDK MKSPFEWDIVT RGLVAMTVE ConsensusP...LG.GL..M..N .......... G ...... SP..WD.V.R...AM.VE 1151 1200 Abclmusmu GVVFFLITVL IQYRFFIRPR PVKAKLPPLN DEDEDVRRERQ RILDGGGQN Abc2musmu GFVGFFLTIM CQYNFLRQPQ RLPVSTKPV- EDDVDVASERQ RVLRGDADN ConsensusG.V.F..T...QY.F...P ........ P .... D.DV..ERQR.L.G...N 1201 1250 Abclmusmu DILEIKELTK IYR-RK--RK PAVDRICIGI -PPGECFGLLG VNGAGKSTT Abc2musmu DMVKIENLTK VYKSRKIGRI LAVDRLCLGV CVPGECFGLLG VNGAGKTST C o n s e n s u s D ...... L T K . Y . . R K . . R . . A V D R L C L G . . . P G E C F G L L G V N G A G K . . T
124
1251 1300 Abc l m u s m u F K M L T G D T P V T R G D A F L N K N S I L S N I H E V H Q N M G Y C P Q F D A I T E L L T G R E Abc2musmu FKMLTGDEST TGGEAFVNGH SVLKDLLQVQ QSLGYCPQFDV PVDELTARE C o n s e n s u s F K M L T G D . . . T . G . A F . N . . S.L ..... V. Q . . G Y C P Q F D . . . . . L T . R E 1301 1350 Abc l m u s m u H V E F F A L L R G V P E K E V G K F G E W A I R K L G L V K Y G E K Y A S N Y S G G N K R K L S T Abc2musmu HLQLYTRLRC IPWKDEAQVV KWALEKLELT KYADKPAGTYS GGNKRKLST C o n s e n s u s H ...... L R . . P . K . . . . . . . W A . . K L . L . K Y . . K . A . . Y S G G N K R K L S T 1351 1400 Abc i m u s m u A M A L I G G P P V V F L D E P T T G M D P K A R R F L W N C A L S I V K E G R S V V L T S H S M E Abc2musmu AIALIGYPAF IFLDEPTTGM DPKARRFLWN LILDLIKTGRS VVLTSHSME ConsensusA.ALIG.P...FLDEPTTGMDPKARRFLWN ..L...K.GRSVVLTSHSME 1401 1450 Abclmusmu ECEALCTRMA IMVNGRFRCL GSVQHLKNRF GDGYTIVVRIA GSNPDLKPV Abc2musmu ECEALCTRLA IMVNGRLHCL GSIQHLKNRF GDGYMITVRTK SSQ-NVKDV ConsensusECEALCTR.AIMVNGR..CLGS.QHLKNRFGDGY.I.VR...S .... K.V 1451 1500 Abclmusmu QEFFGLAFPG SVLKEKHRNM LQYQLPSSLS SLARIFSILSQ SKKRLHIED Abc2musmu VRFFNRNFPE AHAQGKTPYK VQYQLKSEHI SLAQVFSKMEQ VVGVLGIED Consensus..FF...FP ...... K ..... QYQL.S... SLA..FSS ....... L.IED 1501 1550 Abclmusmu YSVSQTTLDQ VFVNFAKDQS DDDHLKD -LS L H K N Q .... Abc2musmu YSVSQTTLDN VFVNFAKKQS DNVEQQEAEP SSLPSPLGLLS LLRPRPAPT C o n s e n s u s Y S V S Q T T L D . V F V N F A K . Q S D . . . . . . . . . . . . . . . . . . LS L ........ 1551 Abc i m u s m u - - - T V V . . . . . D V A V .... L T S F L Q D E K V K E S Y V . . . . . Abc2musmu ELRALVADEP EDLDTEDEGL ISF-EEERAQ LSFNTDTLC C o n s e n s u s ..... V . . . . . D ....... L .S F . . . E . . . . S .......
1590
Residues listed in the consensus sequence are present in both transporter sequences.
Database accession numbers SWISSPROT
Abclmusmu Abc2musmu
PIR
P41233 P41 234
EMBL/GENBANK
X75926; G495257 X75927; G495259
Re[erences I Luciani, M.F. et al. (1994) Genomics 21, 150-159.
z Higgins, C.E (1992) Annu. Rev. Cell Biol. 8, 67-113.
125
m
Yeast multidrug resistance family Summary Transporters of the yeast multidrug resistance family, examples of which are the multidrug resistance protein CDR1 from Candida albicans 1 (Cdrlcanal) and the brefeldin A resistance protein BFR1 from Schizosaccharomyces pombe 2 (Bfrlschpo), mediate resistance to one or, often, many structurally dissimilar antifungal agents by acting as ATP-dependent efflux pumps. Members of the family are only found in yeasts. They may be encoded chromosomally or by plasmids. Statistical analysis of multiple amino acid sequence comparisons places the yeast multidrug resistance family in the multidrug resistance subdivision of the ATP binding cassette (ABC) superfamily. Proteins in this superfamily use the energy of ATP hydrolysis to pump substrates across cell membranes. Transporters of the yeast multidrug resistance family consist of a single polypeptide chain made up of four domains. The N- and C-terminal halves of the protein are homologous, and each half is made up of an ATP binding domain followed by a transmembrane domain. Each transmembrane domain is predicted to contain six membrane-spanning helices by the hydropathy of the amino acid sequences, so the functional transporters contain 12 such helices. Proteins may be glycosylated. Many residues, including several long sequence motifs, are well conserved within the yeast multidrug resistance family, including motifs unique to the family, signature motifs of the ABC superfamily, and motifs necessary for function by the criterion of site-directed mutagenesis.
:::g .....
.
.
: ::..,
:,.
i:~~ :~-.i. :-i -:~
? :i::::
......
Nomenclature, biological sources and substrates CODE
DESCRIPTION [SYNONYMS]
Bfrlschpo
BrefeldinA resistance protein [BFR1, HBA2] Multidrug resistance protein [CDR1]
Cdr 1canal Pdr5sacce Pdrbsacce '-
E E
.
i
Snq2sacce Ydrlsacce
ORGANISM [COMMON NAMES] Schizosaccharomyces pombe [yeast] Candida albicans
[yeast]
Suppressorof toxicity of sporidesmin [PDRS, STS1, YDR1, LEM1, YOR153W] ATP-dependent permease [PDR11, YIL013CI SNQ2 protein [SNQ2, YDR011W, YD8119.16]
Saccharomyces cerevisiae [yeast]
YDR1 protein
Saccharomyces cerevisiae [yeast]
Saccharomyces cerevisiae [yeast] Saccharomyces cerevisiae [yeast]
[RESISTANCE a]
[Brefeldm Alike antibiotics] [Cycloheximide, chloramphenicol, others] [Cycloheximide, sulfomethuronmethyl] [4-Nitroquinolmen-oxide] [4-Nitroquinolmen-oxide, triaziquone] [Multiple antibiotics]
a Presumed substrates; protein corders resistance to specified compounds.
Phylogenetic tree Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and are therefore not included in the phylogenetic tree: Ydrlsacce (BfrlschpoJ.
12(
l~:~iiiiii!ii~i!!:i!~i:,iii,ii
?drSsacce
!~,i;:iii~iiii)i;iiiiii:i
~i:ii$~iii:.i!iiiiii~
!i~ii!iiiiiiiii~i;i!~i !i:!i~i!i~iii!i!i!:il]i:.i~
!!:ii~i!i!~ii~!i:!!!iil ::::::::::::::::::::::::::::::::: ii!iiiii!'iii!!i::i!::i !i:~i!i~!iii!:.i!i!ili~i;! ii,ii~;!!ii!:i!i.;i.!~il]
Cdrlcanal
iiii!i:.!i:i!iii~:ii!i!i!iiil-i~i
i,i:,;:~iiiiiiii'4ii!ili!'~;i
ii:i!;i!ii!i!~ii!! !:!ii:i:is
Bfrlschpo
'!!ii;ii!:iiii[!i;~ii!i!ii!ii;::i ii~ii!i!!ii:ii~iii!i!!i.il
i~!i:i!ii!!!]!ii:,~i!! ~:~i!i;i!41:;ii:ii;:!!!i;ii ::%!!ii%:-..ii:i!!i iiii!i::i~!i111~iii!Iii::i ! i:::i>>$.$.:ks ....
:iiiiiii!iiii:iii!i!!:
Snq2sacce
ili!i!!!!i;!i Iiiiii
!:i!i!~i!i!!i!ii!:i!!i!!! ii!;ii;iiii~}ii;iiiii{):ii:ii i!iiilii~$.!i:-.i~$.!ii!ii11 ii%:iyiiii:{ii:i:
Pdrbsacce
P r o p o s e d o r i e n t a t i o n of C D R 1 i n t h e m e m b r a n e !!!!!!!!:.!::~!!!!::!:i!i!::!~! ..................................
.................................
!i':i{!ii::!iiii!ii:i:: ;:
iiii!iiii
!iii!ii
@
The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75 % of the aligned transporters {see below} are shown.
m
OUTSIDE I
$ V R K R E G G S V G R V
i/!!i.!ii:,i-,;i:i!~i~:~!i:i.!::
::i~i:L:!!:i:::i.i-. i..: ::~:~:,::~:.:.~~: :-:~::,~:.
D N G V
',:Gi:,~i':: :!i y:
I
GYM
I E
E
I W
T R
CG P G Y V
FEN SEF P y
G Y L Y
W
G L D S T A
R
R
G
G
W
L Q N
1!:1
L
L:: 2:: Z:
C P GYNS
C W D N
.
T N
E F
Vt.
,f
T H
A
II
, it
i:~
V
L G
I
y
Q
I
Y
=================:i;:i::::. ================
....:::::::::::::::::::::::::
R
S
p
Y
T A F
!~ii!,::i:'.:.::::~::, ~:::i:.:~:.i~:
L T
~:.:::::=!~::i:~!-~i:::-!::~ ........ ::i:i.....
V T L
::::::::::::::::::::::::::: ,::::~:::
::::::::::::::::::::::::::::: :i::-::~:
p
F D K V G
p
~o
Q
V T
I
YSS
E YPSWET--
F H V D
::::~.~.: i:-::::i::::::: :::::::::::::::::::::::: ~.:~.~:~{.~:~-:..~:
F
G AK YF MG C R T FLT TPP
I
J
m S
COOH
y K
E SN W
E D AG
V L EAPN G A G E F Y - - I
R L
E F G T G R LN LL]-I'KGAGS GMLAT GP GV / Y -LV
N G D F R GY VQQ QD H TVRE L FSA LR EK
G Y KL CG GP GLV EG IN T V G S G F
YV
Vl LM YA A VG G GLNVEQRKLJ
NH 2
//
~::~::.~ ~:.:~: .~:-:~:.:
INSIDE
Physical and genetic characteristics :::::::::::::::::::::::::::
!ii!i~i:}i:i!:,: ........................:.:::,.
!::iii~!i~i},ili~ii~!:!i;i::i ii!~2~ii?:ii~;i:i-?:~i::-i
12~
Bfrl schpo Cdr 1canal Pdr5sacce Pdrbsacce Snq2sacce Ydr 1sacce
AMINO ACIDS 1530 1501 1511 1410 1501 1444
MOL. WT
171 750 169 93 7 170 43 7 160 405 168 766 163 294
CHROMOSOMAL LOCUS Plasmid pDB248'
Chromosome 15 Chromosome 9 Chromosome 4
Multiple amino acid sequence alignments 1
50
Pdr5sacce Cdrlcanal Bfrlschpo Snq2sacce Pdrbsacce Consensus
MPEAKLNN NVNDVTSYSS ASSSTENAAD LHNYNGFDEH ............ MSDSKMSS QDESKLEKAI SQDSSSENHS INEYHGFDAH MNQNSDTTHG QALGSTLNHT TEVTRISNSS DHFEDSSSNV DESLDSSNPS MSNIKSTQDS ....... SHN A V A R S S S A S F A A S E E S F T G I THDKDEQSDT .................................................. ..................................................
Pdr5sacce Cdrlcanal Bfrlschpo Snq2sacce Pdrbsacce Consensus
51 i00 TEARIQKLAR TLTAQSMQNS TQSAPNKSDA Q S I F S S G V E G V N P I F S D P E A TSENIQNLAR TFTHDSFKDD SSAGLLKYLT H . . . M S E V P G V N P Y . . E H E E SNEKASHTNE EYRSKGNQSY VPSSSNEPSP ESSSNSDSSS SDDSSVDRLA PADKLTKMLT G.PARDTASQ ISATVSEMAP DVVSKVE.SF ADALSRHTTR .................................................. ..................................................
Pdr5saece Cdrlcanal Bfrlschpo Snq2sacce Pdrbsacce Consensus
I01 PGYDPKLDPN S E N F S S A A W V K N M A H L S A A D PDFYKPYSLG INND.QLNPD S E N F N A K F W V K N L R K L F E S D PEYYKPSKLG GDPFEL .... GENFNLKHYL RAYKDSLQRD DIITR..SSG SGAFNMDSDS DDGFDAHAIF ESFVRDADEQ GIHIR KAG . . . . . . . . . . . . . . . . . . . . . SLSKYFNPI PDASVTFDGA ............. F . . . . . . . . . . . . . . . . . . . . . . . . . G
............
150 CAWKNLSASG IGYRNLRAYG VCMRDHSVYG VTIEDVSAKG TVQLEESLGA ...... S..G
151 200 Pdr5sacce A S A D V A Y Q S T V V N I P Y K I L K SGLRKFQRSK ETNTFQILKP MDGCLNPGEL Cdrlcanal VANDSDYQPT VTNALWKLAT EGFRHFQKDD DSRYFDILKS MDAIMRPGEL Bfrlschpo VGSGYEFLKT FPDIF...LQ P . Y R A I T E K Q V V E . K A I L S H CHALANAGEL Snq2sacce VDASALEGAT FGNILCLPLT I.FKGIKAKR H Q K M R Q I I S N V N A L A E A G E M Pdrbsacce VQNDEESASE FKNVGHLE . . . . . . . . . . . . . . . . . . . ISD ITFRANEGEV C o n s e n s u s V ........ T ..N . . . . . . . . . . . . . . . . . . . . . . . I .......... GE. Pdr5sacce Cdrlcanal Bfrlschpo Snq2sacce Pdrbsacce Consensus
201 250 LVVLGRPGSG CTTLLKSISS NTHGFDLGAD TKISYSGYSG DDIKKHFRGE TVVLGRPGAG CSTLLKTIAV NTYGFHIGKE SQITYDGLSP HDIERHYRGD VMVLGQPGSG CSTFLRSVTS DTVHYK.RVE GTTHYDGIDK ADMKKFFPGD ILVLGRPGAG CSSFLKVTAG EIDQFAGGVS GEVAYDGIPQ EEMMKRYKAD VLVLGNPTSA ...LFKGLFH GHKHLKYSPE GSIRFKDNEY KQFASKCPHQ . . V L G . P G . G C . . . L K . . . . . . . . . . . . . . . . . . Y.G .............
251 P d r 5 s a c c e V V Y N A E A D V H LPHLTVFETL Cdrlcanal VIYSAETDVH FPHLSVGDTL Bfrlschpo LLYSGENDVH FPSLTTAETL Snq2sacce VIYNGELDVH FPYLTVKQTL Pdrbsacce IIYNNEQDIH FPYLTVEQTI Consensus . . Y . . E . D V H F P . L T V . . T L
300 VTVARLKTPQ NRIKGVDRES YANHLAEVAM EFAARLRTPQ NRGEGIDRET YAKHMASVYM DFAAKCRTPN NRPCNLTRQE YVSRERHLIA DFAIACKTPA LRVNNVSKKE YIASRRDLYA DFALSCKFHI PKQERIE ....... MRDELL .FA .... T P . . R ........ Y .........
301 350 Pdr5sacce ATYGLSHTRN TKVGNDIVRG VSGGERKRVS IAEVSICGSK FQCWDNATRG Cdr icanal ATYGLSHTRN TNVGNDFVRG VSGGERKRVS IAEASLSGAN IQCWDNATRG
12c~
Bfrlschpo Snq2sacce Pdrbsacce Consensus
;~!':!~i~ii:;i:ili~
TAFGLTHTFN TIFGLRHTYN KEFGLSHVKK ...GL.HT.N
TKVGNDFVRG TKVGNDFVRG TYVGNDYVRG T.VGND.VRG
VSGGERKRVT VSGGERKRVS VSGGERKRIS VSGGERKRVS
ISEGFATRPT IACWDNSTRG IAEALAAKGS IYCWDNATRG IIETFIANGS VYLWDNSTKG I.E . . . . . . . . . C W D N . T R G
351 LDSATALEFI RALKTQADIS NTSATVAIYQ Cdrlcanal LDSATALEFI RALKTSAVIL DTTPLIAIYQ Bfrlschpo LDSSTAFEFV NVLRTCANEL KMTSFVTAYQ Snq2sacce LDASTALEYA KAIRIMTNLL KSTAFVTIYQ Pdrbsacce LDSATALEFL SITQKMAKAT RSVNFVKISQ ConsensusLDS.TALEF ....... A ........ V.IYQ
400 CSQDAYDLFN KVCVLDDGYQ CSQDAYDLFD KVVVLYEGYQ ASEKIYKLFD RICVLYAGRQ ASENIYETFD KVTVLYSGKQ ASDKIVSKFD KILMLGDSFQ .S...Y..FDK..VL..G.Q
iii~i~z:,.j!i:i!::i P d r 5 s a c c e .
.
.
.
...........
i;!iii,,i~,ii!i:~i::
?ii i?:.i~N~-:::!!
Pdr5sacce Cdrlcanal Bfrlschpo Snq2sacce Pdrbsacce Consensus
401 IYYGPADKAK IFFGKATKAK IYYGPADKAK IYFGLIHEAK VFYGTMEECL I..G .... AK
Pdr5sacce Cdrlcanal Bfrlschpo Snq2sacce Pdrbsacce Consensus
451 KDML...KKG IHIPQTPKEM NDYWVKSPNY PGYE DK...VPRTAQEF ETYWKNSPEY RKGF...EN..RVPRTPDEF EQMWRNSSVY KPGY___ . E N . . K V P R T A E E F E T Y W L N S P E F TPSVVSEENQ ALNINNETDL HTLWIQSPYY . . . . . . . . . . . . . P.T..E .... W . . S P . Y
Pdr5sacce Cdrlcanal Bfrlschpo Snq2sacce Pdrbsacce Consensus
501 ....... DQR L L N _ D D E A S R ___ .... DEY F V E C E R S N T R SSEAPEKDNF GSDISATTKH ............. VNTEKTK ............. ITSKTVQ ....................
Pdr5sacce Cdrlcanal Bfrlschpo Snq2sacce Pdrbsacce Consensus
551 YLLIRNMWRL RNNIGFTLFM ILGNCSMALI YGVARNFLRM KGDPSIPIFS VFGQLVMGLI YCLARSWERY INDPAYIGSM AFAFLFQSLI LCTQRGFQRI YGNKSYTVIN VCSAIIQSFI TCTVRAFERI IGDRNYLISQ FVSVVVQSLV .... R...R . . . . . . . . . . . . . . . . . . . LI
601 Pdr5sacce GSAMFFAILF Cdrlcanal GAAMFFAVLF Bfrlschpo GGVLFFSILF Snq2sacce GGVLYFALLY Pdrbsacce GSLTFFSILF ConsensusG...FF..LF
13(
KYFEDoMGYV EYFEK MGWK QYFLD.MGFD PYFAK.MGYL THFHDTLQIK .YF...MG..
450 C P S R Q T T A D F LTSV . . . . . . . T S P S E R T L N C P Q R Q T T A D F LTSL TNPAEREPL C H P R E T T P D F LTAI . . . . . . . S D P K A . R F P C P P R Q A T A E F LTAL . . . . . . . T D P N G F H L I KNPNDCIIEY LTSILNFKFK ETSNSIVGLD C . . R . . T . . F L T . . . . . . . . . T.P ...... 500 KELMKEV ............. AELTKEI ADLMAEMESY DKRWTETTPA A Q M K K D I A A Y KEK ....... KHWKA . . . . . . . . ....................
EAIKEAHIAK QSKRARPSSP ETYRESHVAK QSNNTRPASP ELYRQSAVAE KSKRVKDTSP EVYDESMAQE KSKYTRKKSY ECTR ..... K D V N P D D I S P I E . . . . . . . . . . S ...... S.
NAFSSLLEIF SLYEARPITE NAFSSLLEIM SLFEARPIVE CALQSLSEIA NMFSQRPIIA Y S L M G L A N I S ..FEHRPILQ FTFLSLADMP ASFQRQPVVR .... S L . . I . . . F . . R P I . .
550 YTVSYMMQVK YTVSFFMQVR YTVTFSQQLW YTVSYWEQVK FSIPLKTQLK YTV .... Q..
600 LGSMFFKIMK KGDTSTFYFR LSSVFYNL.. S Q T T G S F Y Y R IGSIFYDM.. K L N T V D V F S R TGSLFYNT.. P S S T S G A F S R IGSLFYNI.. P L T T I G S F S R .GS.FY . . . . . . . T ..... R 650 KHRTYSLYHP SADAFASVLS KHKKYALYRP SADALASIIS KHRASALYHPAADVISSLIV KHKGYSLYHP SAEAIGSTLA KHVQLHFYYNWVETLATNFF KH .... LY.P .A .... S...
Pdr5sacce Cdrlcanal Bfrlschpo Snq2sacce Pdrbsacce Consensus
651 EIPSKLIIAV CFNIIFYFLV DFRRNGGVFF ELPVKLAMSM SFNFVFYFMV NFRRNPGRFF DLPFRFINIS VFSIVLYFLT NLKRTAGGFW SFPFRMIGLT CFFIILFFLS GLHRTAGSFF DCCSKFILVVIFTIILYFLA HLQYNAARFF ..P...I .... F . I . . Y F L .... R . . G . F F
Pdr5sacce Cdrlcanal Bfrlschpo Snq2sacce Pdrbsacce Consensus
701 GSLTKTLSEA GAVSTSISGA AGIMPNVESA SSVCDTLSQA ALIAPTLSMA ....... S.A
Pdr5sacce Cdrlcanal Bfrlschpo Snq2sacce Pdrbsacce Consensus
751 FESLLINEFH GIKFPCAE.Y VPRGPAYANI FESLMVNEFH GREFQCAQ.Y VPSGPGYENI FESLMINEFK ARQFECSQ~ IPYGSGYDNY FESMLNAEFH GRHMDCANTL VPSGGDYDNL MEAILSNELF NLKLDCHESI IPRGEYYDNI FES..~ ...... C ..... P.G..Y.N.
Pdr5sacce Cdrlcanal Bfrlschpo Snq2sacce Pdrbsacce Consensus
801 GDDFIRGTYQ GTNYLAGAYQ GSTYLYISFN GDDYLKNQFQ GRDYLKSGLK G . . Y L .....
Pdr5sacce Cdrlcanal Bfrlschpo Snq2sacce Pdrbsacce Consensus
851 90O LVFPR ......... SIVKRM KKRGVLTEKN ANDPENVGER SDLSSDRKM. VLFLK ......... GSLKKH KRKTAASNKG DIEAGPVAGK LDYQDEAEA. L V F R R . . . . . . . . . G H A P D A V K A A V N E G G K P L D L E T G Q D . . . . TQGGDV. L I F K K . . . . . . . . . G S K R F I A H A D E E S P D N V N D I D A K E . . . . . . . . . QF. LRWNNYLKRY CPFLNSQKKN NKSAITNNDG VCTPKTPIAN FSTSSSSVPS L.F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MVPASMLLLA MTPATVLLLA SALGGIGVLA NSISGILMMS NLLAGILLLA ...... L . L A
YYHKDKWRGF YYNSHKWRNL YKTRQLWRNL YVYKHTWRNF YTYHHVWRNF Y ..... WRN.
700 FYLLINIVAV FSMSHLFRCV FYWLMCIWCT FVMSHLFRSI TYFLFLFIGA TCMSAFFRSL TIYLFLTMCS EAINGLFEMV IFLLFLSVYN FCMVSLFALT ...L . . . . . . . . M . . L F . . .
750 LSMYTGFAIP KKKILRWSKW IWYINPLAYL MVIYTGFVIP TPSMLGWSRW INYINPVGYV IAIYTGYAIP NIDVGWWFRW IAYLDPLQFG ISMYSTYMIQ LPSMHPWFKW ISYVLPIRYA IAMYASYVIY MKDMHPWFIW IAYLNPAMFA ...Y .... I . . . . . . . W . . W I . Y . . P .... 800 SSTESVCTVVGAVPGQDYVL SRSNQVCTAV GSVPGNEMVS PVANKICPVT SAEPGTDYVD SDDYKVCAFV GSKPGQSYVL SFSHKACAWQ GATLGNDYVR S ..... C... G . . P G . . Y V .
850 GIGMAYVVFF FFVYLFLCEY NEGAKQKGEI GITIGFAVFF LAIYIALTEF NKGAMQKGEI AIIIGYYAFL VFVNIVASET LNFNDLKGEY GILWCFLLGYVVLKVIFTEY KRPVKGGGDA GIIIGFLCFF LFCSLLAAEY ITPLFTRENL GI ...... F . . . . . . . . . E . . . . . . . . G..
901 950 P d r 5 s a c c e .......... L Q E S S E E E S D T Y G E I G . L S K S E A I F H W R N L C Y E V Q I K A E T C d r l c a n a l .......... V N N E K F T E K G S T G S V D . F P E N R E I F F W R D L T Y Q V K I K K E D B f r l s c h p o .......... V K E S P D N E E E L N K E Y E G I E K G H D I F S W R N L N Y D I Q I K G E H S n q 2 s a c c e .......... S S E S S G A N D E V F D D L E .... A K G V F I W K D V C F T I P Y E G G K P d r b s a c c e V S H Q Y D T D Y N I K H P D E T V N N H T K E S V A M E T Q K H V I S W K N I N Y T I .... GD Consensus .................................. F.W .... Y ........ 951 Pdr5sacce RRILNNVDGW VKPGTLTALM GASGAGKTTL Cdrlcanal RVILDHVDGW VKPGQITALM GASGAGKTTL Bfrlschpo RRLLNGVQGFVVPGKLTALM GESGAGKTTL Snq2sacce RMLLDNVSGY CIPGTMTALM GESGAGKTTL Pdrbsacce KKLINDASGY ISSG.LTALM GESGAGKTTL ConsensusR..L..V.G...PG..TALMG.SGAGKTTL
LDCLAERVTM LNCLSERVTT LNVLAQRVDT LNTLAQR.NV LNVLSQRTES LN.L..R...
i000 GVIT.GDILV GIITDGERLV GVVT.GDMLV GIIT.GDMLV GVVT.GELLI G..T.G..LV
131
iii:~:::j:i!ii!}):ii =
:..::..: :. : ......
.::..::
.:...... .:,::
Pdr5sacce Cdrleanal Bfrlschpo Snq2sacce Pdrbsacce Consensus
i001 NGIPRDK..S NGHALDS..S NGRGLDS..T NGRPIDA..S DGQPLTNIDA NG...D ....
Pdr5sacce Cdrlcanal Bfrlschpo Snq2sacce Pdrbsacce Consensus
1051 Ii00 KNRYVEEVIK I L E M E K Y A D A V V G V A G E G L N VEQRKRLTIG VELTAKPKLL KDDYVDYVID LLEMTDYADA LVGVAGEGLN VEQRKRLTIG VELVAKPKLL KYEYVESVIK LLEMESYAEA IIGTPGSGLN VEQRKRATIG VELAAKPALL KMDYVEKIIR VLGMEEYAEA LVGEVGCGLN VEQRKKLSIG VELVAKPDLL DRDYLGVVSN LLRLPS..EK LVA .... DLS PTQRKLLSIG VELVTKPSLL K..YV..VI..L.M..YA.A .VG..G.GLNVEQRK.L.IGVEL.AKP.LL
FPRSIGYCQQ FQRSIGYVQQ FQRRTGYVQQ FERRTGYVQQ FRRSIGFVQQ F.R..GYVQQ
QDLHLKTATV RESLRFSAYL QDVHLPTSTV REALQFSAYL QDVHIGESTV REALRFSAAL QDIHIAELTV RESLQFSARM QDVHLELLTV RESLEISCVL QD.H .... T V R E . L . F S A . L
1050 RQPAEVSIEE RQSNKISKKE RQPASVPLSE RRPQHLPDSE RG ...... DG R ........ E
.: :....
.
.
... :..........
: .:...:. :.... :..... .... . ....:.:. .... :: :: .: ...:.........
.:.:. ......... .
...< ......... ..
Pdr5sacce Cdrlcanal Bfrlschpo Snq2sacce i~i!ilil-.i!-ii.:il.iiiiii! Pdrbsacce Consensus
Ii01 VFLDEPTSGL LFLDEPTSGL LFLDEPTSGL LFLDEPTSGL LFLDEPTSGL LFLDEPTSGL
DSQTAWSICQ LMKKLANHGQ DSQTAWSICK LMRKLADHGQ DSQSAWSIVC FLRKLADAGQ DSQSSWAIIQ LLRKLSKAGQ DAEAALTIVQ FLKKLSMQGQ DSQ.AW.I ..... KL...GQ
AILCTIHQPS AILCTIHQPS AILCTIHQPS SILCTIHQPS AILCTIHQPS AILCTIHQPS
1150 AILMQEFDRL ALIMAEFDRL AVLFDQFDRL ATLFEEFDRL KSVISYFDNI A ..... FDRL
Pdr5sacce Cdrlcanal Bfrlschpo Snq2sacce Pdrbsacce Consensus
1151 LFMQRGGKTV LFLQKGGRTA LLLQKGGKTV LLLRKGGQTV YLLKRGGECV L.L..GG.TV
YFGDLGEGCK TMIDYFESHG YFGELGENCQ TMINYFEKYG YFGDIGEHSK TLLNYFESHG YFGDIGKNSA TILNYFERNG YFGSLPNAC .... DYFVAHD YFG..G .... T...YFE..G
AHKC..PADA ADPC..PKEA AVHC..PDDG ARKC..DSSE RRLTFDREMD A..C ......
1200 NPAEWMLEVV NPAEWMLQVV NPAEYILDVI NPAEYILEAI NPADFVIDVV NPAE..L.V.
Pdr5sacce Cdrlcanal Bfrlschpo Snq2sacce Pdrbsacce Consensus
1201 GAAPGSHANQ GAAPGSHAKQ GAGATATTNR GAGATASVKE GSGSTNIPMD GA ........
1250 D . . . . . . . . . . . . . . . . . . . . . . . YYEVWR NSEEYRAVQS D . . . . . . . . . . . . . . . . . . . . . . . YFEVWR NSSEYQAVRE D ....................... WHEVWNNSEERKAISA D . . . . . . . . . . . . . . . . . . . . . . . WHEKWL NSVEFEQTKE DAEKPTSSKI DEPVSYHKQS DSINWAELWQ SSPEKVRVAD D . . . . . . . . . . . . . . . . . . . . . . . . . E.W. NS.E ......
Pdr5sacce Cdrlcanal Bfrlschpo Snq2sacce Pdrbsacce Consensus
1251 ELDWMERELP EINRMEAELS ELDKINASFS KVQDLINDLS DLLLLEEEAR ..........
K K G . . S I T A A E D K H E F S Q S I IYQTKLVSIR KLP RDNDP EALLKYAAPL WKQYLLVSWR NSEDKKTLSK EDRSTYAMPL WFQVKMVMTR KQETKSEVG. DKPSKYATSY AYQFRYVLIR KSGVDFTTSV WSPPSYME .... QIKLITKR K . . . . . . . . . . . . . . . . . . . . . Q...V..R
:: ,::9::::-~ :~:::~:::::~':
i:!!:.:;::;::~ ::~-:-:.::i":-
:-i:~ii!i!'-!'{:?~:
...........
:: .::.
.. :.:.:. ........... .: :.:: :.::....: ......
..... ...:. :..:..:.:.. :
.. :--:: : :.::: :.. >..:.......::,.
.,:....,:
-:. ::.....
..
:.:..::........... : :
: : :
.... .:~........
.: ............ ...::::.
..: ::
:..
::.: ::..: ::
...:..:.::.
1301 Pdr5sacce YLWSKFILTI FNQLFIGFTF i!.~)i~ii~-i~!=:;:i:) Cdrlcanal Y I Y S K I F L V V S A A L F N G F S F Bfrlschpo ILMSKLALDI FAGLFIGFTF i ii!%!!ii!ili~:i Snq2sacce YIMSKMMLML VGGLYIGFTF Pdrbsacce YVFAKYALNA GAGLFIGFSF C o n s e n s u s Y . . S K . . L ..... LFIGF.F
iiii!!i!! !ii:! 132
1300 LFQQYWRSPD TIVQDWRSPG NFQSYWREPS TSTSFWRSLN QYICTKRDMT ..... WR...
1350 FKAGTSLQGL QNQMLAVFMF TVIFNPILQQ FKAKNNMQGL QNQMFSVFMF FIPFNTLVQQ YNQGLGVQNI QNKLFAVFMA TVLAVPLING FNVGKSYVGL QNAMFAAFIS IILSAPAMNQ WRTKHNINGL QDAIFLCFMM LCVSSPLINQ ........ G L Q N ..... FM ...... P...Q
Pdr5sacce Cdrlcanal Bfrlschpo Snq2sacce Pdrbsacce Consensus
1351 YLPSFVQQRD LYEARERPSR TFSWISFIFA MLPYFVKQRD VYEVREAPSR TFSWFAFIAG LQPKFIELRN VFEVREKPSN IYSWVAFVFS IQGRAIASRE LFEVRESQSN MFHWSLVLIT VQDKALQSKE VYIAREARSN TYHWTVLLIA ........ R . . . E . R E . . S .... W ......
1400 QIFVEVPWNI LAGTIAYFIY QITSEIPYQV AVGTIAFFCW AIIVEIPFNL VFGTLFFLCW QYLSELPYHL FFSTIFFVSS QTIVELPLAI SSSTLFFLCC Q . . . E . P . . . . . . T..F...
1401 Pdr5sacce YYPIGFYSNA SAAGQLHERG ALFWLFSCAF Cdrlcanal YYPLGLYNNA TPTDSVNPRG VLMWMLVTAF Bfrlschpo FYPIKFYKHI HHPGD...KT GYAWLLYMFF Snq2sacce YFPLRIF... FEASR...SA VYFLNYCIMF Pdrbsacce YFCCGFETSA RVAG ....... VFYLNYILF ConsensusY.P .......................... F
1450 YVYVGSMGLL VISFNQVAES YVYTATMGQL CMSFSELADN QMYFSTFGQA VASACPNAQT QLYYVGLGLM ILYMSPNLPS SMYYLSFGLW LLYSAPDLQT ..Y .... G . . . . . . . . . . . .
1451 Pdr5sacceAANLASLLFT MSLSFCGVMT TPSAMPRFWI CdrlcanalAANLATLLFT MCLNFCGVLA GPDVLPGFWI Bfrlschpo ASVVNSLLFT FVITFNGVLQ PNSNLVGFWH Snq2sacce ANVILGLCLS FMLSFCGVTQ PVSLMPGFWT PdrbsacceAAVFVAFLYS FTASFCGVMQ PYSLFPRFWT C o n s e n s u s A ..... LL . . . . . . F C G V .... S..P.FW.
FMYRVSPLTY FMYRCNPFTY WMHSLTPFTY FMWKASPYTY FMYRVSPYTY FM .... P . T Y
1500 FIQALLAVGV LVQAMLSTGL LIEGLLSDLV FVQNLVGIML FIETFVSLLL ..........
Pdr5sacce Cdrlcanal Bfrlschpo Snq2sacce Pdrbsacce Consensus
1501 ANVDVKCADY ANTFVKCAER HGLPVECKSH HKKPVVCKKK HDREVNCSTS .... V.C...
ELLEFTPPSG EYVSVKPPNG EMLTINPPSG ELNYFNPPNG EMVPSQPVMG E ..... P P . G
1550 MTCGQYMEPY LQLAK.TGYL TDENATDTCS ESCSTYLDPY IKFA..GGYF ETRND.GSCA QTCGEYMSAF LTNNTAAGNL LNPNATTSCS STCGEYMKPF L..EKATGYI ENPDATSDCA QTCGQFMKPF I..DEFGGKL HINNTYTVCA . T C G . Y M . P . . . . . . . . G . . . . . N .... C.
Pdr5sacce Cdrlcanal Bfrlschpo Snq2sacce Pdrbsacce Consensus
1551 FCQISTTNDY FCQMSSTNTF YCPYQTADQF YCIYEVGDNY YCMYTVGDDF .C ........
LANVNSFYSE LKSVNSLYSE LERFSMRYTH LTHISSKYSY LAQENMSYHH L ...... Y..
1600 RWRNYGIFIC YIAFNYIAGV FFYWLARVPK RWRNFGIFIA FIAINIILTV IFYWLARVPK RWRNLGIFVG YVFFNIFAVL LLFYVFRVMK LWRNFGIFWI YIFFNIIAMV CVYYLFHVRQ RWRNFGFEWVFVCFNIAAMF VGFYLTYIKK RWRN.GIF ..... FNI.A ...... L..V.K
Pdr5sacce Cdr icanal Bfrlschpo Snq2sacce Pdrbsacce Consensus
1601 K...N.GKLS G...NREKKN ...LRSTWLG SSFLSPVSIL IWPSVIDGIK ..........
1630 KK . . . . . . . . . . . . . . . . . . KK . . . . . . . . . . . . . . . . . . KKITGTG ............. N K I K N I R K K K Q ......... KCIPSMRRSK TSHNPNEQSV KK . . . . . . . . . . . . . . . . . .
Residues listed in the consensus sequence are present in at least 75 % of the aligned sequences. Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and are therefore not included in the alignment: Ydrlsacce (Bfflschpo). Residues indicated in boldface type are also conserved in at least one other family of the ABC transporter superfamily.
133
Yeast multidrug resistance family
Database accession numbers SWISSPROT
Bfr 1schpo Cdr 1canal Pdr5sacce Pdrbsacce Snq2sacce Ydrlsacce
P41820 P43071 P33302 P40550 P32568
PIR
$34702; A49730 $30918
References 1 Prasad, R. et al. (1995) Curr. Genet. 27, 320-329. 2 Nagao, K. et al. (1995)J. Bacteriol. 177, 1536-1543. 3 Higgins, C.E (1992) Annu. Rev. Cell Biol. 8, 67-113.
134
EMBL/GENBANK
X82891; X77589; X74113; Z47047; X66732; D26548
G609264 G454277 G395259 G763333 G295839
Cystic fibrosis transmembrane conductance regulator family ] .................... ..:. /.,-.:.:.:::~.:;~:.-
::-[.i::::i:::[:i~:::::.:.;-ii i::~:i:~:?:i!i-~:~:.~.[~i3.: ~:: ....
Summary
Transporters of the cystic fibrosis transmembrane conductance regulator ii%:!'!!J family, the example of which is the human cystic fibrosis transmembrane conductance regulator ~'a (Cftrhomsa) act as cAMP-dependent chloride channels ~. In humans, mutations of the CFTR gene leading to defects in channel function cause cystic fibrosis ~. This disorder of exocrine gland function is the most common genetic disease in the Caucasian population, affecting 1 in 2000-2500 live births 3,4. Members of this family have only been found in vertebrates. Statistical analysis of multiple amino acid sequence comparisons places the cystic fibrosis transmembrane conductance regulator family in the multidrug resistance subdivision of the ATP binding cassette (ABC)superfamily s. The CFTR proteins are the only ABC transporters which function principally as channels, rather than as ATP-dependent active transporters. Transporters of this family consist of a single polypeptide chain made up of five domains. Two homologous two-domain polypeptides, each made up of a transmembrane domain followed by an ATP binding domain, are separated by a central "R domain" which contains many charged residues and phosphorylation sites 6. This domain is unique to this family. The most common cystic fibrosis-causing mutation in the Caucasian population - AF508 7 _ occurs in the N-terminal ATP binding domain. Each transmembrane domain is predicted to contain six membrane-spanning helices by the hydropathy of its amino acid sequences, so the functional transporters contain 12 such helices. Proteins are glycosylated. Many residues, including several long sequence motifs, are well conserved ~: within the cystic fibrosis transmembrane conductance regulator family, including motifs unique to the family, signature motifs of the ABC superfamily, and motifs necessary for function by the criterion of siteii~!':i~i!i~,i::i~:~,directed mutagenesis. :::::::::::::::::::::;:i;i:.i-iii-:
:;~:i:.::.:f.::::::~:i:.:.~;':
.:. ........... .. ::::::::::::::::::::::::::::::::: i [i[::!::[;!:.:::i:ili:i-:.!i9 :.::i~~:.- --:ii3:i!: :. ::::..,:::.-:~:.,:::::.... i;ii.~ii:i i!.::::7!ii-~:!.
:~ ~:.::::::-::::.;:..::::::.:
~::~.::.-:.::-~:.:..
,ili !:i!iiii! : ::i~:::.;:..:-::i!;-::: .:: i:.:~
.......,:::.~-s.-:::v :::3
........
::::::::::::::::::::::::::
....... :::::::::::::::::::::::::::::::: ::!::!i.::i.:.::!:!~: :.!!:-:::.:::~:!i. .~,. ,: ~:.~.:~::,:..::..::.:
:i:::i:::::::::::::::::::::::::::::::::::::::i
..........
:,:::,::,:.,.,:~:~,:.~:.,-:.
~::::~.~!~i::)iiiii~i':!::::::-:i .i
: [.. ].i:[:[ii:].:2 ............. :.............
:.~:~:,.::: .:.,:~:~:::~::.~:
.... ::.:.....................
::::::::::::::::::::::::::::::::::::::::::
Nomenclature, biological sources and substrates OR GANISM CODE DESCmPTtON [SYNONYMS]
:3~::3~'3~,~':~,:~.~:~.~:~.-.::.................................
.
:.~;.G.~':~;G:~:.;:~:.2 ::::::::::::::::::::::::::::::::::::::
................... ................................. ::~...;::..::~:~:-.;: :::::..~.: ~:~:~::-:.:::~::~:: .......................................
~. ................................ ...................
................... :.-...:.-.::-:~.-.-.::..:::.-:.::
.......::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::::::::::::::::: ................................
~,,~,-~::~::~::~::
~-~,:.,~.~,-~-~
Cftrbosta
[COMMON NAMES] Bos taurus
Cystic fibrosis transmembrane conductance regulator [CFTR, [cow] cAMP-dependent C1- channel] Cftrhomsa Cystic fibrosis transrnembrane Homo sapiens conductance regulator [CFTR, [human] cAMP-dependent C1- channel] Cftrmusmu Cystic fibrosis transmembrane Mus musculus conductance regulator [CFTR, [mouse] cAMP-dependent C1- channel] Cftrorycu Cystic fibrosis transmembrane Oryctolagus cuniculus conductance regulator [CFTR, [rabbit] cAMP-dependent C1- channel] Cftrsquac Cystic fibrosis transmernbrane Squalus acanthias conductance regulator [CFTR, [dogfish] cAMP-dependent C1- channel] Cftrxenla Cystic fibrosis transmernbrane Xenopus laevis conductance regulator [CFTR, [toadl cAMP-dependent C1- channel]
S UBSTRATE(S)
C1C1C1C1C1C1-
13~
P h y l o g e n e t i c tree Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and are therefore not included in the phylogenetic tree: Cftrbosta (Cftrhomsa}; Cftrorycu (Cftrmusmu). Cftrhomsa
In
. .......................
Cftrxenia
I
~':i!i'~i :::,:ii',ibil ~ii!ii:i~i:i!ii:i::~
...:.......................,~ .....
Cftrmusmu
Cftrsquac
Proposed o r i e n t a t i o n of CFTR 2 in the m e m b r a n e
,
:.-::.:..:::..-~:.::: :.:: i
The model is based on predictions of membrane-spanning regions and a-helical !:i:i!i!i~i!i:i!i! content. The N-terminus of the protein is illustrated on the inside and is folded i:~i':ii::iiii!i:i~il!~::ili 12 times through the membrane. The predicted membrane-spanning helices %!!ii!~i:::.!i::21!!ii~::i!!!::I
: :::::::::::::::::::::::::::-, ........... ........... :': :.::7::
'
OUTSIDE .........
~9!i!!!}i!!!:::?i :::::::.:.:::::::-:::-:- :..-
~:-;:;::::-::;.;.:~::::.:::
--I
...: ................. ::7. :.. 7::.:,i;:;:;:;.;:::[::;:
I
:ii:i :%= :.?
.............
::::: :-::: ::_
: -:::-:-:-:...
.
i
-::.:::.:::.:. ............
!:
i
i
i! ,r
.
iiJ i
' :
:
.... .
,ii:,=i:iFI ,~1!Is1~,mlimiml:m,,~m~~!T
!::~,ij~!#~::~,!!::,:ii ,....:.......................... :
ATP BINDING SITE
::::::::::::::::::::::::::::::::::::::
NH
2
ATP BINDING SITE
N!i~i!~?!ii;:i INSIDE
13~
COOH
are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. More than half of the residues are conserved in at least 75% of the members of the cystic fibrosis transmembrane conductance regulator family and, therefore, are not mapped onto the model.
Physical and genetic characteristics
f::i
;il .!:.i~i:~.r
............ ..:_.:.:.:
.. .. ..... . . .:.:.:
Cftrbosta Cftrhomsa Cftrmusmu Cftrorycu Cftrsquac Cftrxenla
: ::::::::::::::::::::::: :.":i: i:::.2:
::;::::::::::::::::: ::::::::i i
,... :.:...: ............
.......... ...... . :.:: ..
........ .................. ......
:,::: ======================== ::::~. ::
.........
......:. : .:...:. .... . .....::...,.: ........ .:..._..,..,
:. .....
..........:..:..
...... .
.. : :..... :...- .: ========================= ........ ........ ===================== ==============================
.................... ::::::::::::::::::::::::::::: .;
:.:::.i:::::::::::::::::::::::::!::~..i~
167 758 168 173 167 852 164 629 169 384 168 895
1
MQRSPLEKASVVSKLFFSWT MQKSPLEKAS FISKLFFSWT MQKTPLEKAS IFSQIFFSWT MQRSPIEKAN AFSKLFFRWP MQ.SPLEKAS ..SKLFFSWT
EXPRESSION SITES
CHROMOSOMAL LOCUS
lung lung heart ventricle rectal gland
7q31-q32
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
50
RPILRKGYRQ TPILRKGYRH KPILWKGYRQ RPILKKGYRQ .PIL.KGYRQ
RLELSDIYQI HLELSDIYQA RLELSDIYQI KLELSDIYQI .LELSDIYQI
PSVDSADNLS PSADSADHLS HPGDSADNLS PSSDSADELS PS.DSAD.LS
51 EKLEREWDRE EKLEREWDRE ERLEREWDRE EMLEREWDRE E.LEREWDRE
LA.SKKNPKL INALRRCFFW QA.SKKNPQL IHALRRCFFW VATSKKNPKL INALKRCFFW LATSKKNPKLVNALRRCFFW .A.SKKNPKL INALRRCFFW
RFMFYGIFLY RFLFYGILLY KFLFYGILLY RFLFYGILLY RFLFYGILLY
I00 LGEVTKAVQP LGEVTKAVQP LGEVTKAVQP FVEFTKAVQP LGEVTKSVQP
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
i01 LLLGRIIASY VLLGRIIASY LLLGRIIASY LCLGRIIASY LLLGRIIASY
DPDNKEERSI DPENKVERSI DRDNEHERSI NAKNTYEREI D..N..ERSI
AIYLGIGLCL AIYLGIGLCL AYYLAIGLCL AYYLALGLCL A.YL.IGLCL
LFIVRTLLLH LFIVRTLLLH LFVVRMLLLH LFVVRTLFLH LF.VRTLLLH
150 PAIFGLHHIG PAIFGLHRIG PAIFGLHHIG PAVFGLQHLG PAIFGLHHIG
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
151 MQMRIAMFSL MQMRTAMFSL MQMRIAMFSL MQMRIALFSL MQMRIAMFSL
IYKKTLKLSS IYKKTLKLSS IYKKTLKLSS IYKKILKMSS IYKKTLKLDD
RVLDKISIGQ RVLDKISIGQ KVLDKISTGQ RVLDKIDTGQ RVLDKISTGQ
LVSLLSNNLN LVSLLSNNLN LVSLLSNNLN LVSLLSNNLN LVSLLSNNLN
200 KFDEGLALAH KFDEGLALAH KFDEGLALAH KFDEGVAVAH KFDEGLALAH
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
201 FVWIAPLQVA LLMGLIWELL QASAFCGLGF LIVLALFQAG FIWIAPLQVT LLMGLLWDLL QFSAFCGLGL LIILVIFQAI FVWIAPLQVL LLMGLLWDLL QASAFCGLGF LIILSLFQAR FVWIAPVQVVLLMGLIWNEL TEFVFCGLGF LIMLALFQAW FVWIAPLQV LLMGL.W.LLQ.SAFCGLGFLI.L.LFQA.
250 LGRMMMKYRD LGKMMVKYRD LGRMMMKYKD LGKKMMQYRD LG.MMMKYRD
.............. :._...:.
.:~ ::::::::::::::::::::::::::::::::
1481 1480 1476 1450 1492 1485
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
.. :......: ..........
::::::::::::::::::: :, .:
MOL. WT
Multiple amino acid sequence alignments
....:
....... ::7 ::::~:-:::::-: ~:"::-::-
AMINO ACIDS
137
Cystic fibrosis transmembrane conductance regulator filmily
138
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
251 QRAGKISERL QRAAKINERL KRAGKINERL KRAGKINERL .RAGKINERL
VITSEMIENI VITSEIIDNI VITSQIIENI AITSEIIDNI VITSEII.NI
QSVKAYCWEEAMEKMIENLR YSVKAYCWESAMEKMIENLR QSVKAYCWENAMEKIIETIR QSVKVYCWEDAMEKIIDDIR QSVKAYCWE. AMEK.IE..R
300 QTELKLTRKA EVELKMTRKA ETELKLTRKA QVELKLTRKV ..ELKLTRKA
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
301 AYVRYFNSSA AYMRFFTSSA AYVRYFNSSA AYCRYFSSSA AY.RYF.SSA
FFFSGFFVVF FFFSGFFVVF FFFSGFFVVF FFFSGFFVVF FFFSGFFVVF
LSVLPYALIK LSVLPYTVIN LSIVPHLLLD LSVVPYAFIH LSV.PY..I~
GIILRKIFTT GIVLRKIFTT GISLRKIFTT TIKLRRIFTT GI.LRKIFTT
350 ISFCIVLRMA ISFCIVLRMS ISFSIVLRMA ISYNIVLRMT ISF.IVLRM.
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consenuss
351 VTRQFPWAVQ VTRQFPTAVQ VTRQFPWAVQ VTRQFPSAIQ VTRQFP.AVQ
TWYDSLGAIN IWYDSFGMIR TWYDSLGVIN TWYDSLGAIR TWYDSLG.I.
KIQDFLQKQE KIQDFLQKQE KIQEFLQKEE KIQDFLHKDE KIQDFLQK.E
YKTLEYNLTT YKVLEYNLMT YKSLEYNLTT HKTVEYNLTT YK.LEYNLTT
400 TEVVMENVTA TGIIMENVTA TEVAMENVSA KEVEMVNVTA TEV.MENVTA
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
401 FWEEGFGELF EKAKQNNNNR KTSNGDDSLF FWEEGFGELL QKAQQSNGDR KHSSDENNVS SWDEGIGEFF EKAKLEVNGG NISNEDPSAF SWDEGIGELF EKVKQNDSER KMANGDDGLF .W.EG.GELFEKAKQ .... RK.SN.D...F
450 FSNFSLLGTP VLKDINFKIE FSHLCLVGNP VLKNINLNIE FSNFSLHVAP VLRNINFKIE FSNFSLHVTP VLKNISFKLE FSNFSL...PVLKNINFKIE
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
451 RGQLLAVAGS KGEMLAITGS KGQLLAIAGS KGELLAIAGS KG.LLAIAGS
TGAGKTSLLM TGLGKTSLLM TGAGKTSLLM TGSGKSSLLM TG.GKTSLLM
MIMGELEPSE LILGELEASE MIMGELEPSA MIMGELEPSD MIMGELEPS.
GKIKHSGRIS GIIKHSGRVS GKIKHSGRIS GKIKHSGRIS GKIKHSGRIS
500 FCSQFSWIMP FCSQFSWIMP FSPQVSWIMP YSPQVPWIMP F..Q.SWIMP
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
501 GTIKENIIFG GTIKENIIFG GTIKENIVFG GTIKDNIIFG GTIKENIIFG
VSYDEYRYRS VIKACQLEED VSYDEYRYKSVVKACQLQQD VSYDQYRYLS VIKACQLEED LSYDEYRYTSVVNACQLEED VSYDEYRY S V KACQLEED
ISKFAEKDNI ITKFAEQDNT ISKFPEKDNT ITVFPNKDKT I KF EKDNT
550 VLGEGGITLS VLGEGGVTLS VLGEGGITLS VLGDGGITLS VLGEGGITLS
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
551 GGQRARISLA RAVYKDADLY GGQRARISLA RAVYKDADLY GGQRARISLA RAVYKDADLY GGQRARISLA RALYKDADLY GGQRARISLARAVYKDADLY
LLDSPFGYLD LLDSPFGYLD LLDSPFSYLD LLDSPFSHLD LLDSPF.YLD
VLTEKEIFES VFTEEQVFES LFTEKEIFES VTTEKDIFES V.TE..IFES
600 CVCKLMANKT CVCKLMANKT CVCKLMANKT CLCKLMVNKT CVCKLMANKT
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
601 RILVTSKMEH RILVTSKMEH RILVTSKVEQ RILVTSKLEH RILVTSK.EH
HEGSSYFYGT HQGTSYFYGT HEGSCYFYGT HEGHCYFYGT HEG..YFYGT
FSELQNLQPD FSELQSLRPS FSELEDQRPE FSELQGEKPD FSELQ..P..
650 FSSKLMGCDS FSSKLMGYDT FSSHLIG... FSSQLLGSVH FSS.L.G...
LKKADKILIL LRKADKILIL LKKADKVLIL LKKADKILLL LKKDAKILIL
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
651 FDQFSAERRN SILTETLHRF FDQFTEERRS SILTETLRRF FDHFNAERRN SIITETLRRC FDSFSAERRN SILTETFRRC FD.F.AERRNSILTETLRR.
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
701 KRKNS.ILNP KRKNS.ILNS KRKSS.IINP KRKSSLIVNP KRK.S.I.NP
INSIRKFSIV QKTPLQMNGI FSSVRKISIV QKTPL...CI RKSSRKFSLM QKSQPQMSGI ITSNKKFSLV QTAMSYPQTN ..S.RKFS.VQK ....... I
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
751 EQGEAILPRI EQGEAALPRS EQGEASLPRS ELGEPTKPRS EQGEA.LPRS
SVISTGPTLQ NMIATGPTFP NFLNTGPTFQ NIFKSELPFQ N...TGPTFQ
801 Cftrhomsa RKVSLAPQAN Cftrmusmu RKISLVPQIS Cftrxenla RKMSVNSYSN Cftrsquac RKMSMLSQTN ConsensusRK.S...Q.N
SL...EGDAP VSWTETKKQS SV...DDSSA PWS..KPKQS SI...DSDPS AVRNEVKNKS SVSSGDGAGL GSYSETRKAS S .... D ........ E..K.S
700 FKQ.TGEFGE FRQ.TGEVGE FKQ.VADFTE FKQPPPEFNE FKQ...EF.E
EED..SDEPL DGE..SDDLQ EEEDMPAEQG GMEDATSEPG ..E .... E
750 ERRLSLVPDS EKRLSLVPDS ERKLSLVPES ERHFSLIPEN ER LSLVP.S
ARRRQSVLNL MTH.SVNQGQ GRRRQSVLDL MTF.TPNSGS GRRRQSVLNL MTRTSISQGS AHRRQSVLALMTHSSTS..P .RRRQSVL.L MT..S...G.
800 NIHRKTTAST SNLQRTRTSI NAFATRNASV NKIHARRSAV N ....... S.
L.T.ELDIYS RRLSQETGLE L.N.EVDVYS RRLSQDSTLN S.SFDLDIYN RRLSQDSILE FASSEIDIYS RRLSEDGSFE .... E DIYSRRLSQD..LE
850 ISEEINEEDL KECFFDDMES ITEEINEEDL KECFLDDVIK VSEEINEEDL KECFLDDTDS ISEEINEEDL KECFADEEEI ISEEINEEDLKECF.DD...
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
851 IPAVTTWNTY IPPVTTWNTY QSPTTTWNTY QNVTTTWSTY .... TTWNTY
LRYITVHKSL LRYFTLHKGL LRFLTAHKNF LRYVTTNRNL LRY.T.HK.L
IFVLIWCLVI LLVLIWCVLV IFILVFCLVI VFVLILCLVI .FVLI.CLVI
900 FLAEVAASLV VLWLLGNTP. FLVEVAASLF VLWLLKNNP. FFVEVAASSA WLWIIKRNAP FLAEVAASLA GLWIISGLAI F L . E V A A S L . . L W .......
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
901 ...LQDKGNS ...VNSGNNG AINMTSNENV NTGSQTNDTS ........ N
THSRNNS.YA TKISNSS.YV SEVSD.T.LS TDLSHLSVFS T..S..S...
VIITSTSSYY VIITSTSFYY VIVTHTSFYY KFITNGSHYY VIIT.TS.YY
VFYIYVGVAD IFYIYVGVAD VFYIYVGVAD IFYIYVGLAD .FYIYVGVAD
950 TLLAMGFFRG TLLALSLFRG SLLALGIFRG SFLALGVIRG .LLALG.FRG
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
951 LPLVHTLITV LPLVHTLITA LPLVHSLISV LPLVHTLVTV LPLVHTLITV
SKILHHKMLH SKILHRKMLH SKVLHKKMLH SKDLHKQMLH SK.LH.KMLH
SVLQAPMSTL SILHAPMSTI AILHAPMSTF SVLQGPMTAF S.L.APMST.
NTLKAGGILN SKLKAGGILN NTMRAGRILN NKMKAGRILN N...AG.ILN
i000 RFSKDIAILD RFSKDIAILD RFSKDTAILD RFIKDTAIID RFSKD.AILD
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
i001 DLLPLTIFDF DFLPLTIFDF DILPLSIFDL DMLPLTVFDF D.LPLTIFDF
IQLLLIVIGA IQLVFIVIGA TQLVLIVIGA VQLILIVVGA .QL.LIVIGA
IAVVAVLQPY IIVVSALQPY ITVVSLLEPY ICVVSVLQPY I.VVS.LQPY
IFVATVPVIV IFLATVPGLV IFLATVPVIV TLLAAIPVAV IFLATVPV.V
1050 AFIMLRAYFL VFILLRAYFL AFILLRSYFL IFIMLRAYFL .FI.LRAYFL
13~
~= =-=/==i i 2:.::"=i.:'
i
)-: :j
!
::::.-.,s::..
=.~.i =:i=;:! =:.=..;~.i= -: = --~ .:>/2..:-L ...... .
.
=: !<:.
.
-i:-= ~:=~:
....... .=-.
:.~=.,::r,. - .=. : !;ii.-=.i.:.-.
,:i:=--:=.: 9 " .-
_
SEGRSPIFTH SEGRSPIFTH SKARSPIFAH SEARSPIFSH SE.RSPIF.H
LVTSLKGLWT LVTSLKGLWT LITSLKGLWT LITSLRGLWT L.TSLKGLWT
LRAFGRQPYF LRAFRRQTYF LRAFGRQPYF VRAFGRQSYF LRAFGRQ.YF
Ii00 ETLFHKALNL ETLFHKALNL ETLFHKALNL ETLFHKALNL ETLFHKALNL
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
ii01 HTANWFLYLS HTANWFMYLA HTANWFLYLS HTANWFLYLS HTANWFLYLS
TLRWFQMRIE TLRWFQMRID TLRWFQMTIE TLRWFQMRID TLRWFQMRI.
MIFVIFFIAV TFISILTTGE MIFVLFFIVVTFISILTTGE MIFVIFFIAV SFISIATSGA IVFVLFFIAV TFIAIATHDV MIFVLFFIAV TFISI.T.G.
1150 GEGRVGIILT GEGTAGIILT GEEKVGIVLT GEGQVGIILT GEG.VGIILT
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
1151 LAMNIMSTLQ LAMNIMSTLQ LAMNIMNTLQ LAMNITSTLQ LAMNIMSTLQ
WAVNSSIDVD WAVNSSIDTD WAVNASIDVD WAVNSSIDVD WAVNSSIDVD
SLMRSVSRVF SLMRSVSRVF SLMRSVSRIF GLMRSVSRVF SLMRSVSRVF
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
1201 NGQLSKVMII REGSSDVLVI .EQLSEVLIY .NNPSDVLVI .... S.VL.I
ENSHVKKDDI KNEHVKKSDI ENDYVKKTQV ENKHLTKE.. EN.HVKK...
WPSGGQMTVK DLTAKYTEGG WPSGGEMVVK DLTVKYMDDG WPSGGQMTVK NLSANYIDGG WPSGGQMMVNNLTAKYTSDG WPSGGQM.VK .LTAKY...G
1251 Cftrhomsa ISPGQRVGLL Cftrmusmu ISPGQRVGLL Cftrxenla LSPGQRVGLL CftrsquacVNAGQRVGLL Consensus .SPGQRVGLL
GRTGSGKSTL GRTGSGKSTL GRTGSGKSTL GRTGAGKSTL GRTGSGKSTL
LSAFLRLLNT LSAFLRMLNI LSAFLRLLST LSALLRLLST LSAFLRLL.T
1300 EGEIQIDGVS WDSITLQQWR KGDIEIDGVSWNSVTLQEWR QGDIQIDGVS WQTIPLQKWR EGEIQIDGISWNSVSLQKWR .G.IQIDGVS W.S..LQ.WR
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
1301 KAFGVIPQKV KAFGVITQKV KAFGVIPQKV KAFGVIPQKV KAFGVIPQKV
FIFSGTFRKN FIFSGTFRQN FIFSGSIRKN FVFSGTFRKN FIFSGTFRKN
LDPYEQWSDQ LDPNGKWKDE LDPYGKWSDE LDPYEQWSDE LDPY..WSDE
EIWKVADEVG EIWKVADEVG ELLKVTEEVG EIWKVTEEVG EIWKV..EVG
1350 LRSVIEQFPG LKSVIEQFPG LKLIIDQFPG LKSMIEQFPD LKS.IEQFPG
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
1351 KLDFVLVDGG QLNFTLVDGG QLDFVLLDGG KLNFVLVDGG .L.FVLVDGG
CVLSHGHKQL YVLSHGHKQL CVLSHGHKQL YILSNGHKQL .VLSHGHKQL
MCLARSVLSK MCLARSVLSK VCLARSVLSK MCLARSILSK MCLARSVLSK
AKILLLDEPS AKIILLDEPS AKILLLDEPS AKILLLDEPT AKILLLDEPS
1400 AHLDPVTYQI AHLDPITYQV AHLDPITFQI AHLDPVTFQI AHLDP.T.QI
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
1401 IRRTLKQAFA IRRVLKQAFA IRKTLKHAFA IRKTLKHTFS IR.TLK.AFA
DCTVILCEHR GCTVILCEHR DCTVILSEHR NCTVILSEHR .CTVIL.HER
IEAMLECQQF IEAMLDCQRF LEAMLECQRF VEALLECQQF .EAMLECQ.F
LVIEENKVRQ LVIEESNVWQ LVIEDNTVRQ LVIEGCSVKQ LVIE...V.Q
1450 YDSIQKLLNE YDSLQALLSE YDSIQKLVNE FDALQKLLTE YDS.QKLL.E
.
-f(-i;==:. ===!~--:~;[-.:.
9
=i:(:(.:::=!;. ..
. . . .
/;.::: =:.::
-.:.::2:/!iX!/, "
...... ..=. .... ..====..
!~/i:
)
=-=/:.=.:
::
YI--!:.:;. " ..
:;% ..:
.
.
:,:~
.
=
=
::=i.: i.-":. : . 7":-
1051 QTSQQLKQLE HTAQQLKQLE HTSQQLKQLE RTSQQLKQLE .TSQQLKQLE
f
=-:/:,,:.z.=.
h.=.~
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac Consensus
'::i:-
"
=y.=+: =::
=:..=.. :.
14(
.
1200 KFIDMPTEGK P.TKSTKPYK KFIDIQTEES MYTQIIKELP RFIDLPVEEL INENKNKE.. KYIDIPPEGS ETKNRHNA.. KFID.P.E ........ K... 1250 NAILENISFS NAVLENISFS NTVLENISFS RAVLQDLSFS NAVLENISFS
Cftrhomsa Cftrmusmu Cftrxenla Cftrsquac !i:i!~:.;~;!!i:i~:!i Consensus
1451 RSLFRQAISP KSIFQQAISS KSFFKQAISH ASLFKQVFGH .S.F.QAIS.
SDRVKLFP.. SEKMRFFQ.. SDRLKLFPLH LDRAKLFTAH SDR.KLF...
HRNSSKCKSK GRHSSKHKPR RRNSSKRKSR RRNSSKRKTR RNSSSK.K.R
PQIAALKEET TQITALKEET PQISALQEET PKISALQEEA PQI.AL.EET
1499 EEEVQDTRL EEEVQETRL EEEVQDTRL EEDLQETRL EEEVQ.TRL
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and are therefore not included in the alignment: Cftrbosta (Cftrhomsa); Cftrorycu (Cftrmusmu). Residues listed in the consensus sequence are present in at least 75 % of the aligned transporter ii~,?!,;:i'4:!i!~,,:~i~i sequences. Residues indicated in boldface type are also conserved in at least iiii:ii~ilI/i:::~iiiii::.:ii:::i;:!i one other family of the ABC transporter superfamily. ~.~:~.~:~.:,~. -.~.::~.~:
~:::::~ ~~i~:,.::::::::::::::::::: :,~::
Database accession numbers SWISSPR OT Cftrbosta P35071 Cftrhomsa P13569 Cftrmusmu P26361 Cftrorycu Q00554 Cftrsquac P26362 Cftrxenla P26363
Iiiii!i!i!?il !i!!i~,i:i!~i~i~i?i
PIR A39323 A30300; DVHUCF A39901; A40303 E39323 A39322 $23756
EMBL/GENBANK M76128; G163742 M28668; G180332 L26098; G915270 U40227; G1100985 M83785; G213870 X65256; G64623
References
1 Mcintosh, I. and Cutting, G.R. (1992)FASEB J. 6, 2775-2782. iii!~i~i~i!~!il;i!i 2 Riordan, J.R. et al. (1989) Science 245, 1066-1073. 3 Boat, T.F. et al. (1989) In The Metabolic Basis of Inherited Disease (Scriver, i!i~,iiili:~,~iii::::,ili!ii:~::~ii C.L. et al., eds). McGraw-Hill, New York, pp. 2649-2680. 4 Quinton, P.M. (1990)FASEB J. 4, 2709-2717. s Higgins, C.F. (1992) Annu. Rev. Cell Biol. 8, 67-113. 6 Cheng, S.H. et al. (1991)Cell 66, 1027-1036. 7 Cystic Fibrosis Genetic Analysis Consortium (1990) Am. J. Hum. Genet. 47, ii!i!!!,2 354-359.
141
m
P-Glycoprotein transporter family Summary Transporters of the P-glycoprotein family, examples of which are the human multidrug resistance (MDR)protein MDR1 1 (Mdrlhomsa) and bacterial haemolysin 2 and cyclolysin a secretion proteins (e.g. Hlybescco, Cyabborpe), act as ATP-dependent efflux pumps. Many of these transporters mediate drug resistance, often to many structurally dissimilar drugs. The overexpression of MDR proteins in cancer cells is a common cause of treatment failure 4. Other family members mediate the secretion of many different molecules including peptides and glucans. Some family members, such as human Pglycoprotein, are also associated with chloride channel activity S'6; the sulfonylurea receptor, SUR 7, in some mammals acts as a mediator of insulin release. Members of this family are widely distributed throughout many taxa, including bacteria, plants and mammals. Statistical analysis of multiple amino acid sequence comparisons places the P-glycoprotein transporter family in the multidrug resistance subdivision of the ATP binding cassette (ABC)superfamily s. Proteins in this superfamily use the energy of ATP hydrolysis to pump substrates across cell membranes. Many transporters of this family (e.g. MDR1 1) consist of a single polypeptide chain made up of four domains. The N- and C-terminal halves of the protein are homologous, and each half is made up of a transmembrane domain followed by an ATP binding domain. Others (e.g. the antigen peptide transporter TAP 9) consist of a single transmembrane domain followed by an ATP binding domain. In these cases the transporter functions as a dimer: in TAP, this is a heterodimer of the homologous proteins TAP1 and TAP2 (also known as RING4 and RING11 ). Each transmembrane domain is predicted to contain six membrane-spanning helices by the hydropathy of the amino acid sequences, so the functional transporters contain 12 such helices. Many members of this family are glycosylated. Many residues, including several long sequence motifs, are well conserved within the P-glycoprotein transporter family, including motifs unique to the family, signature motifs of the ABC superfamily, and motifs necessary for function by the criterion of site-directed mutagenesis.
Nomenclature, biological sources and substrates CODE
Aprdpseae Atmlsacce
Chvaagrtu Comastrpn
DESCRIPTION [SYNONYMS]
OR GANISM [COMMON NAMES] Alkaline protease secretion Pseudomonas ATP binding protein aemgmosa
[APRD] Mitochondrial transporter ATM1 precursor [ATM1, MDY, YMR301C, YM9952.03C] fl-(1~21Glucan export ATP binding protein [CHVA, attachment protein] Transport ATP binding protein [COMA]
Alkaline protease
[gram-negative bacterium] Saccharomyces cerevisiae[yeast]
Unknown
Agrobacterium tumefaciens
fl-(1-~2)Glucans
[gram-positive bacterium] Streptococcus pneumordae
[gram-positive bacterium]
142
SUBSTRATE(S) [RESISTANCE]a
Competence factor ?
~ii ~;i:,:ii~i!:iii~.i:iii :;~:::~-:.~r
CODE
DESCRIPTION [SYNONYMS]
Cvabescco
Colicin V secretion ATP binding protein [CVAB] Cyclolysin secretion ATP binding protein [CYAB] Transport ATP binding protein [CYDC, MDRA,
. -::.-::v.::-:.--.:.:,.: ........
Cyabborpe ;.;.;. ;;..:.7 ::ii
Cydcescco :7-r- :".- -:: ::i:~i::iii;)~i:.~:::::.::::
Cyddescco Cyddhaein iTi:ii~:',i
Hemsentfa
~::.,,.:~ ~:,,:,-~.:..;:
Hetaanasp ....,::::,::;--:::::-7:
Hly2escco Hlybactac
iiiP:.!i
Hlybescco Hlybpasha Ji:iGi;.!';.;
Hlybprovu
.....
iii!iii il;i!ill i-
Hmtlschpo Lcn31acla
i:~:i"ili~i~:::!i;!-": ;i
Bordetella pertussis
Lcnclacla
........ ..:,,.::.,
~i9:!i?'!;:)~:-i~) Mdlescco
Escherichia coh
Transport ATP binding protein [CYDC, HI1156] Transport ATP binding protein [CYDD, HTRD] Transport ATP binding protein [CYDD, HI1157] Hemolysin secretion protein [CYLB]
Haemophilus influenzae
Heterocyst differentiation ATP binding protein [H~TA] Hemolysin secretion ATP binding protein [HLYB] Leukotoxin secretion ATP binding protein [LKTB,
Anabaena sp.
..
Mdl2sacce Mdrleita Mdrplafa Mdrlcaeel
Unknown Unknown
[gram-negative bacterium] Escherichia coh"
Unknown
[gram-negative bacterium] Haemophilus influenzae
[gram-negative bacterium] Enterococcus faecah's
[gram-positive bacterium]
Hemolysin, bacteriocin cytolysin B Unknown
[algal Escherichia coli
Hemolysin A
[gram-negative bacterium] Actinobacillus actinomycetemcomitans
[gram-positive bacterium] Hemolysin secretion ATP Actinobacillus pleuropneumoniae binding protein [CLYI-B, [gram-positive bacterium] APXlB, APPB, CLYIB] Haemolysin secretion ATP Escherichia coli [gram-negative bacterium] binding protein [HLYB] Leukotoxin secretion ATP Pasteurella haemolytica [gram-negative bacterium] binding protein [LKTB] Leukotoxin secretion ATP Proteus vulgaris [gram-negative bacterium] binding protein [HLYB] Schizosaccharom yces Heavy metal tolerance protein precursor [HMT1] pombe [yeast] Lactococcus lactis Lacticin 481/lactococcin [gram-positive bacterium] transport ATP binding protein [LCNDR3] Lactococcus lactis Lactococcin A transport [gram-positive bacterium] ATP binding protein [LCNC] Escherichia cob" Multidrug resistance-like [gram-negative bacterium] ATP binding protein ATP-dependent permease [MDL1, YLR188W, L9470.3] ATP-dependent permease [MDL2, SSHI] Multidrug resistance protein [PGPA] Multidrug resistance protein [Chloroquine resistance protein, MDR1] Multidmg resistance protein [P-glycoprotein A, PGPI]
Cyclolysin
[gram-negative bacterium]
[MDL] Mdllsacce
Colicin V
[gram-negative bacterium]
AALTB] Hlybactpl
SUBSTRATES [RESISTANCE] a
[gram-negative bacterium]
MDRH,SURB] Cydchaein
.,~.,:..., :..:.:,: Z-7-.L: L
:ii
OR GANISM [COMMON NAMES] Escherichia coh
Leukotoxin {hemolysin} Leukotoxin {hemolysin) Hemolysin A Leukotoxin (hemolysin) Hemolysin A Metal-bound phytochelatins Lacticin, 481/lactococcin Lactococcin A Unknown
Saccharomyces cerevisiae [yeast]
Unknown
Saccharomyces cerevisiae [yeast] Leishmania tarentolae
Unknown
[trypanosome] Plasmodium [alcipamm
Methotrexate Chloroquine
[trypanosome] Caenorhabditis elegans
Multiple drugs
[nematodel
142
CODE
DESCRIPTION [SYNONYMS]
Mdrlcrigr
Multidrug resistance protein 1 [P-glycoprotem 1, PGPll Multidrug resistance protein 1 [P-glycoprotem 1, PGY, MDR1] Multidrug resxstance protein 1 [P-glycoprotem 1, PGYI] Multidrug resxstance protein 1 [P-glycoprotem 1, MDR1, MDR1B, PGY1] Multidrug resistance protein 1 [P-glycoprotem l, PGY1, MDR1 MDR1B] Multidrug resistance protein 2 [P-glycoprotem 2, PGP2] Multidrug resistance protein 2 [P-glycoprotem 2, PGY2] Multidrug resistance protein 3 [P-glycoproteln C, PGP3] Multidrug resistance protein 3 [P-glycoprotem 3, PGP3] Multidrug resxstance protein 3 [P-glycoprotem 3, PGY3, MDR3 Multidrug resistance protein 3 [P-glycoprotem 3, PGY3, MDR3, MDR1A] Multidrug resxstance protein homolog 50 [P-glycoprotein 50, MDR50, MDR49] Multidrug resistance protein homolog 65 [P-glycoprotein 65, MDR65] Probable transport ATP binding protein [MSBA] Probable transport ATP binding protein [MSBA, MSH1, I-II0060] Mt2 protein
Mdrlhomsa Mdrlleien Mdrlmusmu Mdrlratno Mdr2crigr Mdr2musmu Mdr3caeel Mdr3crigr Mdr3homsa Mdr3musmu Mdr4drome Mdr5drome Msbaescco Msbahaein Mt2ratno
OR GANISM [COMMON NAMES] Cricetulus griseus
Ndvarhime Nistlacla Peddpedac Pglyarath
ATP binding transport protein [NATA] //-(1--,2)Glucan export ATP binding protein [NDVA] Nisin transport ATP binding protein [NIST] Pediocin PA-1 transport ATP binding protein [PEDD] P-glycoprotein [ATPGp1]
Homo sapiens
[Multiple drugs]
[humanl [trypanosome]
[Vinblastine, puromycin]
Mus musculus
[Multiple drugs]
Leishmania enriettii
[mouse] Rattus norvegicus
[Multiple drugs]
[rat] Cricetulus griseus
[Multiple drugs]
[hamster] Mus musculus
[Multiple drugs]
[mouse] Caenorhabditis elegans
[Multiple drugs]
[nematode] Cricetulus griseus
[Multiple drugs]
[hamsterl Homo sapiens
[Multiple drugs]
[human] Mus musculus
[Multiple drugs]
[mouse] Drosophila melanogaster
[fruit flYl Drosophila melanogaster
[Colchicine] Unknown
[fruit fly] Escherichia coli
Unknown
[gram-negative bacterium] Haemophilus influenzae
Unknown
[gram-negative bacterium] Rattus norvegicus Bacillus subtilis
Peptides? Unknown
[gram-positive bacterium] Rhizobium meliloti
~-(1-,2)Glucans
[gram-negative bacterium] Lactococcus lactis
[gram-positive bacterium] Pediococcus acidilactici
Nisin Pedoicin PA-1
[gram-positive bacterium] Arabidopsis thaliana
[mouse-ear cress]
144
[Multiple drugs]
[hamsterl
[rat] Natabacsu
SUBSTRATE(S) [RESISTANCE] a
Unknown
CODE
DESCRIPTION [SYNONYMS]
Pgp 1ara
P-glycoprotein
ORGANISM [COMMON NAMES] Arabidopsis thaliana
SUBSTRATE(S) [RESISTANCE] a
Unknown
[mouse-ear cress] Pmdlschpo Prtderwch Rt3bactpl
Leptomycin B resistance protein [PMD1] Proteases secretion ATP binding protein [PRTD] RTX toxin-Ill operon protein [APXIIIB,RTXB]
Schizosaccharom yces pombe [yeast] Erwinia chrysanthemi
[gram-negative bacterium] Actinobacillus pleuropneumoniae
Leptomycin B Mating factor? Proteases A, B, C, G RTX toxin-lII
[gram-positive bacterium] Spabbacsu
Subtilin transport ATP binding protein [SPA.B, SPAY, SPAT] Ste6sacce Mating Factor A secretion protein [Multidrug resistance protein homolog, STE6, YKL209C] Surcricr Sulfonylurea receptor [SUR] Surratno Sulfonylurea receptor [SUR] Syrdpsesy ATP binding protein [SYRD] Taplhomsa Antigen peptide transporter 1 [Peptide supply factor 1, TAP1, PSF1, RING4, Y3] Tap lmusmu Antigen peptide transporter 1 [Histocompatibility antigen modifier 1, TAP1, HA/VIII Tap2homsa Antigen peptide transporter 2 [Peptide supply factor 1, TAP2, PSF2, RING 11, Y1] Tap2musmu Antigen peptide transporter 2 [Histocompatibility antigen modifier 2, TAP2, HAM2] Tap2ratno Antigen peptide transporter 2 [TAP2, MTP2}
Bacillus subtilis
Subtilin
[gram-positive bacterium] Saccharomyces cerevisiae [yeast]
Mating factor A
Cricetus cricetus
Insulin
[hamster] Rattus norvegicus
Insulin
[rat] Pseudomonas syringae
Syringomycm
[gram-negative bacterium] Homo sapiens
Peptides
[human] Mus musculus
Peptides
[mouse] Homo sapiens
Peptides
[human] Mus musculus
Peptides
[mouse] Rattus norvegicus
Peptides
[rat]
'~Presumed substrates; protein confers resistance to specified compounds.
14~
Phylogenetic tree Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and are therefore not included in the phylogenetic tree: Hlybprovu, Hly2escco (Hlybescco}; Mdrlratno (Mdrlmusmu}; Tap2ratno (Tap2musmu). --~
I
I I [~
~ I I
[ [ [
[- Surcricr Surratno Mdrleita Hemsentfa Lcn31acla Nistiacla Spabbacsu Cyddescco Cyddhaein Cydcescco Cydchaein Aprdpseae Prtderwch Hlybescco Hlybpasha Rt3bactpl Hlybactac Cyabborpe Comastrpn Lcnclacla Peddpedac Cvabescco Pmdlschpo Pglyarath ~ - pgplarath Mdr3enthi Mdr2musmu Mdr3crigr Mdr3homsa i~ Mdrlmusmu Mdr2crigr
Mdrlcrigr
Mdrmusmu Mdrlhomsa 9Mdr4drome
Mdr4enthi MdrSdrome Mdrlcaeel MdrBcaeel Mdrlleien ,,, Msbaescco Msbahaein Hetaanasp Mdrplaff Mdllsacce Mdl2sacce [------ Mt2ratno Tap2musmu Tap2homsa Taplhomsa Taplmusmu Chvaagrtu Ndvarhime Atmlsacce Hmtlschpo Mdlescco Ste6sacce
Natabacsu Syrdpsesy
14[
.............................. ..... ==========================
!::;-::::::::::::::::::::::::::::::,!:
Proposed orientation of MDR1 ~ in the membrane The model is based on predictions of membrane-spanning regions and a-helical
content. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last ::::::::::::::::::::::::::::::: of each membrane-spanning helix are boxed. Residues that are ~::!i~):!:!?ii,~:i:ji~i~!i:?residue !! conserved m more than 75% of the aligned transporters (see below) are .~:.~.~..,..,..,
.,..: ....
-'.!::~:~*:::::::::::::::::::::: ..... .:=~: 2" ': 7: ~: ..... ~:::.:::f::
..........
shown. :..: :.:-:.:.: :::::::::::::::::::::::::::::::: :;,--::~..::::~..::~.:r-;~
OUTSIDE
:..::.....:..:..::.:... :::::::::::::::::::::::::::::::::::::::: :~
!ili!!i!!!~!!!!!':!i!:~!ili? :~i-:::~:~.:::~:..~:::~-.::.:
?::?:2::?"; .:..:: :.:.:.:.:.:.:.:...:.:.. .............
,~ ............................... .~.~..:.:.... :..,.~ ~.~:..., .....
.............
i}!i~!i?~!~:i::
::::::::::::::::::::::::::::: ................
1
i~i :.ii~i!~i',!!fi :-:::::-:-. 9 :t:.. ::::::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::::::::
~i!ii!iii:::i::ili!i!:
!:
NH
COOH
y 2
............
~-LTSKGGGV
:::::::::::::::::::::::::::::::.:
GL
LGGDLRHT
--//
EDLAS ----~
i:::i:i}i!:2?:!',Gi
GR V Q L NI - - / / - -
G G
--
LSGGQ QR ARA L LDE
INSIDE
.................................
:.: :.:.: :..:..:.:.:.:.::.:.::
Physical and genetic characteristics :
................ :,:,:..: ...................... :: ...............
!I!::ii i!~i:~;ii i i:~i~i i i
Aprdpseae Atmlsacce i i i i i i'i~',i!i':i i!.i~Chvaagrtu : Comastrpn Cvabescco Cyabborpe !i:ii:ii~i!i~!ii~?:~!~: Cydcescco Cydchaein :::::::::::::::::::::::::::::::.. :~.
E-il.;ii!:::iiii~::iii.:!ii!!;::i;i
AMINO ACIDS
593 690 588 717 698 712 573 576
MOL. WT
63 670 77 522 64 651 80 350 78 245 77 969 62 920 64 831
EXPRESSION SITES
CHROMOSOMAL LOCUS
63 minutes Chromosome 13 comAB locus 19.5 minutes 66.849
l/
Cyddescco Cyddhaem Hemsentfa Hetaanasp Hly2escco Hlybactac Hlybactpl Hlybescco !!~i~!iiii!i!i!:!:i!'.iHlybpasha l Hlybprovu Hmtlschpo Lcn31acla ;i:i!:ii!i!i:~:~i:,i!i Lcnclacla Mdlescco Nii!i= Mdllsacce Mdl2sacce Mdrleita Mdrplafa Mdrlcaeel ?ii!ii) Mdrlcrigr Mdrlhomsa ~iiiiii!;~':i!i~ Mdrlleien Mdrlmusmu Mdrlratno Mdr2crigr ~!~i'~!ii',ii Mdr2musmu Mdr3caeel Mdr3crigr Mdr3homsa Mdr3musmu | Mdr4drome Mdr5drome Msbaescco Msbahaein Mt2ratno !:?!~i!!!~!:i~i!~i:iNatabacsu ;:G=~~ii;i=;,i Ndvarhime Nistlacla Peddpedac Pglyarath Pgplarath Pmdlschpo Prtderwch Rt3bactpl Spabbacsu Ste6sacce Surcricr Niii!i!!!~i~ Surratno Syrdpsesy Tap 1homsa Taplmusmu Tap2homsa Tap2musmu Tap2ratno !!~!~!! !!!!~i :!:~! i:i~i :::>.>:~:::>::>.::,.........
Ii~iiiG~,i !!i!i!!~,!i~';:i!i:~!:i .....,::,>,:<::, ..........
,.::::::::::::::::::::::,:.
i~i~ilN::ii~-!::!i:!4:
148
AMINO ACIDS
MOL. WT
588 586 714 607 707 707 707 707 708 707 830 691 715 1143 695 812 1548 1419 1321 1276 1280 1280 1276 1277 1276 1276 1254 1281 1279 1276 1302 1302 582 587 703 246 616 600 724 1286 1233 1362 575 711 614 1290 1581 1580 566 748 577 686 702 703
64 956 65 645 82051 67789 79 463 79578 79 663 79672 79 712 79940 94007 79 834 79810 126083 75 950 89 754 172235 162251 145074 140 925 141 504 139 728 140993 141386 141 057 140 332 138807 140 866 140682 140 754 142 724 143 736 64 460 64912 77811 27878 67238 69210 81 651 140571 135313 149652 61617 80 405 71 188 144 765 177015 176 750 63 195 80 964 63 450 75 663 77 444 77712
EXPRESSION SITES
CHROMOSOMAL LOCUS
19.3 minutes 66.943
apxI operon Plasmid pHly152
Chromosome 3 ADRIA 85LO30 op. 10.2 minutes Chromosome 12 Chromosome 16 H circle intestinal cells liver, ovary
Chromosome 5 Chromosome 4 7q21.1 extrachromosomal circle
liver, ovary intestinal cells liver liver
Chromosome 10
head head
49EF 65A 20.5 minutes 3.3225
7q21.1
ndvA locus Plasmid pSRQ 11
Plasmid pDB248' prt locus spa region Chromosome 11, left arm pancreatic islets pancreatic islets 6p21.3 MHC class 11 region 6p21.3 MHC class 11 region MHC class II region
Multiple amino acid sequence alignments 1 50 Surcricr ......... P L A F C G T E N H S A A Y R V D Q G V L NN.GCFVDALNVVPHVFLLF Surratno ......... P L A F C G T E N H S A A Y R V D Q G V L NN.GCFVDAL NVVPHVFLLF Mdrleita MVDNGHVTIA MADLGTVVEI AQVRCQQEAQ RKFAEQLDEL WGGEPAYTPT Hmtlschpo ....................................... M VLRYNSPRLN Consensus ..................................................
Surcricr Surratno Mdrleita Hmtlschpo Consensus
51 i00 IT ..... FPI L F I G W G S Q S S K V H I H H S T W L H F P G H N L R W I L T F I L L F V L V IT ..... FPI L F I G W G S Q S S K V H I H H S T W L H F P G H N L R W I L T F I L L F V L V V E D Q A S W F Q Q L Y Y G W I G D .... Y I Y K A A A G N I T E A D L P P P T R S T R T Y H I G ILELVLLYVG FFSIGSLNLL QKRKATSDPY RRKNRFGKEP IGIISWWILG ..................................................
Surcricr Surratno Mdrleita Mdl2sacce Mt2ratno Tap2musmu Tap2homsa Taplhomsa Hmtlschpo Consensus
i01 150 CEIAEGILSD GVTESRHLHL YMPAGMAFMA ...AITSVVY YHNIETSNFP CEIAEGILSD GVTESRHLHL YMPAGMAFMA ...AITSVVY YHNIETSNFP RKLSRQAHAD .IDASRRWQG YIGCEVVYKS EAEAKGVLRW VGHLQQSDYP .................................... MKTY VLLYGKLIMT ..................................... MAL S H P R P W A S L L ..................................... MALSYLRPWVSLL ..................................... MRL P D L R P W T S L L ............. MASSRCP APRGCRCLPG ASLAWLGTVL LLLADWVLLR IALTYVVDIS NLVIYALAVP NWWPCKTTVVCLILFLLFWI IVLISCADSK ..................................................
Surcricr Surratno Mdrleita Hemsentfa Lcn31acla Hlybescco Hlybpasha Rt3bactpl Hlybactac Hlybactpl Cyabborpe Comastrpn Lcnclacla Peddpedac Cvabescco Mdllsacce Mdl2sacce Mt2ratno Tap2musmu Tap2homsa Taplhomsa Atmlsacce Hmtlschpo Consensus
151 200 KLLIALLIYW TLAFITKTIK FVKFYDHAIG FSQLRFCLTG LLVILYGMLL KLLIALLIYW TLAFITKTIK FVKFYDHAIG FSQLRFCLTG LLVILYGMLL R S L V A G V E W R . . . . . . . . . . . . . . . . . . MP P R H R R L A V L G S A A A L H N . . . . . . . . . . . . . . . . . . . . MKR L K Y V A Q G E H S E C A L A C I T M L L N Y Y G N Q S T L .................... MKIVLQNNEQ DCLLACYSMI LGYFGRDVAI ......................... MDSCH KIDYGLYALE ILAQYHNVSV ..................... MEAN...HQ RNDLGLVALT MLAQYHNISL ..................... MESQMPFNE KIDYGLHALV ILAQYHNVAV ......................... MDSQK NTNLALQALE VLAQYHNISI ......................... MDFYR EEDYGLYALT ILAQYHNIAV .................... MTSPVAQCAS VPDSGLLCLV MLARYHGLAA ............... MKFGK RH..YRPQVD QMDCGVASLA MVFGYYGSYY ............... MKFKK KN..YTSQVD EMDCGCAALS MILKSYGTEK ............... MWTQK WHKYYTAQVD ENDCGLAALN MILKYYGSDY ..MTNRNFRQ IINLLDLRWQ RRVPVIHQTE TAECGLACLA MICGHFGKNI .................................... MIVR MIRLCKGPKL TMILNTGRFE EWYKVCIIALKEKEIYVPSS PIAMLNGRLP LLRLGICRNM LVDLALLGLL QSSLGTLLPP GLPGLWLEGT LRLGV .......... LWGLL LADMALLGLL QGSLGNLLPQ GLPGLWIEGT LRLGV .......... LWGLL LVDAALLWLL QGPLGTLLPQ GLPGLWLEGT LRLGG .......... LWGLL TALPRIFSLL VPTALPLLRV WAVGLSRWAV LWLGACGVLR ATVGSKSENA . . . . . . . . . . . . ". . . . . . . . . . . . M L L L P R C P V I G R I V R S K F R S G L I R N H ALPKNADSIL KAYRLSVLYV WAIDIVFETI FIVYSPHPNE TFQGIVLADH ..................................................
14~
l
. .
.
. .
.
.
-
.
. .
.
.
.
9
.
. .
Surcricr Surratno Mdrleita Hemsentfa Lcn31acla Hlybescco Hlybpasha Rt3bactpl Hlybactac Hlybactpl Cyabborpe Comastrpn Lcnclacla Peddpedac Cvabescco Pmdlschpo Pglyarath Pgplarath Mdr2musmu Mdr3crigr Mdr3homsa Mdrlmusmu Mdr2crigr Mdrlcrigr Mdr3musmu Mdrlhomsa Mdr4drome Mdr5drome Mdrlcaeel Mdr3caeel Mdrplafa Mdllsacce Mdl2sacce Mt2ratno Tap2musmu Tap2homsa Taplhomsa Ndvarhime Atmlsacce Hmtlschpo Consensus Surcricr Surratno Mdrleita Hemsentfa Lcn31acla Nistlacla Spabbacsu Cyddescco Cyddhaein Cydcescco
15{
201 250 LVEVNVIRVR RYIFFKTPRE VKPPEDLQDL GVRFLQPFVNLLSKGTYWWM LVEVNVIRVR RYVFFKTPRE VKPPEDLQDL GVRFLQPFVNLLSKGTYWWM .... GVVHGE R L F W P H E D N Y L C S C E P V E Q L YVK . . . . . . . . . SK ...... V E L R E K Y G V P K G G L T I K N I R T V F D E Y G F D V S T F K S S F S N ..... Y L D L P T HELYSGEMIP PDGLSVSYLK NINMKHQVSM HVYKTDKKNS PNKIFYPKML NPEEIKHRFD TDGTGLGLTS WLLAAKSLEL KVKQVKKTID RLNF..IFLP NPEEIKHKFD LDGKGLSLTA WLLAAKSLAL KAKHIKKEIS RLHL..VNLP NPEEVKHKFD LDGKGLDLVA WLLAAKSLEL KMKRVKKSIE RLPF..IHLP NPEEIKHKFD IDGHGLNQTK WLLAAKSLGL KVRTANKTVD RLPF..LHLP NPEELKHKFD LEGKGLDLTA WLLAAKSLEL KAKQVKKAID RLAF..IALP DPEQLRHEF..AEQAFCSET IQLAARRVGL KVRRHRPAPA RLPR..APLP FLAHLRELAK TTMDGTTALG LVKVAEEIGF ETRAIKADMT LFDLPDLTFP SLASLRLLAG TTIEGTSALG IKKAAEILEF SVQALRTDAS LFEMKNAPYP MLAHLRQLAK TTADGTTVLG LVKAAKHLNL NAEAVRADMD ALTASQLPLP DLIYLRRKFN LSARGATLAG INGIAEQLGM ATRALSLELD ELRV..LKTP ....... MSL H S K K S T S T V K D N E H S L D L S I K S I P S N E K N F S T E K S E N E A S ................................ M D N D G G A P ..PPPPTLVV ................................ M D N D G G A P ..PPPPTLVV . . . . . . . . . . . . . . . . . . . M D L E A A R N G T A R R . . . L D G D F ..ELGSISNQ . . . . . . . . . . . . . . . . . . . M D L E A A R N G T A R R P G T V E G D F ..ELGSISNQ . . . . . . . . . . . . . . . . . . . M D L E A A K N G T A W R P T S A E G D F ..ELGISSKQ . . . . . . . . . . . . . . . . . . . M E F E E N L K ...... G R A D K N F ..SKMGKKSK . . . . . . . . . . . . . . . . . . . M E F E E D F S ...... A R A D K D F ..LKMGRKSK . . . . . . . . . . . . . . . . . . . M E F E E D F S ...... G R K D K N F ..LKMGRKSK . . . . . . . . . . . . . . . . . . . M E L E E D L K ...... G R A D K N F ..SKMGKKSK . . . . . . . . . . . . . . . . . . . M D L E G D R N G ..... G A K K K N F ..FKLNNKSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . MV K K E E S R L P Q A ..GDFQLKE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . ME R D E V S T S S S E ..GKSQEEAP .MLRNGSLRQ S L R T L D S F S L A P E D V L K T A I K T V E D Y E G D N I D S N G E I K I T . . . . . . . . . . . . . . . . . . . . . . . . . . . . MK K T K V N P E D D I T L G K F T P K P S ............................................... MGK L . R S Q F A S A S A L Y S T K S L F K P P M Y Q K A E I N L I I P H R .... K H F L L R S I R L LSRPRLAKLP SI.RFRSLVT PSSSQLIPLS RLCLRSPAVA KSLILQSFRC KVGGLLRLVG TFLPLLCLTN PLFFSLRALV GSTMSTSVVR VASASWGWLL KVGELLGLVG TLLPLLCLAT PLFFSLRALV GGTASTSVVR VASASWGWLL KLRGLLGFVG TLLLPLCLAT PLTVSLRALV AGASRAPPAR VASAPWSWLL GAQGWLAALK PLAAALGLAL PGLALFRELI SWGAPGSADS TRLLHWGSHP ............................................... MKI .SPVIFTV . . . . . . . . . . SK L S T Q R P L L F N S . . A V N L W N Q A Q K D I T H K K S VARLVLCVFA TAIYLTYRRK RHTHDPLDFE ERQLTEESNV NENAISQNPS .................................................. 251 NAFIKTAHKK PIDLRAIAKL PIAMRALTNY QRLCVAFDAQ NAFIKTAHKK PIDLRAIGKL PIAMRALTNY QRLCLAFDAQ .................................... YNLI PVISYWNNQH FVVIEKIKKK KVLILDPASN KRWIDISEFK PVIIQWNDNH FVVVTKIYRK NVTLIDPAIG KVKYNYNDFM .................... MDEVKEFTSK QFFYTLLTLP .................... MEVKEQLKLK ELLFIMKQMP .............................. MNKSRQKELT .............................. MNKLRQKYLQ ................................... MRALL
300 ARKDTQSPQG ARKDTQSQQG PPRPPPSPDL KNF ...... S KKF ...... S STLKLIFQLE KTFKLIFTLE RWLKQQSVIS KWLRAQQEPI PYLA ..... L
:i:4
:
:-!
2
....
_
T!i~
....
t
i
-d
: .~z
.:.!, .... u::.. T!;
-
b~
.
.
.:. ..
..
)
- ~. :., <
..
.. .
.
..
,}
:
..
Cydchaein . . . . . . . . . . . . . . . . . . . . . . . . . MRTLL PFIR ..... L HlybesccoiL~REDGRHFILT.KISKEVNRYLIFDLEQRNPRV..LEQSEFEALYQG Hlybpasha ALVWQDNGKH FLLV.KVDTD NNRYLTYNLE QDAPQI..LS Q D E F E A C Y Q G Rt3bactpl ALIWRDDGQH VILM.KIDTQ TNRYLIFDLE E R N P K V . . L S A A E F H E I F Q G Hlybactac A L A W R D D G E H FILL.KIDQE TDRYLIFDLI Q K N P I V . . L D KNEFEERYQS Hlybactpl ALVWREDGKH FILT.KIDNE AKKYLIFDLE THNPRI..LK Q T E F E S L Y Q G Cyabborpe AIALDRQGGY FVLVPRFEPG ADQAVLIQRP GQAPAR..LG Q A E F E A L W A G Comastrpn FVAHVLKEGK L L H Y Y W T G Q DKDSIHIADP DPGVKLTKLP R E R F E E E W T G Lcnclacla FIAHVIKDQK YPHYYVITGA NKNSVFIADP DPTIKMTKLS KEAFLSEWTG Peddpedac VIVHVFKKNK LPHYYVVYQV TENDLIIGDP DPTVKTTKIS KSQFAKEWTQ Cvabescco C I L H W D F S H F W L . . . V S V K RNRYVL...H DPARGIRYIS R E E M S R Y F T G Lepbschpo ESHVVDVVKD PFEQYTPEEQ EILYKQINDT PA..KLSGYP R I L S Y A D K W D Pmdlschpo E S H V V D W K D PFEQYTPEEQ EILYKQINDT PA..KLSGYP R I L S Y A D K W D Pglyarath . . . . . . . . . . . . . . . . . . . . . . . . EEPKKA EI..RGVAFK E L F R F A D G L D Pgplarath . . . . . . . . . . . . . . . . . . . . . . . . EEPKKA EI..RGVAFK E L F R F A D G L D Mdr2musmu GREKKK KV NLIGLL T L F R Y S D W Q D Mdr3crigr . . . . . . . . . . . . . . . . . . . . . . . . GRNKKK KV..NLIGPL T L F R Y S D W Q D Mdr3homsa . . . . . . . . . . . . . . . . . . . . . . . . KRKKTK TV..KMIGVL T L F R Y S D W Q D Mdr imusmu . . . . . . . . . . . . . . . . . . . . . . . . KEKKEK .K.. PAVGVF G M F R Y A D W L D Mdr2cr igr . . . . . . . . . . . . . . . . . . . . . . . . KEKKEK EN..PNVGIF G M F R Y A D W L D Mdr icr igr . . . . . . . . . . . . . . . . . . . . . . . . KEKKEK .K.. PVVSVF T M F R Y A G W L D Mdr3musmu . . . . . . . . . . . . . . . . . . . . . . . . KEKKEK .K..PAVSVL T M F R Y A G W L D Mdrlhomsa KDKKEK K PTVSVF SMFRYSNWLD Mdr4drome .GSVVD AT RKYSYF DLFRYSTRCE Mdr5drome . . . . . . . . . . . . . . . . . . . . . . . . MAEGLE PT..EPIAFL KLFRFSTYGE Mdrlcaeel . . . . . . . . . . . . . . . . . . . . . . . . R D A K E E V V . . N K V S I P QLYRYTTTLE Mdr3caeel . . . . . . . . . . . . . . . . . . . . . . . . PQDSYQ G ...... NFF DVFRDADYKD Mdrlleien GIAGKDGSTR DCSGYGSQGP LFSAEEEVKG T W R E T V G P I EIFRYADATD Msbaescco . . . . . . . . . . . . . . . . . . . . . . . MHNDKDL STWQTFRRLW PTIAPFKA.. Msbahaein . . . . . . . . . . . . . . . . . . . M QEQKLQENDF STLQTFKRLW PMIKPFKA.. Hetaanasp ................ MPKS PHKLFKANSF W K E N N L . . I L R E I K H F R K I A Mdrplafa EQKEKKDGNL SIKEEVEKEL N K K S T A E L F R KIKNEKISFF LPFKCLPAQH Mdllsacce QSDIAQGKKS T K P T L K L S N A ......... N SKSSGFKDIK RLFVLSKP.E Mdl2sacce NSSKTVPETS LPSASPISKG SARSAHAKEQ SKTDDYKDII R L F M L A K R . D Mt2ratno ADYGAV.ALS L A V W A V L S P A ...GAQEKEP GQENNRALMI R L L R L X K P . D Tap2musmu AGYGAV.ALS W A V W A V L S P A ...GVQEKEP GQE.NRTLMK RLLKLSRP.D Tap2homsa VGYGAA.GLS WSLWAVLSPP ...GAQEKEQ D Q V N N K V L M W RLLKLSRP.D Taplhomsa T A F V V S Y A A A L P A A A L W H K L ...GSLWVPG GQGGSGNPVR RLLGCLGS.E Taplmusmu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MLC RMLGFLGP.K Chvaagrtu . . . . . . . . . . . . . . . . . . . . . . . . . . . . MT LFQVYTRALR Y L . . . T V H K W Ndvarhime ILAVGSRRNA LPHRAVAAPI PIPERGETVS LFQVYARALQ YL...AVHKF Atmlsacce VEQF ...... SSAPKVKTQV KKTSKAPTLS ELKILKDLFR Y I W P K G N N K V Hmtlschpo TVQLGVSAST SNFGTLKSTS KKPSDKSWAE YFRSFSTLLP YLWPTKDYRL Ste6sacce . . . . . . . . . . . . . . . . . . . . . . . . . . . . MN FLSFKTTKHY HIFRYVNIRN Syrdpsesy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MKTKQE Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Surcricr Surratno Mdrleita Hemsentfa Lcn31acla Nistlacla
301 ARAIWRA.LC ARAIWRA.LC LRTLFKVHWY NILIY..AHK GYIITLSPNS KRYAIYLIVL
HAFGRRLILS STFRILADLL HAFGRRLVLS STFRILADLL HVWA ......... QILPKLL KKTKKEGKRK QFFLKSFIFT SFTKKKRISE IIFPLKKIFK N A I T A F V P L A SLFIYQDLIN
350 GFAGPLCIFG IVDHLGKENH GFAGPLCIFG IVDHLGKENH SDVTALMLPV LLEYFVK... KFKRYFFSLI ILSFVSQLLL NRNTFLYIFS L . . F I S Q I V A SVLGSGRH ............
m
P-Glycoprotein transporter family Spabbacsu RSLFLKLIRF SIITGILPIV Cyddescco QRWLNISRLL GFVSGILIIA Cyddhaein KKLMRANIVL ATLSSFILVA Cydcescco YKRHKWMLSL GIVLAIVTLL Cydchaein FKFAKFPLIL GLVLMILGLG Aprdpseae ............. MARLGSS Prtderwch H lybescco~ii{~~$~.MNASSE
I I : ~I ~;~ :~ t ~ ~ I
!i::f/: ; . .
<";7 ::
.
.
.
.
.
.
;: 7::7:.
.
;. .....
.
.
" :
.
.:j.;
_
_
....
.
iii:iiii! i!:i!:ii:iil : :::d-:! ?d:.:,
i?!,ii;:~ii;iil-iii i?~
iri:f:i::;;:i::;.:i::;:.::i
i,i:;iiii?i:;:i!i;~
L52
H1ybpashaQ I WS
SVVG
SLYISQELIN QAWFMARILQ QTYFLATLLD ASIGLLTLSG SSMGLLTVSG VTNEIKQALA
SLVTIRKEVS HMIMEN.IPR KLIMQN.VPR WFLSASAVAG WFLAATAIAG ASRGALRSVA
I ......... EALLLPFTLL DELIPYFLGL VAGLYSFNYM LGTL..FNFF AFSGVINLLM
RDRSLFGVLRQFRRSFWSVGIFSAVINVLM
IVSI QI A
Rt3bactpl GMILITSRAS IMG QLA.KF DFTWFIPAVI KYRKIFVETI IVSIFLQLFA Hlybactac KVILIASRAS IVG.NLA.KF DFTWFIPAVI KYRKIFIETL IVSIFLQIFA H l y b a c t p l K L I L V A S R A S IVG KLA K F D F T W F I P A V I KYRKIFIETL IVSIFLQIFA Cyabborpe ELLLCACAAS PTQ.ALA.RF DFSWFIPALV KHRHLIGEVL LISLVLQFIS Comastrpn VTLFMAPSPD YKPHKEQ.KN GLLSFIPILV KQRGLIANIV LATLLVTVIN Lcnclacla ISLFLSTTPS YHPTKEK.AS SLLSFIPIIT RQKKVILNIV IASFIVTLIN Peddpedac IAIIIAPTVK YKPIKES.RH TLIDLVPLLI KQKRLIGLII TAAAITTLIS Cvabescco VALEVWPGSE FQSETLQTRI SLRSLINSIY GIKRTLAKIF C L S W I E A I N Pmdlschpo IMLQLAGTIT GIGAGLGMPL MSLVSGQLAQ AFTDLASGKG ASS ....... Pglyarath YVLMGIGSVG AFVHGCSLPL F L R F F A D L V N S F G S N S N N V E .......... Pgplarath YVLMGIGSVG AFVHGCSLPL F L R F F A D L V N S F G S N S N N V E .......... Mdr2musmu KLFMFLGTLM AIAHGSGLPL MMIVFGEMTD KFVDNTGNFS LPVNF.SLSM Mdr3crigr KLFMLLGTIM AIAHGSGLPL MMIVFGEMTD KFVNNAGNFS LPVNF.SLSM Mdr3homsa KLFMSLGTIM AIAHGSGLPL MMIVFGEMTD KFVDTAGNFS FPVNF.SLSL Mdrlmusmu KLCMILGTLA AIIHGTLLPL LMLVFGNMTD SFTKAE..AS ILPSITNQSG Mdr2crigr KLYMVLGTLA AVLHGTSLPL LMLVFGNMTD SFTKAE..TS IWPNMTNQSE Mdrlcrigr RLYMLVGTLA AIIHGVALPL MMLVFGDMTD SFASVGNIPT ..NATNNATQ Mdr3musmu RLYMLVGTLA AIIHGVALPL MMLIFGDMTD SFASVGNVSK ..NST.NMSE Mdrlhomsa KLYMVVGTLA AIIHGAGLPL MMLVFGEMTD IFANAGNLED LMSNITNRSD Mdr4drome RFLLVVSLLV ATAASAFIPY FMIIYGEFTS LLVDRTVGVG TSSPAFALPM Mdr5drome IGWLFFGFIM C C I K A L T L P A V V I I Y S E F T S MLVDRAMQFG TSSNVHALPL Mdrlcaeel KLLLFIGTLV AVITGAGLPL MSILQGKVSQ AFINEQIVIN .......... Mdr3caeel YILFSGGLIL SAVNGALVPF NSLIFEGIAN ALMEGESQYQ NGTINMPW.. Mdrlleien RVLMIAGTAF AVACGAGMPV FSFIFGRIAM DLMSGVGSAE .......... Msbaescco ..GLIVAGVA LILNAASDTF MLSLLKPL ...... LDDGFG KTDRSV .... Msbahaein ..GLIVSGVA LVFNALADSG LIYLLKPL ...... LDDGFG KANHSF .... Hetaanasp ILAVIFSFLA ASFEGVSIGF LLSFLQKLTS P N D P I Q T G I S W V D M I L A A D A Mdrplafa RKLLFISFVC AVLSGGTLPF FISVFGVILK NM .................. Mdllsacce SKYIGLALLL ILISSSVSMA VPSVIGKLLD LASESDGEDE EGSKSNKLYG Mdl2sacce WKLLLTAILL LTISCSIGMS IPKVIGIVLD TLKTSSGSDF FDLKI.PIFS Mt2ratno LPFLIVAFIF LAMAVWWEMF IPHYSGRVID ILGGDFDPDA FASAIF .... Tap2musmu LPFLIAAFFF LVVAVWGETL IPRYSGRVID ILGGDFDPDA FASAIF .... Tap2homsa LPLLVAAFFF LVLAVLGETL IPHYSGRVID ILGGDFDPHA FASAIF .... Taplhomsa T R R L S L F L V L V V L S S L G E M A IPFFTGRLTD WILQDGSADT FTRNLT .... Taplmusmu KRRLYLVLVL LILSCLGEMA IPFFTGRITD WILQDKTVPS FTRNIW .... Chvaagrtu RVAVVVIANV ILAA..ITIA EPVLFGRIID AISSGTNVTP I ....... LI Ndvarhime RVGAIVIANI VLAA..ITIA EPILFGRIID AISSQKDVAP M ....... LL Atmlsacce RIRVLIALGL LISAKILNVQ VPFFFKQTID SMNIAWDDPT VALPAAIGLT Hmtlschpo QFQIFICIVL LFLGRAVNIL APRQLGVLTE KLTKHSEK.. IPWSDVILFV Ste6sacce DYRLLMIMII GTVATGLVPA ITSILTGRVF DLLSVFVANG SHQGLY .... Syrdpsesy KKARPGSIMR LLWSSHPWLT FFTLLTGLIS GVASIAVVNV INQAIHEETF Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
351 400 Surcricr VFQPKTQFLG VYFVSSQEFL GNAYVLAVLL FLALLLQRTF LQASYYVAIE Surratno VFQPKTQFLG VYFVSSQEFL GNAYVLAVLL FLALLLQRTF LQASYYVAIE Mdrleita ........... YLNADNATW GWGLGLALTI FLTNVIQSCS AHKYDHISIR Hemsentfa LLIPIATKYS IDNIRSFQEI PTYVLLLILT SFFSVL.YVVQYLKSSVVAE Lcn31aclaLWFSIILR...DILNKSHDI TYSFIMMISLVLFQTL.SLL ..MKLGAQKN Nistlacla ...................... LINIIIIY FIVQVITTVL GQLESYVSGK Spabbacsu ...................... VITIFLTY LGVSFFSELI SQISEFYNGK Cyddescco V ........................ LTFVL RAWVV ...... WLRERVGYH Cyddhaein I ........................ IGFGM R A I I L . . . . . . W A R E K I G F Q Cydcescco L ........................ PAAGV RGAAITRTAG RYFERLVSHD Cydchaein Y ........................ PSASV RGLAIGRTVM RYFEKIVTHD Aprdpseae LVPSLYMLQV YDRVLSSANE VTLLMLTLMA LGVFVFMGAL EALRSFVLVR Prtderwch LAPSVYMLQV YDRVLASGNG ITLLMLTLLM AGLCAFMGAL EWVRSLLVVR Hlybescco LITPLFFQVVMDKVLVHRGF STLNVITVAL SVVVVFEIIL SGLRTYIFAH Hlybpasha LITPLFFQVVMDKVLVHRGF STLNIITVAL AIVIIFEIVL SGLRTYVFSH Rt3bactpl LITPLFFQVVMDKVLVHRGF STLNVITVAL SVVVIFEIVL SGLRTYIFSH Hlybactac LITPLFFQVVMDKVLVHRGF STLNVITVAL AIVVLFEIIL GGLRTYVFAH Hlybactpl LITPLFFQVVMDKVLVHRGF STLNVITVAL AIVVLFEIVL NGLRTYIFAH Cyabborpe LLTPLFFQVVMDKVLVNNAM ETLNVITVGF LAAILFEALL TGIRTYLFAH Comastrpn IVGSYYLQSI IDTYVPDQMR STLGIISIGL VIVYILQQIL SYAQEYLLLV Lcnclacla ILGSYYLQSM IDSYIPNALM GTLGIISVGL LLTYIIQQVL EFAKAFLLNV Peddpedac IAGAYFFQLI IDTYLPHLMT NRLSLVAIGL IVAYAFQAII NYIQSFFTIV Cvabescco LLMPVGTQLV MDHAIPAGDR GLLTLISAAL MFFILLKAAT STLRAWSSLV P m d l s c h p o . . . . . . . . . . . . . . . . . . FQ H T V D H F C L Y F I Y I A I G V F G C S Y I Y T V T F I I P g l y a r a t h . . . . . . . . . . . . . . . . . KMM E E V L K Y A L Y F L V V G A A I W A S S W A E I S C W M W P g p l a r a t h . . . . . . . . . . . . . . . . . KMM E E V L K Y A L Y F L V V G A A I W A S S W A E I S C W M W M d r 2 m u s m u L N P G R I . . . . . LEE . . . . . . . E M T R Y A Y Y Y S G L G G G V L V A A Y I Q V S F W T L M d r 3 c r i g r I N P G R I . . . . . LEE . . . . . . . E M T R Y A Y Y Y S G L G G G V L V A A Y I Q V S F W T L M d r 3 h o m s a L N P G K I . . . . . LEE . . . . . . . E M T R Y A Y Y Y S G L G A G V L V A A Y I Q V S F W T L M d r l m u s m u P N S T L I I S N S SLEE . . . . . . . E M A I Y A Y Y Y T G I G A G V L I V A Y I Q V S L W C L M d r 2 c r i g r I N N T E V I S . G SLEE . . . . . . . D M A T Y A Y Y Y T G I G A G V L I V A Y I Q V S F W C L M d r l c r i g r V N A S D I F . . G KLEE . . . . . . . E M T T Y A Y Y Y T G I G A G V L I V A Y I Q V S F W C L M d r 3 m u s m u A D K R A M F . . A KLEE . . . . . . . E M T T Y A Y Y Y T G I G A G V L I V A Y I Q V S F W C L M d r l h o m s a I N D T G F F . . M NLEE . . . . . . . D M T R Y A Y Y Y S G I G A G V L V A A Y I Q V S F W C L Mdr4drome FGGGQQLTNA SKEENNQAII DDATAFGIGS LVGSVAMFLL ITLAIDLANR Mdr5drome FGGGKTLTNA SREENNEALY DDSISYGILL TIASVVMFIS GIFSVDVFNM Mdrlcaeel .NNGSTFLPT GQNYTKTDFE HDVMNVVWSYAAMTVGMWAAGQIIVTCYLY M d r 3 c a e e l . . . . . . . . . . . . . . . . . . FS S E I K M F C L R Y F Y L G V A L F L C S Y F A N S C L Y T Mdrlleien .................... EKAAKTSLIM VYVGIAMLIA CAGHVMCWTV Msbaescco ...................... LVWMPLVVIGLMILRGIT SYVSSYCISW Msbahaein ...................... LKMMAFVVVGMIILRGIT NFISNYCLAW H e t a a n a s p WP . . . . . . . . . . . . . . . . . I P P I Y R I S L L I L L S T W M R A T F N Y F G G V Y T E S Mdrplafa ................. NLG DDINPIILSL VSIGLVQFIL SMISSYCMDV Mdllsacce F .................... TKKQFFTAL GAVFIIGAVA NASRIIILKV Mdl2sacce L .................... PLYEFLSFF TVALLIGCAANFGRFILLRI Mt2ratno ......................... FMCLF S V G S S L S A G C R G G S F L F . . . T a p 2 m u s m u . . . . . . . . . . . . . . . . . . . . . . . . . FMCLF S V G S S F S A G C R G G S F L F . . . T a p 2 h o m s a . . . . . . . . . . . . . . . . . . . . . . . . . FMCLF S F G S S L S A G C R G G C F T Y . . . Taplhomsa ......................... LMSIL TIASAVLEFV GDGIYNN... Taplmusmu ......................... LMSIL TIASTALEFA SDGIYNI... C h v a a g r t u LW . . . . . . . . . . . . . . . . . . . . . . . . . . . . AGFGVFNTVA YVAVAREADR Ndvarhime LW ............................ AGFGVFNTIA FVLVSREADR Atmlsacce ILCYGVARFG S ................... VLFGELRNAV FAKVAQNAIR
15~
Hmt ischpo Mdlescco Ste6sacce Syr dpsesy Consensus
-.:..
.
.
: 5 :
.... ...
.c
-s.:b
..
.
::
.:
7:,
-
..... ...
.
.
.
[54
IYRFLQGNMG ..................... VIGSLRSFL WVPVSQYAYR VPPKVVGIVV DGVTEQHFTT GQILMWIATM VLIAVVVYLL RYVWRVLLFG .................... SQLVQRSMAV MALGAASVPV MWLSLTSWMH QRQSL ...................... FWF V G L S W A L L F RNGASLFPAY ..................................................
401 Surcricr TGINLRGAIQ TKIYNKIMHM Surratno TGINLRGAIQ TKIYNKIMHL Mdrleita TAALFETSSM ALLFEKCFTV Hemsentfa FQYEFDFKLM FSYIDKLFSM Lcn31acla TNLLYESKIS RQIFKGIFSR Nistlacla FDMRLSYSIN MRLMRTTSSL Spabbacsu FQLNIGYKLN YKVMKKSSNL Cyddescco AGQHIRFAIR RQVLDRLQQA Cyddhaein SGQLLRNHIR QKILDKIHLV Cydcescco ATFRVLQHLR IYTFSKLLPL Cydchaein ATFRILSKLR VQVFEKIIPL Aprdpseae VSERFDGQLH GRIYAAAFER Prtderwch LGTRIDLALN QDVFNAAFAR Hlybescco STSRIDVELG AKLFRHLLAL Hlybpasha STSRIDVELG AKLFRHLLSL Rt3bactpl STSRIDVELG AKLFRHLLAL Hlybactac STSRIDVELG ARLFRHLLAL Hlybactpl STSRIDVELG ARLFRHLLAL Cyabborpe TSSKLDVELG ARLYAHLLRL Comastrpn LGQRLSIDVI LSYIKHVFHL Lcnclacla LSQRLAIDVI LSYIRHIFQL Peddpedac LGQRLMIDIV LKYVHHLFDL Cvabescco MSTLINVQWQ SGLFDHLLRL Pmdlschpo AGERIARRIR QDYLHAILSQ Pglyarath SGERQTTKMR IKYLEAALNQ Pgplarath SGERQTTKMR IKYLEAALNQ Mdr2musmuAAGRQIKKIR QKFFHAILRQ Mdr3crigrAAGRQIKKIR QNFFHAILRQ Mdr3homsaAAGRQIRKIR QKFFHAILRQ MdrlmusmuAAGRQIHKIR QKFFHAIMNQ Mdr2crigrAAGRQINKIR QKFFHAIMNQ MdrlcrigrAAGRQIHKIR QKFFHAIMNQ Mdr3musmuAAGRQIHKIR QKFFHAIMNQ MdrlhomsaAAGRQIHKIR KQFFHAIMRQ Mdr4drome IALNQIDRIR KLFLEAMLRQ Mdr5drome VALRQVTRMR IKLFSSVIRQ Mdrlcaeel VAEQMNNRLR REFVKSILRQ Mdr3caeel LCERRLHCIR KKYLKSVLRQ MdrlIeienAACRQVARIR LLFFRAVLRQ Msbaescco VSGKVVMTMR RRLFGHMMGM Msbahaein VSGKVVMTMR RRLFKHLMFM Hetaanasp AQLNLADRLH KQIFEQLQAL Mdrplafa ITSKILKTLK LEYLRSVFYQ Mdllsacce TGERLVARLR TRTMKAALDQ Mdl2sacce LSERVVARLR ANVIKKTLHQ Mt2ratno AESRINLRIREQLFSSLLRQ Tap2musmu TMSRINLRIR EQLFSSLLRQ
STSNLSMGEM STSNLSMGEM SRRSLQRPDM PLMYF..SNR PLLYF..RNN ELSDY..EQA ALKDF..ENP GPAWI..QGK GPATI..NQK SPAGL..ARY SPAVL..NRY ...... NLRA ...... NLEA .PISYFESR. .PISYFENR. .PISYFENR. .PISYFEAR. .PISYFENR. .PLAYFQAR. .PMSFFATR. .PMSFFSTR. .PMNFFTTR. .PLAFFERR. .NIGYFD.RL .DIQFFDTEV .DIQFFDTEV .EMGWFDIK. .EMGWFDIK. .EIGWFDIN. .EIGWFDVH. .EIGWFDVH. .EIGWFDVH. .EIGWFDVH. .EIGWFDVH. .DIAWYDTS. .DIGWHDLA. .EISWFDTNH .DAKWFD.ET .DIGWHDEH. .PVSFFD.KQ .PVSFFD.QN .RLSYFA.QT .DGQFHD.NN .DATFLD.TN .DAEFFD.NH .DLAFFQ.ET .DLGFFQ.ET
450 TAGQICNLVA IDTNQLMWFF TAGQICNLVA IDTNQLMWFF SVGRIMNMVG NDVDNIGSLN STGELVFRAN LNIYIRQILS SVGTIIEKIN LRTGIRDGIL DMYNIIEKVT QDSTYKPFQL EIYDKLERVT KEISYKPYQI PAGSWATLVL EQIDDMHDYY PAGSWASIML EQVENLHNFY RQGELLNRVVADVDTLDHLY RNSDLLNRLV SDVDTLDSLY GGQEASQALH DLTTLRQFIT GDGRAGLALT DLTLLRQFIT RVGDTVARVR ELDQIRNFLT RVGDTVARVR ELDQIRNFLT RVGDTVARVR ELDQIRNFLT RVGDTVARVR ELDQIRNFLT RVGDTVARVR ELDQIRNFLT RVGDSVARVR ELEHIRAFLT RTGEIVSRFT DANSIIDALA RTGEITSRFS DASSILDAIA HVGEMTSRFS DASKIIDALG KLGDIQSRFD SLDTLRATFT GAGEITTRIT TDTNFIQDGL R T S D W F A I N TDAVMVQDAI R T S D W F A I N TDAVMVQDAI GTTELNTRLT DDVSKISEGI GTTELNTRLT DDISKISEGI DTTELNTRLT DDISKISEGI DVGELNTRLT DDVSKINDGI DIGELNTRLT DDVSKINDGI DVGELNTRLT DDVSKINEGI DVGELNTRLT DDVSKINEGI DVGELNTRLT DDVSKINEVI SGSNFASKMT EDLDKLKEGI SKQNFTQSMV DDVEKIRDGI S.GTLATKLF DNLERVKEGT TIGGLTQKMS SGIEKIKDGI SPGALTARMT GDTRVIQNGI STGTLLSRIT YDSEQVASSS STGRLLSRIT YDSQMIASSS RSGELINTIT TEIERIKQGF PGSKLRSDLD FYLEQVSSGI RVGDLISRLS SDASIVAKSV KVGDLISRLG S D A Y W S R S M KTGELNSRLS SDTSLMSQWL KTGELNSRLS SDTSLMSRWL
:i".i.C.!i ......
..
.
....
i2 :F-: : ........ -.:.
..... . . . . : . :
....
:: . : :.- . : . : ..,
!~:~~;ii:!ili!:!i~
..:..: ...
i !::-: : g :
i-:....% ..
Tap2homsa Taplhomsa Taplmusmu Chvaagrtu Ndvarhime Atmlsacce Hmtlschpo Mdlescco Ste6sacce Syrdpsesy Consensus
TMSRINLRIR EQLFSSLLRQ .DLGFFQ.ET KTGELNSRLS SDTTLMSNWL TMGHVHSHLQ GEVFGAVLRQ .ETEFFQ.QN QTGNIMSRVT EDTSTLSDSL TMGHMHGRVH REVFRAVLRQ .ETGFFL.KN PAGSITSRVT EDTANVCESI LAHGRRATLL TEAFGRIISM .PLSWHHLRG TSNALHTLLR ASETLFGLW. LAHGRRASLL TEAFGRIVSM .PLSWHSQRG TSNALHTLLR ACETLFGLW. ....... TVS LQTFQHLMKL .DLGWHLSRQ TGGLTRAMDR GTKGISQVLT ....... AIS TKALRHVLNL .SYDFHLNKR AGEVLTALTK GS.SLNTFAE ASYQLAVELR EDYYRQLSRQ .HPEFY.LRH RTGDLMARAT NDVDRVVFAA IGERQGFRIR SQILEAYLEE KPMEWYDNNE KLLGDFTQIN RCVEELRSSS ASMRIMTRLR IALCRKILGT PLEE..VDRR GAPNVLTLLT SDIPQLNATL ..................................................
451 Surcricr FLCPNLWTMP Surratno FLCPNLWAMP Mdrleita WYVMYFWSAP Hemsentfa QKVITTLIDS Lcn31acla LKIFPSLLNF Nistlacla FNAIIVELSS Spabbacsu IQAIITMTTS Cyddescco ARYLPQMALA Cyddhaein ARFLPQQSLS Cydcescco LRVISPLVGA Cydchaein LRLLAPFFTA Aprdpseae GQALFAFFDA Prtderwch GNALFAFFDV Hlybescco GQALTSVLDL Hlybpasha GQALTSVLDL Rt3bactpl GQALTSVLDL Hlybactac GQALTSILDL Hlybactpl GQALTSVLDL Cyabborpe GNAVTVLLDV Comastrpn STILSIFLDV Lcnclacla STILSLFLDL Peddpedac STTLTLFLDM Cvabescco TSVIGFIMDS Pmdlschpo GEKVGLVFFA Pglyarath SEKLGNFIHY Pgplarath SEKLGNFIHY Mdr2musmu GDKVGMFFQA Mdr3crigr GDKVGMFFQA Mdr3homsa GDKVGMFFQA Mdrlmusmu GDKIGMFFQS Mdr2crigr GDKIGMFFQS Mdrlcrigr GDKIGMFFQA Mdr3musmu GDKIGMFFQA Mdrlhomsa GDKIGMFFQS Mdr4drome GEKIVIVVFL Mdr5drome SEKVGHFVYL Mdrlcaeel GDKIGMAFQY Mdr3caeel GDKVGVLVGG Mdrlleien NDKLSQGIMN Msbaescco SGALITVVRE Msbahaein SGSLITIVRE
VQII..VGVI VQII..VGVI LQLV..LCLL LFLG..IYLF FTVF..IVII FISL..LSSL FVTL..LSSI VSVP..LLIV AIVP..VVIF FVVI..MVVT VFVI..IAMM PWFP..VYLL PWFP..LFLL LFSL..IFFA LFSF..IFFA LFSF..IFFA LFSF..IFFA MFSF IFFA VFSV..VFIA STW..IISL TIW..MTGL WILL..AVGL IMW..GVCV IATF..VSGF MATF..VSGF MATF..VSGF IATF..FAGF VATF..FAGF VATF..FAGF ITTF..LAGF IATF..LAAF MATF..FGGF MATF..FGGF MATF..FTGF IMTF..VIGI VVGF..IITV LSQF..ITGF VATF..ISGV GSMG..VIGY GASI..IGLF GAYI..ISLF
500 LLYYILGVSA LIGAAVIILL APVQYFVATK LLYYILGVSA LIGAAVIILL APVQYFVATK LLIRLVGWLR VPGMAVLFVT LPLQAVISKH LMV ..... NY SILLTIIALV LISLIAFLSI YLG ..... TI SFTLTLF.LV IMNLLYMIFS FFI..GTWNI GVAILLLIVP VLSLVLFLRV AFL..MSWNP KVSLLLLVIP VISLFYFLKI VAI .... FPS NWAAALILLG TAPLIPLFMA IAV .... FPL NWAAGLILMI TAPLVPLFMI IGLSFLDFTL AFTLGGIMLL TLFLMPPLFY IGLSFINIPL ALGLGLFLLI LLMIIPTVFY V.I .... FLF DPWLGLLSLV GALALMALAW V.L .... FLL HPWLGMLALG GTVVPGGVGL V.M .... WYY SPKLTLVILF SLPCYAAWSV V.M .... WYY SPKLTLVILG SLPCYILWSI V.M .... WYY SPKLTIVILL SLPCYIAWSI V.M .... WYY SPKLTLVVLG SLPCYVIWSV V M WYY SPKLTLVILG SLPFYMGWSI V.M .... FFY SVKLTLVVLA ALPCYFLLSL V.L .... FSQ NTNLFFMTLL ALPIYTVIIF I.L .... GLQ NMQLFLLVLL AIPLYIVVII F.L .... AYQ N I N L F L C S L V V V P I Y I S I V W M.M .... LLY GGYLTWIVLC FTTIYIFIRL V.I...AFIR HWKFTLILSS MFPAICGGIG I.V...GFTAVWQLALVTLAVVPLIAVIGG I.V...GFTAVWQLALVTLAVVPLIAVIGG I.V...GFIR GWKLTLVIMA ISPILGLSTA I.V...GFIR GWKLTLVIMA ISPILGLSAA I.V...GFIR GWKLTLVIMA ISPILGLSAA I.I...GFIS GWKLTLVILA VSPLIGLSSA I.V...GFIS GWKLTLVILA VSPLIGLSSA I.I...GFTR GWKLTLVILA ISPVLGLSAG I.I...GFTR GWKLTLVILA ISPVLGLSAG I.V...GFTR GWKLTLVILA ISPVLGLSAA V.S...AFVY GWKLTLVVLS CVPFIIAATS A.I...SFSY GWKLTLAVSS YIPLVILLNY I.V...AFTH SWQLTLVMLA VTPIQALCGF S.I...GFYM CWQLTLVMMI TVPLQLGSMY I.A...GFVF SWELTLMMIG MMPFIIVMAA I.M...MFYY SWQLSIILIV LAPIVSIAIR A.V...MFYT SWELTIVLFI IGPIIAVLIR
15,5
Hetaanasp Mdrplafa Mdllsacce Mdl2sacce Mt2ratno Tap2musmu Tap2homsa Taplhomsa Taplmusmu Chvaagrtu Ndvarhime Atmlsacce Hmtlschpo Mdlescco Ste6sacce Syrdpsesy Consensus
-.
!-}--.-ig
..
..
15r
SGLAFVLTRI MT.V..CVYF V.V...MFSI SWQLSIISVL IFLLLAVGLS GTKFITIFTYASSF..LGLYI.W...SLIKNARLTLCITCVFPLIYVCGV TQNVSDGTRA IIQG..FVGF G.M...MSFL SWKLTCVMMI LAPPLGAMAL TQKVSDGVKA LICG..VVGV G.M...MCCL SPQLSILLLF FTPPVLFSAS SLNANILLRS LVKV..VGLYY.F...MLQVSPRLTFLSLLDLPLTIAAEK PFNANILLRS LVKV..VGLY F.F...MLQV SPRLTFLSLL DLPLTIAAEK PLNANVLLRS LVKV..VGLY G.F...MLSI SPRLTLLSLL HMPFTIAAEK SENLSLFLWY LVRG..LCLL G.I...MLWG SVSLTMVTLI TLPLLFLLPK SDTLSLLLWY LGRA..LCLL V.F...MFWG SPYLTLVTLI NLPLLFLLPK ...LEFMRTH LATF..VALV L.LIPTAMAM DLRLSFVLIG LGIVYWFIGK .~ LATA..VALM L.LIPTAFAMDVRLSLILVVLGAAYVMISK AMVFHIIPIS FEIS..VVCG I.LT...YQF GASFAAITFS TMLLYSIFTI QVVFQIGPVL LDLG..VAMV Y.FF...IKF DIYFTLIVLI MTLCYCYVTV GEGVLTLVDS LVMA..CAVL I.MMSTQI.. SWQLTLFSLLPMPVMAIMIK AEASAITFQN LVAI..CALL G .... TSFYY SWSLTLIILC SSPIITFFAV LIMPTILVES AVFLFGIAYL AYLSWVVFAI TISLMILGVA MYLLFFMGGM ..................................................
501 Surcricr L ...... SQA QRTTLEHSNE RLKQTNEMLR GMKLLKLYAW Surratno L ...... SQA QRTTLEYSNE RLKQTNEMLR GIKLLKLYAW Mdrleita V ...... QDV S E R M A S W D L RIKRTNELLS GVRIVKFMGW Hemsentfa INSHTIKRFV DKEIMEQG.N VQRIITEAIE GIETIKSANA Lcn31acla FSLISIKRQA NIQYTQQTID FTSVVQEDLN QIEQIKAQAN Nistlacla GQLEFLIQWQ RAS.SERETW YIVYLLTHDF SFKEIKLNNI Spabbacsu GQEEFFIHWK RAG.KERKSW YISYILTHDF SFKELKLYNL Cyddescco LVGMGAADAN RRN.FLALAR LSGHFLDRLR GMETLRIFGR Cyddhaein IVGIAAADNS QKN.MDTLSR LSAQFLDRLR GLETLRLFNR Cydcescco RAGKSTGQ...NL.THLRGQ YRQQLTAWLQ GQAELTIFGA Cydchaein RLGQQFGE...RL.IQARAT YRTQFLEFIQ AQAELLLFNA Aprdpseae FNERATRAPL AKA.GELSIK SGQLASNNLR NAEVIEAMGM Prtderwch AEPASDQSTA GGS.NQQSQQ ATHLADAQLR NADVIEAMGM Hlybescco FISPILRRRL DDK.FSRNAD NQSFLVESVT AINTIKAMAV Hlybpasha FISPILRRRL DEK.FARSAD NQAFLVESVT SINMIKAMAV Rt3bactpl FISPILRRRL DEK.FARNAD NQSFLVESVS AIDTIKALAV Hlybactac FISPILRRRL DDK.FARNAD NQSFLVESVT AINTIKAMAI Hlybactpl FISPILRRRL DEK.FARGAH NQSFLVESVT AINTIKALAV Cyabborpe VLTPVLRRRL HVK.FNRGAE NQAFLVETVS GIDTVKSLAV Comastrpn AFMKPFEKMN RDT.MEANAV LSSSIIEDIN GIETIKSLTS Lcnclacla IFTPLFEKQN HEV.MQTNAV LNSSIIEDIN GIETIKALAS Peddpedac LFKKTFNRLN QDT.MESNAV LNSAIIESLS GIETIKSLTG Cvabescco VTYGNYRQIS EEC.LVREARAASYFMETLY GIATVKIQGM Pmdlschpo LGVPFITKNT KGQ.IAVVAE SSTFVEEVFS NIRNAFAFGT Pglyarath IHTTTLSKLS NKS.QESLSQ AGNIVEQTVVQIRVVMAFVG Pgplarath IHTTTLSKLS NKS.QESLSQ AGNIVEQ ......... PLNK Mdr2musmuVWAKILSTFS DKE.LAAYAK AGAVAEEAPG AIRTVIAFGG Mdr3crigrVWAKILSTFS DKE.LAAYAK AGAVAEEALG AIRTVIAFGG Mdr3homsaVWAKILSAFS DKE.LAAYAK AGAVAEEALG AIRTVIAFGG Mdrlmusmu LWAKVLTSFT NKE.LQAYAK AGAVAEEVLA AIRTVIAFGG Mdr2crigr MWAKVLTSFT NKE.LQAYAK AGAVAEEVLA AIRTVIAFGG Mdrlcrigr IWAKILSSFT DKE.LQAYAK AGAVAEEVLA AIRTVIAFGG Mdr3musmu IWAKILSSFT DKE.LHAYAK AGAVAEEVLA AIRTVIAFGG MdrlhomsaVWAKILSSFT DKE.LLAYAK AGAVAEEVLA AIRTVIAFGG Mdr4dromeVVARLQGSLA EKE.LKSYSDAANVVEEVFS GIRTVFAFSG
550 ESIFCSRVEV ENIFCSRVEK EPVFLARIQD KKSFLLNWKN EKECVKRWTK SNYFIHKFGK KDYLLNKYWD GEAEIESIRS TSEQTEHIEN SDRYRTQLEN EDKLKEKMLV LGSMRGRWER LGNLRRRWLA SPQMTNIWDK APQMTDTWDK TPQMTNIWDK SPQMTNIWDK TPQMTNTWDK EPQWQRNWDR ESQRYQKIDK EQERYQKIDY EATTKKKIDT VGIRGAHWLN QDILAKLYNK ESRASQAYSS NKKTHHHQQT QNKELERYQK QNKELERYQK QNKELERYQK QQKELERYNK QNKELERYNK QKKELERYNN QKKELERYNN QKKELERYNK QEKEKERFGK
Mdr5drome YVAKFQGKLT ARE.QESYAG AGNLAEEILS SIRTVVSFGG EKSEVQRYEN Mdrlcaeel AIAKSMSTFA IRE.TLRYAK AGKVVEETIS SIRTVVSLNG LRYELERYST !i!t!.;%~!.j Mdr3caeel LSAKHLNRAT KNE.MSAYSN AGGMANEVIA GIRTVMAFNA QPFEINRYAH Mdrlleien IIGSIVSKIT ESS.RKYFAK AGSLATEVME NIRTVQAFGR EDYELERFTK MsbaesccoVVSKRFRNIS KNM.QNTMGQ VTTSAEQMLK GHKEVLIFGG QEVETKRFDK Msbahaein LVSKIFRRLS KNL.QDSMGE LTSATEQMLK GHKVVLSFGG QHVEEVHFNH Hetaanasp TLNKRVRETS FGI.SHANAQ FTAVAVEFIN GIRTIQAFGT QEFERQRFYK Mdrplafa ICNKKVKLNK KTS.LLYNNN TMSIIEEALM GIRTVASYCG EKTILNKFNL Mdllsacce IYGRKIRNLS RQL QTSVGG LTKVAEEQLN ATRTIQAYGG EKNEVRRYAK Mdl2sacce VFGKQIRNTS KDL QEATGQ LTRVAEEQLS GIKTVQSFVA EGNELSRYNV Mt2ratno VYNPRHQAVL KEI.QDAVAK AGQVVREAVG GLQTVRSFGA EEQEFRRYKE liiiti::i:i!;i~ii Tap2musmu VYNPRHQAVL KEI.QDAVAK AGQVVREAVG GLQTVRSFGA EEQEVSHYKE Tap2homsa VYNTRHQEVL REI.QDAVAR AGQVVREAVG GLQTVRSFGA EEHEVCRYKE Taplhomsa KVGKWYQLLE VQV.RESLAK SSQVAIEALS AMPTVRSFAN EEGEAQKFRE Taplmusmu KLGKVHQSLA VKV.QESLAK STQVALEALS AMPTVRSFAN EEGEAQKFRQ ChvaagrtuWVMGRTKDGQ ASV.EEHYHS VFAHVSDSIS NVSVLHSYNR IEAETKALKS N d v a r h i m e V V M S R T K E G Q A A V EGHYHT VFSHVSDSIS NVSVVHSYNR IEAETRELKK Atmlsacce KTTAWRTHFR RDA.NKADNKAASVALDSLI NFEAVKYFNN EKYLADKYNG ittit;;!::i',ii~,:~ Hmtlschpo KITSWRTEAR RKM.VNTWRE SYAVQNDAIM NFETVKNFDA DDFENERYGH Mdlescco RNGDALHERF KLA.QAAFSS LNDRTQESLT SIRMIKAFGL EDRQSALFAA Ste6sacce VFSRMIHVYS E K E . N S E T S K A A Q L L T W S M N A A Q L V R L Y C T QRLERKKFKE Syrdpsesy KFTHKVRDEF TAFNEYTHAL VFGLKELKLN GIRR.RWFSR SAIQESSVRV Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ~iild:?iil _
.
i%iiiir ..
iii!i!t' :::.ii!ii .....:,.... ........
:.:..
:
Surcricr Surratno Mdrleita Hemsentfa Lcn31acla Nistlacla !.t:;i~:}i~!:tt!::;il Spabbacsu i:':!:ii;it!!i;,i!ti:; Cyddescco Cyddhaein Cydcescco Cydchaein Aprdpseae Prtderwch Hlybescco Hlybpasha Rt3bactpl Hlybactac Hlybactpl Cyabborpe Comastrpn Lcnclacla Peddpedac Cvabescco Pmdlschpo Pglyarath Pgplarath t!:!:!!ii}!i!:!!~:~!! ' Mdr2musmu Mdr3crigr Mdr3homsa .....
........................ ........_
ii:i~ii!::ii-.~::ti:::::~~i;
iti!<;::it:ii:i~::i;
.....
=.....
:..
:::::::::::::::::::::::::~:.~
...
:::::::::::::::::::::::::::::::::::::
t-:...-: ..:.......... ..:..:.:.
!<:!::!!i!}:~
::::::::::::::::::::::::::,i::tit::i kt;:::.:{~?.:~:.::::t{.~::~
!:::i%i~;igtti
551 600 TRRKEMTSLRAFAVYTSISI FMNTAIP... IAAVL ..... ITFVGHVSFF TRRKEMTSLR AFAVYTSISI FMNTAIP... IAAVL ..... ITFVGHVSFF ARSRELRCLR DVHVANVFFM FVNDATPTLV IAVVF ..... ILY..HVS.. MFTSQLLITK NKNRYIAIFG ILPEIIQSVM PALFLIIGIK LII ....... KSAQTIFSYN KILNIDGITS AFNQGFNYIC VILMMIFGIY LNQ ....... LKKGFINQDL AIARKKTYFN IFLDFILNLI NILTIFAMIL SVR ....... IKKSFIEQDT KILRKKTLLN LIYEIAVQLV GAVIIFIAIM SAF ....... ASEDFRQRTM EVLRLAFLSS GILEFFTSLS IALVAVYFGF SYLGELDFGH ATEDFRETTM DVLKLAFLSS AVLEFFTSIS IALMAVYFGF SYLGQIEFGT TEIQWLEAQR RQSELTALSQ AIMLLIGALA VILM ..... L WMAS.GGVGG TEKTWQEDQA KEAKLSGFST ALVLFLNGLL ISGM ..... L WFASNADFGT LHQAFLDQQS LASERAARINALSKYLRIAL QSLVLGLGAW LAV ....... RHYRFISLQN LASERAAAVG GASKYSRIAL QSLMLGLGAL LAI ....... QLAGYVAAGF KVTVLATIGQ QGIQLIQKTV MIINLWLG ........ AHLV QLASYVSSSF RVTVLATIGQ QGVQLIQKTV MVINLWLG ........ AHLV QLASYVSADF RVTVLATIGQ QGVQLIQKTV MIINLWLG ........ AHLV QLASYVAVSF KVTVLATIGQ QGIQLIQKAV MVINLWLG ........ AHLV QLASYVSAGF RVTTLATIGQ Q G V Q F I Q K V V M V I T L W L G ........ AHLV QLAGYVAAGL SVANVAMLAN TGVTLISRL ..... LRWESC .... GWAHRG EFVDYLKKSF TYSRAESQQK ALKKVAHLLL NVGILWMG ........ AVLV EFASYLKKAF TLQKSEAIQG LIKAIIQLTL SVTILWFG ........ ATLV LFSDLLHKNL AYQKADQGQQ AIKAATKLIL TIVILWWG ........ TFFV MKIDAINSGI KLTRMDLLFG GINTFVTACD QIVILWLG ........ AGLV YLITAQRFGI NKAIAMGLMV GWMFFVAYGV YGLAFWEGGR LL ........ ALKIAQKLGY KTGLAKGMGL GATYFVVFCC YALLLWYGGY LV ........ NIKPNDNKSD EQRRAKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HLENAKKIGI KKAISANISM GIAFLLIYAS YALAFWYGST LVI ....... HLENAKKIGI KKAISANISM GIAFLLIYAS YALAFWYGST LVI ....... HLENAKEIGI KKAISANISM GIAFLLIYAS YALAFWYGST LVI .......
157
.:-:.
II 15~
Mdrlmusmu Mdr2crigr Mdrlcrigr Mdr3musmu Mdrlhomsa Mdr4drome Mdr5drome Mdrlcaeel Mdr3caeel Mdrlleien Msbaescco Msbahaein Hetaanasp Mdrplafa Mdllsacce Mdl2sacce Mt2ratno Tap2musmu Tap2homsa Taplhomsa Taplmusmu Chvaagrtu Ndvarhime Atmlsacce Hmtlschpo Mdlescco Ste6sacce Syrdpsesy Consensus
NLEEAKNVGI KKAITASISI GIAYLLVYAS YALAFWYGTS LVL ....... NLEEAKNVGI KKAVTANISI GIAYLLVYAS YALAFWYGTS LVL ....... NLEEAKRLGI KKAITANISM GAAFLLIYAS YALAFWYGTS LVI ....... NLEEAKRLGI KKAITANISM GAAFLLIYAS YALAFWYGTS LVI ....... NLEEAKRIGI KKAITANISI GAAFLLIYAS YALAFWYGTT LVL ....... LLIPAENTGR KKGLYSGMGN ALSWLIIYLC MALAIWYGVT LILDE..RDL FLVPARKASQ WKGAFSGLSD AVLKSMLYLS C A G A F W Y G V N L I I D D . . R N V AVEEAKKAGV LKGLFLGISF GAMQASNFIS F A L A F Y I G V G W V ........ QLNEARRMGI RKAIILAICT AFPLMLMFTC MAVAFWYGAT LA ........ AVLYAQGRGI RKELASNLSA AVIMALMYVS YTVAFFFGSY LVEWG..RR. VSNRMRLQGM KMVSASSISD PIIQLIASLA LAFVLYAASF PSV ....... VSNDMRRKSM KMVTANSISD PVVQVIASLA LATVLYLATT PLI ....... ASTNQLNAAI KVVLAWTLVK PIAEGIA .... TTVLISLIV ISF ....... SETFYSKYIL KANFVEALHI GLINGLILVS YAFGFWYGTR IIINSATNQY EVRNVFHIGL KEAVTSGLFF GSTGLVGNTA MLSLLLVGTS MIQ ....... AIRDIFQVGK TAAFTNAKFF TTTSLLGDLS FLTVLAYGSY LVL ....... A L E R C R Q L W W R R D L E K S L Y L VIQRVMALGM QVLILNVGVQ QIL ....... A L E R C R Q L W W R R D L E K D V Y L VIRRVMALGM QVLILNCGVQ QIL ....... ALEQCRQLYW RRDLERALYL LVRRVLHLGV QMLMLSCGLQ QMQ ....... KLQEIKTLNQ KEAVAYAVNS WTTSISGMLL KVGILYIGGQ LVT ....... KLEEMKTLNK KEALAYVAEV WTTSVSGMLL KVGILYLGGQ LVI ....... FTEKLLSAQY PVLDWWAFAS ALNRTASTVS MMIILVIGTV LVK ....... FTQRLLSAQY PVLDWWALAS GLNRIASTIS MMAILVIGTV LVQ ....... SLMNYRDSQI KVSQSLAFLN SGQNLIFTTA LTAMMYMGCT GVI ....... AVDIYLKQER KVLFSLNFLN IVQGGIFTFS LAIACLLSAY RVT ....... DAEDTGKKKL RVARIDARFD PTIYIAIGMA NLLAIGGGSW M W ....... IILNCNTFFI KSCFFVAANA GILRFLTLTM FVQGFWFGSA MI ........ AKYNYIERLW FTAAENVGQL TLSLLVGCLL FAAPMF .............. ..................................................
Surcricr Surratno Mdrleita Hemsentfa Lcn31acla Nistlacla Spabbacsu Cyddescco Cyddhaein Cydcescco Cydchaein Aprdpseae Prtderwch Hlybescco Hlybpasha Rt3bactpl Hlybactac Hlybactpl Cyabborpe Comastrpn Lcnclacla Peddpedac Cvabescco
601 650 KESDLSPSVA FASLSLFHIL VTPLFLLSSV VRSTVKALVS VQKLSEF... KESDFSPSVAFASLSLFHILVTPLFLLSSVVRSTVKALVSVQKLSEF... . G K V L K P E V V F P T I A L L N T M R V S F F M I P I I ISSILQCFVSAKRVTAF... .NNSLSLGSL IGFVSIVTMV MKPILSLVSS YNDFLLLNVY FQKLSEV... .GNLVSIPDL IIFQSGISLF VSAVNQIQDV MFEISRLSIY GNKISDL... .AGKLLIGNL VSLIQAISKI NTYSQTMIQN IYIIYNTSLF MEQLFEF... .AGKIMVGNV MSYIRSVSLV QNHSQSIMTS IYSIYNSNLY MNQLYEF... YDTGVTLAAG FLALILAPEF FQPLRDLGTF YHAKAQAVGA ADSLKTF... YNAPLTLFTG FFCLILAPEF YQPLRDLGTY YHDRAAGIGA ADAIVDF... NAQPGALIAL FVFCALAA.. FEALAPVTGA FQHLGQVIAS AVRISDL... DEYHTAYIAL FTFAALAA.. F E I I M P L G A A F L H I G Q V I A A A E R V T E I . . . .EGRITPGMM IAGSILMGRA LGPIDQLIGV WKQWGAARDA YRRLSGL... .DGKITPGMM IAGSILVGRV LSPIDQLIGV WKQWSSARIA WQRLTRL... ISGDLSIGQL IAFNMLAGQI VAPVIRLAQI WQDFQQVGIS VTRLGDV... ISGDLSIGQL IAFNMLSGQV IAPVIRLAQL WQDFQQVGIS VTRLGDV... ISGDLSIGQL ITFNMLSGQV IAPVVRLAQL WQDFQQVGIS ITRLGDV... ISGDLSIGQL IAFNMLAGQI ISPVIRLAQI WQDFQQVGIS VTRLGDV... ISGDLSIGQL IAFNMLSGQV IAPVIRLAQL WQDFQQVGIS VTRLGDV... GRARMTVGEL VAFNMLSGHV T Q P V I R L A Q L W N D F Q Q T G V S MQRLGDI... MDGKMSLGQL ITYNTLLVYF TNPLENIINL QTKLQTAQVA NNRLNEV... ISQKITLGQL ITFNALLSYF TNPITNIINL QTKLQKARVA NERLNEV... MRHQLSLGQL LTYNALLAYF LTPLENIINL QPKLQAARVA NNRLNEV... IDNQMTIGMF VAFSSFRGQF SERVASLTSF LLQLRIMSLH NERIADI...
i: i%!
7~ ~:: - ?h
9 :-.(
.. ..,u ::-
s
.:..:i
-: .-):. ;.
.
.
i:/i:.L :;::-7 ..........
:..: ...~...~.
.........
.i~i::.F :;[if.ii:;
.
.
!i(;:r
:--.vt: (::.~:
~;::~:?Ci!::~i2:
.......... ......... .:... ......
,.i~}!:!~i;);:
Pmdlschpo HAGDLDVSKL IGCFFAVLIA SYSLANISPK MQSFVSCASA AKKIFDT.~ Pglyarath RHHLTNGGLA IATMFAVMIG GLALGQSAPS MAAFAKAKVAAAKIFRI... Pgplarath ................... C LQALGQSAPS MAAFAKAKVAAAKIFRI... Mdr2musmu .SKEYTIGNA MTVFFSILIG AFSVGQAAPC IDAFANARGA AYVIFDI... Mdr3crigr .SKEYTIGNA MTVFFSILIG AFSVGQAAPC IDAFANARGA AYVIFDI... Mdr3homsa .SKEYTIGNA MTVFFSILIG AFSVGQAAPC IDAFANARGA AYVIFDI... Mdrlmusmu .SNEYSIGEV LTVFFSILLG TFSIGHLAPN IEAFANARGA AFEIFKI... Mdr2crigr .SNEYSVGQV LTVFFSILFG TFSIGHIAPN IEVFANARGA AYEIFKI... Mdrlcrigr .SKEYSIGQV LTVFFAVLIG AFSIGQASPN IEAFANARGA AYEIFNI... Mdr3musmu .SKEYSIGQV LTVFFSVLIG AFSVGQASPN IEAFANARGA AYEVFKI... Mdrlhomsa .SGEYSIGQV LTVFFSVLIG AFSVGQASPS IEAFANARGA AYEIFKI... Mdr4drome PDRVYTPAVL VIVLFAVIMG AQNLGFASPH VEAIAVATAAGQTLFNI... Mdr5drome ENKEYTPAIL MIAFFGIIVG ADNIARTAPF LESFASARGC ATNLFKV... Mdrlcaeel HDGSLNFGDM LTTFSSVMMG SMALGLAGPQ LAVLGTAQGA ASGIYEV... Mdr3caeelAAGAVSSGAV FAVFWAVLIG TRRLGEAAPH LGAITGARLA IHDIFKV... Mdrlleien ..... DMADI ISTFLAVLMG SFGLGFVAPS RTAFTESRAAAYEIFKA... Msbaescco M.DSLTAGTI T W F S S M I A L MRPLKSLTNV NAQFQRGMAACQTLFTI... Msbahaein AEDNLSAGSF TVVFSSMLAM MRPLKSLTAV NAQFQSGMAACQTLFAI... Hetaanasp ATFTLPVASL LTFFFVLVRV IPNIQDINGT VAFLSTLQGS SENIKNI... Mdrplafa PNNDFNGASV ISILLGVLIS MFMLTIILPN ITEYMKALEA TNSLYEI... Mdllsacce .SGSMTVGEL SSFMMYAVYT GSSLFGLSSF YSELMKGAGAAARVFEL... Mdl2sacce .QSQLSIGDL TAFMLYTEYT GNAVFGLSTF YSEIMQGAGA ASRLFEL... Mt2ratno .AGEVTRGGL LSFLLYQEEV GHHVQNLVYM YGDMLSNVGA AEKVFSY... Tap2musmu .AGEVTRGGL LSFLLYQEEV GQYVRNLVYM YGDMLSNVGA AEKVFSY... Tap2homsa .DGELTQGSL LSFMIYQESV GSYVQTLVYI YGDMLSNVGA AEKVFSY... Taplhomsa .SGAVSSGNL VTFVLYQMQF TQAVEVLLSI YPRVQKAVGS SEKIFEY... Taplmusmu .RGTVSSGNL VSFVLYQLQF TQAVQVLLSL YPSMQKAVGS SEKIFEY... Chvaagrtu .NGELRVGDV IAFIGFANLL IGRLDQMRQF VTQIFEARAK LEDFFVL... Ndvarhime .RGELGVGEV IAFIGFANLL IGRLDQMKAF ATQIFEARAK LEDFFQL... Atmlsacce .GGNLTVGDL VLINQLVFQL SVPLNFLGSV YRDLKQSLID METLFKL... Hmtlschpo FGFNTVGDF VILLTYMIQL QQPLNFFGTL YRSLQNSIID TERLLEI Mdlescco .QGSLTLGQL TSFMMYLGLM IWPMLALAWM FNIVERGSAAYSRIRAM... Ste6sacce KKGKLNINDV ITCFHSCIML GSTLNNTLHQ IVVLQKGGVA MEKIMTL... Syrdpsesy ..AVIDAKTM SASVLAVLYI MGPLVMLVSA MPMLAQGRIA CTRLADFGFS Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
651 700 Surcricr LSSAEIREEQ CAPREPAPQG QAGKYQAVPL KVVNRKRPAR EEVRDLLGPL Surratno LSSAEIREEQ CAPREPAPQG QAGKYQAVPL KVVNRKRPAR EEVRDLLGPL i!il;~i?!=ii:,i:;i Mdrleita IECPDTHSQV QDIASIDVPDAAAIFKGASI HTYLPVKLPR CKSR..LTAM Hemsentfa LTYE ....... EKNDFNNKK GNVG.KEEFI YRVNNVYYTI SVFEKNI... Lcn31acla LI.E ....... NPQRIDNIE KHSN.NAIIL KDISYSY .... ELNNYI... Nistlacla L K R E S W H K K ...IEDTEIC NQHI.GTVKV INLSYVYPNS NA..F.A... SpabbacsuL..ELKEEKS ...QGHKKPIVEPI.HSVVFQNVSFIYPNQGE..Q.T... Cyddescco METPLAHPQR GEAELA ...... ST.DPVTI EAEELFITSP EGKTL.A... Cyddhaein LESDYLTVHQ NEKTIS ...... LE.SAVEI SAENLVVLST QGSAL.T... Cydcescco TDQK.PEVTF PDTQTR ...... VA.DRVSL TLRDVQFTYP EQSQQ.A... Cydchaein IEQK.PLVEF NGNEEF ...... ET.KVRLI SAKNLNFSYP EQETL.V... Aprdpseae LD ....... E FPARERRMEL PEPR.GHLLL ESLDA..APP GSEAR.T... Prtderwch IA ....... A YPPRPAAMAL PAPE.GHLSV EQVSL..RTA QGNTR ..... Hlybescco ..LNSPTESY ...HGKL.TLPEIN.GDITF RNIRFRYKPDSPV...I... i:i!?!i;,i!i~;il;f=,i~Hlybpasha ! ..LNSPTEQY ...QGKL.SLPEIK.GDISF KNIRFRYKPDAPT...I... Rt3bactpl ..LNSPTENY ...QGKL.SLPEIF.GDIAF KHIRFRYKPDAPI...I... :!!i:oiii~-:i!~:,;f? Hlybactac ..LNSPTENN ...TASV.SLPEIQ.GEISFRNIKFRYKPDSPM...I.....
::.::::~,::::::::::::::::::::::::::
15g
P-Glycoprotein transporter family
:.--:::i
.
..... ? . . . : k
..
:.:.: .. .... ..
:,
i"::
::: :
..~.
;. ::::::-
:.2
..
.
) :::!i " (:t ..
::-. 7,: : . .:.:>
i}/ .
.
:{
:.
..... ...:. :.7:-}
:~::i
!..../:.;-:
~
b-: ....
(::
.
..:.,:
:.:
....
:-._- K... ..
....
-i:/
: .):
Hlybactpl ..LNSPTESY . . . Q G K L . A L P E I K . G D I T F R N I R F R Y K P D A P V . . . I . . . Cyabborpe ..LNCRTEVA ...GDKA.QL PALR.GSIEL DRVSFRYRPDAAD...A... Comastrpn YLVASEFEEK ...KTVE.DL SLMK.GDMTF KQVHYKY.GY GRD...V... LcnclaclaYLVPSEFEEK ...KT...EL SLSH.FNLNM SDISYQY.GFGRK...V... Peddpedac YLVESEFSKS . . . R E I T . A L E Q L N . G D I E V N H V S F N Y . G Y C S N . . . I . . . CvabesccoALHEKEEKKP . . . E I E I . V A D M G P . I S L E T N G L S Y R Y D S Q S A P . . . I . . . Pmdlschpo IDRVSPI.NA FTPTGDV.VK D.IK.GEIEL KNIRFVYPTR PEV.L.V... Pglyarath IDHKPTI.ER NSESGVE.LD S.VT.GLVEL KNVDFSYPSR PDV.K.I... Pgplarath IDHKPTI.ER NSESGVE.LD S.VT.GLVEL KNVDFSYPSR PDV.K.I... Mdr2musmu IDNNPKI.DS FSERGHK.PD N.IK.GNLEF SDVHFSYPSR ANI.K.I... Mdr3crigr IDNNPKI.DS FSERGHK.PD S.IK.GNLDF SDVHFSYPSR ANI.K.I... Mdr3homsa IDNNPKI.DS FSERGHK.PD S.IK.GNLEF NDVHFSYPSR ANV.K.I... Mdrlmusmu IDNEPSI.DS FSTKGYK.PD S.IM.GNLEF KNVHFNYPSR SEV.Q.I... Mdr2crigr IDNEPSI.DS FSTQGHK.PD S.VM.GNLEF KNVHFSYPSR SGI.K.I... Mdrlcrigr IDNKPSI.DS FSKNGYK.PD N.IK.GNLEF KNIHFSYPSR KDV.Q.I... Mdr3musmu IDNKPSI.DS FSKSGHK.PD N.IQ.GNLEF KNIHFSYPSR KEV.Q.I... Mdrlhomsa IDNKPSI.DS YSKSGHK.PD N.IK.GNLEF RNVHFSYPSR KEV.K.I... Mdr4drome IDRPSQV.DP MDEKGNR.PE NTA..GHIRF EGIRFRYPAR PDV.E.I... Mdr5drome IDLTSKI.DP LSTDGKL.LN YGLR.GDVEF QDVFFRYPSR PEV.I.V... Mdrlcaeel LDRKPVI.DS SSKAGRK..D MKIK.GDITV ENVHFTYPSR PDV.P.I... Mdr3caeel IDHEPEI.KC TSSEGKI.PE K.IQ.GKLTFDGIEFTYPTRPEL.K.I... Mdrlleien I D R V P P V . D . . I D A G G V . P V P G F K . E S I E F R N V R F A Y P T R P G M . I . L . . . MsbaesccoLDSEQ...EK ..DEGKR.VI ERAT.GDVEF RNVTFTYPGR .DV.P.A... Msbahaein LDLEP...EK ..DDGAY.KA EPAK.GELEF KNVSFAYQGK .DE.L.A... HetaanaspLQTNN...KPYLKNGKL.HFQGLK.RSIDLVSVDFGYTA..DN.L.V... Mdrplafa INRKPLV.ENNDDGETL.PN IK .... KIEF KNVRFHYDTRKDV.E.I... Mdllsacce NDRKPLI.RP ..TIGKD.PV SLAQ.KPIVF KNVSFTYPTR PKH.Q.I... MdI2sacceTDRKPSI.SP ..TVG.H.KYKPDR.GVIEF KDVSFSYPTRPSV.Q.I... Mt2ratno LDRRPNL.PN . . P G T L A . P P R L . E . G R V E F Q D V S F S Y P S R P E K . P . V . . . Tap2musmu LDRKPNL.PQ ..PGILA.PP WL.E.GRVEF QDVSFSYPRR PEK.P.V... Tap2homsa MDRQPNL.PS ..PGTLA.PT TL.Q.GVVKF QDVSFAYPNR PDR.P.V... Taplhomsa LDRTPRC.PP ..SGLLT.PL HL.E.GLVQF QDVSFAYPNR PDV.L.V... Taplmusmu LDRTPCS.PL ..SGSLA.PS NM.K.GLVEF QDVSFAYPNQ PKV.Q.V... Chvaagrtu EDAVKEREEP GDARELS .... NVS.GTVEF RNINFGFANT K...Q.G... Ndvarhime EDSVQDREEP ADAGELK .... GVV.GEVEF RDISFDFANS A...Q.G... Atmlsacce . R K N E V K I K N A E R P L M . L P E N V P . Y D I T F E N V T F G Y H P D R . . . K . I . . . Hmtlschpo F E E K P T W E K P N A P D L K VTQG.. .KVIF SHVSFAYDPRK P V Mdlescco LAERPVVND ..... GSE P V P E G R G E L D V N I H Q F T Y P . Q T D H P A Ste6sacce LKDGSKRNPL NKTVAHQFPL DYAT.SDLTF ANVSFSYPSR PSE.A.V... Natabacsu .......................... MITL TDCSRRFQDK KKVVK.A... Syrdpsesy INEPHPEPET SDADNVLLLD HKKSWGSIQL KNVHMNYKDP QSSSGFA... Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Y ..........
.. b:. ?-: :.--
701
!.2
/?
" :
..
16t~
750
Surcricr QRLA ................ PSMDGDADNF CVQIIGGFFTWTPDGIPT.. Surratno QRLT ................ PSTDGDADNF CVQIIGGFFT WTPDGIPT.. Mdrleita QRSTLWFRRR GVPETEWYEV DSPDASASSL AVHSTTVHMG STQTVITDSD Hemsentfa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lcn31acla . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nistlacla . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spabbacsu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cyddescco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cyddhaein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cydcescco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
P-Glycoprotein transporter [amily
::if: ::7 Z?II: <:. .......
Cydchaein
..................................................
Prtderwch
..................................................
Aprdpseae
Hlybescco
Hlybpasha
Rt3bactpl ==:i[:!i :/~:-
Hlybactac
Hlybactpl Cyabborpe Comastrpn
Lcnclacla
.......
..... .......
<{i~i:::ii: ~7 t!ii:::i :!
?f::!!i:
:%.-<..:::: ;{4: ..i .,";?.;
..................................................
Mdr3crigr
Mdr3homsa
Mdr Imusmu
Mdr2crigr
Mdrlcrigr
..................................................
.................................................. ..................................................
..................................................
.................................................. ..................................................
..................................................
Mdr5drome
..................................................
Mdr4drome
Mdr3caeel
Mdrlleien Msbaescco Msbahaein
..................................................
..................................................
..................................................
.................................................. ..................................................
Hetaanasp
..................................................
Mdllsacce
..................................................
.................................................. ..................................................
Mt2ratno
..................................................
Tap2homsa
..................................................
Taplmusmu
..................................................
Tap2musmu
Taplhomsa
:iili:,i::<:j~<
Hmt ischpo
Atmlsacce
Mdlescco
Ste6sacce
~ :< T .
..................................................
Mdrlhomsa
Chvaagrtu
ii
..................................................
Mdr2musmu
Ndvarhime
'257;;)i(}f
..................................................
..................................................
Pgp lar ath
i:i; :i:i:[[:::ii i!;::b :;<
..................................................
Pglyarath
Mdl2sacce
i!}ii ii:>i;:
..................................................
..................................................
..................................................
Mdrplafa
~.::? ...<:: :< ;
..................................................
Pmdlschpo
Mdrlcaeel
iiiiii:i ~:!::7:i
..................................................
..................................................
..................................................
~fJii:!i(:::i;! M d r 3 m u s m u
~i~!:i~i!.<:!/;
..................................................
Peddpedac
Cvabescco
':;ii!i!(:;
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................
.................................................. ..................................................
..................................................
Natabacsu
..................................................
Consensus
..................................................
Syrdpsesy
Surcricr
Surr atno
Mdrleita
..................................................
751
..........................
..........................
GAAGEDEKGE
VEEGDREYYQ
LSNI
LSNI
LVSKELLRNV
TIRIPRGQLT
TIRIPRGQLT
SLTIPKGKLT
800
MIVGQVGCGK
MIVGQVGCGK MVIGSTGSGK
161
:.r"
i
-..
.
Hemsentfa .......................... Lcn31acla .......................... Nistlacla .......................... Spabbacsu .......................... Cyddescco
...........................
Cyddhaein
...........................
Cydchaein Aprdpseae Prtderwch Hlybescco Hlybpasha Rt3bactpl Hlybactac Hlybactpl Cyabborpe Comastrpn Lcnclacla Peddpedac Cvabescco Pmdlschpo Pglyarath Pgplarath Mdr2musmu Mdr3crigr Mdr3homsa Mdrlmusmu Mdr2crigr Mdrlcrigr Mdr3musmu Mdrlhomsa Mdr4drome Mdr5drome Mdrlcaeel Mdr3caeel Mdrlleien Msbaescco Msbahaein Hetaanasp Mdrplafa Mdllsacce Mdl2sacce Mt2ratno Tap2musmu Tap2homsa Taplhomsa Taplmusmu Chvaagrtu Ndvarhime Atmlsacce Hmtlschpo Mdlescco Ste6sacce Natabacsu
.......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... .......................... ..........................
Cydcescco
:i:-
:
::) .-::: - . . .
. .
--.<-:.
...
.
..
: . . - . .
. . .
.
.
.
: .-:~..~:
u/-.:
.
.
.
.
.....:
U.::i i/
....
162
..........................
.......................... .......................... .......................... .......................... ..........................
LNGI S F D I R K G D K V A I V G R S G S G K FNNI N F S I K K G E K I A I V G K S G S G K LKNI N L S F E K G E L T A I V G K N G S G K LKHI N V S L H K G E R V A I V G P N G S G K GPL N F T L P A G Q R A V L V G R S G S G K KPL N F Q I P A N H N V A L V G Q S G A G K LKGI S L Q V N A G E H I A I L G R T G C G K LKNL TLDLEQGKKI AILGKTGSGK LRGL TLAIPAGSVVGVIGPSGSGK LQNI H F S L Q A G E T L V I L G A S G S G K LDNI N L S I K Q G E V I G I V G R S G S G K LNNV NLEIRQGEVI GIVGRSGSGK LDDV NLSVKQGEVI GIVGRSGSGK LNNI N L D I S Q G E V I G I V G R S G S G K LNDV NLSIQQGEVI GIVGRSGSGK LRNV SLRIAPGEVVGVVGRSGSGK LSDI N L T V P Q G S K V A F V G I S G S G K LSEI E L S I K E N E K L T I V G M S G S G K LEDV SLTIPHHQKI TIVGMSGSGK FSAL SLSVAPGESV AITGASGAGK LDNF S L V C P S G K I T A L V G A S G S G K LNNF C L S V P A G K T I A L V G S S G S G K LNNF CLSVPAGKTI ALVGSSGSGK LKGL NLKVKSGQTV ALVGNSGCGK LKGL NLKVQSGQTV ALVGNSGCGK LKGL NLKVQSGQTV ALVGSSGCGK LKGL NLKVKSGQTV ALVGNSGCGK LKGL NLKVQSGQTV ALVGKSGCGK LKGL NLKVQSGQTV ALVGNSGCGK LKGL NLKVKSGQTV ALVGNSGCGK LKGL NLKVQSGQTV ALVGNSGCGK LKGL TVDVLPGQTV AFVGASGCGK HRGL NIRIRAGQTV ALVGSSGCGK LRGM NLRVNAGQTV ALVGSSGCGK LKGV SFEVNPGETV ALVGHSGCGK FRDL S L K I K C G Q K V A F S G A S G C G K LRNI N L K I P A G K T V A L V G R S G S G K LNNI S F S V P A G K T V A L V G R S G S G K LNNI T L T I E R G K T T A L V G A S G A G K YKDL SFTLKEGKTY AFVGESGCGK FKDL N I T I K P G E H V C A V G P S G S G K FKNL N F K I A P G S S V C I V G P S G R G K LQGL TFTLHPGKVT ALVGPNGSGK LQGL TFTLHPGTVT ALVGPNGSGK LKGL TFTLRPGEVT ALVGPNGSGK LQGL TFTLRPGEVT ALVGPNGSGK LQGL TFTLHPGTVT ALVGPNGSGK VHDV SFTAKAGETV AIVGPTGAGK VRNV SFKAKAGQTI AIVGPTGAGK LKNA SFTIPAGWKT AIVGSSGSGK LSDI N F V A Q P G K V I A L V G E S G G G K LENV NFALKPGQML GICGPNGSGK LKNV SLNFSAGQFT FIVGKSGSGK VRDV SLTIEKGEVVGILGENGAGK
Syrdpsesy .......................... Consensus ..........................
::-:x.::::.:..
--:
::::::::::::::::::::::::.,
~?~i:::::::::::::::::::::::::::::::
Surcricr Surratno Mdrleita Hemsentfa Lcn31acla Nistlacla Spabbacsu Cyddescco Cyddhaein Cydcescco Cydchaein Aprdpseae Prtderwch Hlybescco Hlybpasha Rt3bactpl Hlybactac Hlybactpl Cyabborpe Comastrpn Lcnclacla Peddpedac Cvabescco Pmdlschpo Pglyarath Pgplarath Mdr2musmu Mdr3crigr Mdr3homsa Mdrlmusmu Mdr2crigr Mdrlcrigr Mdr3musmu Mdrlhomsa Mdr4drome Mdr5drome Mdrlcaeel Mdr3caeel Mdrlleien Msbaescco Msbahaein Hetaanasp Mdrplafa Mdllsacce Mdl2sacce Mt2ratno Tap2musmu Tap2homsa Taplhomsa Taplmusmu
801 SSLLLATLGE SSLLLATLGE STLLGALMGE STLLKLLAGL STLFNILLGL STLVKIISGL KTFIKLLTGL SSLLNALS.G TSLMNVIL.G STLLQQLTRA SSLLQLLVRN SSLARVVLGI SSLARLLVGA STLTKLIQRF STLTKLLQRF STLTKLLQRF STLTKLIQRF STLTKLIQVF STLTRLIQRM TTLAKMMVNF STLVKLLVNF TTLAKLLVGF TTLMKVLCGL STIIGLVERF STVVSLIERF STVVSLIERF STTVQLLQRL TTTLQLLQRL STTVQLIQRL STTVQLMQRL STTVQLLQRL STTVQLLQRL STTVQLMQRL STTVQLMQRL STLIQLMQRF STCVQLLQRF STIISLLLRY STSIGLLMRF SSVIGLIQRF STIASLITRF STIANLVTRF TTLADLIPRF STILKLIERL STIASLLLRY STIALLLLRY STVAALLQNL STVAALLQNL STVAALLQNL STVAALLQNL STVAALLQNL
LGPI D L T I H S G E L V Y I V G G N G C G K L . . . . . . . . . G ..... VG.. G. GK
850 MQKVSGAVFW NSNLPDSEGE DPSSPERETA AGSDIRSRGP MQKVSGAVFW NS.LPDSEGE DPSNPERETA ADSDARSRGP Y S V E S G E L W . . . . . . . . . . . . . . . AER . . . . . . . . . . . . S L Q P S N G . . . . . . . . . . . EIL Y ..... EGYP L S N N S N N R R N . . . . . . . . . . . . . . . . . . IS Y ..... EGEV T Y G Y E N L R Q I YQPTM . . . . . . . . . . . . . . G I I Q Y D K M R S S L M P E E F Y Q K N YEVHE . . . . . . . . . . . . . . G D I L I N G I N I K E L D M D S Y M N Q FLSYQ . . . . . . . . . . . . . GS L R I . N G I E L R D L S P E S W R K H FLPYE . . . . . . . . . . . . . GS L K I . N G Q E L R E S N L A D W R K H W D P Q Q . . . . . . . . . . . . . GE I L L . N D S P I A S L N E A A L R Q T YDANQ . . . . . . . . . . . . . GE L L L . A E K P I S A Y S E E T L R H Q W P T L H G . . . . . . . . . . . SVR L . . . D G A E I R Q Y E R E T L G P R Q S P T Q G . . . . . . . . . . . KVR L . . . D G A D L N Q V D K N T F G P T YIPEN . . . . . . . . . . . . . GQ V L I . D G H D L A L A D P N W L R R Q YIPEN . . . . . . . . . . . . . GQ V L I . D G H D L A L A D P N W L R R Q YIPEN . . . . . . . . . . . . . GQ V L I . D G H D L A L A D P N W L R R Q YIPEQ . . . . . . . . . . . . . GQ V L I . D G H D L A L A D P N W L R R Q YIPEN . . . . . . . . . . . . . GQ V L I . D G H D L A L A D P N W L R R Q FVADR . . . . . . . . . . . . . GR V L I . D G H D I G I V D S A S L R R Q YDPSQ . . . . . . . . . . . . . GE I S L . G G V N L N Q I D K K A L R Q Y FQPTS . . . . . . . . . . . . . GT I T L . G G I D L Q Q F D K H Q L R R L FEPQEQ . . . . . . . . . . . HGE I Q I . N H H N I S D I S R T I L R Q Y FEPDS . . . . . . . . . . . . . GR V L I . N G I D I R Q I G I N N Y H R M YDPIG . . . . . . . . . . . . . GQ V F L . D G K D L R T L N V A S L R N Q YDPNS . . . . . . . . . . . . . GQ V L L . D G Q D L K T L K L R W L R Q Q YDPNS . . . . . . . . . . . . . GQ V L L . D G Q D L K T L K L R W L R Q Q YDPTE . . . . . . . . . . . . . GK I S I . D G Q D I R N F N V R C L R E I YDPTE . . . . . . . . . . . . . GT I S I . D G Q D I R N F N V R Y L R E I YDPDE . . . . . . . . . . . . . GT I N I . D G Q D I R N F N V N Y L R E I YDPLE . . . . . . . . . . . . . GV V S I . D G Q D I R T I N V R Y L R E I YDPTE . . . . . . . . . . . . . GV V S I . D G Q D I R T I N V R Y L R E I YDPTE . . . . . . . . . . . . . GV V S I . D G Q D I R T I N V R Y L R E I YDPLD . . . . . . . . . . . . . GM V S I . D G Q D I R T I N V R Y L R E I YDPTE . . . . . . . . . . . . . GM V S V . D G Q D I R T I N V R F L R E I YDPEA . . . . . . . . . . . . . GS V K L . D G R D L R T L N V G W L R S Q YDPVF . . . . . . . . . . . . . GS V L L . D D L D I R K Y N I Q W L R S N YDVLK . . . . . . . . . . . . . GK I T I . D G V D V R D I N L E F L R K N Y N Q C A . . . . . . . . . . . . . GM I K L . D G I P I Q E Y N I R W L R S T YDPIG . . . . . . . . . . . . . GA V L V . D G V R M R E L C L R E W R D Q YDIDE . . . . . . . . . . . . . GE I L M . D G H D L R E Y T L A S L R N Q YDIEQ . . . . . . . . . . . . . GE I L L . D G V N I Q D Y R L S N L R E N YDPTE . . . . . . . . . . . . . GQ I L V . D G L D V Q Y F E I N S L R R K YDPTE . . . . . . . . . . . . . GD I I V N D S H N L K D I N L K W W R S K YDVNS . . . . . . . . . . . . . GS I E F . G D E D I R N F N L R K Y R R L YNPTT . . . . . . . . . . . . . GT I T I . D N Q D I S K L N C K S L R R H YQPTG . . . . . . . . . . . . . GQ L L L . D G E P L V Q Y D H H Y L H R Q YQPTG . . . . . . . . . . . . . GQ L L L . D G E P L T E Y D H H Y L H R Q YQPTG . . . . . . . . . . . . . GQ V L L . D E K P I S Q Y E H C Y L H S Q YQPTG . . . . . . . . . . . . . GQ L L L . D G K P L P Q Y E H R Y L H R Q YQPTG . . . . . . . . . . . . . GQ L L L . D G Q R L V Q Y D H H Y L H T Q
16~
Chvaagrtu Ndvarhime Atmlsacce Hmtlschpo Mdlescco Ste6sacce Natabacsu Syrdpsesy Consensus
-..;
..
..
~
..
..
:
.
.
.
.
Surcricr Surratno Mdrleita Hemsentfa Lcn31acla Nistlacla Spabbacsu Cyddescco Cyddhaein Cydcescco Cydchaein Aprdpseae Prtderwch Hlybescco Hlybpasha Rt3bactpl Hlybactac Hlybactpl Cyabborpe Comastrpn Lcnclacla Peddpedac Cvabescco Pmdlschpo Pglyarath Pgplarath
.
9
.
..
::
. .-.:.
:
.
::
4:
.
.
:.
.
.
-
.
Mdr2musmu
.
--.:7
.
.
..
..
.
.
164
Mdr3crigr Mdr3homsa Mdrlmusmu Mdr2crigr Mdrlcrigr Mdr3musmu Mdrlhomsa Mdr4drome Mdr5drome Mdrlcaeel Mdr3caeel Mdrlleien Msbaescco Msbahaein Hetaanasp Mdrplafa
T T L I N L L Q R V Y D P D S . . . . . . . . . . . . . GQ I L I . D G T D I S T V T K N S L R N S T T L V N L L Q R V H E P K H . . . . . . . . . . . . . GQ I L I . D G V D I A T V T R K S L R R S S T I L K L V F R F Y D P E S . . . . . . . . . . . . . GR I L I . N G R D I K E Y D I D A L R K V S T I M R I L L R F FDVNS . . . . . . . . . . . . . GS I T I . D D Q D I R N V T L S S L R S S S T L L S L I Q R H F D V S P . . . . . . . . . . . . . GD I R F H D . I P L T K L Q L D S W R T G S T L S N L L L R F Y D . . G Y . . . . . . . . . . . NGS I S I . N G H N I Q T I D Q K L L I E N T T M L R M I A S L L E P S Q . . . . . . . . . . . . . GV I T V D G F D T V K Q P A E V K Q R I G STLAKVFCGL YIPQEGQ .............. LLLDGAAVT DDSRGDYRDL ST...L ...................... G . . . . . . . . . . . . . . . . . . R.. 851 VAYASQKPWL VAYASQKPWL IAYVPQQAWI IFYVNQNAHI IGVVSQNMNL ISVLFQDFVK IAALFQDFMK LSSVGQNPQL IAWVGQNPLL ISVVPQRVHL ICFLTQRVHV IGYLPQDIEL IGYLPQDVQL VGVVLQDNVL IGVVLQDNVL IGVVLQDNVL VGWLQDNVL VGWLQDNVL LGVVLQESTL INYLPQQPYV INYLPQQPYI INYVPQEPFI IACVMQDDRL ISLVQQEPVL IGLVSQEPAL IGLVSQEPAL IGVVSQEPVL IGVVSQEPVL IGVVSQEPVL IGVVSQEPVL IGVVSQEPVL IGVVSQEPVL IGVVSQEPVL IGVVSQEPVL IGVVGQEPVL IAWGQEPVL VAVVSQEPAL IGIVQQEPII IGIVSQEPNL VALVSQNVHL CAVVSQQVHL MAWSQDTFI IGVVSQDPLL
900 L N A T V E E N I T FESP . . . . . . . . . . . . . . . . F N K Q R Y K M V I L N A T V E E N I T FESP . . . . . . . . . . . . . . . . F N K Q R Y K M V I M N A T L R G N I L FFDE . . . . . . . . . . . . . . . . E R A E D L Q D V I FNETIEKNIS LEFKPNS .............. SINEKKRLK R K G S L I E N I V ..SNNNS . . . . . . . . . . . . . . E E L D I Q K I N YELTIRENIG LSD ................. LSSQWEDEKI Y E M T L K E N I G FGQ . . . . . . . . . . . . . . . . . I D K L H Q T N K M PAATLRDNVL LARP ................. DA.SEQELQ LQGTIKENLL LGDV ................. QA.NDEEIN FSATLRDNLL LASP ................. GS.SDEALS FSDTLRQNLQ FASA ................. DKISDEQMI FAGTVAENIA ................... R FGEVQADKVV FKGSLAENIA ................... R FGDADPEKVV LNRSIIDNIS L.AN ................. PGMSVEKVI LNRSIRENIA L.SD ................. PGMPMERVI LNRSIRDNIA L.TD ................. PSMSMERVI LNRSIRENIA L.TN ................. PGMPMEKVI LGRSIRDNIA L.AD ................. PGMPMEKIV FNRSVRDNIA L.TR ................. PGASMHEVV FNGTILENLL LGAK ................. EGTTQEDIL FTGSILDNLL LGAN ................. ENASQEEIL FSGSVLENLL LGSR ................. PGVTQQMID F S G S I R E N I C GFAE . . . . . . . . . . . . . . . . . . E M D E E W M V FATTVFENIT YGLPDTIKGT LSKEELERRV YDAAKLAN.. F A T S I K E N I L L G . . R P . . D A .DQVEIE .... E A A R V A N . . F A T S I K E N I L L G . . R P . . D A .DQVEIE .... E A A R V A N . . F S T T I A E N I R Y G R G N V . . T M ...DEIE .... K A V K E A N . . F S T T I A E N I R Y G R G N V . . T M ...EEIK .... K A V K E A N . . F S T T I A E N I C Y G R G N V . . T M ...DEIK .... K A V K E A N . . F A T T I A E N I R Y G R E D V . . T M ...DEIE .... K A V K E A N . . F A T T I A E N I R Y G R E N V . . T M ...DEIE .... K A V K E A N . . F A T T I A E N I R Y G R E N V . . T M ...DEIE .... K A V K E A N . . F A T T I A E N I R Y G R E D V . . T M ...DEIE .... K A V K E A N . . F A T T I A E N I R Y G R E N V . . T M ...DEIE .... K A V K E A N . . F A T T I G E N I R Y G R P S A . . T Q ...ADIE .... K A A R A A N . . F L G T I A Q N I S Y G K P G A . . T Q ...KEIE .... A A A T Q A G . . F N C T I E E N I S L G K E G I . . T R . . . E E M V .... A A C K M A N . . FVATVAENIRMGDVLI..T...DQDIE .... E A C K M A N . . FAGTMMENVR MGKPNA ......... TDEEV VEACRQAN.. F N D T V A N N I A Y A R T E .... Q Y S R E Q I E .... E A A R M A Y . . F N D T I A N N I A Y A A Q D .... K Y S R E E I I .... A A A K A A Y . . F N T S I R D N I A Y G T S G .... A .SEAEIR .... E V A R L A N . . FSNSIKNNIK YSLYSLKDLE AMENYYEENT NDTYENKNFS
!)~i~=i.!>i:-
iii!%~.i:!!:?
Mdllsacce IGYVQQEPLL Mdl2sacce IGIVQQEPVL Mt2ratno VVLVGQEPVL Tap2musmuVVLVGQEPVL Tap2homsaVVSVGQEPVL Taplhomsa VAAVGQEPQV Taplmusmu VAAVGQEPLL Chvaagrtu IATVFQDAGL Ndvarhime IATVFQDAGL Atmlsacce IGVVPQDTPL Hmtlschpo IGVVPQDSTL Mdlescco LAVVSQTPFL Ste6sacce ITVVEQRCTL Natabacsu VLFGGETGLY Syrdpsesy FSAVFSDFHL Consensus ...V.Q...L
FNGTILDNIL YCIPPEI ............... AEQDDRIR MSGTIRDNIT YGLT..Y ............... TPTKEEIR FSGSVKDNIA YGL.R ................. DCEDAQVM FSGSVKDNIA YGL.R ................. DCEDAQVM FSGSVRNNIA YGL.Q ................. SCEDDKVM FGRSLQENIA YGLTQ ................. KPTMEEIT FGRSFRENIA YGLNR ................. TPTMEEIT LNRSIRENIR LGRE .................. TATDAEVV MNRSIGENIR LGRE .................. DASLDEVM FNDTIWENVK FGRI .................. DATDEEVI FNDTILYNIK YAKP .................. SATNEEIY F S D T V A N N I A LG.. CPNATQQEIE FNDTLRKNIL LGSTDSVRNA DCSTNENRHL IKDACQMALL D R M T A K E N L Q YF . . . . . . . . . . . . . . . . . . . . . GRLYGLN FNRLIGPDEK ..................... EHPSTDQAQ . . . . . . . NI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
901 Surcricr EACSLQPDIDI Surratno EACSLQPDID Mdrleita RCCQLEADLAQ Hemsentfa GSMSKSKMDE Lcn31acla DVLKDVNMLE Nistlacla IKVLDNLGLD Spabbacsu HEVLDIVRAD CyddesccoAALDNAWVSE Cyddhaein QALMRSQAKE Cydcescco EILRRVGLEK Cydchaein EMLHQVGLSK Aprdpseae EAARLAGVHE PrtderwchAAAKLAGVHE Hlybescco YAAKLAGAHD Hlybpasha YAAKLAGAHD Rt3bactpl YAAKLAGAHD Hlybactac AAAKLAGAHD Hlybactpl HAAKLAGAHE Cyabborpe AAARLAGAHE Comastrpn RAVELAEIRE Lcnclacla KAVELAEIRA Peddpedac QACSFAEIKT Cvabescco ECARASHIHD Pmdlschpo ....... AYD Pglyarath ....... AHS Pgplarath ....... AHS Mdr2musmu ....... AYD Mdr3crigr ....... AYE Mdr3homsa ....... AYE Mdrlmusmu ....... AYD Mdr2crigr ....... AYD Mdrlcrigr ....... AYD Mdr3musmu ....... AYD Mdrlhomsa ....... AYD Mdr4drome ....... CHD Mdr5drome ....... AHE
950 .............................. LPHGDQTQI I .............................. LPHGDQTQI .............................. FCGGLDTEI VLL ........................... GIPQYEKTIV LVD ........................... SLPQKIFSQL FLKTNNQY .......................... VLDTQL FLKSHSSY .......................... QFDTQL FLP ........................... LLPQGVDTPV FTD ........................... KL..GLHHEI LLE ........................... DA..GLNSWL LLE ........................... QEGKGLNLWL LV ........................... L RLPQGYDTVL LI . . . . . . . . . . . . . . . . . . . . . . . . . . . L SLPNGYDTEL FIS . . . . . . . . . . . . . . . . . . . . . . . . . . . ELREGYNTIV FIS . . . . . . . . . . . . . . . . . . . . . . . . . . . ELREGYNTIV FIS . . . . . . . . . . . . . . . . . . . . . . . . . . . ELREGYNTIV FIS . . . . . . . . . . . . . . . . . . . . . . . . . . . ELREGYNTVV FIS . . . . . . . . . . . . . . . . . . . . . . . . . . . ELREGYNTIV FIC . . . . . . . . . . . . . . . . . . . . . . . . . . . QLPEGYDTML DIE RMPLNYQTEL DIE ........................... QMQLGYQTEL DIE ........................... NLPQGYHTRL VIM ........................... NMPMGYETLI FI.MT .......................... LPEQFSTNV FI.IK .......................... LPDGFDTQV FI.IK .......................... LPDGFDTQV FI.MK .......................... LPQKFDTLV FI.MK .......................... LPQKFDTLV FI.MK .......................... LPQKFDTLV FI.MK .......................... LPHQFDTLV FI.MK .......................... LPHKFDTLV FI.MK .......................... LPHKFDTLV FI.MK .......................... LPHQFDTLV FI.MK .......................... LPHKFDTLV FI.TR .......................... LPKGYDTQV FI.TN .......................... LPESYRSMI
16~
9
.!
t
1
~ii~i-ii:,i i/.).:!
..
9
.
.
.
.
16r
Mdrlcaeel ....... AEK FIKT . . . . . . . . . . . . . . . . . . . . . . . . . . . LPNGYNTLV Mdr3caeel ....... AHE FIC.K . . . . . . . . . . . . . . . . . . . . . . . . . . LSDRYDTVI Mdrlleien ....... IHD TI.MA . . . . . . . . . . . . . . . . . . . . . . . . . . LPDRYDTPV Msbaescco ....... AMD FIN.K . . . . . . . . . . . . . . . . . . . . . . . . . . MDNGLDTVI Msbahaein ....... ALE FIE.K . . . . . . . . . . . . . . . . . . . . . . . . . . LPQVFDTVI Hetaanasp ....... ALQ FIE.E . . . . . . . . . . . . . . . . . . . . . . . . . . MPEGFDTKL Mdrplafa LISNSMTSNE LLEMKKEYQT IKDSDVVDVS KKVLIHDFVS SLPDKYDTLV Mdllsacce RAIGKANCTK FLA . . . . . . . . . . . . . . . . . . . . . . . . . . . NFPDGLQTMV Mdl2sacce SVAKQCFCHN FIT . . . . . . . . . . . . . . . . . . . . . . . . . . . KFPNTYDTVI Mt2ratno AAAQAACADD FIG . . . . . . . . . . . . . . . . . . . . . . . . . . . EMTNGINTEI T a p 2 m u s m u A A A Q A A C A D D FIG . . . . . . . . . . . . . . . . . . . . . . . . . . . EMTNGINTEI T a p 2 h o m s a A A A Q A A H A D D FIQ . . . . . . . . . . . . . . . . . . . . . . . . . . . EMEHGIYTDV Taplhomsa AAAVKSGAHS FIS . . . . . . . . . . . . . . . . . . . . . . . . . . . GLPQGYDTEV Tap imusmu AVAVESGAHD FIS . . . . . . . . . . . . . . . . . . . . . . . . . . . GFPQGYDTEV Chvaagrtu EAAAAAAATD FID . . . . . . . . . . . . . . . . . . . . . . . . . . . SRINGYLTQV N d v a r h i m e A A A E A A A A S D FIE . . . . . . . . . . . . . . . . . . . . . . . . . . . DRLNGYDTVV Atmlsacce TVVEKAQLAP LIK . . . . . . . . . . . . . . . . . . . . . . . . . . . KLPQGFDTIV H m t l s c h p o A A A K A A Q I H D RIL . . . . . . . . . . . . . . . . . . . . . . . . . . . QFPDGYNSRV Mdlescco HVARLASVHD DI . . . . . . . . . . . . . . . . . . . . . . . . . . . L RLPQGYDTEV Ste6sacce D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RFIL DLPDGLETLI Natabacsu RHEIKARIED LSK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RFGMRDYM Syrdpsesy TYLSTLGLED KVKIE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Surcricr Surratno Mdrleita Hemsentfa Lcn31acla Nistlacla Spabbacsu Cyddescco Cyddhaein Cydcescco Cydchaein Aprdpseae Prtderwch Hlybescco Hlybpasha Rt3bactpl Hlybactac Hlybactpl Cyabborpe Comastrpn Lcnclacla Peddpedac Cvabescco Pmdlschpo Pglyarath Pgplarath Mdr2musmu Mdr3crigr Mdr3homsa
951 G...ERGINL G...ERGINL G...EMGVNL ...SENGSNF ...FENGKNL GNWFQEGHQL GLWFDEGRQL G...DQAARL K...DGGLGI G...EGGRQL G...DGGRPL G...VGGAGL G...DGGGGL G..~ G...EQGAGL G...ELGAGL G EQGAGL G..~ G...ENGVGL T..~ S..~ S..~ G..~ G...QRGFLM G...ERGLQL G...ERGLQL G...DRGAQL G...ERGAQL G ERGAQL
SGGQRQRISV SGGQRPGISV SGGQKARVSL SGGQRQKIAL SGGQIQRLLI SGGQWQKIAL SGGQWQKIAL SVGQAQRVAV SVGQAQRLAI SGGELRRLAI SGGEQRRLGL SGGQRQRIAL SGGQRQRIGL SGGQRQRIAI SGGQRQRIAI SGGQRQRIAI SGGQRQRIAI SGGQPNRIAI SGGQRQRIGI SGGQRQRIAL SGGQKQRIAL SGGQKQRLSI SGGQKQRIFI SGGQKQR[AI SGGQKQR[AI SGGQKQR[AI SGGQKQRIAI SGGQKQRIAI SGGQKQRIAI
i000 ARALYQQTNV VFLDDPFSAL DVHLSDHLMQ ARALYQHTNV VFLDDPFSAL DVHLSDHLMQ ARAVYANRDV YLLDDPLSAL DAHVGQRIVQ ARAFYSNVNT L L L D E P T S A M D N I . S E F E V F AKSLLNNNKF IFWDEPFSSL DNQ~ ARTFFKKASI YILDEPSAAL DPV~ ARAYFREASL YILDEPSSAL DPI.AEKETF ARALLNPCSL LLLDEPAASL DAH.SEQRVM ARALLRKGDL LLLDEPTASL DAQ.SENLVL ARALLHDAPL VLLDEPTEGL DAT.TESQIL ARILLNNASI LLLDEPTEGL DRE~ A R A L Y G A P T L V V L D E P N S N L DDS.GEQALL ARAMYGDPCL LILDEPNASL DSE.GDQALM ARALVNNPKI LIFDEATSAL DYE~ ARALVNNPKI LIFDEATSAL DYE~ ARALVNNPRI LIFDEATSAL DYE~ ARALVNNPRI LIFDEATSAL DYE SENIIM ARALVNNPKI LIFDEATSAL DYESEHIIMR ARALIHRPRV LILDEATSAL DYE.SEHIIQ ARALLTDAPV LILDEATSSL DIL.TEKRIV ARALLSPAKI LILDEATSNL DMI.TEKKIL ARALLSPAQC FIFDESTSNL DTI.TEHKIV ARALYRKPGI LFMDEATSAL DSE.SEHFVN ARAVISDPKI LLLDEATSAL DSK.SEVLVQ ARAMLKNPAI LLLDEATSAL DSE.SEKLVQ ARAMLKNPAI LLLDEATSAL DSE.SEKLVQ ARALVRNPKI LLLDEATSAL DTE.SEAEVQ ARALVRNPKI LLLDEATSAL DTE.SEAEVQ ARALVRNPKI LLLDEATSAL DTE~
9
. : . : . .
:-:
:.:--:...
:
. . .
.
.
.
..
.
.
. . . . . .
.........
. : . : y .
Mdrlmusmu G...ERGAQL SGGQKQRIAI ARALVRNPKI Mdr2crigr G...ERGAQL SGGQKQRIAI ARALVRNPKI Mdrlcrigr G...ERGAQL SGGQKQRIAI ARALVRNPKI Mdr3musmu G...ERGAQL SGGQKQRIAI ARALVRNPKI Mdrlhomsa G...ERGAQL SGGQKQRIAI ARALVRNPKI Mdr4drome G EKGAQI SGGQKQRIAI ARALVRQPQV Mdr5drome G...ERGSQL SGGQKQRIAI ARALIQNPKI Mdrlcaeel G...DRGTQL SGGQKQRIAI ARALVRNPKI Mdr3caeel G...AGAVQL SGGQKQRVAI ARAIVRKPQI Mdrlleien G...PVGSLL SGGQKQRIAI ARALVKRPPI Msbaescco G...ENGVLL SGGQRQRIAI ARALLRDSPI Msbahaein G...ENGTSL SGGQRQRLAI ARALLRNSPV Hetaanasp G...DRGVRL SGGQRQRIAI ARALLRDPEI Mdrplafa G...SNASKL S G G Q K Q R I S I A R A I M R N P K I Mdllsacce G...ARGAQL SGGQKQRIAL ARAFLLDPAV Mdl2sacce G...PHGTLL SGGQKQRIAI ARALIKKPTI Mt2ratno G...ERGSQL AVGQKQRLAI ARALVRNPRV Tap2musmu G...EKGGQL AVGQKQRLAI ARALVRNPRV Tap2homsa G . . . E K G S Q L A A G Q K Q R L A I ARALVRDPRV Taplhomsa D...EAGSQL SGGQRQAVAL ARALIRKPCV Taplmusmu G...ETGNQL SGGQRQAVAL ARALIRKPLL Chvaagrtu G...ERGNRL SGGERQRIAI ARAILKNAPI Ndvarhime G...ERGNRL SGGERQRVAI ARAILKNAPI Atmlsacce G...ERGLMI SGGEKQRLAI ARVLLKNARI Hmtlschpo G ERGLKL SGGEKQRVAV ARAILKDPSI Mdlescco G...ERGVML S G G Q K Q R I S I A R A L L V N A E I Ste6sacce G...TGGVTL SGGQQQRVAI ARAFIRDTPI Natabacsu N...RRVGGF SKGMRQKVAI ARALIHDPDI Syrdpsesy GLGYSTTTAL SYGQQKRLAL VCAYLEDRPI C o n s e n s u s G ..... G . . L S G G Q . Q R . . . ARA ....... i001 Surcricr AGILELLRDD KRTVVLVTHK Surratno AGILELLRDD KRTVVLVTHK Mdrleita D V I L G R L R G . . K T R V L A T H Q Hemsentfa SNLLDE .... KRTVITVAHR Lcn31acla KNVLENPDYK SQTIIMISHH Nistlacla DYF..VALSE NNISIFISHS Spabbacsu DTF..FSLSK DKIGIFISHR Cyddescco EALNA..ASL RQTTLMVTHQ Cyddhaein QALNE..ASQ HQTTLMITHR Cydcescco ELLAE..MMR EKTVLMVTHR Cydchaein RLILQ..HAE NKTLIIVTHR A p r d p s e a e A A I Q A L K A R . GCTVLLITHR Prtderwch QAIVALQKR. GATVVLITHR Hlybescco RNMHK..ICK GRTVIIIAHR Hlybpasha QNMQK..ICQ GRTVILIAHR Rt3bactpl QNMQK..ICH GRTVIIIAHR Hlybactac HNMHK..ICQ NRTVLIIAHR Hlybactpl NMHQI...CK GRTVIIIAHR Cyabborpe RNMRD..ICD GRTVIIIAHR Comastrpn DNLIA..L.. DKTLIFIAHR Lcnclacla KNLLP..L.. DKTIIFIAHR Peddpedac SKLLF..M.K DKTIIFVAHR
LQYLPHADWI LQYLPHADWI IHLLPLADYI ISTVKNFDKI LDVLKYVDRV LNAARKANKI LVAAKLADRI LEDLADWDVI IEDLKQCDQI LRGLSRFQQI LSSIEQFDKI AGVLGCADRL PALTTLAQKI LSTVKNADRI LSTVKNADRI LSTVKNADRI LSTVKNADRI LSTVKNAASI LSAVRCADRI LTIAERTEKV LSVAEMSHRI LNIASQTDKV
LLLDEATSAL DTE.SEAVVQ LLLDEATSAL DTE.SEAVVQ LLLDEATSAL DTE.SEAVVQ LLLDEATSAL DTE.SEAVVQ LLLDEATSAL DTE.SEAVVQ LLLDEATSAL DPT SEKRVQ LLLDEATSAL DYQ.SEKQVQ LLLDEATSAL DAE.SEGIVQ LLLDEATSAL DTE.SERMVQ LLLDEATSAL DRK.SEMEVQ LILDEATSAL DTE.SERAIQ LILDEATSAL DTE.SERAIQ LILDEATSAL DSV.SERLIQ LILDEATSSLDNK.SEYLVQ LILDEATSAL DSQ.SEEIVA LILDEATSAL DVE.SEGAIN LILDEATSAL DA ..... ECE LILDEATSAL DA ..... QCE LILDEATSAL DV ..... QCE LILDDATSAL DAN.SQLQVE LILDDATSAL DAG.NQLRVQ LVLDEATSAL DVE.TEARVK LVLDEATSAL DVE.TEARVK MFFDEATSAL DTH.TEQALL ILLDEATSAL DTN TERQIQ LILDDALSAVDGR.TEHQIL LFLDEAVSAL DIV.HRNLLM ILFDEPTTGL DIT.SSNIFR YLLDEWAADQ DPPFKRFFYE L . L D E . . S A L D .... E .... 1050 .................... .................... .................... .................... .................... .................... .................... .................... .................... .................... .................... .................... .................... .................... .................... .................... .................... .................... .................... .................... .................... ....................
167
-ii!L iilT().,=
ili--:: !ii ::~/ii:!i)(: . . . . . . . . . ::.
!:ill!i:
L68
Cvabescco VAIKN..M.. NITRVIIAHR ETTLRTVDRV .................... Pmdlschpo KALDN..ASR SRTTIVIAHR LSTIRNADNI .................... Pglyarath EALDR..FMI GRTTLIIAHR LSTIRKADLV .................... Pgplarath EALDR..FMI GRTTLIIAHR LSTIRKADLV .................... Mdr2musmuAALDK..ARE GRTTIVIAHR LSTIRNADVI .................... Mdr3crigrAALDK..ARE GRTTIVIAHR LSTVRNADVI .................... Mdr3homsaAALDK..ARE GRTTIVIAHR LSTVRNADVI .................... MdrlmusmuAALDK..ARE GRTTIVIAHR LSTVRNADVI .................... Mdr2crigrAALDK..ARE GRTTIVIAHR LSTVRNADVI .................... MdrlcrigrAALDK..ARE GRTTIVIAHR LSTVRNADII .................... Mdr3musmuAALDK..ARE GRTTIVIAHR LSTVRNADVI .................... Mdrlhomsa VALDK..ARK GRTTIVIAHR LSTVRNADVI .................... Mdr4drome SALEL..ASQ GPTTLVVAHR LSTITNADKI .................... Mdr5drome QALDL..ASK GRTTIVVSHR LSAIRGADKI .................... Mdrlcaeel QALDK..AAK GRTTIIIAHR LSTIRNADLI .................... Mdr3caeel TALDK..ASE GRTTLCIAHR LSTIRNA ....................... MdrlIeienAALDQLIQRG GTTVVVIAHR LATIRDMDRI YYVKH ............... MsbaesccoAALDEL..QK N R T S L V I A H R L S T I E K A D E I ..... Msbahaein SALEEL..KK DRTVVVIAHR LSTIENADEI .................... Hetaanasp ESIEKL..SV GRTVIAIAHR LSTIAKADKV .................... Mdrplafa KTINNLKGNE NRITIIIAHR LSTIRYANTI FVLSNRERSD NNNNNNNDDN Mdllsacce KNL.QRRVER GFTTISIAHR LSTIKHSTRV I ................... Mdl2sacce YTFGQLMKSK SMTIVSIAHR LSTIRRSENV I ................... Mt2ratno QALQTWRSQE DRTMLVIAHR LHTVQNADQV L ................... Tap2musmu QALQNWRSQG DRTMLVIAHR LHTVQNADQV L ................... Tap2homsa QALQDWNSRG DRTVLVIAHR LQTVQRAHQI L Taplhomsa QLLYESPERY SRSVLLITQH LSLVEQADHI L Taplmusmu RLLYESPKRA SRTVLLITQQ LSLAEQAHHI L ................... ChvaagrtuAAV..DALRK NRTTFIIAHR LSTVRDADLV .................... Ndvarhime DAI..DALRK DRTTFIIAHR LSTVREADLV .................... Atmlsacce RTIRDNFTSG SRTSVYIAHR LRTIADADKI .................... HmtlschpoAAL..NRLAS GRTAIVIAHR LSTITNADLI .................... Mdlescr HNLRQW..GQGRTVIISAHRLSALTEASEI .................... Ste6sacce KAIRHW..RK GKTTIILTHE LSQIESDDYL Y ................... Natabacsu EFIQQLK.RE QKTILFSSHI MEEVQA ........................ Syrdpsesy ELLPDLK.RR GKTILIITHD DQYFQLADRI .................... C o n s e n s u s . . . . . . . . . . . . T ..... HR L ...... D . . . . . . . . . . . . . . . . . . . . . . Surcricr Surratno Mdrleita Hemsentfa Lcn31acla Nistlacla Spabbacsu Cyddescco Cyddhaein Cydcescco Cydchaein Aprdpseae Prtderwch Hlybescco Hlybpasha
1051 .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... ..........
IAMKDGTIQR IAMKDGTIQR VVLQHGSIVF ILMDNGEIVC IYIDDKKIM. VVMKDGQVED IVMDKGEIVG WVMQDGRIIE FVMQRGEIVQ IVMDNGQIIE CVIDNGRLIE LALNAGQLHL LILHEGQQQR IVMEKGKIVE IVMEKGEIVE
EGTLKDFQRS ..ECQLFEHW EGTLKDFQRS ..ECQLFEHW AGDFAAFSAT ALEETLRGEL IGKHEDLIEN SELYRSLYYK IDKHNNLLLN .DSYNSFVNE VGSHDVLLRR CQ..YYQELY IGTHEELLKT CP..LYKKMD QGRYAELSVA GGPFATLLAH QGKFTELQ.H EGFFAELLAQ QGTHAELLAR QGRYY.QFKQ EGDYNSLITK ENGFFKRLIE YGERDQVLAALNNQRAASAS MGLARDVLTE LQQRSAANQA QGKHKELLS. EPESLYSYLY QGKHHELLQ. NSNGLYSYLH
ii00 KTLMNRQDQE KTLMNRQDQE KGSKDVESCS KQQFGGTK.. .......... YSEQYEDNDE ESENYMNPLE R Q E E I ..... RQQDIQ GL ........ RV QQRADYRVAG RMNPTAAMPQ Q L Q S D ..... Q L Q L N .....
~(ff!: !il(-~/!i:. ii!.~:/tli::. ii!.::i) ~:~i!~:i i:!i. -;ih : U!~
i~i%.i:il/:~
...... ...
.
.
:":.: 2.,-:-
..
i::::~ ::?:~:!.i- ::
i: :::!,"J::( :
Rt3bactpl Hlybactac Hlybactpl Cyabborpe Comastrpn Lcnclacla Peddpedac Cvabescco Pmdlschpo Pglyarath Pgplarath Mdr2musmu Mdr3crigr Mdr3homsa Mdrlmusmu Mdr2crigr Mdrlcrigr Mdr3musmu Mdrlhomsa Mdr4drome Mdr5drome Mdrlcaeel Mdr3caeel Mdrlleien Msbaescco Msbahaein Hetaanasp Mdrplafa Mdllsacce Mdl2sacce Mt2ratno Tap2musmu Tap2homsa Taplhomsa Taplmusmu Chvaagrtu Ndvarhime Atmlsacce Hmtlschpo Mdlescco Ste6sacce Natabacsu Syrdpsesy Consensus
. . . . . . . . . . I V M E K G H I V E Q G K H N Q L L E . N E N G L Y Y Y L N Q L Q S N ..... . . . . . . . . . . I V M D K G E I I E Q G K H Q E L L K . D E K G L Y S Y L H Q L Q V N ..... . . . . . . . . . . I V M E K G Q I V E Q G K H K E L L R . D P N G L Y H Y L H Q L Q S E ..... .......... VVMEGGEVAE CGSHETLL..AAGGLYARLQ ALQAGEAG.. . . . . . . . . . . V V L D Q G K I V E E G K H A D L L . . A Q G G F Y A H L V NS ........ . . . . . . . . . . I V V D Q G K V I E S G S H V D L L . . A Q N G F Y E Q L Y HN ........ . . . . . . . . . . V V L D H G K I V E Q G S H R Q L L . . N Y N G Y Y A R L I H N Q E ...... . . . . . . . . . . ISI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .......... VVVNAGKIVE QGSHNELLDL ..NGAYARLV EAQKLSGGEK .......... AVLQQGSVSE IGTHDELFSK GENGVYAKLI KMQE...AAH .......... AVLQQGSVSE IGTHDELFSK GENGVYAKLI KMQE...AAH .......... AGFEDGVIVE QGSHSELMK..KEGIYFRLV NMQT...AGS .......... AGFEDGVIVE QGSHSELMQ..KEGVYFKLVNMQT...SGS .......... AGFEDGVIVE QGSHSELMK..KEGVYFKLV NMQT...SGS .......... AGFDGGVIVE QGNHDELMR..EKGIYFKLV MTQT...RGN .......... AGFDGGVIVE QGNHEELMK..EKGIYCRLV MMQT...RGN .......... AGFDGGVIVE QGNHEELMR..EKGIYFKLV MTQT...AGN .......... AGFDGGVIVE QGNHDELMR..EKGIYFKLV MTQT...AGN .......... AGFDDGVIVE KGNHDELMK..EKGIYFKLV TMQT...AGN .......... VFLKDGWAE QGTHEELME..RRGLYCELV SITQ...RKE .......... VFIHDGKVLE EGSHDDLMA..LEGAYYNMV RAGD...INM .......... ISCKNGQVVE VGDHRALMA..QQGLYYDLV TAQT...FTD ..................... STHDELISK .DDGIYASMV KAQEIERAKE .......... DGAEGSRITE SGTFDELLEL ..DGEFAAVA KMQGVLAGDA .......... VVVEDGVIVE RGTHNDLLE..HRGVYAQLH K M Q F G Q .... .......... LVIDHGEIRE RGNHKTLLE..QNGAYKQLH S M Q F T G .... .......... VVMEQGRIVE QGNYQELLE..QRGKLWKYH QMQHESGQTN NNNNNNNNNK INNEGSYIIE QGTHDSLMK. NKNGIYHLMI NNQKISSNKS .......... VLGKHGSVVE TGSFRDLIAI PNSELNALLA EQQDEEGKGG .......... VLGHDGSVVE MGKFKELYAN PTSALSQLLN EKAAPGPSDQ . . . . . . . . . . V L . K Q G Q L V E . . . H D Q L R D E Q D V Y A H L V Q Q R L E A ...... . . . . . . . . . . V L . K Q G R L V E . . . H D Q L R D G Q D V Y A H L V Q Q R L E A ...... .......... VL.QEGKLQK ...LAQL ....................... . . . . . . . . . . F L . E G G A I R E G G T H Q Q L M E K K G C Y W A M V Q A P A D A P E .... . . . . . . . . . . F L . R E G S V G E Q G T H L Q L M K R G G C Y R A M V E A L A A P A D .... LFLDQGRIIE KGTFDELTQR GGRFTSLLRT SGLLTEDEGQ .......... IFMDQGRVVE MGGFHELSQS NGRFAALLRA SGILTDEDVR .......... IVLDNGRVRE EGKHLELLAM PGSLYRELWT IQEDLDHLEN LCISNGRIVE TGTHEELIKR DGGRYKKMW. FQQAMGKTSA .......... IVMQHDISPS VAILMCWHNKAAGIAICIAI N ......... ........... LMKEGEVVE SGTQSELLAD PTTTFSTWYH LQNDYSDAKT ..... L C D S V I M I H S G E V I Y R G A L E S L Y E S E R S E D L N Y I F M S K L V R G I S . .......... IKLADGCIVS DVKCAVEGKR A ................... . . . . . . . . . . . . . . . G . . . . . G .... L . . . . . . . . . . . . . . . . . . . . . . .
Surcricr Surratno Mdrleita Spabbacsu Aprdpseae Pmdlschpo Pglyarath Pgplarath
ii01 LEKETVMERK LEKETVMERK SDVDTESATA EEGSKWKEAL YGAPQWAAP DQEMVEEELE ETAMSNARKS ETAMSNARKS
1150 ASEPSQGLPR AMSSRDGLLL DEEEEEEEAAESEEDDNLSS APEPSQGLPR AMSSRDGLLL DEDEEEEEAAESEEDDNLSS ETAPYVAKAK GLNAEQETSL AGGEDPLR.S DVEAGRLMTT YQG ..................................... RQGGVE .................................. DAPREIPITS FGDDDEDNDM ASLEAPMMSH NTDTDTLNNK SARPSSARNS VSSPIMTRNS SYGRSPYSRR LSDFSTSDFS SARPSSARNS VSSPIMTRNS SYGRSPYSRR LSDFSTSDFS
16~
..
.
:..:...., :. ;. .
9
.
. ..
.
..
:::ii~i::iii~i '.;.
:.::i:":.} i :~-. .. :.:
-:::.,x --... ....
:::::;~:.-::::-ii~:~i...-.: {.:
:11.::.!%::::!i%:::.~i
~!~:!::i~i:.::~: ~:::i::i:...:.i::.
17(
Mdr2musmu Mdr3crigr Mdr3homsa Mdrlmusmu Mdr2crigr Mdrlcrigr Mdr3musmu Mdrlhomsa Mdr4drome Mdr5drome Mdrlcaeel Mdr3caeel Mdrlleien Hetaanasp Mdrplafa Mdllsacce Mdl2sacce Chvaagrtu Ndvarhime Atmlsacce Hmtlschpo Ste6sacce Consensus
QILSEEFEV ............ ELSDEKAAG DVAPNGWKAR IFR.NSTKKS QILSQEFEV ............ ELSEEKAAD GMTPNGWKSH IFR.NSTKKS QIQSEEF.. ELNDEKAAT RMAPNGWKSR LFR HSTQKN EIEPGNNAY ............ GSQSDTDAS ELTSEESKSP LIR.RSIYRS EVELGSEAD ............ GSQSDTIAS ELTSEEFKSP SVR.KSTCRS EIELGNEVG ............ ESKNEIDNL DMSSKDSASS LIRRRSTRRS EIELGNEAC ............ KSKDEIDNL DMSSKDSGSS LIRRRSTRKS EVELENAAD ............ ESKSEIDAL EMSSNDSRSS LIRKRSTRRS ATEADEGAVA GRPLQKSQNL SDEETDDDEE DEEEDEEPEL QTSGSSRDSG PDEVEKEDSI EDTKQKSLAL FEKSFETSPL NLEKGQKNSV QFEEPIIKAL AVDSAAEGKF SRENSVARQT SEHEGLSRQA SEMDDIMNRV RSSTIGSITN DTTLDDEEDE KTHRSFHRDS VTSDEERELQ QSLARDSTRL RQSMISTTTQ KSGASVRDAK KASGHLGVIL DEADLAQLDE DVPRTARQNV PIDELAKW. o S ............................................... S N N G N D N G S D N K S S A Y K D S D T G N D A D N M N S L S I H E N E N I S N N R N C K .... VIDLDNSVAR EV ...................................... QLQIEKVIEK EDLNESKEHD DQKKDDNDDN DNNHDNDSNN QSPETKDNNS QPRPKAIAS ......................................... KSLTAA ............................................ ELKDQQEL .......................................... ETH ............................................... IVDTETEEKS IHTVESFNSQ LETPKLGSCL SNLGYDETDQ LSFYEAIYQK ..................................................
Surcricr Surratno Mdrleita Pmdlschpo Pglyarath Pgplarath Mdr2musmu Mdr3crigr Mdr3homsa Mdrlmusmu Mdr2crigr Mdrlcrigr Mdr3musmu Mdrlhomsa Mdr4drome Mdr5drome Mdrlcaeel Mdr3caeel Mdrlleien Mdrplafa Mdl2sacce Mdlescco Ste6sacce Consensus
1151 1200 V .......................................... LHQRAKI V .......................................... LHQRAKI E .......................................... EKATGKV LNEKDNVVFE DKTLQHVASE IVPNLPPADV GELNEEPKKS KKSKKNNHEI LSIDA ......................... SSYPNYRNE ..... KLAFKD LSIDA ......................... SSYPNYRNE ..... KLAFKD LKS ........................... PHQNRLDEE ..... TNELDA LKSSR ......................... AHHHRLDVD ..... ADELDA LKNSQ ......................... MCQKSLDVE ..... TDGLEA VHRKQ ......................... DQERRLSMK ...... EAVDE ICGSQ ......................... DQERRVSVK ...... EAQDE IRGPH ......................... DQDRKLSTK ...... EALDE ICGPH ......................... DQDRKLSTK ...... EALDE VRGSQ AQDRKLSTK EALDE FRAST ......................... RRKRRSQRR ..... KKKKDK IKDTN ......................... AQSAEAPPE ..... KPNFFR GPVID ......................... EKEERIGKDA LSRLKQELEE VPEWE ......................... IENAR .......... EEMIE ................................................. E .............................. NTAENEKEEK VPFFKRMFRR DDIEKCVCRT SPERRGKGG ............................... .............................. NWRRRSTTLR KIARRPSMRS RSN .................. VRTRRVKVE EENIGYALKQ QKNTESSTGP ..................................................
Sur cr icr Surratno Mdrleita
1201 PWRACTKYLS PWRACTKYLS PWSTYVAYLK
SAGILLLSLL VFSQL. LKHM VLVAIDYWLA SAGILLLSLL VFSQL.LKHM VLVAIDYWLA SCGGLEAWGC LLATFALTEC VTAASSVWLS
1250 KWTDSALVLS KWTDSALVLS IWSTGSLMWS
Pmdlschpo Pglyarath Pgplarath Mdr2musmu Mdr3crigr Mdr3homsa Mdrlmusmu Mdr2crigr Mdrlcrigr Mdr3musmu Mdrlhomsa Mdr4drome Mdr5drome Mdrlcaeel Mdr3caeel Mdrlleien Mdrplafa Mdlescco Ste6sacce Consensus
NSLTALWFIH SFVRTMIEII CLLIGILASM ICGAAYPVQA AVFARFLNIF Q...ANSFWR LAKMNSPEWK YALLGSVGSV ICGSLSAFFA YVLSAVLSVY Q...ANSFWR LAKMNSPEWK YALLGSVGSV ICGSLSAFFA YVLSAVLSVY N.VPPVSFLK VLKLNKTEWP YFVVGTVCAI ANGALQPAFS IILSEMIAIF N.VPPVSFLK VLKLNKTEWP Y F V V G T V C A I V N G A L Q P A I S IILSEMIAIF N VPPVSFLK VLKLNKTEWP YFVVGTVCAI ANGGLQPAFS VIFSEIIAIF D.VPLVSFWR ILNLNLSEWP YLLVGVLCAV INGCIQPVFA IVFSRIVGVF D.VPLVSFWG ILKLNITEWP YLVVGVLCAV INGCMQPVFS IVFSGIIGVF D.VPPISFWR ILKLNSSEWP Y F V V G I F C A I V N G A L Q P A F S IIFSKVVGVF D.VPPASFWR ILKLNSTEWP YFVVGIFCAI INGGLQPAFS VIFSKVVGVF S.IPPVSFWR IMKLNLTEWP YFVVGVFCAI INGGLQPAFA IIFSKIIGVF EVVSKVSFTQ LMKLNSPEWR FIVVGGIASV MHGATFPLWG LFFGDFFGIL ...... TFSR ILQLAKQEWC YLILGTISAV AVGFLYPAFA VIFGEFYAAL NNAQKTNLFE ILYHARPHAL SLFIGMSTAT IGGFIYPTYS VFFTSFMNVF EGAMEASLFD IFKYASPEMR NIIISLVFTL IRGFTWPAFS IVYGQLFKIL VKHAKVGFLR LMRMNKDKAW AVALGILSSV VIGSARPASS IVMGHMLRVL KKKAPNNLRI IYKEIFSYKK DVTIIFFSIL VAGGLYPVFA LLYARYVSTL FSQLWPTLKR LLAYGSPWRK PLGIAVLMMW VAAAAEVSGP LLISYFIDNM QLLSIIQIIK RMIKSIRYKK ILILGLLCSL IAGATNPVFS YTFSFLLEGI ..................................................
Surcricr Surratno Mdrleita Pmdlschpo Pglyarath Pgplarath Mdr2musmu Mdr3crigr Mdr3homsa Mdrlmusmu Mdr2crigr Mdrlcrigr Mdr3musmu Mdrlhomsa Mdr4drome Mdr5drome ili:.)!?ii:~i,:i: i!i!i~:i!!:~i:,:iMdrlcaeel : Mdr3caeel Mdrlleien Mdrplafa Mdlescco Ste6sacce Consensus
1251 1300 PAARNCSLSQ ECDLDQSVYA MVFTLLCSLG IVLCLVTSVT VEWTGLKVAK PAARNCSLSQ ECALDQSVYA MVFTVLCSLG IALCLVTSVT VEWTGLKVAK ADTYLYVY ............ L F I V F L E I F G S P L R F F L C Y Y L I R I G . . . S R TD...LSSTD FL.HKVNVFA VYWLILAIVQ FFAYAISNFA MTYAMEAVLQ Y .... NPDHE YMIKQIDKYC Y L L I G L S S A A L V F N T L Q H S F WDIVGENLTK Y .... NPDHE YMIKQIDKYC Y L L I G L S S A A L V F N T L Q H S F WDIVGENLTK GP...GDDA. VKQQKCNMFS LVFLGLGVLS FFTFFLQGFT FGKAGEILTT GP...GDDA. VKQQKCNLFS LVFLGLGVLS FFTFFLQGFT FGKAGEILTT GP...GDDA. VKQQKCNIFS LIFLFLGIIS FFTFFLQGFT FGKAGEILTR SR...DDDHE TKRQNCNLFS LFFLVMGLIS FVTYFFQGFT FGKAGEILTK TR...DDDPK TKQQNCNLFS LFFLVMGMIC FVTYFFQGFT FGKAGEILTK TR...NTDDE TKRHDSNLFS LLFLILGVIS FITFFLQGFT FGKAGEILTK TN...GGPPE TQRQNSNLFS LLFLILGIIS FITFFLQGFT FGKAGEILTK TR...IDDPE TKRQNSNLFS LLFLALGIIS FITFFLQGFT FGKAGEILTK S .... D G D D D V V R A E V L K I S MIFVGIGLMA GLGNMLQTYM FTTAGVKMTT A .... EKDPE DALRRTAVLS WACLGLAFLT GLVCFLQTYL FNYAGIWLTT A ..... GNPA DFLSQGHFWA LMFLVLAAAQ GICSFLMTFF MGIASESLTR SA...GGDDV SI..KALLNS LWFILLAFTG GISTLISGSL LGKAGETMSG GEYSATKDVE ALRSGTNLYA PLFIVFAVAN FSGWILHGF. YGYAGEHLTT F ...... DFA NLEYNSNKYS IYILLIAIAM FISETLKNYY NNKIGEKVEK ..... VAKNN L P L K V V A G L A A A Y V G L Q L F A AGLHYAQSLL FNRAAVGVVQ VP..STDGKT GSSHYLAKWS LLVLGVAAAD GIFNFAKGFL LDCCSEYWVM ..................................................
i;:i::::-i:;:::i:i:.i:.i::!:::}:;ii
.......... y.:...:
........
............. ....... . . . . .
...:..:.?. -:..-: :.
..:......:....
::::::::::::::::::::::::::::::::::::;:~ ::::::::::::::::::::::::::::::: ........
................... :.::?i: :::::.--::::::::::::::::::::
, :::z .:..:.?::.:: :::
-:% :: ~::,-;:::i.::?:::: ~: ::...:..::::.~.~:~:: .:.
.:.~:~..::::~:.~:::::::::::::::::
i~ii!:;::~[i!;::!i!"!::!ii;:' :::::7::-:.::: :::::::::::::::::::::::: ::
............................
.......................... ..........
,:~::::::::::::::::::::::::::::::::::
======================. .:. :
::::::::::::::::::::::::: ..... .................. o ......... :::::::::::::::::::::::::::::::::::~:
.............. i~ii:!ii~ii.!iil;~: ii:.~;:
::::::::::::::::::::::::: .......
i!i~,!iii::i:ii~:!iiiil.;i:~iiii:: :::::::::::::::::::::::::::::::::::: ::ii!~i::i:~!:il;ii!iiii::!::::~
iiii~iii!!ili!
Surcricr Surratno Mdrleita Pmdlschpo Pglyarath Pgplarath Mdr2musmu
1301 RLHRSLLNRI RLHRSLLNRI NMHRDLLESI RIRYHLFRTL RVREKMLSAV RVREKMLSAV RLRSMAFKAM
1350 ILAPMRFFE..TTPLGSILNRFSSDCNTIDQHIPSTLECL I L A P M R F F E . . T T P L G S I L N RFSSDCNTID QHIPSTLECL G V A R M S F F D . . T T P V G R V L N RFTKDMSILD NTLNDGYLYL LRQDVEFFDR SENTVGAITT SLSTKIQSLE GLSGPTLGTF LKNEMAWFDQ E E N E S A R I A A R L A L D A N N V R SAIGDRISVI LKNEMAWFDQ E E N E S A R I A A R L A L D A N N V R SAIGDRISVI LRQDMSWFDD HKNSTGALST RLATDAAQVQ GATGTKLALI
171
_
.
... .
.
.
.
.
.
.
;:.: .: f ....
i::.:.::%::
?
, -.:..
.
::
.
: ,:: . . . .
2
:-i '.2. .
.
.
.
,...
.
.
.
17.~
,
Mdr3crigr Mdr3homsa Mdrlmusmu Mdr2crigr Mdrlcrigr Mdr3musmu Mdrlhomsa Mdr4drome Mdr5drome Mdrlcaeel Mdr3caeel Mdrlleien Mdrplafa Mdlescco Ste6sacce Consensus
RLRSMAFKAM LRQDMSWFDD YKNSTGALST RLATDRAQVQ GATGTRLALI RLRSMAFKAM LRQDMSWFDD HKNSTGALST RLATDAAQVQ GATGTRLALI RVRYMVFKSM LRQDISWFDD HKNSTGSLTT RLASDASSVK GAMGARLAVV RLRYMVFKSM LRQDISWFDD HRNSTGALTT RLASDAANVK GAMSSRLAGI RLRYMVFKSM LRQDVSWFDN PKNTTGALTT RLANDAGQVK GATGARLAVI RLRYMVFKSM LRQDVSWFDD PKNTTGALTT RLANDAAQVK GATGSRLAVI RLRYMVFRSM LRQDVSWFDD PKNTTGALTT RLANDAAQVK GAIGSRLAVI RLRKRAFGTI IGQDIAYFDD ERNSVGALCS RLASDCSNVQ GATGARVGTM RMRAMTFNAM VNQEVGWFDD ENNSVGALSA RLSGEAVDIQ GAIGYPLSGM DLRNKLFRNV LSQHIGFFDS PQNASGKIST RLATDVPNLR TAIDFRFSTV RLRMDVFRNI MQQDASYFDD SRHNVGSLTS RLATDAPNVQAAIDQRLAEV KIRVLLFRQI MRQDINFFDI PGRDAGTLAG MLSGDCEAVH QLWGPSIGLK TMKRRLFENI LYQEMSFFDQ DKNTPGVLSA HINRDVHLLK TGLVNNIVIF QLRTDVMDAALRQPLSEFD..TQPVGQVIS RVTNDTEVIR DLYVTVVATV DLRNEVMEKL TRKNMDWFSG ENNKASEISA LVLNDLRDLR SLVSEFLSAM ..................................................
Surcricr Surratno Mdrleita Pmdlschpo Pglyarath Pgplarath Mdr2musmu Mdr3crigr Mdr3homsa Mdrlmusmu Mdr2crigr Mdrlcrigr Mdr3musmu Mdrlhomsa Mdr5drome Mdrlcaeel Mdr3caeel Mdrlleien Mdrplafa Mdlescco Ste6sacce Consensus
1351 1400 SRSTLLCVSA LTVISYVTPV FLVALLPLAV VCYFIQKYFR VASRDLQQLD SRSTLLCVSA LAVISYVTPV FLVALLPLAV VCYFIQKYFR VASRDLQQLD LEYFFSMCST VIIMVVVQPF VLVAIVPCVY SYYKLMQVYN ASNRETRRIK FQILTNIISV TILSLATGWK LGLVTLSTSP VIITAGYYRV RALDQVQEKL VQNTALMLVA CTAGFVLQWR LALVLVAVFPVVVAATVLQK MFMTGFSGDL VQNTALMLVA CTAGFVLQWR LALVLVAVFPVVVAATVLQK MFMTGFSGDL AQNTANLGTG IIISFIYGWQ LTLLLLSVVP FIAVAGIVEM KMLAGNAKRD AQNTANLGTG IIISFIYGWQ LTLLLLSVVP FIAVSGIVEM KMLAGNAKRD AQNIANLGTG IIISFIYGWQ LTLLLLAVVP IIAVSGIVEM KLLAGNAKRD TQNVANLGTG VILSLVYGWQ LTLLLVVIIP LIVLGGIIEM KLLSGQALKD TQNVANLGTG IIISLVYGWQ LTLLLVVIAP LIILSGMMEM KVLSGQALKD TQNIANLGTG IIISLIYGWQ LTLLLLAIVP IIAIAGVVEM KMLSGQALKD FQNIANLGTG IIISLIYGWQ LTLLLLAIVP IIAIAGVVEM KMLSGQALKD TQNIANLGTG IIISFIYGWQ LTLLLLAIVP IIAIAGVVEM KMLSGQALKD IQALSNFISS VSVAMYYNWK LALLCLANCP IIVGSVILEA KMMSNAVVRE ITTLVSMVAG IGLAFFYGWQ MALLIIAILP IVAFGQYLRG RRFTGKNVKS LTGIVSLFCG VGVAFYYGWNMAPIGLATEL LLVVVQSSVA QYLKFRGQRD VQTMCIIASG LVVGFIYQWK LALVALACMP LMIGCSLTRR LMINGYTKSR SHFIMLFLVS MVMSFYFCPI VAAVLTFIYF INMRVFAVRA RLTKSKEIEK LRSAALVGAM LVAMFSLDWR MALVAIMIFPVVLVVMVIYQ RYSTPIVRRV TSFVTVSTIG LIWALVSGWK LSLVCISMFP LIIIFSAIYG GILQKCETDY ..................................................
Surcricr Surratno Mdrleita Pmdlschpo Pglyarath Pgplarath Mdr2musmu Mdr3crigr Mdr3homsa Mdrlmusmu Mdr2crigr Mdrlcrigr
1401 DTTQLPLVSH FAETVEGLTT DTTQLPLLSH FAETVEGLTT SIAHSPVFTL LEESLQGQRT SAA .............. YKE EAA .............. HAK EAA .............. HAK KKE .............. MEA KKA .............. LEA KKE .............. LEA KKQ .............. LEI KKE LEV KKE .............. LEG
IRAFRYEARF IRAFRYEARF IATYGKLHLV SAAFACESTS GTQLAGEAIA GTQLAGEAIA AGKIATEAIE AGKIATEAIE AGKIATEAIE SGKIATEAIE SGKIATEAIE SGKIATEAIE
QQKLLEYTDS QQKLLEYTDS LQEALGRLDV AIRTVASLNR NVRTVAAFNS NVRTVAAFNS NIRTVVSLTQ NIRTVVSLTQ NIRTVVSLTQ NFRTIVSLTR NFRTVVSLTR NFRTVVSLTR
1450 NNIASLFLTA NNIASLFLTA VYSALYMQNV EENVFAEYCD EAKIVRLYTA EAKIVRLYTA ERKFESMYVE ERKFESMYVE ERKFESMYVE EQKFETMYAQ EQKFENMYAQ EQKFENMYAQ
~:i~il-iii~i : i|
ili2::':/
...
:!i~i5:~;t ...
ii! ::::F:~".i:.-]
~:i~-:-!7i~.,~:::~~!
ifill :~.~iiii.
i/(]
:~ .:.:, -,-::::. .i!.?
::T:
i!i!::"!i::~?~i
~!9 .:.- ~ ,-: ::
::ii:.i:::::::::~:~::2 :.~. ~..::~ :::.::,::.
i~;ii::!i,//i !i-:~ii2i~:i~-
Mdr3musmu Mdrlhomsa Mdr4drome Mdr5drome Mdrlcaeel Mdr3caeel Mdrlleien Mdrplafa Mdlescco Ste6sacce Consensus
KKE .............. LEG SGKIATEAIE N F R T V V S L T R EQKFETMYAQ KKE .............. LEG AGKIATEAIE N F R T V V S L T Q EQKFEHMYAQ KAS . . . . . . . . . . . . . . IEE A S Q V A V E A I T N I R T V N G L C L ERQVLDQYVQ KQV .............. IEE A C R I A T E S I T N I R T V A G L R R EADVIREYTE ASE . . . . . . . . . . . . . . FAD SGKIAIEAIE N V R T V Q A L A R EDTFYENFCE MDS .............. AIE ASRLVTESIS N W K T V Q A L T K Q E Y M Y D A F T A EGD .............. TDD T..IVTEALS N V R T V T S L N M KEDCVEAFQA KENMSSGVFA FSSDDEMFKD PSFLIQEAFY NMHTVINYGL EDYFCNLIEK RA .............. YLAD INDGFKQIIN GMSVIQQFRQ QARFGERMGE KTS .............. VAQ L E N C L Y Q I V T N I K T I K C L Q A E F H F Q L T Y H D ..................................................
Surcricr Surratno Mdrleita Pmdlschpo Pglyarath Pgplarath Mdr2musmu Mdr3crigr Mdr3homsa Mdrlmusmu Mdr2crigr Mdrlcrigr Mdr3musmu Mdrlhomsa Mdr4drome Mdr5drome Mdrlcaeel Mdr3caeel Mdrlleien Mdrplafa Mdlescco Ste6sacce Consensus
1451 1500 ANRWLEVCME Y I G A C V V L I A A A T S I S N S L H R E L S A . . G L V G L G L T Y A L M V ANRWLEVRME Y I G A C V V L I A A A T S I S N S L H R E L S A . . G L V G L G L T Y A L M V SNRWLGVRLE FLSCVVTFMV AFIGVIGKME GASSQNIGLI SLSLTMSMTL S ..... LIKP GRESAIASLK SGLFFSAAQG VTFLINALTF W Y G S T L M R K G N LEPP LKRCFWKGQI AGSGYGVAQF CLYASYALGL W Y A S W L V K H G N LEPP LKRCFWKGQI AGSGYGVAQF CLYASYALGL W Y A S W L V K H G K LHGP YRNSVRKAHI Y G I T F S I S Q A FMYFSYAGCF RFGSYLIVNG K ..... LHEP YRNSVQMAHI Y G I T F S I S Q A FMYFSYAGCF R F G A Y L I V N G K ..... LYGP YRNSVQKAHI Y G I T F S I S Q A FMYFSYAGCF R F G A Y L I V N G S ..... LQVP Y R N A M K K A H V FGITFSFTQA MMYFSYAACF R F G A Y L V A Q Q S ..... LQIP Y R N A L K K A H V FGITFSFTQA MMYFSYAACF RFGAYLVAHQ S ..... LQIP Y R N A L K K A H V FGITFSFTQA MMYFSYAACF RFGAYLVARE S ..... LQIP Y R N A M K K A H V FGITFFFTQA MMYFSYAACF RFGAYLVTQQ S ..... LQVP YRNSLRKAHI F G I T F S F T Q A MMYFSYAGCF RFGAYLVAHK Q ..... IDRV DVACRRKVRF R G L V F A L G Q A A P F L A Y G I S M YYGGILVAEE E ..... IQRV E V L I R Q K L R W R G V L N S T M Q A SAFFAYAVAL CYGGVLVSEG K ..... LDIP HKEAIKEAFI QGLSYGCASS V L Y L L N T C A Y RMGLALIITD A ..... SKSP H R R A I V R G L W QSLSFALAGS FVMWNFAIAY M F G L W L I S N N A ..... LREE APRSVRKGII AGGIYGITQF IFYGVYALCF W Y G S K L I D K G A IDYK N K G Q K R R I I V NAALWGFSQS AQLFINSFAY W F G S F L I K R G A ..... SRSH YMARMQTLRL DGFLLR ..... PLLSLFSSL ILCGLLMLFG L KIKM Q Q I A S K R A I A TGFGISMTNM IVMCIQAIIY YYGLKLVM.. ..................................................
Surcricr Surratno Mdrleita Pmdlschpo Pglyarath Pgplarath Mdr2musmu Mdr3crigr Mdr3homsa Mdrlmusmu Mdr2crigr Mdrlcrigr Mdr3musmu Mdrlhomsa Mdr4drome Mdr5drome
1501 SNYLNWMVRN SNYLNWMVRN TETLNWLVRQ ..EYNIVQFY ISDFS..KTI ISDFS..KTI HMRFK..DVI HMRFR..DVI HMRFR..DVI LMTFE..NVM IMTFE..NVM LMTFE..NVL LMTFE..NVL LMSFE..DVL RMNYE..DII QLPFQ..DII
LADMEIQLGA LADMEIQLGA VAMVEANMNS TCFIAIVFGI RVFMVLMVSA RVFMVLMVSA LVFSAIVLGA LVFSAIVFGA LVFSAIVFGA LVFSAVVFGA LVFSAVVFGA LVFSAIVFGA LVFSAIVFGA LVFSAVVFGA KVAEALIFGS KVSETLLYGS
1550 VKR . . . . . . . . . . . . . . . . . . . IHALLKTE VKG . . . . . . . . . . . . . . . . . . . IHTLLKTE VERVLHYTQE V E H E H V P E M G ELVAQLVRSE QQAGQFFGYS A D V T K A K A A A G E I K Y L S E S K NGAAETLTLA PDFIKGGQAMRSVFELLDRK NGAAETLTLA PDFIKGGQAMRSVFELLDRK VALGHASSFA PDYAKAKLSA AYLFSLFERQ VALGHASSFA PDYAKAKLSA AHLFSLFERQ VALGHASSFA PDYAKAKLSA AHLFMLFERQ M A A G N T S S F A P D Y A K A K V S A SHIIRIIEKT IAAGNASSFA P D Y A K A K V S A SHIIRIMEKI M A V G Q V S S F A P D Y A K A K V S A SHIIMIIEKV M A V G Q V S S F A P D Y A K A T V S A SHIIRIIEKT MAVGQVSSFA PDYAKAKISA AHIIMIIEKT W M L G Q A L A Y A P N V N D A I L S A GRLMDLFKRT MMLAQSLAFT P A F S A A L I A G HRLFQILDRK
172
~::...:.:~:, :,~:..::: ..... i-. ~:.~:.?::2;,~..:.:: ;:: ..-,
Mdrlcaeel Mdr3caeel Mdrlleien Mdrplafa Mdlescco Ste6sacce Consensus
ii:!::i-:{ii::ii";.)!i
i:i?:=:ii?~i:i?~i~S u r c r i c r
Surratno Mdrleita Pmdlschpo Pglyarath Pgplarath Mdr2musmu Mdr3crigr Mdr3homsa Mdrlmusmu Mdr2crigr Mdrlcrigr Mdr3musmu Mdrlhomsa Mdr4drome Mdr4enthi Mdr5drome Mdrlcaeel Mdr3caeel Mdrlleien Mdrplafa Mdlescco Ste6sacce Consensus
PPTMQPMRVL RVMYAITIST STLGFATSYF PEYAKATFAG GIIFGMLRKI WST..PYTVF QVIEALNMAS MSVMLAASYF PEYVRARISA GIMFTMIRQK EAEFK..DVM IASMSILFGA QNAGEAGAFA TKLADAEASA KRVFSVIDRV ..TILVDDFM KSLFTFIFTG SYAGKLMSLK GDSENAKLSF EKYYPLMIRK FSASGTIEVG VLYAFISYLG RLNEPLIELT TQQAMLQQA ...... VVAGE IHEYTSKEMF TTFTLLLFTI MSCTSLVSQI PDISRGQRAASWIYRILDEK .................................................. 1551 1600 .... A E S Y E G L L A P S L I P K N WP .... D Q G K I Q I Q N L S V R Y D S S L K P V L K H .... A E S Y E G L L A P S L I P K N WP .... D Q G K I Q I Q N L S V R Y D S S L K P V L K H SGRGANVTETVVIESAGAAS SALHPVQAGS LVLEGVQMRY REGLPLVLRG PKI .... D T W S T E G K K V E S L Q S A A ..... I E F R Q V E F S Y P T R R H I K V L R G T E I E P D D . . . P D T T P V P D R L R . G E ..... V E L K H I D F S Y P S R P D I Q I F R D T E I E P D D . . . P D T T P V P D R L R . G E ..... V E L K H I D F S Y P S R P D I Q I F R D PLI .... D S Y S G E G L W P D K F E . G S ..... V T F N E V V F N Y P T R A N V P V L Q G PLI .... D S Y S G E G L W P D K F E . G S ..... V T F N E V V F N Y P T R A N M P V L Q G PLI .... D S Y S E E G L K P D K F E . G N ..... I T F N E V V F N Y P T R A N V P V L Q G PEI .... D S Y S T E G L K P T L L E . G N ..... V K F N G V Q F N Y P T R P N I P V L Q G PSI .... D S Y S T R G L K P N W L E . G N ..... V K F N E V V F N Y P T R P D I P V L Q G PSI .... D S Y S T G G L K P N T L E . G N ..... V K F N E V V F N Y P T R P D I P V L Q G PEI .... D S Y S T Q G L K P N M L E . G N ..... V Q F S G F V F N Y P T R P S I P V L Q G PLI .... D S Y S T E G L M P N T L E . G N ..... V T F G E V V F N Y P T R P D I P V L Q G S T Q P N P P Q S P Y N T V E K S E . . . . GD ..... I V Y E N V G F E Y P T R K G T P I L Q G .................................................. P K I Q S P M G T I K N T L A K Q L N L F . E G ..... V R Y R G I Q F R Y P T R P D A K I L N G SKI .... D S L S L A G E K . K K L Y . G K ..... V I F K N V R F A Y P E R P E I E I L K G S V I D N R G L T G D T P T I K . . . . . . GN ..... I N M R G V Y F A Y P N R R R Q L V L D G P D V D I E Q A G N K D L G ...... E G C D ..... I E Y R N V Q F I Y S A R P K Q V V L A S SNIDVRDDG...GIRINKNL I K G K ..... V D I K D V N F R Y I SRPNVPIYKN R V F E L M D G P R Q Q Y G N D D R P F T S G T ..... I E V D N V S F A Y . . R D D N L V L N N HNTLEVENNN ARTVGIAGHT YHGKEKKPIV SIQNLTFAYP SAPTAFVYKN ..................................................
1601 Surcricr VNTLISPGQK Surratno VNALISPGQK Mdrleita VSFQIAPREK Pmdlschpo LNLTVKPGQF Pglyarath LSLRARAGKT Pgplarath LSLRARAGKT Mdr2musmu LSLEVKKGQT Mdr3crigr LSLEVKKGQT Mdr3homsa LSLEVKKGQT Mdrlmusmu LSLEVKKGQT Mdr2crigr LSLEVKKGQT Mdrlcrigr LNLEVKKGQT Mdr3musmu LSLEVKKGQT Mdrlhomsa LSLEVKKGQT Mdr4drome LNLTIKKSTT Mdr5drome LDLEVLKGQT Mdrlcaeel LSFSVEPGQT Mdr3caeel FNMSANFGQT MdrlIeienVNMRFGDATS
174
1650 IGICGRTGSGKSSFSLAFFRMVDMFEGRII IDG ....... IGICGRTGSGKSSFSLAFFRMVDMFEGRII IDG ....... V G I V G R T G S G K S T L L L T F M R M V E V C G G V I H V N G ....... V A F V G S S G C G K S T T I G L I E R F Y D C D N G A V L V D . G V ..... L A L V G P S G C G K S S V I S L I Q R F Y E P S S G R V M I D . G K ..... L A L V G P S G C G K S S V I S L I Q R F Y E P S S G R V M I D . G K ..... L A L V G S S G C G K S T V V Q L L E R F Y D P M A G S V L L D . G Q ..... L A L V G S S G C G K S T V V Q L L E R F Y D P M A G T V L L D . G Q ..... L A L V G S S G C G K S T V V Q L L E R F Y D P L A G T V L L D . G Q ..... L A L V G S S G C G K S T V V Q L L E R F Y D P M A G S V F L D . G K ..... L A L V G S S G C G K S T V V Q L L E R F Y D P M A G T V F L D . G K ..... L A L V G S S G C G K S T V V Q L L E R F Y D P M A G T V F L D . G K ..... L A L V G S S G C G K S T V V Q L L E R F Y D P M A G S V F L D . G K ..... L A L V G S S G C G K S T V V Q L L E R F Y D P L A G K V L L D . G K ..... V A L V G P S G S G K S T C V Q L L L R Y Y D P V S G S V N L S . G V ..... V A L V G H S G C G K S T C V Q L L Q R Y Y D P D E G T I H I D H D D ..... L A L V G P S G C G K S T V V A L L E R F Y D T L G G E I F I D . G S ..... V A L V G P S G C G K S T T I Q L I E R Y Y D A L C G S V K I D . D S ..... N G L I G Q T G C G K S T V I Q M L A R F Y E R R S G L I S V N . G R .....
!!-ifLZ : ,......:.
. .,: --.::.
. .
: {::..
:: -=
' : i
-
.-;..
'
,.
.
.::~:
....
.
.
.
.:::::.....:...
...... ... :::2: ".......
: i
...: ... ...... ..~:~ ....:.. ~.,~ :..::.:: .:-::., ~:~:.
:
,~,.
LSFTCDSKKT TAIVGETGSG KSTFMNLLLR FYDLKNDHII LKNDMTNFQD INLSVPSRNF VALVGHTGSG KSTLASLLMG YYPLTEGEIR LDGRPLSSLS M N F D M F C G Q T L G I I G E S G T G K S T L V L L L T K L Y N C E V G K I K IDGT ...... ............................................
Surcricr Surratno Mdrleita Pmdlschpo Pglyarath Pgplarath Mdr2musmu Mdr3crigr Mdr3homsa Mdrlmusmu Mdr2crigr Mdrlcrigr Mdr3musmu Mdrlhomsa Mdr4drome Mdr5drome Mdrlcaeel Mdr3caeel Mdrlleien Mdrplafa Mdlescco Consensus
1651 1700 ........................................ IDIAKLPLHT ........................................ IDIAKLPLHT ........................................ REMSAYGLRD ......................................... NVRDYNIND ......................................... DIRKYNLKA ......................................... DIRKYNLKA ......................................... EAKKLNVQW ......................................... EAKKLNIQW ......................................... EAKKLNVQW ......................................... EIKQLNVQW ......................................... EIKQLNVQW ......................................... EVNQLNVQW ......................................... EIKQLNVQW ......................................... EIKRLNVQW ......................................... PSTEFPLDT ......................................... IQHDLTLDG ......................................... EIKTLNPEH ......................................... DIRDLSVKH ......................................... DLSSLDIAE Y Q N N N N N S L V L K N V N E F S N Q SGSAEDYTVF N N N G E I L L D D I N I C D Y N L R D HTRG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..................................................
Surcricr Surratno Mdrleita Pmdlschpo Pglyarath Pgplarath Mdr2musmu Mdr3crigr Mdr3homsa Mdrlmusmu Mdr2crigr Mdrlcrigr Mdr3musmu Mdrlhomsa Mdr4drome Mdr5drome Mdrlcaeel Mdr3caeel Mdrlleien Mdrplafa Mdlescco Ste6sacce Consensus
1701 1750 L R S R L S I I L Q D P V L F S G T I R F N L .... DPE KKCSDSTLWE A L E I A Q L K L V L R S R L S I I L Q D P V L F S G T I R FNL .... DPE K K C S D S T L W E A L E I A Q L K L V V R R H F S M I P Q D P V L F D G T V R QNV .... DPF L E A S S A E V W A A L E L V G L R E R Y R K Q I A L V S Q E P T L Y Q G T V R ENIVLGAS.. K D V S E E E M I E A C K K A N I H E F IRKHIAIVPQ EPCLFGTTIY ENIAYGHEC...ATEAEIIQAATLASAHKF IRKHIAIVPQ EPCLFGTTIY ENIAYGHEC...ATEAEIIQAATLASAHKF LRAQLGIVSQ EPILFDCSIA ENIAYGDNSR .VVPHDEIVRAAKEANIHPF LRAQLGIVSQ EPVLFDCSIA ENIAYGDNSR .VVSQDEIVRAAKAANIHPF LRAQLGIVSQ EPILFDCSIA ENIAYGDNSR .VVSQDEIVSAAKAANIHPF LRAHLGIVSQ EPILFDCSIA ENIAYGDNSR .AVSHEEIVRAAKEANIHQF LRAHLGIVSQ EPILFDCSIA ENIAYGDNSR .VVSQDEIERAAKEANIHQF LRAHLGIVSQ EPILFDCSIA ENIAYGDNSR .VVSQDEIERAAKEANIHQF LRAQLGIVSQ EPILFDCSIA ENIAYGDNSR .VVSYEEIVRAAKEANIHQF LRAHLGIVSQ EPILFDCSIA ENIAYGDNSR .VVSQEEIVRAAKEANIHAF LRSKLGLVSQ EPVLFDRTIA ENIAYGNNFR DDVSMQEIIEAAKKSNIHNF VRTKLGIVSQ EPTLFERSIA ENIAYGDN.R RSVSMVEIIAAAKSANAHSF TRSQIAIVSQ EPTLFDCSIA ENIIYGLD.P SSVTMAQVEEAARLANIHNF LRDNIALVGQ EPTLFNLTIR ENITYGLEN...ITQDQVEKAATLANIHTF WRRNISIVLQ EPNLFSGTVR ENIRYA...R EGATDEEVEEAARLAHIHHE LRNLFSIVSQEPMLFNMSIYENIKFG...REDATLEDVKRVSKFAAIDEF ..QGVAMVQQ D P V V L A D T F L A N V T L G R D .... I S E E R V W Q A L E T V Q L A D V L R K E I S V V E Q K P L L F N G T I R DNLTYGLQ.. D E I L E I E M Y D A L K Y V G I H D F ..................................................
.
!::::~ : > i. :(J .....~
.::: .
9i
9
=
:..:
.:.~
.
.
;.f
Mdrplafa Mdlescco Ste6sacce Consensus
.:.
....t
..
.
. . . . . . . . .
[75
::: ... i:.
7
: ::: ~i~-.::. " .-:3:: :...,. / ...: ....... . . . . . -......... :i .:,-.. : :-
:-33-..3}-2-3I- .. : :: ".s'ii~::~;:-r/~ 3 . .. .... ...::.::... -. :.:-!:.-
t
f-.:...: ==. -[33.i::./:::: .... ..:...: .: .: ..... ..... .:
. ::...:..::.:..... .... ..... ...... ..:
{.-i.::.:i::! ..... . :..:....:.}i ':~..4:~; :~ '.".,:~.~ ........
:.3,.!~!~i 2! ~=.:!~[:y,!}; ::.~. ,.-...,:~. ::- :,: ,..: 4-~.--:. ::
:@:ii3!44:i ..... . ...... :.. {':!:! !7"!
'!
:: :..:. ::;: .:~:...: .:.. .....
.......
: =====================: :::?}i. .......
::ii~{::i;iii~i::,3:.:~}?. -.s:,-::,..:
:::.
.-:
............... . . . . =.==:..i;::.?=:] . :2
ii
.':.. : ... : '4-:...:::. ........... ........... !ii ....i:: i::.::?:;: ~-.~~. .;,
3~-::-:.::-.:
3-:3:.:
................
....
................ ............... . .......... .
......
.;:: :,":-:::..i:.:: : ::: : } .............
i! !:.-..::s .
.
.
:.; ':4- <.: '.":
...... ........ . . . . . ,:.s::. :.-:. :.: :-:~ ............. . ........ ,::s~-:..::::: - .:-..~ : : s!::i. :gb~-.,: .::.3 i: -.:..;.:::,.:::~-- :..::.:: ....... :::::.!:/;:;;.::?}.:,-:~:..-:: :::::::::::::::::::::::./.j;: ..... ..:...
.....
::X .::%:~ :: {:i.:!:~: ...... . ........ : ............. ..~.~:.:~ 9 : .:.:~..: -::-:::-:~,::~:.:v .:
?.::: 9 -:~{.~}3~.:. :-:. ! ~,::,::<:. :: ..:::~: ~:~
..~..::...-:~.~ ~:.-.:~,.~ .: ,~.--,~-::
......
.
::. /-;::!::~. ! ::::; " .4.}. i ,~. ...:.:. : .....
Surcricr Surratno Mdrleita Pmdlschpo Pglyarath Pgplarath Mdr2musmu Mdr3crigr Mdr3homsa Mdrlmusmu Mdr2crigr Mdrlcrigr Mdr3musmu Mdrlhomsa Mdr4drome Mdr5drome Mdrlcaeel Mdr3caeel Mdrlleien Mdrplafa Mdlescco Ste6sacce Consensus
1751 1800 VKALPGGLDA IITEGGENFS QGQRQLFCLA RAFVRKTSIF I.MDEATASI VKALPGGLDA IITEGGENFS QGQRQLFCLA RAFVRKTSIF I.MDEATASI VASESEGIDS RVLEGGSNYS VGQRQLMCMA RALLKRGSGF ILMDEATANI ILGLPNGYNT LCGQKGSSLS GGQKQRIAIA RALIRNPKIL L.LDEATSAL ISALPEGYKT YVGERGVQLS GGQKQRIAIA RALVRKAEIM L.LDEATSAL ISALPEGYKT YVGERGVQLS GGQKQRIAIA RALVRKAEIM L.LDEATSAL IETLPQKYNT RVGDKGTQLS GGQKQRIAIA RALIRQPRVL L.LDEATSAL IETLPQKYKT RVGDKGTQLS GGQKQRLAIR RALIRQPRVL L.LDEATSAL IETLPHKYET RVGDKGTQLS GGQKQRIAIA RALIRQPQIL L.LDEATSAL IDSLPDKYNT RVGDKGTQLS GGQKQRIAIA RALVRQPHIL L.LDEATSAL IESLPDKYNT RVGDKGTQLS GGQKQRIAIA RALVRQPHIL L.LDEATSAL IESLPDKYNT RVGDKGTQLS GGQKQRIAIA RALVRQPHIL L.LDEATSAL IDSLPDKYNT RVGDKGTQLS GGQKQRIAIA RALVRQPHIL L.LDEATSAL IESLPNKYST KVGDKGTQLS GGQKQRIAIA RALVRQPHIL L.LDEATSAL ISALPQGYDT RLG.KTSQLS GGQKQRIAIA RALVRNPKIL I.LDEATSAL IISLPNGYDT RMGARGTQLS GGQKQRIAIA RALVRNPKIL L.LDEATSAL IAELPEGFET RVGDRGTQLS GGQKQRIAIA RALVRNPKIL L.LDEATSAL VMGLPDGYDT SVGASGGRLS GGQKQRVAIA RAIVRDPKIL L LDEATSAL IIKWTDGYDT EVGYKGRALS GGQKQRIAIA RGLLRRPRLL L.LDEATSAL IESLPNKYDT NVGPYGKSLS GGQKQRIAIA RALLREPKIL L.LDEATSSL ARSMSDGIYT PLGEQGNNLS VGQKQLLALA RVLVETPQIL I.LDEATASI VISSPQGLDT RID..TTLLS GGQAQRLCIA RALLRKSKIL I.LDECTSAL ..................................................
Surcricr Surratno Mdrleita Pmdlschpo Pglyarath Pgplarath Mdr2musmu Mdr3crigr Mdr3homsa Mdrlmusmu Mdr2crigr Mdrlcrigr Mdr3musmu Mdrlhomsa Mdr4drome Mdr5drome Mdrlcaeel Mdr3caeel Mdrlleien Mdrplafa Mdlescco Ste6sacce Consensus
1801 1850 DMATENILQKVV..MTAFAD RTVVTIAHRV HTILSADLVM VLKRGAILEF DMATENILQK VV~ RTVVTIAHRV HTILSADLVM VLKRGAILEF DPALDRQIQA TV~ YTVITIAHRL HTVAQYDKII VMDHGVVAEM DSHSEKVVQE ALN..AASQG RTTVAIAHRL SSIQDADCIF VFDGGV .... DAESERSVQE ALD..QACSG RTSIVVAHRL STIRNAHVIA VIDDGK .... DAESERSVQE ALD QACSG RTSIVVAHRL STIRNAHVIA VIDDGK DTESEKVVQE ALD..KAREG RTCIVIAHRL STIQNADLIV VIENGK .... DTESEKVVQE ALD..KAREG RTCIVIAHRL STIQNADLIV VIQNGK .... DTESEKVVQE ALD..KAREG RTCIVIAHRL STIQNADLIV VFQNGR .... DTESEKVVQE ALD..KAREG RTCIVIAHRL STIQNADLIV VIENGK .... DTESEKVVQE ALD..KAREG RTCIVIAHRL STIQNADLIV VIQNGK .... DTESEKVVQE ALD..KAREG RTCIVIAHRL STIQNADLIV VIQNGK .... DTESEKVVQE ALD..KAREG RTCIVIAHRL STIQNADLIV VIQNGK .... DTESEKVVQE ALD..KAREG RTCIVIAHRL STIQNADLIV VFQNGR .... DLESEKVVQQ ALD..EARSG RTCLTIAHRL TTVRNADLIC VLKRGV .... DLQSEQLVQQ ALD..TACSG RTCIVIAHRL STVQNADVIC VIQNGQ .... DTESEKVVQE ALD..RAREG RTCIVIAHRL N T V M N A D C I A V V S N G T .... DTESEKIVQE ALD..KARLG RTCVVIAHRL STIQNADKII VCRNGK .... DSVTEAKVQE GIEAFQAKYK VTTVSIAHRL TTIRHCDQII LLDSGC .... DSNSEKLIEK TIVDIKDKAD KTIITIAHRI ASIKRSDKIV VFNNPDRNGT DSGTEQAIQH ALA..AVREH TTLVVIAHRL STIVDAATIL VLHRGQ .... DSVSSSIINE IVKKGPPALL TMVITHSEQM MRSCNSIAVL KDGKVVERGN ..................................................
Surcricr Surratno Mdrleita
1851 1900 D K P E T L L S Q K D S V F A S F V ..... RADK ....................... DKPEKLLSQK DSVFASFV ..... RADK ....................... GSPRELVMNH QSMFHSMVES LGSRGSKDFY ELLMGRRIVQ PAVLSD ....
........ ::.:-.:.:. 7-: .<. ::-
:?%-i-.i ::: }-~ .::...:...::>:. ........ .3:!-::::.:~::/{::":. :;i ~.. :::: .:.ii!] i: .: :r ii: -s~:-.:::-.: :::'.:: ::::::::::::::::::::::::::::::::: ~i:.i:
17~
Pmdlschpo Pglyarath Pgpla[ath ~i!iii~/: M d r 2 m u s m u .i!::!!<:i:!:.:.~ii'~. M d r 3 c r i g r Mdr3homsa Mdrlmusmu Mdr2crigr Mdrlcrigr iii!i!i
.
.
.
iii!ii!;i~.!i!ii:i. . i)~: <..;......... :
.TCEAGTHAE LVKQ..R ..... GRYYELVVEQGLNKA ............. .VAEQGSHSH LLKNHPD ..... GIYARMIQ LQRFTHTQVI GMTSGSSSRV .VAEQGSHSH LLKNHPD ..... GIYARMIQ LQRFTHTQVI GMTSGSSSRV .VKEHGTHQQ LLAQ..K ..... GIYFSMVNIQAGTQNL ............ .VKEHGTHQQ LLAQ..K ..... GIYFSMVNIQAGAQNS ............ .VKEHGTHQQ LLAQ..K ..... GIYFSMVS VQAGTQNL ............ .VKEHGTHQQ LLAQ..K ..... GIYFSM.. VQAGAKRS ............ .VKEHGTHQQ LLAQ..K ..... GIYFSM.. VQAGAKRL ............ .VKEHGTHQQ LLAQ..K ..... GIYFSMVS VQAGAKR ............. .VKEHGTHQQ LLAQ..K ..... GIYFSMVS VQAGAKRS ............ .VKEHGTHQQ LLAQ..K ..... GIYFSMVS VQAGTKRQ ............ .VVEQGNHMQ LISQ..G ..... GIYAKLHK TQKDH ............... .IIEKGTHTQ LMSE..K ..... GAYYKLTQ KQMTEKK ............. . A I E E G T H Q T L L A R . . R . . . . . G L Y Y R L V E KQSS . . . . . . . . . . . . . . . . .IIEQGSHEE LMALGGEYKT RYDLYMSALS .................... F V Q S H G T H D E L L S A Q D .... GIYKKYVK LAK ................. .AVEQGTHQQ LLRPRDVLAD VSTATCGRRA GSQRA FDTLYNNRGE LFQIVSNQSS ............... ..................................................
1901 P g l y a r ath K E D D A Pgp lar ath K E D D A C o n s e n s u s .....
<:.??~ii~;.:~i?>: .................
Proteins listed subsequently in italics are at least 90% identical to the paired iii!j!!i~i,i~:< transporters listed in parenthesis and are therefore not included in the alignment: Hlybprovu, Hly2escco (Hlybescco); Mdrlratno (Mdrlmusmu); Tap2ratno (Tap2musmu). Residues listed in the consensus sequence are present in at least 75% of the aligned transporter sequences. Residues indicated in boldface type are also conserved in at least one other family of i<:;7:1}:}~i the ABC transporter superfamily. Database accession numbers ;!:!'~!i<:'~/::~:,ii! Aprdpseae Atmlsacce Chvaagrtu Comastrpn Cvabescco Cyabborpe Cydcescco Cydchaein Cyddescco Cyddhaein ~:)!i!!i)
Y.
z:
:
...
..
.
f:.;?:ii :::.:::: :,:.-:.,
SWISSPROT
PIR
EMBL/GENBANK
Q03024 P40416 P18768 Q03727 P22520 P18770 P23886 P45081 P29018 P45082
$26696
X64558; G45280 X82612; G575393 M24198; G142224 M36180; G520738 X57524; G41176 X14199; G39733 L10383; G145165 L45792; G1006503 L21749; G347240 L45793; G1006505 M38052 M31722; G 142020 M10133; G146380 X53955; G38647 M65808; G 141821 M14107; G150683 M20730; G150495 X12852; G45904 Z14055; G4972 U04057; G433323
P22638 P10089 P23702 P26760 P08716 P16532 P11599 Q02592 P37608
A32810; VXAGGA A39203 S12272; IKEC5B S02386; BVBRCB PS0228; B36888 B40632 A41464 A35391 B24433; LEECB S12601; A61378 A40366; S 18855 S10057 A32051; $29517 S05477; LEEBBV $25198
17"
P-Glycoprotein transporter family
~::.,~;.~::.::::.:. ~::~:~-~
Lcnclacla Mdlescco Mdllsacce Mdl2sacce Mdrleita Gi:;iii~ijli i...l Mdrplafa ::::.~::.:.ii:ii. ;.:.:Y:i Mdrlcaeel :~i!.:?:,~?.i:i!ii Mdrlcrigr Mdrlhomsa Mdrlleien Mdrlmusmu Mdrlratno Mdr2crigr Mdr2musmu Mdr3caeel Mdr3crigr Mdr3homsa Mdr3musmu Mdr4drome i:.?i:,!.~:~S:!:i; MdrSdrome Msbaescco Msbahaein ( .:::(::.i ~I ~?; I Mt2ratno :.:::x~-, ::i:---:,~ ii;-,:-i ii!i!!:;~Natabacsu !?!iill Ndvarhime Nistlacla Peddpedac Pglyarath Pgp 1arath Pmdlschpo Prtderwch :.:.:..:..-.....:: )-i Rt3bactpl Spabbacsu Ste6sacce Surcricr Surratno Syrdpsesy Taplhomsa Taplmusmu Tap2homsa ~iL(:;iil : Tap2musmu y.h!:._._j Tap2ratno :..:~.::~.:..,...:.;.:
-
~:-:::.:.:::...:,..~.:.-~.:
:'.j:.:.:~.~:::.:.;;-% ::
..... .:: ~:.?... :. : :... ::,:;:: ::~
..............
~--::~:?--.:$,~-:..-t
-.:..s:.:. :........~
.
_
-: .: :.--:-- :::: ,_ ._
..
f.::
. . . .
: :?,. . . . i ::+
v i ?::: i
SWISSPR OT
PIR
EMBL/GENBANK
Q00564 P30751 P33310 P33311 P21441 P 13568 P34712 P21448 P08183 Q06034 P06795 P43245 P21449 P21440 P34713 P23174 P21439 P21447 Q00449 Q00748 P27299 P44407
B43943
M90969; G387688 L08627; G 146801 U17246; G577195 L16959; G311095 X17154; G9659 M29154; G 160399 X65054; G6809 M60040; G191165 M14758; G307180 L08091; G159370 M14757; G387426 M81855 M60041; G191167 J03398; G387428 X65055; G6811 M60042; G191169 M23234; G307181 M30697; G387429 M59076; G157871 M59077; G157875 Z11796; G42023 L44704; G1005423
P46903 P18767 Q03203 P36497 P36619 P23596 Q04473 P33116 P 12866 Q09427 Q09429 P33951 Q03518 P21958 Q03519 P36371 P363 72
$42681 $42682 A34207; A32547; $27337 A38696; A25059;
DVLNS DVZQF DVHY1C DVHU1
A33719; D V M S 1 B27126; DVHY2C A30409; DVMS2 $27338 JS0051; DVHU3 A34175; DVMS1A A41249 B41249 $21588; $27998 $25577 U30873; G973330 A31094; VXZRNA S38 790 D48941 A42150 $21957 $20548 S12525 B43935 S05789; DVBYS6 $27646; $37347 A41538 $27333 A37779 A40224; $27334 A44135 S19603
M20726; G152269 X68307; CA4044 M83924; G150568 X613 70 X613 70 D10695; G218550 M60395; G148484 L12145; G470686 M86869; G 143 715 X 15428; G4564 L40623; G1311522 L40624; G 1311534 M97223; G151562 X66401; G34636 M55637; G199305 X66401; G34638 M90459; G199434 X63854; G56719
References 1 Chen, C-J. et al. (1986) Cell 47, 381-389. 2 Felmlee, T. et al. (1985)J. Bacteriol. 163, 94-105. a Glaser, P. et al. (1988) EMBO J. 7, 3997-4004.
4 Endicott, J.A. and Ling, V. (1989) Annu. Rev. Biochem. 58, 137-171.
(.:
-.i:.~:.::~!:.:! :..: 9 i!~-::i:: ::::i.!
.~. . :.:., .......
j
;(!!:(;,qH: i
....: ....
.
._,_
s Valverde, M.A. et al. (1992) Nature 355, 830-833. 6 Gill, D.R. et al. (1992) Cell 71, 23-32. 7 Aguilar-Bryan, L. et al. (1995) Science 268, 423-426.
s Higgins, C.E (1992) Annu. Rev. Cell Biol. 8, 67-113. 9 Beck, S. et al. (1992) J. Mol. Biol. 228, 433-441.
17[
Peroxisomal membrane transporter family Summary
!ii!!ii !ii l
Transporters of the peroxisomal membrane family, examples of which are the human adrenoleukodystrophy protein (Aldhomsa) 1'2, and the rat 70kDa peroxisomal membrane protein 3 (Pmp7ratno), mediate import of large molecules - proteins or long chain fatty acids - through the peroxisomal membrane. Mutations in human ALDP give rise to the serious genetic disease adrenoleukodystrophy 4. As this gene is located on the X~i~i!i~F: chromosome, the disease only occurs in males; it affects about 1 in 20000 live male births. Mutations in human 70kDa peroxisomal membrane protein have been implicated in the Zellweger syndrome, an inherited defect of peroxisome assembly s. Members of the peroxisomal membrane transporter family have only been found in mammals and fungi. Statistical analysis of multiple amino acid sequence comparisons places the peroxisomal membrane transporter family within the multidrug resistance !].i.ili ! :.:i :i.:: subdivision of the ATP binding cassette (ABC) superfamily 6. Proteins in this superfamily use the energy of ATP hydrolysis to pump substrates across cell membranes. Members of this family consist of a single transmembrane domain followed by an ATP binding domain; the functional transporter is expected to be a dimer and may be glycosylated. Unusually, the transmembrane domains of proteins in this family are predicted to contain ~ii!i:!i!Ji.! five membrane-spanning helices by the hydropathy of the amino acid !=i!ii!:i.il sequences, so the functional transporters will contain ten such helices. The ATP binding domain has been found experimentally to be exposed to i!~i!?ih:~i the cytosol 3, leaving the N-terminus extracellular, similar to the topology of the transmembrane subunits of the histidine transporter from Salmonella typhimurium 7. Many residues, including several long sequence motifs, are well conserved within the peroxisomal membrane transporter family, including motifs i"ii.iiLii unique to the family, signature motifs of the ABC superfamily, and motifs necessary for function by the criterion of site-directed mutagenesis. These 2 2:::::? sequence motifs occur throughout the protein, but are most common within the ATP binding domains. ..:.qi ,~. : -:: .-.--
:-.:q : : . , : .
..
Nomenclature, biological sources and substrates CODE
Aldhomsa : ::::::,....--, -i
Aldmusmu Aldrrnusmu
~ii:L~ :
Pat2sacce
Pmp7hornsa
DESCRIPTION [SYNONYMS] Adrenoleukodystrophy protein [VLCFA-CoA, synthetase, ALDPl Adrenoleukodystrophy protein [VLCFA-CoA synthetase homolog, ALDGH] ALDR
ORGANISM SUBSTRATE(S) [COMMON NAMES] Homo s a p i e n s VLCFA-CoA [human] synthetase? Mus musculus [mouse]
VLCFA-CoA synthetase?
Mus rnusculus [mouse] Saccharornyces cerevisiae [yeast]
VLCFA-CoA synthetase? Long chain fatty acids
Peroxisomallong chain fatty acid import protein 2 [PAT2, PXA1, PAL1, SSH2, YPL147W,LPI1W] 70 kDa peroxisomal membrane Homo s a p i e n s protein [PXMP1, PMP70] [human]
Peroxisomal matrix enzymes
17 ~]
SUBSTRATE(S) DESCRIPTION ORGANISM [SYNONYMS] [COMMON NAMES] Peroxisomal Pmp7musmu 70 kDa Peroxisornal membrane Mus musculus matrix enzymes protein [PMP70] [mouse] Peroxisomal Pmp7ratno 70 kDa Peroxisomal membrane Rattusnorvegicus matrix enzymes protein [PMP70] [rat] CODE
Phylogenetic tree ii:.:i
!2:!
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and are therefore not included in the phylogenetic tree: Aldmusmu (Aldhomsa); Pmp7musmu, Pmp7ratno (Pmp7homsa).
::...
.
.
.
Pmp7homsa i::.:i?;:.< :::
i
<:ii~::i:~,i:.i!:
Aidhomsa
Aldrmusmu
:.:
< :,,.
:
Pat2sacce
i!:}:
!ii!il~I!~II,<.-I:!
<~,~"%:i ,-
!i~~i~iiil-~ ~ii:!
::k......
i:
ili;i?><::: : ...
Proposed orientation of ALD ~ in the membrane
.
k:
The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the outside and is folded five times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75% of the aligned transporters (see below) are
shown.
18(
OUTSIDE y ~ K i
'1
:iiGiiii:ii:~ii!
I
R
NH
~,
A
2
N D
R
LNSYL--
NPDQ SLT 1
VS__
F--
VD
L A RI ET G
R F
K L P E T
TYYK
Y L R
LAG TAR R
I--I
L
L G
L Y
GI E
Vl
NP
GI WSi
V NL
I
L t
-, El
~
P G
iiiii!{i!i~i-!i:::::i:.~i!
~i~:i!i~:i:i:i~i;
L
N
PT
YKI
W
R F L S S K G C G
!,h,i PII
L
V.
G
M
9
Y
!!}{!ii~::~;~:!-:!:ili
R
I
LY
G
, H SRI NSEEIA
e
L
V I/
I
K
W D
K P G G E
M
R p
I H
R F
S
L
R
V
G L R D Q V
D
Y
G
[ DS
K I D
I
L I L
.::~::~i~:::::::::::::::::::::::::: :.~::~ :i~.::::::i:::::::::f:i:.~::~ ~
~.~
D
V L S G G E K Q R
F Y I p
Q
INSIDE
E-E T
I
COOH
( I I ! I '
A
I
I
Y
I
H
I
L D E C T S A V S
(
K P y A
I I
I ( I I '
[
Physical and genetic characteristics .............. ::2.~!::;:::~:::::::::::::::
:!!:!::~:!::? ! !:::!'i?!
i ~i!i~!i;~ii :i !:iii:i~li ============================== .<:::
Aldhomsa Aldmusmu Pmp7homsa Pmp7musmu Pmp7ratno Pat2sacce Aldrmusmu
AMINO ACIDS
MOL. WT
EXPRESSION SITES
CHROMOSOMAL LOCUS
745 736 659 660 659 758 741
82 908 81 858 75 475 75 482 75 315 86955 83 483
liver liver liver liver liver
Xq28 lp22-p21 Chromosome 16
Multiple amino acid sequence alignments 1
Pmp7homsa
:.:,,:~:::::::::::::::::::::: .............
gldhomsa Aldrmusmu Pat2sacce Consensus
.....
5o MAAFS
KYLTARNSSL
AGAAFL
RPRPWRGNTL
KRTAVLLALA
MQ LDSGARIMYI
PEVELVDRQS
..... MPVLS
MIHMLNAAAY ........
RVKWTRSGAAKRAACLVA.A
...............
R .......
..........
LLCL
LHKRRRALGL
AYGAHKVYPL
VRQCL
PDDNKFMNAT
DKKKRKRIFI
AYALKTLYPI
A.L ................
.....
IGKRLKQPGH
K .......
181
.
51 Pmp7homsa HGKKS ....... GKPPLQNN Aldhomsa ..APARGLQA PAGEPTQEAS Aldrmusmu RKAKAEAYSP AENREILHCT Pat2sacce PPKDNDVYE..HDKFLFKNV Consensus ....................
I
CDVWMIQNGT LSVYVARLDG LSIYVAGLDG LSLFVAKLDG LS..VA.LDG
150 LIESGIIGRS RKDFKRYLLN RLARCIARKD PRAFGWQLLQ KIVKSIVEKK PRTFIIKLIK QIVKNIIAGR GRSFLWDLGC .I...I . . . . . R . F . . . L . .
151 Pmp7homsa FIAAMPLISL Aldhomsa WLLIALPATF Aldrmusmu WLMIAIPATF Pat2sacce WFLIAVPASY ConsensusW..IA.PA..
VNNFLKYGLN ELKLCFRVRL VNSAIRYLEG QLALSFRSRL VNSAIRYLEC KLALAFRTRL TNSAIKLLQR KLSLNFRVNL VNSAI.YL...L.L.FR.RL
200 TKYLYEEYLQ A.FTYYKMG. VAHAYRLYFS Q.QTYYRVS. VDHAYETYFA N.QTYYKVI. TRYIHDMYLD KRLTFYKLIF .... Y . . Y . . . . . TYYK...
201 Pmp7homsa ...NLDNRIA Aldhomsa ...NMDGRLR Aldrmusmu ...NMDGRLA Pat2sacce DAKASNSVIK C o n s e n s u s ...N.D.R..
NPDQLLTQDV NPDQSLTEDV NPDQSLTEDI NIDNSITNDV NPDQSLT.DV
Pmp7homsa Aldhomsa Aldrmusmu Pat2sacce Consensus
i01 FCKETGYLVL LCRETGLLAL VTTETGWLCL FDKNFLLLTA ...ETG.L.L
i00 EKEGKKERAV VDKVFFSRLI QILKIMVPRT GVAAAK..AG MNRVFLQRLL WLLRLLFPRV EIICKKPAPG LNAAFFKQLL ELRKILFPKL ELERAKNSQL FYSKFLNQMN VLSKILIPTV E .... K . . . . . . . . F . . . L . . L . K I L . P . .
IAVMLVSRTY HSAALVSRTF HSVALISRTF QIFFLVMRTW .... LVSRT.
I
I
[82
EKFCNSVVDL YSNLSKPFLD VAFAASVAHL YSNLTKPLLD MMFSQSVAHL YSNLTKPILD AKFCDATCSV FANIAKPVID ..F..SV..LYSNL.KP.LD
250 IVLYIFKLTS VAVTSYTLLR VILTSYTLIR LIFFSVYLRD .... S..L..
Pmp7homsa Aldhomsa Aldrmusmu Pat2sacce Consensus
251 A I G A Q G P A .... S M M A Y L V V . . S G L F L T R L RRPIGKMTIT AARSRGAGTA WPSAIAGLVVFLTANVLRAF SPKFGELVAE TATSRGASPI GPTLLAGLVVYATAKVLKAC SPKFGSLVAE N L G T V G V A G I ...... FVNY F I T G F I L R K Y T P P L G K L A G E ..... G . . . . . . . . . . . L V V ...... L .... P . . G . L . . E
300 EQKYEGEYRY EARRKGELRY EAHRKGYLRY RSASDGDYYN E .... G . . R Y
Pmp7homsa Aldhomsa Aldrmusmu Pat2sacce Consensus
301 VNSRLITNSE MHSRVVANSE VHSRIIANVE YHLNMINNSE .HSR.I.NSE
Pmp7homsa Aldhomsa Aldrmusmu Pat2sacce Consensus
351 400 SIIAKYLATV VGYLVVSRPF L .............. DLSHP RHLKSTHSEL QFLMKYVWSA SGLLMVAVPI ITATGYSESD AEAVKKAALE KKEEELVSER QFLMKYVWSS CGLIMVAIPI ITATGFADGD LEDGPKQA ....... MVSDR D Y V L K Y T W S G L G Y V F A S I P I V M S T L A T G I N SEE . . . . . . . . . . . . . . . KN .... K Y . W S . . G . . . V . . P I ..............................
Pmp7homsa Aldhomsa Aldrmusmu Pat2sacce Consensus
401 LEDYYQSGRM TEAFTIARNL TEAFTTARNL MKEFIVNKRL .E ........
350 EIAFYNGNKR EKQTVHSVFR KLVEHLHNFI LFRFSMGFID EIAFYGGHEV ELALLQRSYQ DLASQINLIL LERLWYVMLE EIAFYRGHKV EMKQLQKCYK ALAYQMNLIL SKRLWYIMIE EIAFYQGTAV ERTKVKELYD VLMEKMLLVD KVKFGYNMLE E I A F Y . G . . V E ....... Y . . L ..... L .... R . . Y . M . E
LLRMSQALGR LTAAADAIER LASGADAIER MLSLADAGSR L ..... A . . R
IVLAGREMTR IMSSYKEVTE IMSSYKEITE LMHSIKDISQ I ..... E.T.
450 LAGFTARITE LMQVLKDLNH LAGYTARVHE MFQVFEDVQR LAGYTARVYN MFWVFDEVKR LTGYTNRIFT LLSVLHRVHS L A G . T A R . . . . . . V ......
~!:~i}i. 7".
i.
-.:L:
i;i .:( . -!i.? ....
.
,.:
.
Pmp7homsa Aldhomsa Aldrmusmu Pat2sacce Consensus
451 500 GKYE .... RT MVSQQEK ...... GIEGVQV IPLIPGAGEI IIADNIIKFD CHFK .... RP RELEDAQAGS G T I G R S G V R V E G P L K I R G Q V VDVEQGIICE GIYK .... RT VT.QEPENHS KRGGNLELPL SDTLAIKGTV IDVDHGIICE LNFNYGAVPS ILSIRTEDAS RNSNLLPTTD NSQDAIRGTI QRNFNGIRLE ........ R . . . . . . . . . . . . . . G ........... I.G ....... GI..E
Pmp7homsa Aldhomsa Aldrmusmu Pat2sacce Consensus
HVPLATPN ..... GDVLIRD NIPIVTPS ..... GEVVVAS N V P I I T P A ..... GEVVASR NIDVIIPSVR ASEGIKLINK N.P..TP ...... G.V ....
LNFEVR . . . . . . . . . . . . . . . . . . . . . . . . LNIRVE . . . . . . . . . . . . . . . . . . . . . . . . LNFKVE . . . . . . . . . . . . . . . . . . . . . . . . LTFQIPLHID PITSKSNSIQ DLSKANDIKL LN..V . . . . . . . . . . . . . . . . . . . . . . . . .
Pmp7homsa Aldhomsa Aldrmusmu Pat2sacce Consensus
551 ..... SGANV LICGPNGCGK ..... EGMHL LITGPNGCGK EGMHL LITGPNGCGK PFLQGSGSSL LILGPNGCGK ...... G . . L L I . G P N G C G K
SSLFRVLGEL W P L F . . G G R L SSLFRILGGL W P T Y . . G G V L SSLFRILSGL WPVY EGVL SSIQRIIAEI W P V Y N K N G L L SSLFR.L..LWP.Y...G.L
Pmp7homsa Aldhomsa Aldrmusmu Pat2sacce Consensus
601 VPQRPYMT.L IPQRPYMS.V IPQRPYMS.L IPQKPYFSRG IPQRPYMS..
Pmp7homsa Aldhomsa Aldrmusmu Pat2sacce Consensus
REGG...WDS REGG...WEA REGG...WDA RGVGLTYLDA REGG...WDA
Pmp7homsa Aldhomsa Aldrmusmu Pat2sacce Consensus
DVEGYIYSHC RKVGITLFTV DVEGKIFQAAKDAGIALLSI DVEGKIFQAAIGAGISLLSI DMEDYLFNLL KRYRFNFISI DVEG.IF ...... GI.L.SI
501
550
....
. . . .
-:,- -:::... 7
..
.:. ....
.......
!,~-.... ~ : .:.
,11':217)::' :....
.... .
.
Pmp7homsa Aldhomsa Aldrmusmu Pat2sacce Consensus
......
..
651
701
751
GTLRDQVIYP DGREDQKRKG GSLRDQVIYP DSVEDMQRKG GSLRDQVIYP DSADDMREKG GTLRDQIIYP MSSDEFFDRG G.LRDQVIYPDS..D...KG VQDWMDVLSG MCDWKDVLSG VMDWKDVLSG IADWKDLLSG ..DWKDVLSG
ISDLVLKEYL YSEQDLEAIL YTDQDLERIL FRDKELVQIL ..D..L..IL
600 TKPERGKLFY YKPPPQRMFY YKPPPQHMFY SIPSENNIFF .KP ..... FY 650 DNVQLGHILE DVVHLHHILQ HSVHLYHIVQ VEVKLDYLLK ..V.L.HI.. 700
GEKQRMAMAR LFYHKPQFAI L D E C T S A V S V GEKQRIGMAR MFYHRPKYAL LDECTSAVSI GEKQRMGMAR MFYHKPKYAL LDECTSAVSI GEKQRVNFAR I M F H K P L Y V V L D E A T N A I S V G E K Q R . . M A R . F Y H K P . Y A . LDECTSAVS. SHRKSLWKHH THRPSLWKYH THRPSLWKYH SQRPTLIKYH .HRPSLWKYH
EYYLHMDGRG THLLQFDGEG THLLQFDGEG EMLLEIGENR ..LL..DG.G
750
NYEFKQ . . . . GWKFEKLDSA GWRFEQLDTA DGKWQLQAVG ...F ...... 800
.... ITEDTV EFGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A R L S L T E E K Q RLEQQLAGIP KMQRRLQELC Q I L G E A V A P A HVPAPSPQGP IRLTLSEEKQ KLESQLAGIP KMQQRLNELC KILGEDSVLK TIQTPEKTS. TDEAITSIDN EIEELERKLE RVKGWEDERT KLREKLEII . . . . . . . . . . . ..... TE . . . . . E . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
801
.:
Pmp7homsa . . . . . . . . Aldhomsa GGLQGAST Aldrmusmu . . . . . . . . Pat2sacce . . . . . . . . Consensus . . . . . . . .
182
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and are therefore not included in the alignment: Aldmusmu (Aldhomsa); PmpTmusmu, Pmp7ratno (Prnp7homsa). Residues listed in the consensus sequence are present in at least 75 % of the aligned transporter sequences. Residues indicated in boldface type are also conserved in at least one other family of the ABC transporter superfamily.
i!i ~!~:.i::.~il;~;i~i: :::2 ~iF:.i~-i:.::ii !;ii.:i. :j iii-i!?::i!?~:?::~:.." ~! .~:., ~:::::~.,~:~. ~....:
~:::::~;:::::::::::::::::::::" .i
~,:.:~.~:..n.~:.:.:,... 9 -.: :::::::::::::::::::::::::::::::::::::
i,;~:-i ::.i;:;:.~:-i~::.;:. i
........ .: ,. :~:..::......
,. :. ~:::~.:.::.......::
:,...::..~:::~::::~,:;,
~.:~.~:::~.:~::::::~:::.~:::....-..::::
!il;~?S:!i:ii?' ~::'!, ; :~ ::::.. ::~.:~.,v:
:::::::::~::,:..::::~:-4 ::::::::::::::: .....
:::::::::::::::::::::::::::::::::::::::
184
Database accession numbers SWISSPROT Aldhomsa P33897 Aldmusmu P48410 Aldrmusmu Pat2sacce P41909 Pmp7homsa P28288 Pmp7musmu Pmp7ratno P 16970
PIR $30059
$20313 JS0371; A35723
EMBL/GENBANK Z21876; G38591 7__33637; G520955 Z48670 U17065; G619668 M81182; G190129 L28836 D90038; G220862
References 1 Mosser, J. et al. (1993) Nature 3 6 1 , 7 2 6 - 7 3 0 . 2 Aubourg, P. et al. (1993) Biochimie 75, 293-302. a Kamijo, K. et al. (1990) J. Biol. Chem. 265, 4534-4540. 4 Moser, H.W. and Moser, A. (1989) In The Metabolic Basis of Inherited Disease (Scriver, C.L. et al., eds). McGraw-Hill, New York, pp. 1511-1532. s Gartner, J. et al. (1992) Nature Genet. 1, 16-23. 6 Higgins, C.F. (1992) Annu. Rev. Cell Biol. 8, 67-113. 7 Higgins, C.F. et al. (1982) Nature 298, 723-727.
Part 4
ABC-2 Transporters
ABC-2 nodulation protein family Summary
iiiiiii!ii ! !ii :::::::::::::::::::::::-~-~i:.i~::
iii!ii~i:ii?~i~
............
N
Members of the ABC-2 nodulation protein family, the example of which is the NODJ nodulation protein from Azorhizobium ~ (Nodjazoca) mediate export of modified oligosaccharides - probably modified fl-l,4-1inked Nacetylglucosamine oligosaccharides. With the ABC transporter protein NODI * they comprise a membrane transport complex involved in the nodulation process. Members of the family have been found only in gramnegative bacteria. Statistical analysis of multiple amino acid sequence comparisons indicates that the ABC-2 nodulation protein family, with the polysaccharide export family, comprise a superfamily which has been termed ABC-2-type transporters z. These proteins are not homologous to any other families of ATP binding cassette (ABC) transporters, although their transport mechanism is believed to involve bound ATP and to be similar to that of other ABC transporters ~. These two families share several amino acid sequence motifs ~'3. They are both predicted to contain six membranespanning helices by the hydropathy of their amino acid sequences. Many residues, including several long sequence motifs, are well conserved within the ABC-2 nodulation protein family, including motifs unique to the family, signature motifs of the ABC-2 superfamfly, and motifs necessary for function by the criterion of site-directed mutagenesis.
Nomenclature, biological sources and substrates
.iil i~i;i!i~i!:!i!i~
CODE
DESCRIPTION [SYNONYMS]
Nodj~zoca
Nodulationprotein J [NODJ]
ORGANISM [COMMON NAMES] Azorhizobium caulinodans
Nodjbraja
Nodulationprotein J [NODJ]
Bradyrhizobium japonicum
Nodjrhiga
Rhizobium galegae
Nodjrhile
Nodulationprotein J [NODJ] Nodulationprotein J [NODJ]
Rhizobium leguminosamm
Nodjrhilt
Nodulationprotein J [NODJ]
Rhizobium leguminosarum
SUBSTRATE(S)
Modified oligosaccharide
[gram-negative bacterium] Modified oligosaccharide
[gram-negative bacterium] [gram-negative bacterium]
Modified oligosaccharide Modified oligosaccharide
[gram-negative bacterium] [gram-negative bacterium]
18~
Modified oligosaccharide
P h y l o g e n e t i c tree ::: .: : : . .
:....:
Nodjrhile
:
t:<~
-,~,-~.-,,~-~..:
.::::::~:~.:::~:~ .::.:::::::~.~ ..................... ~........
Nodjrhilt
:.:.:.:.2-.:.:.:...:::.X:
.............. ...............
Nodjrhiga
:::::::::::::::::::::: ..... ........... :::::::::::::::::::::::::
Nodjbraja
| N!',i~i!',;,i ~
g
Nodjazoca
2
..................
,.....
P r o p o s e d o r i e n t a t i o n of N O D J ~ in t h e m e m b r a n e .................................
~g~i!!!!!,,:,~,:::::::::::::::::::::: .......
:::::::::::::::::::::::::: .....
o~.~.,:~::~.::..: ...... ..........................
7.?:?[~:.:iC: .............
........................
The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the inside and is folded six times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75 % of the aligned transporters {see below} are shown.
18?
OUTSIDE LPL HS
~~ i D
i i il; ,: !i :!i !:~ VG
!|
_
QJ .
.
.
.
:i!i.!i::~:'j,!:!~i:-:i;:~ii::ii:
I~,~. ~!mil]ml
I~,LF ~
............ . .
Ip
I,,A
s!!i'c
....<.:: .:..
NR R
~.~.t
MRTWEML
W V A
iiiiiiiiii;iliiiiilt
!!7
INSIDE
W
N P
Physical and genetic characteristics Nodjazoca Nodjbraja Nodjrhiga Nodjrhile Nodjrhih
AMINO ACIDS
MOL. W T
CHR O M O S O M A L LOCUS
254 262 168 259 262
27 701 28 194 17 825 27 698 28 033
nod locus 1 nod gene cluster nod box Plasmid pRL 1JI nodIJ region
iii~i~iii:, Multiple amino acid sequence alignments il;!!i!~ ii!:iii~i!~; ~
188
Nodjrhile Nodjrhilt Nodjrhiga Nodjbraja Nodjazoca Consensus
1
...MGVATLP MSGDSVTALP MGRVSTEALP MDDGYASVMP ........ MR ......... p
AGGLNWLAVWRRNYLAWKKA ALASILGNLA GGSLNWIAVWRRNYIAWKKA ALASLLGHLA SGLLNWVAVWRRNFLAWKKV APASLLGNLA ANAYNWTAVWRRNYLAWRKV ALASLLGNLA ERMVTWAAVF QRNAMSWRRE MAASVLGSVI .... N W . A V W R R N . . A W . K . A . A S . L G . L A
50
DPVIYLFGLG EPLIYLFGLG DPMIYIFGLG DPITNLFGLG DPLIMLFGLG DP.I.LFGLG
:::::::::::::::::::::::
~!~iiiiiil;i,~:iiii,;i ~ii!;!!::iii:s
::::::::::::::::::::::::::
27"?"z?
~:;:.i::::::.:::.i~i:i::::ii ~
~i!:,!!!:',:,i:!~), !~,::!!::! =:.:..::::::.
Nodjrhile Nodjrhilt Nodjrhiga Nodjbraja Nodjazoca Consensus
51 AGLGVMVGRV AGLGVMVGRV SGLGVMLGNV FGLGLIVGRV VGLGKIVDSV .GLG..VG.V
i01 Nodjrhile EAMLYTQLTQ Nodjrhilt EAMLYTQLRL Nodjrhiga EAMLYTKLTL Nodjbraja EGILFTQLTL Nodjazoca KSMRYAPICV ConsensusE.MLYT.L.. 151
DGVSYTAFLA GGVSYTAFLA GGVSYSAFLA EGTSYIAFLA DGRSYAEFLA .G.SY.AFLA
i00 AGMIATSAMTAATFETIYAAFGRMQGQRTW AGMVATSAMTAATFETIYAAFGRMEGQRTW AGMVATSAMT ASTFETIYAT FARMRDHRTW AGMVAISAMT SATFETLYAAFARMDVKRTW CGLILTSAMS ASNYEMLYGT YSRIYVTGTL AGM.ATSAMT A.TFET.YA. F.RM...RTW
GDIVVGEMAWAATKASLAGT GDIVLGEMAWAATKAALAGA GDIVLGEMAWAATKASLAGT GDIVLGELVWAASKSVLAGT SDYLIGEVLWAAYEGVVAGT GDIV.GE..WAA.K..LAGT
150 GIGIVAAMLG YTHWLALLYA GIGVVAAALG YTQWLSLLYA AIGIVTATLA YSEWDSLIYV AIGIVAATLG YASWTSVLCA IVAVCTAFLG YIPGWSVIYI .IG.V.A.LGY..W.S..Y. 200
!,!'~iiii~!!~iii Nodjrhile LPVIAITGLA FASLGMVVTA LAPSYDYFIF YQTLVITPML FLSGAVFPVD
li~i~i~i~, ..............
:::::::::::::::::::::::::::::: ~.......................... :::::::::::::::::::::::::: ii~!i:!ii!!!G~
Yf!:~:!f~!.i!:
:'~iiiNi~Gi'::
=====================================
~.~ ::=_7:::::::: :::::::::::::::::::::
:.:.:.::::::::::::::::::::::::
......................
Nodjrhilt Nodjrhiga Nodjbraja Nodjazoca Consensus
Nodjrhile Nodjrhilt Nodjrhiga Nodjbraja Nodjazoca Consensus Nodjrhile Nodjrhilt Nodjrhiga Nodjbraja Nodjazoca Consensus
LPVIALTGLA FPVIALTGLA IPTIALTGLV LPDILFVALI .P.IA.TGL.
FASLGMVVTA LAPSYDYFIF FASLSMVVAALAPSYDYLVF FASLAMVVIS LAPTYDYFVF FSSTSLLVAAISRGYALFAF FASL.MVV.A LAP.YDYF.F
251 VSTALLRRRL LSTALLRRRL VSIALLRRRL ASIALFRRRL ISAKVICVRL .S.AL.RRRL
263 MR. LR. TQ. LR. DD. ..
YQTLVITPIL YQSLVITPML YQSLVLTPMV YQSIAIAPLV YQ.LVITP..
201 QLPVAFQQIA AFLPLAHSID LIRPTMLGQP IANVCLHIGV QLPIVFQTAARFLPLSHSID LIRPIMLGHPVVDVCQHVGA QLSPMLQRIT HLLPLAHSID LIRPAMLGHP VPDITLHLGA QMPDSFQHFA GLLPLAHSVD LIRPVMLERG ADNAALHVGA TGNDVISGMI HFSPLYRAVNDVRNVVYEGR GTQVGPLLLL Q ..... Q ..... LPL.HS.D LIRP.ML ......... H.G.
FLSGAVFPVD VLSGSVFPVE FLCGAVFPTS FLSGVIVPRF FLSG.VFP..
250 LCIYIVVPFL LCIYIVIPFF LCLYIVLPFF LCVYAVLPFF SLLYASVMVF LC.Y.V.PFF
Residues listed in the consensus sequence are present in at least 75 % of the aligned transporter sequences. Residues indicated in boldface type are also conserved in the other family of the ABC-2 transporter superfamily. Database accession numbers Nodjazoca Nodjbraja Nodjrhiga Nodjrhile Nodjrhilt
SWISSPR OT
PIR
EMBL/GENBANK
Q07757 P26025
$35008 $27497
P06755 P24144
B24400; S 10231 S08617
L 18897; G310298 J03685; G 152118 X87578 Y00548; G46213 X51411; G46244
References 1 Geelen, D. et al. (1993) Mol. Microbiol. 9, 145-154. 2 Reizer, J. et al. (1992) Protein Sci. 1, 1326-1332. a Vazquez, M. et al. (1993) Mol. Microbiol. 8, 369-377.
18~
ABC-2 polysaccharide exporter family Summary
..
Transporters of the ABC-2 polysaccharide exporter family 1, examples of which are the BEXB capsular polysaccharide exporters of Haemophilus influenzae a (Bexbhaein) and the polysialic acid transporter of Escherichia coli, KPSM 3 (Kpmlescco), mediate polysaccharide export. KPSM comprises a membrane transport complex for the transport of polysialic acid across the membrane with KPST; BEXB comprises a transport complex with BEXA. These transporters are only found in gram-negative bacteria. Statistical analysis of multiple amino acid sequence comparisons indicates that the ABC-2 polysaccharide exporter family, with the nodulation transporter family, comprise a superfamily which has been termed ABC-2i/. ! type transporters 4-6. These proteins are not homologous to any other families of ATP binding cassette (ABC) transporters, although their transport mechanism is believed to involve bound ATP and to be similar to that of other ABC transporters 2. Members of the ABC-2 polysaccharide exporter ":":>i!ii" family are predicted to contain six membrane-spanning helices by the hydropathy of their amino acid sequences. Many residues, including several long sequence motifs, are well conserved within the ABC-2 polysaccharide exporter family, including motifs unique to the family and signature motifs of the ABC-2 superfamily 4-6.
z
-
.
.
--..
. . . .
..
.
.
.
9.-
,.
.
:
.
..
!
-
i i.
-..
... :
......: . . : . .
Nomenclature, biological sources and substrates CODE
Bexlhaein
-:..
!
. ..-:. -:,
....
.
~
.
.
19C
DESCRIPTION [SYNONYMS]
Capsulepolysaccharide export inner-membrane protein [BEXB] Bex2haein Capsulepolysaccharide export inner-membrane protein [BEXB] Bex3haein Capsulepolysaccharide export inner-membrane protein [BEXB] Ctrcneime Capsulepolysaccharide export inner-membrane protein [BEXB,CTRC] Kpmlescco Polysialicacid transport protein [KPSM] Kpm2escco Polysialicacid transport protein [KPSM] Vexbsalti Vi polysaccharide export protein [VEXB]
ORGANISM [COMMON NAMES] Haemophilusinfluenzae
SUBSTRATE(S)
[gram-negativebacterium]
Capsule polysaccharide
Haemophilusinfluenzae [gram-negativebacterium]
Capsule polysaccharide
Haemophilusinfluenzae [gram-negativebacterium]
Capsule polysaccharide
Neisseriameningitidis [gram-negativebacterium]
Capsule polysaccharide
Escherichiacoli [gram-negative bacterium] Escherichiacoli [gram-negative bacterium] Salmonellatyphi [gram-negative bacterium]
Polysialic acid Polysialic acid Vi polysaccharide
Phylogenetic tree Proteins listed subsequently in italics are at least 90% identical to the paired ii~
genetic tree: Kpm2escco (Kpmlescco); Bex2baein, Bex3baein (Bexlhaein). Kpm!escco
ii;ii:i:Dii~i:,:~ :::~i:iii
,!I::: ~, ~,!i::~iiI
ii!i~i~!il?:;~i~:,i;:,i;~',i!
Bexlhaein
ii:i~!%i:,i~:i:i
i:;;ii~::~i~.i:':.ii.<;i:.:}ii i::i~ii-~i~ii:iii:J?:i::::i~ ~:iii!%i::i~::i~iiiii~!
iii!',
Ctrcneime
.-~,,~:~:,::::~::::::::::::::::::::::: :::::::::::::::::::::::::::::::~::
ii~'~i:i~ii':~i;'!;i!i!ili2:
!ii'!i!ii::!ii:~::i::,i!i!i::::i
:,:!ii}i'iii~i'~i!ii'j
#,ii% ::::::::::::::::::::::::::::::
Vexbsalti
Proposed orientation of KPM13 in the membrane The model is based on predictions of membrane-spanning regions and ~-helical content. The N-terminus of the protein is illustrated on the inside and is folded six times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75 % of the aligned transporters (see below} are shown.
191
OUTSIDE
iii!?ii:i:i:i!!: iil
E H
R G F
P
W il :i.i.......
Y
?:/!:;
ol
I
F
I H
FF ~ s
::::.7:::....
T-.:: ............ !
V
i!Ji:i~i i i ~L~ii~!q
G
!i:;il;!ii i !iil]!il
W
:#,:::~:~.:-~:i-.,::i:~]
I
I
AI
E
:::i:i~:<~~i ~i:!i!:~!]
M,
I
L
$
' Y:, LL'
LL
II!
L
33
i}:i!!:~:ii~ :ili:~!~!!!~i :::ii::it : i:~;I
I
::::::::::::::::::::::i , ~L"::.I
R"
SL
F
T D RV
K
COOH
:~:ii.~::/L::~;!I ':~ :: :::.i::.i]
ii~ii~ii!!:;:,i~i,t INSIDE
i i i i:i!i!ili!:!!
L
I
NH
2
Physical and genetic characteristics iiii!i:i!ii~ii;::~;!i i :,:.~::,: ...........
i !i i i i:~ii:i :,!~! ~
!!ii!'~ii!,:-i:i:!iil
Bexlhaein Bex2haein Bex3haein Ctrcneime Kpmlescco Kpm2escco Vexbsahi
i !:i ~i ~.!~i:.::ii: :,~.!ili :i i:,i; ij:jj~!;::!i!-!:!i!!/!~
192
CHROMOSOMAL LOCUS
265 265 265 265 258 258 264
30 181 30 108 30 195 30 168 29 557 29 561 30 429
capsulation locus capsulation locus capsulation locus capsulation locus polysialic acid gene cluster K5 antigen gene cluster Plasmid pGBM124
1
Kpmlescco Bexlhaein Ctrcneime Vexbsalti Consensus
........ MA R S G F E V Q K V T .MQYGDKTTF KQSLAIQGRV .MKALHKTSF WESLAIQRRV MNILKNNSYY FMKLITVCEL . . . . . . . . . . . . . L..Q...
Kpmlescco Bexlhaein Ctrcneime Vexbsalti Consensus
51 LLILLGILGY VMHRTMPDIS FPVFLLNGLI PFFIFSSISK TFFIVMMWKF IRADKFSTLN MIAFVMTGYP MAMMWRNASN TFVIVLMWKF LKADRYSTLN IVAFAITGYP MLMMWRNASK ISITVISFQY LNRSVPISTD DISFVIAGIL PYLLFRYTIT .... V . . . . . . . . . . . . . . . . . . F . . . G . . . . . . . . . . S.
..........:..:........... ....=========================
MOL. WT
Multiple amino acid sequence alignments
iii.i~:?:i:::~:ii:.i~::i::/;:!~ii ~:~...~:::~:~,:,::..,.:.,.
AMINO ACIDS
VEALFLREIR INALLMREII IGALLMREII IILLMSRDIK I.AL..REI.
50
TRFGKFRLGY LWAILEPSAH TRYGRQNIGF FWLFVEPLLM TRYGRNNIGF LWLFVEPLLM TRYNGNLLNY MMVLAVPLVW T R Y G .... G . . W . . . E P L . .
I00 RSIGAIEANQ RAIGSISANL RAVGSISSNA ATMRTHSFST R..G.IS.N.
:.:{{,:~:::):::::..:::~:,,4
..........
ii!:i'?!!!:i!h] ................... ..~
Kpmlescco Bexlhaein Ctrcneime Vexbsalti Consensus
i01 150 GLFNYRPVKP IDTIIARALL ETLIYVAVYI LLMLIVWMTG EYFEITNFLQ SLLYHRNVRV LDTIFTRVLL EVAGASIAQI LFMAILVMID WIDAPHDVFY SLLYHRNVRV LDTILARMIL EIAGATIAQI VIMAVLIAIG WIEMPADMFY SLAVVSQVKK RHVIFSLAAI EFVNAVIIYI IISLINFLIF SRWEAQKPFL SL...R.V...DTI..R..LE...A.I..I . . M . I . . . I . . . . . . . . . F.
Kpmlescco Bexlhaein Ctrcneime Vexbsalti Consensus
151 LVLTWSLLII LSCGVGLIFMVVGKTFPEMQ MLIAWFLMAM FAFGLGLIIC AIAQQFDVFG MLMAWLLMAF FAIGLGLVIC SIAFNFEPFG IFEGMVIAWL LGLSFGYFCD ALSERFPLVY .... W . L . . . . . . G . G L . . . . . . . . F ....
Kpmlescco Bexlhaein Ctrcneime Vexbsalti Consensus
201 HSIPKQYWSY LLWNPLVHVVELSREAVMPG YISEGVSLNY HNLPAQAQSI ALWFPMIHGT EMFRHGYFGD TVVTYESIGF HNLPPKVQEY ALMIPMVHGT EMFRAGYFGS DVITYENPWY NELPYSLLSI FSWNPLLHAN EIVREGMFEG YHSLYLEPFY H.LP . . . . . . . L W . P . . H . . E . . R . G . F . . . . . . Y .... Y
Kpmlescco Bexlhaein Ctrcneime Vexbsalti Consensus
251 266 IGLALYRTRE EAMLTS LGLVMVKNFS KGVEPQ FGLAMVSKFS KGVEPQ A G L I F H L I C D TENH.. .GL . . . . . . . . . . . . .
~-..~:~:-:,,.:,~,~:~-:!
~-,:: .:.:.,:.:.~:;... ~-::.~::-..:,~:::?~,.::,.,..!
._
~:-~:-:l
~:i,l{i,:{~:~:{~!! i~:,~i ~,:.:,< :~ :::..:.. .:.~ .......... :,..:~ . :.... :F ~.::::? :-:.:'i:.::.::.:~ ~...::?:..::: :~.:
i~:{{i~!:i'~:i!:1!:~:~, ._
11!71! _
~--..i~::::i~:.~::':~:::.:::: 3::::1
200 KVLPILLKPL YFISCIMFPL KIWGTLSFVL LPISGAFFFV KIWGTLTFVM MPLSGAFFFV KAVPVMLRPM FLISAVFYTA K .... L . . . . . . IS..FF.. 250 LAMFTLVTLF LVVSDLALLL IVLCNLVLLL PLAFSATLFL ..... L . L L L
!!GI
Residues listed in the consensus sequence are present in at least 75 % of the aligned transporter sequences. Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and are therefore not included in the aligmnent: Kpm2escco (Kpmlescco); Bex2haein, Bex3haein (Bexlhaein). Residues indicated in boldface type are i~!!:~Yi'~;!ii!:i,ii also conserved in the other family of the ABC-2 transporter superfamily. .._
Database accession numbers SWISSPR OT Bexlhaein P19390 Bex2haein P19391 Bex3haein P22235 Ctrcneime P32015 Kpmlescco P23889 Kpm2escco P24584 Vexbsalti P43109
PIR
S12234; BWHIXB S 15222 A42469 S12236
EMBL/GENBANK M33787; G148869 M33788; G148871 X54987; G45299 M57677; 50252 M57382; G146565 X53819; G41878 D 14156; G426450
References 1 2 a 4 :!i!i!i;i:
Frosch, M. et al. (1991)Mol. Microbiol. 5, 1251-1263. Kroll, J.S. and Moxon, E.R. (1990) J. Bacteriol. 172, 1374-1379. Pavelka, M.S. et al. (1991) J. Bacteriol. 173, 4603-4610. Geelen, D. et al. (1993) Mol. Microbiol. 9, 145-154. Reizer, J. et al. (1992) Protein Sei. 1, 1326-1332. Vazquez, M. et al. (1993) Mol. Microbiol. 8, 369-377.
192
ABC-2-associated (cytoplasmic) protein family Summary Most transporters of the ABC-2-associated (cytoplasmic)protein family, examples of which are the ATP binding protein NODI in Azorhizobium caulinodans ~ (Nodiazoca), and BEXA~ in Haemophilus influenzae (Bexahaein) and CTRD a in Neisseria meningitidis (Ctrdneime) export oligoor polysaccharides, spermidine, putrescine, teichoic acid and sulfate; the substrates of a few members are unknown. One member of the family, the TNRB2 protein ~ of Streptomyces longisporoflavus (Tnrbstrlo), confers resistance to the antibiotic tetronasin. NODI is involved in the nodulation process by an unknown mechanism. These transporters are only found in prokaryotes, but are found in both gram-positive and gram-negative bacteria. Statistical analysis of multiple amino acid sequence comparisons places the ABC-2-associated (cytoplasmic)protein family in the ATP binding cassette (ABC) superfamily s. Proteins in this transporter superfamily use the energy of ATP hydrolysis to pump substrates across cell membranes. The family is characterized by their cytoplasmic ATP binding domains s, which exist as separate chains and are described in the following tables. In many, but not all, of these proteins the ATP binding domains are associated with transmembrane proteins of the ABC-2 superfamily~: NODI is associated with NODJ, KPST with KPSM and BEXA with BEXB. In other proteins in this family, such as ABCA from Aeromonas salmonicida 7 (Abcaaersa), the associated transmembrane protein is unknown. Transmembrane proteins in the ABC-2 superfamily, which includes the nodulation protein family and the polysaccharide export family, are predicted to contain six membranespanning helices by the hydropathy of their amino acid sequences. A small number of amino acids, scattered throughout the whole length of the sequences, are conserved within the ABC-2 family, including motifs unique to the family, signature motifs of the ABC superfamily, and motifs necessary for function by the criterion of site-directed mutagenesis.
Nomenclature, biological sources and substrates
:::::::::::::::::::::::::::::
...,................................
CODE
DESCRIPTION [SYNONYMS]
Abcaaersa
Abca protein [ABCA]
.:::.......::....:...::::.: ..................
:::::::::::::::::::::::::::::::: .....................................
OR GANISM [COMMON NAMES] Aeromonas salmonicida
SUBSTRATE(S) [RESISTANCE]~
Unknown
[gram-negative bacterium] Bexahaein !iNfi!!:~!i:!!
Ctrdneime Glyustrli
...................
Kst1escco Kst5escco
~.:~;.~,,. ..................... .................. ..................
Natabacsu .
.
.
.
.
.
.
.
.
.
.
.
.
.
IBEXAI
Capsulepolysaccharide export ATP binding protein C [CTRD] tRNA-GlyU
Haemophilus mfluenzae
[gram-negative bacterium] Neisseria meningitidis
[gram-negative bacterium] Streptomyces lividans
Capsular polysaccharide Capsular polysaccharide U
n
k
n
o
w
n
[gram-negative bacterium]
.;:.;~:.;:.'~~:G:i~;:;:.',:~
.
ATP binding protein
.
194
.
Polysialicacid transport ATP binding protein [KPST] Polysialicacid transport ATP binding protein IKPST] ATP binding transport protein [NATAl
Escherichia cold
Polysialic acid
[gram-negative bacterium] Escherichia coh"
Polysialic acid
[gram-negative bacterium] Bacillus subtilis
[gram-positive bacterium]
Unknown
CODE
DESCRIPTION [SYNONYMS]
Nodescco
Nodulation protein homolog Nodulation ATP binding protein I [NODI]
Nodiazoca
ORGANISM [COMMON NAMES] Escherichia cold
SUBSTRATE(S)
[gram-negative bacterium] Azorhizobium caulinodans
Modified oligosaccharide
[gram-negative bacterium] Nodibraja
Nodulation ATP binding protein I [NODI]
Bradyrhizobium japonicum
Nodirhiga
Nodulation ATP binding protein I [NODI] Nodulation ATP binding protein I [NODI] Nodulation ATP binding protein I [NODI]
Rhizobium galegae
Modified oligosaccharide
[gram-negative bacterium] Nodirhilo Nodirhile
[gram-negative bacterium] Rhizobium loti
[gram-negative bacterium] Rhizobium leguminosamm
Modified oligosaccharide Modified oligosaccharide Modified oligosaccharide
[gram-negative bacterium] Nodirhime Nosfpsest Potgescco Sppaescco Sulisynsp Taghbacsu Tnrbstrlo
Nodulation protein [NODI] Copper transport ATP binding protein [NOSF] Putrescine transport ATP binding protein [POTG] Spermidine/putrescme transport protein A Sulfate transport protein
Rhizobium meldloti
[gram-negative bacterium] Pseudomonas stutzeri
Modified oligosaccharide Copper
[gram-negative bacterium] Escherichia cold
Putrescine
[gram-negative bacterium] Escherichia cold
[gram-negative bacterium] Synechococcus sp. [cyanobacterium]
Teichoic acid translocation Bacillus subtilis ATP binding protein [gram-negative bacterium] [TAGH] TnrB2 protein Streptomyces
Spermidine putrescine Sulfate Teichoic acid [Tetronasml
longisporoflavus
[gram-negative bacterium] Vexcsalti
Vi polysaccharide export ATP binding protein
Salmonella typhi
[gram-negative bacterium]
IvExc] a Presumed substrates; protein corders resistance to specified compounds.
Vi polysaccharide
.........
Phylogenetic tree
.....
Proteins listed s u b s e q u e n t l y in italics are at least 90% identical to the paired transporters listed in parenthesis and are therefore not included in the p h y l o g e n e t i c tree: N o d i r h i m e (Nodirhilo).
,~-~.:: :...~...-.,
" ::i: il <. .-. L~ ~ :. ::;5::? : ": :..:
Bexahaein
...
Ctrdneime
< .::ii.i /(::.i:.: .......
+ . .
Kstlescco
.:. ........
.
tI
:. ........... < :::::. : :~-:-9 - :.::.: : :::::?:;':;: 7:.. !::
KstSescco
,
Abcaaersa Taghbacsu ".excsalti Potgescco
:L:::-;::: ...../::'::':-" :, .:vv:::.- .: .--
Sulfsynsp Sppaescco ............
Nosfpsest
...:.
Tnrbstrlo Giyustr!i Nodescco
<.i~:< . <:i. ..:-.:::
Nodirhilo
: ~:..~.~.~ :... :..~,
:! ::" :;i::::<: i:; -if:
Nodibraja Nodirhiga
~ :::::::1%- :::::-:.:
Nodirhile ~atabacsu Nodiazoca
Physical and genetic characteristics
Ii ! :,:;::i::::::+:::ii.:.i:
..... :.::::.::.... : ..... ::::::::::::::::::::::: !,-::?:i:
........... ,.....
:..:..:.:::.:...::.
....
::::::::::::::::::::::: :;.::~::~:? ~.:~::::,.:~:.:~:~.7, ~ .........
i:iiiii-'i!i
::::::::::::::::::::::::::::::: ::::..:~;:4:
19(
Abcaaersa Bexahaein Ctrdneime Glyustrli Kstlescco Kst5escco Natabacsu Nodescco Nodiazoca Nodibraja Nodirhiga Nodirhilo Nodirhile Nodirhime Nosfpsest Potgescco Sppaescco Sulfsynsp
AMINO ACIDS 308 217 216 326 219 224 246 308 320 306 347 339 311 339 308 404 3 78 344
MOL. WT
34 015 24 746 24 595 34 954 24 939 25 481 27 878 34 647 35 310 3412 7 38 435 3 7 264 34 300 3 7 264 33 777 44 784 43 028 38 476
CHROMOSOMAL LOCUS
cap kps gene cluster 2.4-4.1 minutes nod operon nod gene cluster nod box Plasmid pRL 1JI 19 minutes 15 minutes
AMINO ACIDS 527 300 246
Taghbacsu Tnrbstrlo Vexcsahi
MOL. W T 59243 32278 27 368
CHROMOSOMAL LOCUS 310 ~ Plasmid pGBM124
Multiple amino acid sequence alignments 1 Bexahaein iii~!iiii!~:!%iii~!~i C t r d n e i m e Kstlescco Kst5escco Abcaaersa Taghbacsu Vexcsalti ~i~!!!~!!!i~!!i!~i!!!'i!i:!!P o t g e s c c o Sulfsynsp Sppaescco :~:!~ii:~ilj!!:i~ii!:!N o s f p s e s t
.........
,. . ,. . . . .= , , .....
Tnrbstrlo Glyustrli Nodescco Nodirhilo Nodibraja Nodirhiga Nodirhile Natabacsu Nodiazoca
!:.%i;?.:!i.~i)i~:;i:~ C o n s e n s u s
50
............................................... MIR ............................................... MIS ............................................... MIK ............................................... MIK ........................................... MSEPVLA ............................................ MKLKVS ................................................ MT ..MPEGRTTP AGNSHHYGAL AHIQCRRAAV NDAIPRPQAK TRKALTPLLE ........................................ MPKDKAVGIQ ............................... MGQSKKLNK QPSSLSPLVQ .............. MNAVE
.................................................. ....................................... M QTNEHEHVIE ............................................ MTIALE ........ MK RKLGPEELRR LETPAIERES HGQTSAKSSV PDSASTVAVD ......................................... MNMSNMAID MGENMEREML RPKTIAMDQN SASARSNPER EIKTGRLEPA SNSAPTMAID .................................... MDSP SGSLSPVAID ............................................... MIT .................................. MMLMRE SPDDRHYTVS ..................................................
51 i00 BexahaeinVNNVCKKYH ................ TNSGW KTVLKNINFE LQKGEKIGIL Ctrdneime VEHVSKQYQ ................ MRGGM RTVLDDINFS LQKGEKVGIL Kstlescco IENLTKSYR ................ TPTGR HYVFKNLNII FPKGYNIALI Kst5escco IENLTKSYR ................ TPTGR HYVFKDLNIE IPSGKSVAFI Abcaaersa VSGVNKSFPI YRSPWQALWH ALNPKADVKV FQALRDIELT VYRGETIGIV Taghbacsu FRNVSKQYHL YKKQSDKIKG LFFP.AKDNG FFAVRNVSFD VYEGETIGFV Vexcsalti LKITDFSFRR DAFVFGLLGC TRYFESDKGP RVVLDKTDFV MGYHEHIGIL Potgescr IRNLTKSYD ................ GQ .... HAVDDVSLT IYKGEIFALL SF . . . . Q A V K D V D L T V E T G S L V A L L !::i[,'i!i~!:,!!!i~i:;i~,?:;,'!;. S u l f s y n s p V S Q V S K Q F G . . . . . . . . . . . . . . . . Sppaescco LAGIRKCFD ................ GK .... EVIPQLDLT INNGEFLTLL Nosfpsest IQGVSQRYG ................ SM . . . . T V L H D L N L N L G E G E V L G L F i!ii!:i~)ii!~i!:!?ii~:i~ T n r b s t r l o M A G L H K S F G . . . . . . . . . . . . . . . . RT .... HALDGLDLA VDSGEVHGFL Glyustrli VTDLRRVYG ................ G...G FEAVRGIDFS VRRGEVFALL LQQLKKTYP ................ G...G VQALRGIDLQ VEAGDFYALL i!:ii:,i!!:.i~:i!~.i!:,i)!; N o d e s c c o Nodirhilo FAGVTKSYG ................ NK .... IVVDELSFS VASGECFGLL Nodibraja LVGVRKSFG ................ DK . . . . V I V N D L S F S V A R G E C F G L L Nodirhiga LQAVTMIYR ................ DK .... TVVDSLSFG VRAGECFGLL Nodirhile LAGVSKSYG ................ GK .... IVVNDLSFT IAAGECFGLL Natabacsu LTDCSRRFQ ................ DKKKV VKAVRDVSLT IEKGEVVGIL ;~}ii:~!ii:::.i%?! Nodiazoca ATGVWK .................... KRGG IDVLRGLDMY VRRGERYGIM
iiii?ii!i!i,!i
iiii'=i!::iiii!. ,,iii:ii!ii!i ================================
Consensus
..... K . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
GE .....
:.-:}.
. :./:.
-!. ::- "..:::.. ..... ... :.: . . . .
9
::r"::
-
......
's
:.-:.-..
: . . . .
......
....... -. s ;.. ...........
:- .
::: :.- :.!.:::. -~::~....
....
:i
.:.,:!-.('i:; "i:t .
::.3";!-".'!':? .
.
.... ::.....
::::::::.:y{:
::4::.:;? ::{i: .-i-
.......... :i/::ql} .i!~;!=. :!;!.
r
ili}~::;
i01 Bexahaein GRNGAGKSTL IRLMSGVEPP Ctrdneime GRNGAGKSTL VRLISGVEPP Kstlescco GQNGAGKSTL LRIIGGIDRP Kst5escco GRNGAGKSTL LRMIGGIDRP Abcaaersa GHNGAGKSTL LQLITGVMQP Taghbacsu GINGSGKSTM SNLLAKIIPP V e x c s a l t i A A P G S G K T T L TRLLCGLDAP Potgescco GASGCGKSTL LRMLAGFEQP Sulfsynsp GPSGSGKSTL LRLIAGLEQP Sppaescco GPSGCGKTTV LRLIAGLETV Nosfpsest GHNGAGKTTS MKLILGLLSP Tnrbstrlo GPNGAGKSTT IRVLLGLLRA Glyustrli GTNGAGKTST VELLEGLAAP Nodescco GPNGAGKSTT IGIISSLVNK Nodirhilo GPNGAGKSTI ARMLLGMTCP Nodibraja GPNGAGKSTI ARMLLGMISP Nodirhiga GPNGAGKSTI TRMLLGMATP Nodirhile GPNGAGKSTI TRMILGMTSP Natabacsu GENGAGKTTM LRMIASLLEP Nodiazoca GTNGAGKSSL INIILGLTPP C o n s e n s u s G . N G A G K S T ...... G...P
. . . . . ................
Bexahaein Ctrdneime Kstlescco Kst5escco !i@;:~;Si Abcaaersa Taghbacsu !??~:q!i'~/.}!!;!~i/ Vexcsalti iiii~!i~;i:i;;.;i!;:!'~i:i Potgescco Sulfsynsp Sppaescco Nosfpsest Tnrbstrlo !i;il/!i!/!}//! Glyustrli Nodescco Nodirhilo Nodibraja Nodirhiga Nodirhile Natabacsu Nodiazoca Consensus ,
.:..........
i.:.i:!-~!-~::i :./:i-:i@:..
..........
ii!i.ii i ii! ,r..::-::::. ..........
i!B/:ii!i:i:i4i:.
i!iiii .............
iii@!;@!}!: i!i:.iii ..............
tii~;i$iii!.ii!!ig ............
B!88@i~ii I~i@Si~i@
Bexahaein Ctrdneime Kstlescco Kst5escco Abcaaersa Taghbacsu Vexcsalti
150 TSGTIERSM . . . . . . . . . . . . SISWPLAFS TSGEIKRTM . . . . . . . . . . . . SISWPLAFS DSGNIITEH . . . . . . . . . . . . KISWPVGLA DSGKIITNK . . . . . . . . . . . . TISWPVGLA DCGQITRTG RVVGLLELG TSGEIEMNG . . . . . . . . . . . . Q.PSLIAIA DEGDFIGLR . . . . . . . . . . . . GDALPLGAN SAGQIMLDGV DLSQVPPYLR PINMMF..QS DSGRIFLTGR DATNESVRDR QIGFVF..QH DSGRIMLDNE DITHVPAENR YVNTVF..QS SEGQVKVLGR AP..NDPQVR RQLGYL.PEN DSGAPSSSAR TPWKDAVALH RRLAYV.PGD AGGRVRVLGH DPYTERAAVR PRTGVM.LQE TSGRVSVFGY DLEKDVVNAK RQLGLV.PQE DAGAITVLGV PVPARARLAR RGIGVV.PQF DRGKITVLDE PVPSRARAAR VAVGVV.PQF SAGKISVLGL PVPGKARLAR A S I G W . S Q F SVGKITVLGA QEPGQVRLAR AKIGIV.SQF SQGVITVDGF DTVKQPAEVK QRIGVLFGGE DRGTVTVFGK DMRRQGHLAR ARIGVV.PQD ...G . . . . . . . . . . . . . . . . . . . . . . . . . .
151 GAFQGSLTGM DNLRFICRLY DV.DPDYVTR ..FTKEFSEL GAFQGSLTGM DNLRFICRIY NV.DIDYVKA ..FTEEFSEL GGFQGSLTGR ENVKFVARLY AK.RDELNER VDFVEEFSEL GGFQGSLTGR ENVKFVARLY AK.QEELKEK IEFVEEFAEL SGFNPEFTGR ENIFFNGAIL GMSQREMDDR LERILSFAAI AGLNNQLTGR DNVRLKCLMM GLTNKEIDDM YDSIVEFAEI SFILPGLTGE ENARMMASLY G L D G D E F S . . . H F C Y Q L T Q L YALFPHMTVE QNIAFGLKQD K L P K A E I A S R V N E M L G L V H M YALFKHLTVR KNIAFGLELR KHTKEKVRAR VEELLELVQL YALFPHMTVF ENVAFGLRMQ KTPAAEITPR VMEALRMVQL VTFYPQLSGR ETLRHFARLK GAALTQ .... VDELLEQVGL VELWPNLTGG EAIDLLSRLR GGLDRQ...R RDELIERFDL GGFPSELTVA ETARMWAGCV SGARPPAEV ..... LALVGL FNFNPFETVQ QIVVNQAGYY GVERKEAYIR SEKYLKQLDL DNLDQELTVR ENLLVFGRYF GMSTRQSEAV IPSLLEFARL DNLEPEFTVR ENLLVFGRYF GMSARTIEAV VPSLLEFARL DNLDMEFTVR ENLLVFGRYF QMSTRAIEKL IPSLLEFAQL DNLDLEFTVR ENLLVYGRYF RMSTREIETV IPSLLEFARL TGLYDRMTAK ENLQYFGRLY GLNRHEIKAR IEDLSKRFGM DCLEQNMTPY ENVMLYGRLC RMTASEARKR ADQIFEKFSM ....... T . . . N . . . . . . . . . . . . . . . . . . . . . . . . . . . L 201 YSSGMKARLA YSSGMKARLA YSSGMRSRLA YSSGMRSRLG YSSGMMVRLA YSSGMKSRLG YSVTMKTHLA
FALSLSVEFD FALSLAVEFD FGLSMAFKFD FGLSMAFKFD FSVIINTDPD FAISVHIDPD FAINLLLPCR
CYLIDEVIAV CYLIDEVIAV YYLIDEITAV YYIVDEVTAV VLIIDEALAV ILIIDEALSV LYIADGKLYT
200 GDYLYEPVKK GQYLYEPVKR GKYFDMPIKT GKYFDMPIKT GDFIDQPVKN GDFINQPVKN EQCYTDRVSE QEFAKRKPHQ TGLGDRYPSQ ETFAQRKPHQ AHAADRRVKT ..DPTKKGRA EAKSGTRVKQ WGKRNERARM ERKANARVSE ESKADVRVSL EAKADVRVSD ESKANTRVAD RDYMNRRVGG MDCANRPVRL ..........
250 GDSRFAEKCK YELFE.KRK. GDSRFADKCK YELFE.KRK. GDAKFKKKCS DIFDK.IRE. GDARFKEKCA QLFKE.RHK. GDDAFQRKCY ARLKQLQSQ. GDQTFYQKCV DRINEFKKQ. GDNATQLRMQAAL.ACQLQ.
Potgescco Sulfsynsp Sppaescco Nosfpsest Tnrbstrlo Glyustrli Nodescco Nodirhilo Nodibraja Nodirhiga Nodirhile Natabacsu Nodiazoca Consensus
LSGGQRQRVA LARSLAKRPK LLLLDEPMGA LDKKLRDRMQ L E W D I L E R V LSGGQRQRVA LARALAVQPQ VLLLDEPFGA LDAKVRKDLR SWLRKLHDEV LSGGQQQRVA IARAVVNKPR LLLLDESLSA LDYKLRKQMQ NELKALQRKL YSKGMRQRLG LAQALLGEPR LLLLDEPTVG LDPIATQDLY LLIDRLRQR. YSKGNRQKVP IVAALASDAE LLLLDEPTAG LDPLMEVVFQ DVILRAKAA. LSGGQRRRLD LALALLGDPE VLFLDEPSTG LDAEGRRDTW ELVGALRDQ. LSGGMKRRLM IARALMHEPK LLILDEPTAG VDIELRRSMW GFLKDLNDK. LSGGMKRCLT MARALINDPQ LIVMDEPTTG LDPHARHLIW ERLRALLAR. LSGGMKRRLT LARALINDPH LLVMDEPTTG LDPHARHLIW ERLRALLAR. LSGGMKRRLT LARALVNDPQ LLILDEPTTG LDPPARHQIW ERLRSLLIR. LSGGMKRRLT LAGALINDPQ LLILDEPTTG LDPHARHLIW ERLRSLLAR. FSKGMRQKVA IARALIHDPD IILFDEPTTG LDITSSNIFR EFIQQLKRE. LSGGMRRLTM VARALVNDPY VIILDEATVG LDAKSRSALW QQIEASNAT. .S.G ....... A ............ DE ..... D ..................
Bexahaein Ctrdneime Kstlescco Kst5escco Abcaaersa Taghbacsu Vexcsalti Potgescco Sulfsynsp Sppaescco Nosfpsest Tnrbstrlo Glyustrli Nodescco Nodirhilo Nodibraja Nodirhiga Nodirhile Natabacsu Nodiazoca Consensus
251 300 DRSIILVSHS PSAMKSYCDN A W L E N G I M H HF.EDMDKAY QYYNETQK.. DRSIILVSHS H S A M K Q Y C D N A M V L E K G H M Y QF.EDMDKAY EYYNSLP... KSHLIMVSHS ERALKEYCDV AIYLNKEGQG KFYKNVTEAI ADYK..KDL. ESSFLMVSHS LNSLKEFCDV AIVFKNSYII GYYENVQSGI DEYKMYQDLD GVTILLVSHA AGSVIELCDR AVLLDRGEV. LLQGEPKAVVHNYHKLLH.. GKTIFFVSHS IGQIEKMCDR VAWMHYGEL. RMFDETKTVVKEYKAFIDWF QKGLIVLTHN PRLIKEHCHA FGVLLHGKI. TMCEDLAQAT ALFEQYQSNQ GVTCVMVTHD QEEAMTMAGR IAIMNRGKFV QIGEPEEIYE HP.TTRYSAE HVTTVFVTHD QEEAMEVADQ IVVMNHGKVE QIGSPAEIYD NP.ATPFVMS GITFVFVTHD QEEALTMSDR IVVMRDGRIE QDGTPREIYE EP.KNLFVAG GTSIILCSHV L P G V E A H I N R A A I L A K G C L Q AVGSLSQLRA E...AGLPVR GKTVLLSSHI LAQVEKLCDR VSIIRKGRNV QSGTLTEMRH L...TRTTVE GTTVLLTTHY LEEAEHLADR L A I M H D G R . I A A T G T P A E V T A A Q P S H I S F E GTTIILTTHY LEEAEMLCRN IGIIQHGELV ENTSMKALLA KLKSETFILD GKTIILTTHF MEEAERLCDR LCVLEKGRNI AEGGPQALID EHIGCQVM.. GKTILLTTHF MEEAERLCDR LCVLESGCKI AEGKPDALID EHIGCNVI.. GKTILLTTHM MDEAERMCDR LCVLEGGRMI AEGPPLSLIE DIIGCPVI GKTILLTTHI MEEAERLCDR LCVLEAGRKI AEGRPHALIE EQIGCPVI.. QKTILFSSHI MEEVQALCDS VIMIHSGEVI YRGALESLYE SERSEDLN.. GATVLVISHL AEDLERMTDR IACICAGTVR AEWETAALLKALGALKIL.. ........ H ......... D ....... G .......................
Bexahaein Ctrdneime Kstlescco ::::::::::::::::::::::::::::::::::::::::::: Kst5escco ~:ii !:(ii i,:::ii i:::i!i?!,,~:i Abcaaersa Taghbacsu Vexcsalti Potgescco Sulfsynsp Sppaescco Nosfpsest Tnrbstrlo Glyustrli @!ii:,ii!i}!~:;!i: Nodescco Nodirhilo
301 350 .................................................. .................................................. .................................................. IE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..MEGDERAR FRYHLRQTGR GD..SYISDE STSEPKIKSA PGILSVDLQP NKLSKKEKET YKKEQTEERK KEDPEAFARF RKKKKKPKSLANAIQIAILS ATIQTEDYSF DI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FIGSVNVFEG VLKERQEDGL VLDSPGLVHP L K V D A D A S W D N V P V H V A L R FIGPVNVL ..... PNSSH IFQAGGL... DTPHPEVFLR FIGEINMFNATVIERLDEQRV.RANVEGRE CNiY~FAVEPGQKLHVLLR IRAS.GISER DSWLQRWTDA GHSARGLSES SIEVVAVNGH KLVLLRQLLG AETERAVTGL DAM ...... A GVHAVRVTER RVH.FAVDGA HLDAAIHRLG LPDGYFVGDL PPLAELGVSD HETDGRTVK. L R T R E L Q R A A T G L L V W A A Q A LAPK...SPL PKLDGYQYRL VDTATLEVEV LREQGINSVF TQL .... SEQ ...EIYGGDP HELLSL .... VKPHSQRIEV SGETLYCYAP DPDQVRTQLQ
.....
........... i:::::"::.
....
~-:~.::; .i:... " (:.i~!:
~-:-:/:~. :. -.i--:: :
.... iii(:!.:i" (%::
.....
.:,:-: :..t:,.: .. j:i: :; ::.:i iiiii-:;: ::
-1i ::/~;:-:/.-:.
.:.~,~:;.:::.: .~:.:..~ ....
;i, ~:::.:.:.:.~:::.:~:?~.: :::i..~. ~~:!..~::::.~!i::i?::i:?: :........:.... ........ ........ ....
::::::::::::::::::::::::-:.
.::3::::-.,-..-,:::~: ~:~ :~ :,.::~:~..:.. :~:~::...::-..
.:~.;.: :::~/-.:.:~ ..;.:.
::::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::
i~!i~ii=i::~!i!ii~:::ii:.i!:
....... ...:.:. .............. :.
...........
.~::.~:..:~:::,,;~;:,:..:::~::: :~;~.~..:::~::~:::~s ..................... ..::-::
~!:,~:: ============================= ~,
:::::::::::::::::::::::::::::: .................. ................. ................
19c~
Nodibraja ...EIYGGDL DQLREL .... IRPYARHIEV SGETLFCYAR CPDEISVHLR N o d i r h i g a ...EVYGGNP DELSLI .... V R P H V D R I E T SGETLFCYTV N S D Q V R A K L R Nodirhile EIYGGDP QELSLL IRPNARRLEI SGETLFCYTP DPEQVRAQLR i i i-~!!ii-!ii'!!!:-:Natabacsu . . . Y I F . . . . . . M S K L . . . . V R G I S . . . . . . . . . . . . . . . . . . . . . . . . . Nodiazoca ...EIDHSVS PRGKEL .... FCNHGLHVHE NDGRLSAVHP SSDFALEAVL Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ::::::::::::::::::::::::,
:.~:~i/:.:ii~i/i~?::i:/Bexahaein
!i':!i:ii:i::! iliili:~
ili!!i:..:i:..ii::z!i~i~! :
if::.i!i
351
400
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Ctrdneime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kstlescco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kst5escco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A b c a a e r s a Q S T ~ Y E S K G A V L S D V H IE S F Taghbacsu ILTVFMAGTM FFNAPLRTIA SFGAIPQNEV KNHHGDAKGK SEERLTAINK Vexcsalti . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Potgescr PEKIMLCEEP PANGCNFAVG EVIHIAYLGD LSVYHVRLKS GQ.MISAQLQ Sulfsynsp PHDIEIAIDP IPETVPARID RIVHLGWEVQ A...EVRLED GQ.VLVAHLP Sppaescco PEDLRVEEIN DDNHAEGLIG YVRERNYKGM TLESVVELEN GKMVMVSEFF Nosfpsest EGEPEDIEIH QPSLEDL.YR YYMERAGDVR AQEGRL . . . . . . . . . . . . . . Tnrbstrlo EFGIRSLTSH PPTLEELMLR HYGDELAAGA GGNGAAR . . . . . . . . . . . . . Glyustrli AVELRRLDVR SASLEEAFLS IAKQVSARSD GTATDTTEKE YAA . . . . . . . Nodescco G I Q V L S M R N K A N R L E E L F V S L . . . V N E K Q G D R A . . . . . . . . . . . . . . . . . Nodirhilo GRAGLRLLLR PANLEDVF .... LRLTGREM EE Nodibraja GRTDLRVLQR PPNLEDVF .... LRLTGREM EK . . . . . . . . . . . . . . . . . . N o d i r h i g a EFPSLRLLER PANLEDVF .... LRLTGREM EK Nodirhile AYSNLRLLER PPNLEDVF .... LRLTGREM EK . . . . . . . . . . . . . . . . . . Natabacsu . . . . . . . . . . . . . . . . . NodiazocaKKEAVPTTTRFATLDDIIRFPGFPWTGVGG~GEK Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
i:iii'ii!i!i:. iii!:
?i.!ii~:iii:.
Taghbacsu Vexcsalti Potgescco Sulfsynsp Sppaescco Consensus
i;:!i:'i;~iiiii: iiii~i:
i!ii:i.i!il
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
401 450 QGFIANEKAAAYKDQGLKQK ADVTLPFGTK VTVAAKGKQA AKIKFDGHSY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
NAHRHRKGLP T ...... WGD EVRLCWEVDS CVVLTV . . . . . . . . . . . . . . RDRYRDLQLE PEQQVFVRPK QAR.SFPLNY SI NEDDPDFDHS LDQKMAINWVESWEVVLADE EHK . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
451 500 Taghbacsu YVKQSAVATN MKHAELHATA FTSYVSQNAA SSYEYFLKFL GDSSTSIQSK
:~i!i!ii:~!~iiii.:i!!iTaghbacsu !-:i,
501 550 L N G Y T E G N K A DGRKTLNFDY EKISYVLEND KATELIFHNI SPINPASLSL
551 587 Taghbacsu SDSDVLYDSS KKRFLVNTDD QVFAVDNEEH TLTLMLK
Residues listed in the consensus sequence are present in at least 75 % of the aligned transporter sequences. Proteins listed subsequently in italics are at least 90% identical to the paired transporters hsted in parenthesis and are therefore not included in the alignment: Nodirhm~e {Nodirhilo}. Residues indicated in boldface type are also conserved in at least one other family of the ABC transporter superfamily.
'_0(
Database accession numbers !ii::::.!?;.:!ii-i=.:.i: !~iiii:i~
Abcaaersa Bexahaein Ctrdneime Glyustrli Kst 1escco ii~i~ii;i',?!/~,iKst5escco Natabacsu .... ..= ::: ->: Nodescco Nodiazoca =:ii:/.::-::~:::::::::7-: Nodibraja Nodirhiga ::::::::::::::::::::::::::::::::::::: ::i Nodirhilo Nodirhile Nodirhime :/!!:i:~!!i:!;i~ig:i~i' Nosfpsest Potgescco Sppaescco Sulfsynsp Taghbacsu Tnrbstrlo i!i!ii:ii:ii;i!~::il;:!!-:iil Vexcsalti L~i.~~i!i!i!~ii:~L :i~!!!i~i:!i~l!i!:i~:-i~il
ii ii:iiiiiii::.:iiii~.:ii~i-:i
::::::::::::::::::::::::....
SWISSPROT Q07698 P10640 P32016
PIR A36918 A28781; BVHIXA S 15223
P23888 P24586 P46903
S 13590 S 1223 7
Q07756 P26050
$45204 $35007 $27496
P23703 P08720 P 19844 P31134
L18897; G310297 J03685; G152117 X87578 X55705; G581510 Y00548; G46214
S 13590 S 13584 B45313 A40840 GRYCS7
X53676; G45849 M93239; G147335 M64519
$34187
U13832; G755153 X73633 D14156; G475034
P42954 P43110
EMBL/GENBANK L 11870; G304013 M19995; G148867 M57677; G150253 X65556 M57381; G 146567 X53819; G418 79 U30873; G973330
References 1 2 a 4 s 6 7
Geelen, D. et al. (1993) Mol. Microbiol. 9, 145-154. Kroll, J.S. et al. (1988)Cell 53, 347-356. Frosch, M. et al. (1991) Mol. Microbiol. 5, 1251-1263. Linton, K.J. et al. (1994)Mol. Microbiol. 11,777-785. Higgins, C.F. (1992) Annu. Rev. Cell Biol. 8, 67-113. Reizer, J. et al. {1992) Protein Sci. 1, 1326-1332. Chu, S. and Trust, T.J. (1993)J. Bacteriol. 175, 3105-3114.
~_01
This Page Intentionally Left Blank
ABC Binding Protein-Dependent Transporters" Transmembrane Elements
ABC-associated binding protein-dependent maltose transporter family Summary ...:%
::~:~
Transporters of the ABC-associated binding protein-dependent maltose transporter family, the example of which is the MALG maltose permease of Escherichia coli (Malgescco), mediate uptake of maltose, starch degradation products and glycerol-3-phosphate. Members of the family are found in both gram-negative and gram-positive bacteria. Members of the ABC-associated binding protein-dependent maltose transporter family are associated with cytoplasmic elements of the ATP binding cassette (ABC) superfamily 1. Proteins in this superfamily use the energy of ATP hydrolysis to pump substrates across cell membranes. Two cytoplasmic chains containing the ATP binding domains associate with two transmembrane domains of the ABC-associated binding protein-dependent :i!.!!:??!!~-;ii~maltose .! transporter family to form the active complex. Statistical analysis of multiple amino acid sequence comparisons suggests ;121:ii!:i~.?,-:i that the ABC-associated binding protein-dependent maltose transporter family is most closely related to the ABC-associated binding proteindependent peptide transporter family. Members of the ABC-associated binding protein-dependent maltose transporter family are predicted to form six membrane-spanning helices by the hydropathy of their amino acid sequences and activities of reporter gene fusions z. Several amino acid sequence motifs are highly conserved in the ABC-associated binding proteindependent maltose transporter family that are necessary for function by the criterion of site-directed mutagenesis. :. :. ::.-~::, .. ~::
..
.
~:.:.--~.;:..:::. :: ;:~:~
.......::,:
:.:. :.
i!i!ii iiiii!;i:i
~.:::..:.~:. ~::~:, ::., ...
Nomenclature, biological sources and substrates .:.:::.:...::.:: :..
COD~
::.: ::.::.:2..::"
Amyctheth !i!~ii:~??:Ji: .... ~:~:~:~::~:--.--.!..
.... .:.:.. ::;~:.::::..- :,:::1 :v.~::.-:...:
~..:....~:::...:~:.:~.:, .:: :,
..
iiiii~:~i;!!~i!&:, f.:i:~i .......................... .:.~,
!04
OESCmPT~ON [SYNONYMS]
ORGANISM [COMMON NAMaS] Starchdegradation Thermoanaerobacter products transport system thermosulfurogenes
permease [AMYC] Cymgkleox Starchutilization protein [CYMG] Maldstrpn Maltodextrintransport system permease [MALD] Malgescco Maltodextrintransport system permease [MALG] Malgsalty Maltodextrintransport system permease [MALG] Malgentae Maltodextrintransport system permease [MALG] Ugpeescco sn-Glycerol-3-phosphate transport s y s t e m permease [UGPE]
SUBSTRATa(S)
Starch degradation products
[gram-positive bacterium] Starch degradation [gram-negative bacterium] products Streptococcuspneumoniae Maltodextrin [gram-positivebacterium] Escherichiacoli Maltose [gram-negativebacterium] Salmonella Maltose Klebsiellaoxytoca
typhimurium
[gram-negative bacterium] Enterobacteraerogenes
Maltodextrin
[gram-negativebacterium] sn-Glycerol-3[gram-negative bacterium] phosphate
Escherichiacoli
Phylogenetic
tree
....... ......................
~...~ ..~ ..: .~.:+..~...
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the i:~:,:!i~i+,!.i:+:;,:j~:i!,~iphylogenetic +i tree: Malgsalty, Malgentae {Malgescco). .
.
.
.
.
.
.
.
........ ................... ............ ... :...:..
Cmygkleox Maidstrpn Malgescco Amyctheth Ugpeescco
[
:: :.... :.. ~:..~::.:...:....: 9 :~.:..:.:
Proposed orientation of MALG in the membrane
~!<,i~i+!:,:i~,i!i:~iii:
The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the inside and is folded six times through the membrane e. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last :!i++:~Zi:,:i+i:+i:i+:i:++:'+:i+~: residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75% of the aligned transporters (see below) are ![i:+;+:;+I~:::! i~::~!:!:: +::! shown. Consensus residues indicated by an asterisk are not conserved in MALG. .: :.:: _.:..: ...............
iif::!~i!;{:i:i::-::ii::%:
T
I~i.i!i.:~!Z i?i :.:!-
OUTSIDE G A
N
U. :: ::.:. %;..:
I~::+i:!:-i r162 .::
P
P
A
GL I'M
:P Y
IJ~
Q V
t~-:%:::!::::::.i::'i".~-
~Z::i:./Zii+~!:: ................ .:..: ..
~)::i.iiZ!i ii::;;ii::ii: ..........
COOH RF i~:::::~,:i;iZ:+i~ii+i:i{~.i::
................. +.:. ::...:.:..:................... :::::::::::::::::::::::
k
NH
E ~
2
INSIDE
~_05
Physical and genetic characteristics .
.
.
.
.......................
.
.
Amyctheth Cymgkleox Maldstrpn Malgescco Malgsalty Malgentae Ugpeescco
AMINO ACIDS
MOL. W T
274 277 277 296 296 296 281
30994 30 884 30 983 32 225 32 225 32 221 31 499
CHR O M O S O M A L LOCUS
91.32 minutes 77.29 minutes
Multiple amino acid sequence alignments 1
50
Cmygkleox Maldstrpn Malgescco Amyctheth Ugpeescco Consensus
MRNIISKIMT ILVY...LFL LLNALVVLGP MILITQRRLT QSYY...LYL IGLSIVIIYP MAMVQPKSQK ARLFITHLLL LLFIAAIMFP ..... M R K V H V Q K Y L L T F L G I V L S L L W I S P ...MIENRPW LTIFSHTMLI LGIA.VILFP ............................. P
Cmygkleox Maldstrpn Malgescco Amyctheth Ugpeescco Consensus
51 i00 TE . . . . . . . . . . I S F T L E H Y H N L . L T G T . P Y L K W Y K N T F I L A T C N M L I S L TN . . . . . . . . . . I D L N F D N F K G P S L K P C . T V L G T F . N T L I I A L I T M A V Q T PEQISWDHWK LALGFSVEQA DGRITPPPFP VLLWLWNSVK VAGISAIGIV S ......... L P K S L M L D N Y K T . . A A A N L N L S E A F S N T L I I T V F S I L I I A T L I P G T H L L E N I H N I W V N G V G T N S A .... P F W R M L L N S F V M A F S I T L G K I .................................... N .... A ........
i01 CmygkleoxVVVTITAFIF SRYRFKAKKK ILMSILVLQM Maldstrpn SIIVLAGYAY SRYNFLARKQ SLVFFLIIQM MalgesccoALSTTCAYAF A R M R F P G K A T L L K G M L I F Q M Amyctheth IFSSMTAYAL QRVKRKSSVI IYMIFTVAML Ugpeescco TVSMLSAFAIVWFRFPLRNL FFWMIFITLM C o n s e n s u s ...... A . A . . R . . F . . . . . . . . . . . . . . M
VIWTVMSSLK PGNNLFSSGF LLITIMSAFK AGNVSAFKLD LLMVVAISLR QGNFATGSLI FYIILVNSFK TKLELFTNTL LYVAFVAATL DKQAVYAAPM ....................
150 FPAFLSMTAI YILLSKMN.. VPTMAALTAF FVMALMLN.. FPAVLSLVAL YALFDRLGEY IPFQSVMIPL VAEFGKFH.. LPVEVRIFPT VEVIANLQ.. .P . . . . . . . . . . . . . . . . . .
Cmygkleox Maldstrpn Malgescco Amyctheth Ugpeescco Consensus
151 .... L I D T Y I .... A L N H N W IPFIGLNTHG ..... FLTRS .... M L D S Y A ..........
Cmygkleox Maldstrpn Malgescco Amyctheth Ugpeescco Consensus
201 250 LTIFFEIILP LAKPILVFVA LVSFTGPWMD FILPTLILRS EDKMTLAIGI FRRFWQIVLP LVRPMVAVQA LWAFMGPFGD YILSSFLLRE KEYFTVAVGL WQAFRLVLLP LSVPILAVVF ILSFIAAITE VPVASLLLRDVNSYTLAVGM FRIYWNIILP LLNPTTITLA VLDIMWIWND YLLPSLVINK VGSRTLPLMI MRFFCDIVFP LSKTNLAALF VITFIYGWNQ YLWPLLIITD VDLGTTVAGI . . . F . . I . L P L . . P . . . . . . . . . F . . . . . . . . . . . L . . . . . . . . T...G.
GLLLVYVTGS LPFMTWLVKG YFDAIPTSLD FLIFLYVGGG IPMNAWLMKG YFDTVPMSLD GVIFAYL.GG IALHVWTIKG YFETIDSSLE GLVFMYLGFG SSLGVFLYYGALKGIPTSLD GLTLPLMASA TA..TFLFRQ FFMTLPDELV G L . . . Y . . . . . . . . . . L . . G .F...P.SL.
200 EAAKIDGAGH ESAKLDGAGH EAAALDGATP EAALIDGCSR EAARIDGASP EAA..DGA..
ABC-associated binding protein-dependent maltose transporter [amily
.-.~.;::. ::: .:
::
Cmygkleox Maldstrpn Malgescco Amyctheth Ugpeescco Consensus
251 299 FSWIS.SNSA ENFTLFAAGA LLVAVPITLL FIVTQKHITT GLVSGAVKE QTFVN.NAKN LKIAYFSAGA ILIALPICIL FFFLQKNFVS GLTSGGDKG QQYLN.PQNY L.WGDFAAAAVMSALPITIV FLLAQRWLVNGLTAGGVKG FYFF..SQYT KQWNLGMAGL TIAILPVVIF YFLAQRKLVT AIIAGAVKQ KGMIATGEGT TEWNSVMVAM LLTLIPPVVI VLVMQRAFVR GLVDSEK.. ................. A ....... P ........ Q...v. GL..G..K.
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the alignments: Malgsahy, Malgentae (Malgescco). Residues listed in the consensus sequence are present in at least 75% of the aligned transporter sequences. Residues indicated by boldface type are also conserved in the ABC-associated binding protein-dependent peptide transporter family. ii!!~L:i:i; Database accession numbers SWISSPR OT Amyctheth P37729 Cymgkleox Maldstrpn Q04699 Malgescco P07622 Malgsalty P26468 Malgentae P 18814 Ugpeescco P 10906 ,:.. <.:.:.=. : . ....
PIR $37705; S17297 $55409 $325 71 A24361 A60175; $20605 S05333 S03782
EMBL/GENBANK M57692; M54654 C86014 L08611 U00006; X02871 X54292; M33921 X 13141; U00039
References 1 Higgins, C.E (1992) Annu. Rev. Cell Biol. 8, 67-113. z Dassa, E. and Muir, S. (1993) Mol. Microbiol. 7, 29-38.
~_07
ABC-associated binding protein-dependent peptide transporter family Summary Transporters of the ABC-associated binding protein-dependent peptide transporter family, the example of which is the DPPC dipeptide transporter of Escherichia coli (Dppcescco), mediate the uptake of oligopeptides. One !i~i!.!i:!!':i:i!.:,:.iqmember of the family, NIKC, mediates the uptake of nickel. Members of the family are found in both gram-negative and gram-positive bacteria. Members of the ABC-associated binding protein-dependent peptide transporter family are associated with cytoplasmic elements of the ATP binding cassette (ABC) superfamily *. Proteins in this superfamily use the energy of ATP hydrolysis to pump substrates across cell membranes. Two cytoplasmic chains containing the ATP binding domains associate with two :~:~}i)~;',?i?:))il transmembrane domains of the ABC-associated binding protein-dependent peptide transporter family to form the active complex. Statistical analysis of multiple amino acid sequence comparisons suggests that the ABC-associated binding protein-dependent peptide transporter family is most closely related to the ABC-associated binding proteindependent maltose transport family. Members of the ABC-associated binding protein-dependent peptide transporter family are predicted to form six membrane-spanning helices by the hydropathy of their amino acid sequences, activities of reporter gene fusions, reaction with peptide specific antibodies, and susceptibility to proteolysis ~. Several amino acid sequence motifs are highly conserved in the ABCassociated binding protein-dependent peptide transporter family. ......
::::::::::::::::::::::::::::: .:::: :,:~ :~::~t ::::::::::::::::::: .....
Nomenclature, biological sources and substrates
..........
::::::::::::::::::::::::::::::
DESCRIPTION [SYNONYMS]
Amidstrpn
Oligopeptidetransport system permease [gram-posiuve bacterium] [AMID] Bacillus subtih's Oligopeptidetransport [gram-posiuve bacterium] system permease [APPC] Dipeptidetransport system Bacillus subtilis [gram-posiuve bacterium] permease [DPPC] Dipeptidetransport system Escherichia coh" [gram-negatwe bacterium] permease [DPPC] Dipeptidetransport system Haemophilus influenzae [gram-negauve bacterium] permease [DPPC] Escherichia coh" Nickeltransport system [gram-negauve bacterium] permease [NIKC] Bacillus subtilis Oligopeptidetransport [gram-posiuve bacterium] system permease [OPPC, SPOOKc] Haemophilus influenzae Oligopeptidetransport [gram-negative bacterium] system permease [OPPC, HI1122 ] Lactobacillus lactis Oligopeptidetransport [gram-positive bacterium] system permease [OPPC] Salmonella typhimurium Oligopeptidetransport [gram-negative bacterium] system permease [OPPC]
Appcbacsu Dppcbacsu :::::::::::::::::::::::::::::::::
Dppcescco -:.:::.:::-::::: -::.-:::: :.. ........
9::: :::-.
:::
Dppchaein
:-
Nikcescco ::~;;!!::d.Yii:USi
ii:i~iiiG!ili~!::,ii:~!:
::...........................
.:,.
iSi:iil;!i? ......................
Oppcbacsu Oppchaein
.............. .............
.........
..
Oppclacla Oppcsalty
~_08
OR GANISM [COMMON NAMES] Streptococcus pneumoniae
CODE
!~i!~!i~:i!':i!~!:!i
SUBSTRATE(S)
Oligopeptides Oligopeptides Dipeptides Dipeptides Dipeptides Nickel Oligopeptides Oligopeptides Oligopeptides Oligopeptides
ABC-associated binding protein-dependent peptide transporter family
CODE
Sapcescco Sapchaein Sapcsalty
i~!i!!:i!:~i!!i!i!:~il.ii: ................!<
.. ~s ii!fii!~:!!.Gii~i [%::i:i:i~iii~iiii:,~
[i4Gii[i:!!'~i
[~:f~i~!ii,~,::!i!i!!i.i:i!:i i!::!!ii~Ti~!i.:i;i!;i i!:~
[fili[!':~[![!:~,~.i[i!i~!i;
DESCRIPTION [SYNONYMS] Peptidetransport system permease [ S A P C ] Peptidetransport system permease [SAPC, HI1640] Peptidetransport system permease [ S A P C ]
ORGANISM [COMMON NAMES] Escherichiacoli [gram-negative bacterium] Haemophilusmfluenzae [gram-negativebacterium] Salmonellatyphimurium [gram-negative bacterium]
SUBSTRATE(S)
Oligopeptides Peptides Peptides
Phylogenetic tree Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the phylogenetic tree: Sapcescco (Sapcsalty).
--j
q
C
--
_
,,
C
I
c::-
Dppcescco Dppchaein Sapchaein Sapcsalty
Oppchaein Oppcsa!ty Dciabacsu Spobacsu Appcbacsu Nikcescco Oppclacia Amidstrpn
Proposed orientation of DPPC in the membrane ~i'~ii[!',!!~ifi!i
N N
The model is based on predictions of membrane-spanning regions and ~-helical content. The N-terminus of the protein is illustrated on the inside and is folded six times through the membrane 2. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than" 75% of the aligned transporters (see belowl are shown. Consensus residues indicated by an asterisk are not conserved in DPPC.
~_0r
OUTSIDE
iiiC,~4iG~i.li
GRD D T G
R A
R
H
..... o .............. . ::::::::::::::::::::::: :: ::::::::::::::::::::::::
Y
MD
./:Z.ZZ
~!!!!i~!!!~f!;~:::::
:.-.z.::.:Jz.:.:.
========================== ..............
:zz?zz.
A
13 D G
F
t
GD
R
o
A D p
INSIDE
,.t,2p:-2,:.,:.,t~.9~
COOH
Physical and genetic characteristics ::::::::::::::::::::::::::::::::::::: ~.:~;i:.~:~:;~i.::
i)-i4i-ii-:-:--:.i-:iii:?i: ';GG:G-i";:::
-::.:.:=: .:. = :.
~-:y:~.-.-:~::::.~::.::.~:~:-::~:~ =-::--==:--.
Amidstrpn Appcbacsu Dppcbacsu Dppcescco Dppchaein Nikcescco Oppcbacsu Oppchaem Oppclacla Oppcsahy Sapcescco Sapchaem Sapcsahy
AMINO ACIDS 308 303 320 300 295 277 305 311 294 302 296 295 296
MOL. WT
34 634 33 420 35 836 32 308 31 840 30 362 33 621 34 631 32 835 33 090 31 548 33 014 31 548
CHROMOSOMAL LOCUS
104 ~ 113 ~ 79.75 minutes 77.87 minutes 104 ~
Multiple amino acid sequence alignments ============================
..............
~.1(
Sapchaein Sapcsalty Oppchaein Oppcsalty Dppcbacsu
1 ............................. M ...................... MPYDSVYS ......... M T D Y R T Q P I N Q K N A D F V E Q V A . . . . . . . . . . . . . . . MMLSK K N S E T L E N F S MNLPVQTDER QPEQHNQVPD EWFVLNQEKN
QNKEPDEFRE EKRPPGTLRT DRIEEMQLEG EKLE...VEG READSVKRPS
50 STSIFQIWLR A ...... WRK RSLWQDAKRR RSLWQDARRR LSYTQDAWRR
Oppcbacsu Appcbacsu Dppcescco Dppchaein ii~[i!~,i~,,,ii,~i~:i! N i k c e s c c o Oppclacla Amidstrpn Consensus
!i:i:i~i~i!:;!i
.............. MQNIPK NMFEPAAANA GDAEKISKKS LSLWKDAMLR . . . . . . . . . . . . . . . . . . MS E L Q T T P S P E I R L K E N I S K K P E T M T K I F W E K ......................... MSQVT ENKVISAPVP MTPLQEFWHY .............................. MSDTPLTFAP KTPLQEFWFY .............................................. MNFF ............................... MTEKKHKNS LSLVHSIKEE .............. MSTIDK EKFQFVKRDD FASETIDAPA YSYWKSVFKQ ..................................................
Sapchaein Sapcsalty Oppchaein Oppcsalty Dppcbacsu Oppcbacsu
51 FRQNTIALFS FYSDAPAMVG FFRNKAAVAS FMHNRAAVAS LKKNKLAMAG FRSNKLAMVG
Dppcescco Dppchaein Nikcescco Oppclacla Amidstrpn Consensus
F K R N K G A W G LVYVVIVLFI A.IFANWIAP YNPAEQ.FRD ALLAPPAWQE FKQNRGALIG LIFILIVALI S.ILAPYIAP FDPTEQ.NRT ALLLPPAWYE L S S R W S V R L A LIIIALLALI A.LTSQWWLP YDP ....... QAIDLPSRLL LKKDKLAMIS TIFLVAVFLI V Y I Y S M F L K Q SNYVDVNIMD QYLA ...... FMKKKSTVVM LGILVAIILI SFIY.PMFSK FDFND...VS KVNDFSVRYI F ..... A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sapchaein Sapcsalty Oppchaein Oppcsalty Dppcbacsu Oppcbacsu Appcbacsu Dppcescco Dppchaein Nikcescco Oppclacla Amidstrpn Consensus
I01 150 RGKIAFFFGT DDLGRDILSR L I M G T R Y T L G SALLVVFSVA IIGGALGIIA YGEVSFFLGT DDLGRDVLSR L L S G A A P T V G G A F I V T L A A T L C G L V L G V V A TMEGYHFFGT DASGRDLLVR TAIGGRISLL V G I A G A F I S V TIGTIYGAIS DMASGHYFGT DSSGRDLLVR V A I G G R I S L M V G I A A A L V A V IVGTLYGSLS P P S A D H W F G T DELGRDVFTR TWYGARISLF VGVMAALIDF L I G V I Y G G V A PPSKDHWFGT DDLGRDIFVR TWVGARISIF IGVAAAVLDL LIGVIWGSIS ..GLEHLMGT DKFGRDIFSR ILYGARVSLL VGFASVVGSI L I G T V L G A L A G G S M A H L L G T DDVGRDVLSR LMYGARLSLL VGCLVVVLSL IMGVILGLIA GGNPAYLLGT D D I G R D I L S R IIYGTRISVF AGFIIVLLSC AFGTSLGLIS SPDAQHWLGT DHLGRDIFSR L M A A T R V S L G SVMACLLLVL T L G L V I G G S A PLTNGHLLGT DNGGRDIIMM L M I S A R N S F N I A F A V T L I T L V V G N I L G V I T KPNAEHWFGT D S N G K S L F D G V W F G A R N S I L . I S V I A T V I N L V I G V F V G G I W ..... H . . G T D . . G R D . . . R ...G.R.S . . . . . . . . . . . . . . G...G...
Appcbacsu
FYLLIALIFT LYGCAGLALL LIILAFIIIF LIVLFLIALF LFILLFLFVM LIIIVLIILM
FSKNKLAILG AVILFIIIMS
A.LFASYLAP C.IFGGWIAP I.TVAPWFFP V.TVAPMLSQ A.VIGPFLSP A.IFAPMFSR
A.VFAPLIAP
YA.DNRQFIG YG.IDQQFLG FT ..... YED FT ..... YFD HS ..... W R YD ..... YST
YPQEQQSLLD
i00 QELMPPSWVD YQLLPPSWSR TDWNMMSAAP TDWGMMSSAP Q..SLTEQNL T..NLLNADK K Y K A P .....
151 200 Sapchaein GLLKGIKARF VGHIFDAFLS LPILLIAVVI STLMEP.SLW NAMFATLLAI Sapcsalty G A T H G L R S A V LNHILDTLLS IPSLLLAIIV VAFAGP.HLS HAMFAVWLAL Oppchaein GYVGGKTDML MMRFLEILSS FPFMFFVILL VTLFGQ.NIF L I F I A I G A I A Oppcsalty G Y L G G K I D S V MMRLLEILNS FPFMFFVILL V T F F G Q . N I L LIFVAIGMVS Dppcbacsu GYKGGRIDSI M M R I I E V L Y G L P Y L L V V I L L M V L M G P . G L G T I I V A L T V T G Oppcbacsu GFRGGRTDEI M M R I A D I L W A V P S L L M V I L L MVVLPK.GLF T I I I A M T I T G N~:~!i!!:i?i2~: Appcbacsu G Y F R G I V D A V IMRVVDIVLS IPDIFLLITL V T I F K P . G V D KLILIFCLTG Dppcescco GYFGGLVDNI IMRVVDIMLA LPSLLLALVL V A I F G P . S I G N A A L A L T F V A Dppchaein GYYGGVLDTI IIRLIDIMLA I P N L L L T I W V S I L E P . S L A NATLAIAVVS Nikcescco G L I G G R V D Q A TMRVADMFMT FPTSILSFFM V G V L G T . G L T N V I I A I A L S H Oppclacla GYFGGRFDLI FMRFTDFVMI LPSMMIIIVF V T I I P R F N S W SLIGIISIFS A m i d s t r p n G . I S K S V D R V MMEVYNVISN IPPLLIVIVL T Y S I G A . G F W N L I F A M S V T T C o n s e n s u s G . . . G . . D . . . M R ........ P ..... I.. V ............. A .....
~_11
!~:~ii~i ,iii:i
201 Sapchaein LPYFIHTIYR A I Q K E L E K D Y V V M L K L E G I S NQALLKSTIL Sapcsalty LPRMVRSVYS MVHDELEKEY VIAARLDGAT TLNILWFAIL Oppchaein WLGLARIVRG QTLSLKNKEF VEAAIVCGVP RRQIILKHII Oppcsalty WLDMARIVRG QTLSLKRKEF IEAAQVGGVS TASIVIRHIV D p p c b a c s u W V G M A R I V R G QVLQIKNYEY VLASKTFGAK TFRIIRKNLL Oppcbacsu WINMARIVRG QVLQLKNQEY VLASQTLGAK TSRLLFKHIV Appcbacsu WTTTARLVRG EFLSLRSREY VLAAKTIGTK THKIIFSHIL Dppcescco LPHYVRLTRA AVLVEVNRDY V T A S R V A G A G A M R Q M F I N I F Dppchaein IPSYVRLTRA AMMNEKNRDY VTSSKVAGAG ILRLMFIVIL Nikcescco WAWYARMVRS LVISLRQREF VLASRLSGAG HVRVFVDHLA Oppclacla WIGTTRLIRA RTMTEVNRDY VRASKTSGTS DFKIMFREIW Amidstrpn WIGIAFMIRV QILRYRDLEY NLASRTLGTP TLKIVAKNIM Consensus ..... R..R .......... Y V . A .... G .......... I.
250 PNITVIYIQE PNITAGLVTE PNVLGLVAVY PNVLGVVVVY RNTMGAIIVQ PNAMGSILVT PNALGPIIVS PNCLAPLIVQ PNCLAPLIVQ GAVIPSLLVL PNLSTLVIAE PQLVSVIVTT PN ........
Sapchaein Sapcsalty Oppchaein Oppcsalty Dppcbacsu Oppcbacsu Appcbacsu Dppcescco Dppchaein Nikcescco Oppclacla Amidstrpn Consensus
251 300 VARAFVIAVL DISALSFISL G A Q R P T P E W G A M I K D S L E L L YLAP..WTVL ITRALSMAIL DIAALGFLDL G A Q L P S P E W G A M L G D A L E L I YVAP..WTVM ASLEVPGLIL FESFLSFLGL GTQEPMSSWG ALLSDG.AAQ MEVSPWLL.I ASLLVPSMIL FESFLSFLGL GTQEPLSSWG ALLSDG.ANS MEVSPWLL.L MTLTVPAAIF AESFLSFLGL GIQAPFASWG VMANDGLPTI LSGHWWRL.F MTLTVPTAIF TEAFLSYLGL GVPAPLASWG TMASDGLPA. LTYYPWRL.F ATLKVGSVIL AESALSYLGF GIQPPIASWG NMLQDAQNFT VMIQAWWYPL ASLGFSNAIL DMAALGFLGM GAQPPTPEWG TMLSDVLQFA Q..SAWWVVT MTMGISNAIL ELATLGFLGI GAKPPTPELG TMLSEARGFM Q..AANWLVT ATLDIGHMML HVAGMSFLGL GVTAPTAEWG VMINDARQYI WTQP..LQMF ATLVFAGNIG LETGLSFLGF GLPAGTPSLG TMINEATNPE TMTDKPWTWV MTQMLPSFIS YEAFLSFFGL GLPITVPSLG RLISDYSQNV TTNA..YLFW ........ I ..... L S F L G L G . . . P . . . W G .M..D ...............
Sapchaein Sapcsalty Oppchaein Oppcsalty Dppcbacsu Oppcbacsu Appcbacsu Dppcescco Dppchaein Nikcescco Oppclacla Amidstrpn Consensus
301 334 LPGFAIIFTI LLSIIFSNGL TKAINQHQE ..... LPGAAITLSV LLVNLLGDGI RRAIIAGVE ..... FPAFFLCLTL FCFNFIGDGL RDALDPKDR ..... FPAGFLVVTL FCFNFIGDGL RDALDPKDR ..... FPAFFISSTM YAFNVLGDGL QDALDPKLRR .... FPAGFICITM FGFNVVGDGL RDALDPKLRK .... FPGLFILMTV LCFNFVGDGL RDALDPKNIK .... FPGLAILLTV LAFNLMGDGL RDALDPKLKQ .... IPGLVILSLV LAFNLMGDGL RDALDPKLKQ .... WPGLALFISV M A F N L V G D A L R D H L D P H L V T EHAH P A T V V I L I V V L A I I F I G N A L R R V A D Q R Q A T R... IPLTTLVLVS LSLFVVGQNL ADASDPRTHR .... .P ........... N..GDGL RDA.DP ........
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the alignments: Sapcescco (Sapcsalty). Residues listed in the consensus sequence are present in at least 75% of the aligned transporter sequences. Residues indicated by boldface type are also conserved in the ABC-associated binding protein-dependent maltose transport family.
~_12
Database accession numbers Amidstrpn Appcbacsu i;il;i!ii:;!i~,.:i:,~.,:,i Dppcbacsu Dppcescco Dppchaein Nikcescco iiiiiI~iii Oppcbacsu Oppchaein Oppclacla Oppcsalty Sapcescco Sapchaein Sapcsalty
SWISSPR OT P18794 P42063 P26904 P37315
PIR S 11151
P33592 P24139 P45053 Q07743 P08006
$39596 S 15232 C29333
P45287 P36669
$39587
S 16649
EMBL/GENBANK X17337 U20909 X56678 L08399; U00039 U 17295 X73143; U00039 X56347; M57689 U32792 L 18760 X05491 X97282 U32837 X74212
Refs 1 Higgins, C.F. (1992) Annu. Rev. Cell Biol. 8, 67-113. 2 Pearce, S.R. et al. (1992) Mol. Microbiol. 6, 47-57.
ABC-associated binding protein-dependent iron transporter family Summary F i-
.::
i
....
:
.
7
"
-
.
Transporters of the ABC-associated binding protein-dependent iron transporter family, the example of which is the BTUC vitamin B12 transport protein of Escherichia coli (Btucescco), mediate uptake of iron. Members of the family are found in both gram-negative and gram-positive bacteria. Members of the ABC-associated binding protein-dependent iron transporter family are associated with cytoplasmic elements of the ATP binding cassette (ABC) superfamily 1. Proteins in this superfamily use the energy of ATP hydrolysis to pump substrates across cell membranes. Two cytoplasmic chains containing the ATP binding domains associate with two transmembrane domains of the ABC-associated binding protein-dependent iron transporter family to form the active complex. Statistical analysis reveals no apparent relationship between the amino acid sequences of the ABC-associated binding protein-dependent iron transporter family and any other family of transporters. They are predicted to form nine membrane-spanning helices by the hydropathy of their amino acid sequences. Several amino acid sequence motifs are highly conserved in the ABCassociated binding protein-dependent iron transporter family.
Nomenclature, biological sources and substrates I
i, ~
CODE
DESCRIPTION [SYNONYMS]
Btucescco
VitaminB12 transport system permease [BTUC] Ferri-siderophorepermease
..
Cbrberwch Cbrcerwch Fatcviban
[CB~]
Ferri-siderophorepermease [CBRC] Ferricanguibactin transport system permease
[FATC]
Fatdviban
Ferricanguibactin transport system permease
[FATD] Feccescco
"
i
"
i
i i
! !14
Ferricdicitrate transport system permease [FECC] Fecdescco Ferricdicitrate transport system permease [FECD] Fepdescco Ferricenterobactin transport system [FEPD] Fepgescco Ferricenterobactin transport system [FEPG] Feubbacsu Iron uptake protein [FEUB] Fhubbacsu Ferrichrometransport protein [FHUB] Fhubescco Ferrichrometransport protein [FHUB] Fxuamycsm Ferricexochelin uptake system [FXUA] Hempyeren Hemin uptake system [HEMP, HEMU]
OR GANISM [COMMON NAMES] Escherichia coE
S UBSTRATE(S)
Vitamin B12
[gram-negative bacterium] Erwinia chrysanthemi
[gram-negative bacterium] Erwinia chrysanthemi
[gram-negative bacterium] Vibrio anguillarum
[gram-negative bacterium]
Ferrisiderophores Ferrisiderophores Ferric anguibactin
{gram-negative bacterium]
Ferric anguibactin
Escherichia coli
Ferric dicitrate
Vibrio anguillarum
[gram-negative bacterium] Escherichia coli
Ferric dicitrate
[gram-negative bacterium] Escherichia coli
[gram-negauve bacterium] Escherichia coil
[gram-negative bacterium] Bacillus subtilis
Ferric enterobactin Ferric enterobactin Iron
[gram-posiuve bacterium] Bacillus subtilis
Ferrichrome
[gram-positive bacterium] Escherichia coli
Ferrichrome
[gram-negative bacterium] Mycobacterium smegmatus Ferric exochelin
[gram-posiuve bacterium] Yersinia enterocolitica
[gram-negative bacterium]
Hemin
Phylogenetic tree i
%
:...
.
I I
:9 .,: ... :.
.<
:%
-..-
u:-
L
Cbrcerwch Feubbacsu Fhubbacsu Fecdescco Hempyeren Cbrberwch Fepdescco Feccescco 3tucescco Fhubescco 7atdviban Fa
Proposed orientation of BTUC in the membrane
.:
.
Fxuamycsm Fepgescco
:.
<::
:L
..
. :/
..:.
}:.
i!~i
.
.
i
>
..
-:.
The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the inside and is folded nine times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75% of the aligned transporters (see below) are shown. Consensus residues indicated by an asterisk are not conserved in BTUC. OUTSIDE
.<:
.,
:.
s
I()B li3 1
1
G~' P
8221L
G
GF
:LI
LA
A
G
G
...
G ::
[:
k
O
>
9.
G
L~
G
AD:
P
::
INSIDE
:~: .......
GALG
Physical and genetic characteristics Btucescco Cbrberwch Cbrcerwch Fatcviban
AMINO ACIDS 326 340 349 317
MOL. W T
35 34 36 34
007 832 777 929
CHROMOSOMAL LOCUS 38.56 minutes
AMINO ACIDS 314 332 318 334 330 334 384 659 344 334
MOL. WT
;!i=i!!!i=iiii:
Fatdviban Feccescco Fecdescco Fepdescco Fepgescco Feubbacsu Fhubbacsu Fhubescco Fxuamycsm Hempyeren
iljiC
Multiple amino acid sequence alignments
C :~:-:i'iii:~ !. ....
.
i
.:...:
.
9
~.:~2 s.t.:~i., h.
:::::::::::::::::::::::::::::::::::::
33 922 34 892 34 131 33 8 71 34 993 35 896 40 720 70 335 35 170 35 528
CHROMOSOMAL LOCUS 97.19 minutes 97.17 minutes 13.3 7 minutes 13.35 minutes 15 ~ 286 ~ 3.68 minutes
1 50 ghubescco MSKRIALFPA LLLALLVIVA TALTWMNFSQ ALPRSQWAQA AWSPDIDVIE
51 i00 Fhubescco QMIFHYSLLP RLAISLLVGA GLGLVGVLFQ QVLRNPLAEP TTLGVATGAQ ,s:-:,:-:s:::::--t ......
i01 150 Fhubescco LGITVTTLWA IPGAMASQFA AQAGACVVGL IVFGVAWGKR LSPVTLILAG 151 200 Fhubescco LVVSLYCGAI NQLLVIFHHD QLQSMFLWST GTLTQTDWGG VERLWPQLLG 201 250 Fhubescco GVMLTLLLLR PLTLMGLDDG VARNLGLALS LARLAALSLA IVISALLVNA
...........o ......... :. 9 .::.:.:..:.:::.:-:
: .
251 300 Fxuamycsm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fepgescco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cbrcerwch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Feubbacsu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fhubbacsu ........................ MHFHFC SKHSIKSAEK SDILKQQLII Fecdescco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hempyeren . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cbrberwch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fepdescco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Feccescco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Btucescco
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fhubescco VGIIGFIGLF APLLAKMLGA RRLLPRLMLA SLIGALILWL SDQIILWLTR Fatdviban . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fatcviban . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
f?::!?!!(?:2!:::::!!i! i:!!2~::
301 350 Fxuamycsm ......... M RGSRRQRVGA H W A C G D R G R RTAGGCLRGP RVLGDRDRPS Fepgescco ..................... MIYVSRRLL ITCLLLVSACVVAGIWGLRS Cbrcerwch . . . . . . . MDK R G G M D H V L V W R Q G R F S R Q I N LTTVGRVSLA LLLVLAVMVA Feubbacsu ......................... MYSKQ WTRIILITSP FA.IALSLLL Fhubbacsu IISNRKEVRQ LSQHKNIRTA SEEIQWTSRT YGAVIVLIAG LCLLCLGAFL Fecdescco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M K I A L V I F I T L A L A G
... . . . . . .
?:i.-,:. :i : :: :. :.
9
:-:.:...
~..:
~..
...... .
.
Fxuamycsm Fepgescco Cbrcerwch Feubbacsu Fhubbacsu Fecdescco Hempyeren Cbrberwch Fepdescco Feccescco Btucescco Fhubescco Fatdviban Fatcviban Consensus
........
i:~:!:I,-~: :: i:..:-
...
. ... . . . .
ii?ii . -:!:.,!:4 1
!/:ili~i~ii~:
-;.
::;:.!:i : *
j::..--:..:.
.. i?!.-.;-..:. .... .
Hempyeren . . . . . . . . . . . . . . . . . . . . . . . . . MNCRI H P R L M L S I L L MILIILALG. Cbrberwch . . . . . . . . . . . . . . . . . . MS HAVIPTGRRI A P G Q V L A G G G VCLLALAVLS Fepdescco MS G S V A V T R A I A VPGLLLL ..... LIIATALS Feccescco . . . . . . . . . . . . . . . . . . . . . . . . . MTAIK H P V L L W G L P V A A L I I I F W L S Btucescco . . . . . . . . . . . . . . . . . . . . . M L T L A R Q Q Q R Q N I R W L L C L SVLMLLALLL F h u b e s c c o V W M E V S T G S V IALIGAPLLL WLLPRLRSIS A P D M K V N D R V A A E R Q H V L A F Fatdviban . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MTFRMILAFF Fatcviban . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
. . . .
<...:... :. :/i;:i i-ii(":
..
.
... ... ....
:
_:.......
400 PRMLMALLIG PRVLMALLIG PRWLAALVG PRAAGALLIG PRTAAAALVG PRLLLALFVG PRVLLAVVVG SRTLIAIVVG PRTLAGLLAG PRSLVAVLIG PRTLAVLLVG PRIMAALFAG PRLVALILTG IKLSAIIIGG PR ....... G
401 FxuamycsmAALGVSGAIF FepgesccoAALGVSGAIF Cbrcerwch GALAVSGLIL FeubbacsuAALAVSGALM Fhubbacsu A L L A V S G A I M FecdesccoAALAVAGVLI Hempyeren CALAVSGAIM Cbrberwch A G L A V A G A L M Fepdescco G A L G L A G A L M Feccescco A S L A L A G T L L BtucesccoAALAISGAVM Fhubescco VMLAVAGCII Fatdviban SGLAMCGVIL Fatcviban SCVAISAVIF Consensus ..LA..G...
QALTRNPLGS QSLMRNPLGS QAMIRNPLAS QGITRNYLAS QGMTRNPLAE QGIVRNPLAS QGLFRNPLAD QVLTRNPLAS QTLTRNPLAD QTLTHNPMAS QALFENPLAE QRLTGNPMAS QHIVRNRFVE QALARNRILT Q...RNP...
PDIIGLNAG. PDVMGFNTG. PDILGITSG. PSIMGVSDG. PSIMGVTSG. PDILGVNHA. PGLLGISSG. PGLFGINAG. PGLLGVNAG. PSLLGINSG PGLLGVSNG. PEVLGISSG. PGTTGSLDA. PSIMGYESIY P...G...G.
450 AYTGALVALA GLGTGGQHGG A W S G V L V A M V LFG ..... QD ASAAAVFYLS FLAATL...G SAFIITLCMV LLPQSSSI.. SAFAVSIAFA FFPGLSAM.. ASLASVGALL LMP.SLPVMV AAL.CVGLII V M P F S L P P L V AMF .... FLI V C V S L F P K V A ASF .... AIV LGAALFGYSS AAL AMA LTSALSPTPI A G V G L I A A V L LGQGLTP... AAFGVVLMLF LVPG .... NA AKLGILVSIV MLPSSDKLER LVWQALLLLF V G T S G S A V L G A ...................
451 Y.YAVAGGAL L.TAIALSAM A.HYLPLAAM ...EMMIYSF ...GLVLWSF L .... PLLAF ALYSHMVGAF MSVWL.WSAF AQEQL.AMAF
VGGLITAAAV VGGIVTSLLV IGAATAALAV IGSALGAVLV AGAGLGASTV AGGMA.GLIL IGSLAISAII AGAAVAGCLV AGALVASLIV
YALS..YRNG WLLA..WRNG YWLA..WQAG FGLAAMMPNG MGIGMFSRGG LKMLAKTHQP FTLSRWGHGN WLIGTMGKGS AFTGSQGGGQ
500 L A G Y R L I V V G IGVGAVLSSV IDTFRLIIIG IGVRAMLVAF VSPQRLVLTG VGVSALLMAA FTPVQLAIIG T V T S M L L S S L L T P V K L A L A G TAVTYFFTGI M . . . K L A L T G VALSA.CWAS L S . . R L L L A G IAINALCGAA LNPLRMVLAGAAITAMF.AA LSPVRLTLAG VALAAVL.EG
.. .....
351 .... TPIT.. P V D V L R V L V G T N T T F D R . . V W L E W . . . R M .... GAVTLE T S Q V F A A L M G D A P R S M T . . M V V T E W . . . R L SLGLGKLMLS PWEVLRALWS SQPEGAA..L IVQQL...RL SILYGAKHLS T D I V F T S L I H FDPGNTDHQI IW.HS...RI SISLGAADIH L R T V W E A I F H YQPTKTSHQI IH.DL...RL CALLSLHMGV IPVPWRALLT D W Q A G H E H Y Y V L M E Y . . . R L .... SANMGA L T L S F R T L W H A S . L D D A M W H IWLNI...RL L M V . G P V W I A PSQVLGALWH P D P L N V S H I L V . T S T . . . R L LLI.GAKSLP ASVVLEAFS. GTCQSADCTI V . L D A . . . R L LFCYSAIPVS G A D A T R A L L P G H T P T L P E A L V . Q N L . . . R L SLCAGEQWIS PGDWFTP ......... RGEL FVWQI...RL ALAGGVLLLM AVVALSFGRD AHGWTWASGA LLEDLMPWRW T L C A T S L F F G A N Q I E W S L L P TFNEKAWLPI I ..... ASRL .MTSLNLNFR VSVVLVILLS IAFIFINSGF DLEYIIPRRL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RL
Fxuamycsm Fepgescco Cbrcerwch Feubbacsu Fhubbacsu Fecdescco Hempyeren Cbrberwch Fepdescco
Feccescco A G Y S L S F I A A C G G G V S W L L V MTAGGGFRHT HDRNKLILAG Btucescco .NWALGLCAI RGALIITLIL LRFA...RRH LSTSRLLLAG Fhubescco FGWLLPAGSL GAAVT...LL IIMIAAGRGG FSPHRMLLAG Fatdviban MFFAVLFC.. FAAGLVYIAI IRKVKFSNTA LVPVIGLMFG F a t c v i b a n V V G N F W S A V LILLYSFVIQ FWVLKRFQHD M..HQVLLIG Consensus ........... G ....................... L...G
. , .
k
:
. . .
..
:
.
.
.. .
.
. . . .
/-
.
.
.
.
?
-.
IALSAFC.MG VALGIICSAL MALSTAFTML SVLSALAE.. FVLTMVLTTV ..........
Fxuamycsm Fepgescco Cbrcerwch Feubbacsu Fhubbacsu Fecdescco Hempyeren Cbrberwch Fepdescco Feccescco Btucescco Fhubescco Fatdviban Fatcviban Consensus
501 550 NQWIVIKLDH HTAVTASVWQ QGTLNGLTWS QVVPMTVCLAVVTVALLAMG NTWLLLKASL E T A L T A G L W N A G S L N GL T W A KTSPSAPIII LMLIAAALLV TTFMLVFSPL TTTLSAYVWL TGSVYGASWR ETRELGGWLL LIAPWLVLLA SAAMSIYF.. QISQDLSFWY SARLHQMSPD FLKLAAPFFL IGIIMAISLS STAIAIRF.. DVAQDISFWY AGGVAGVKWS GVQLLLIAGA VGLTLAFFIA LTDYLMLSRP QDVNNALLWL TGSLWGRDWS FVKIAIPLMI LFLPLSLSFC VGVLTYISDD QQLRQFSLWS MGSLGQAQWS TLMVAASLIL PACVLGLLQA F S Q A M L W N Q EGLDTVLFWL AGSVADRELA DGAAADGLLLAALVGALLLS LTSGIALLNP DVYDQLRFWQ AGSLDIRNLH TLKVVLIPVL IAGATALLLS LTRITLLLAE DHAYGIFYWL AGGVSHARWQ D V W Q L L P V V V T A V P V V L L L A MTWAIYFSTS VDLRQLMYWM MGGFGGVDWR QSWLMLALIP VLLWIC.CQS L.MMLQASGD PRMAQVLTWI SGSTYNATDA QVWRTGIVMV ILLAITPLCR .... FYAYQN NILQSMSGWL MGDFSKVV.Q EHYEIIFLIL PITLLTYLYA AQFIQIRISP GEFSIFQGLS YTSFERAKPS TLLFAGTVLS ILALFANKWV .................. W . . G ............................
Fxuamycsm Fepgescco Cbrcerwch Feubbacsu Fhubbacsu Fecdescco Hempyeren Cbrberwch Fepdescco Feccescco Btucescco Fhubescco Fatdviban Fatcviban Consensus
551 600 PQLPVLQMGD DAAGGLGVNP ERVRLSYLVA GVALVALACAAAGPISFVAL RRMRLLEMGD DTACALGVRL ERSRLLMMLV AVVLTAAATA LAGPISFIAL RQVRVQQLDD GLAQGIGVGV EWLAVALLLL SVRLAGAAIA WGAAMAFVGL KKVTAVSLGD DISKSLGQKK KTIKIMAMLS VIILTGSAVA LAGKIAFVGL RSVTVLSLGD DLAKGLGQYT SAVKLVGMLIVVILTGAAVS IAGTIAFIGL RDLDLLALGD ARATTLGVSV PHTRFWALLL AVAMTSTGVA ACGPISFIGL RQLNLLQLGD EEAHYLGVNV KQAKLRLLLL SAILIGAAVA VSGVIGFIGL GQVNVLNAGE AIARGLGHGT G R I R L L M S L L V V A L A G G A V A MAGSIGFVGL RALNSLSLGS DTATALGSRV ARTQLIGLLA ITVLCGSATA IVGPIAFIGL NQLNLLNLSD STAHTLGVNL TRLRLVINML VLLLVGACVS VAGPVAFIGL RPMNMLALGE ISARQLGLPL W F W R N V L V A A T G W M V G V S V A LAGAIGFIGL RWLTILPLGG DTARAVGMAL TPTRIALLLLAACLTATATM TIGPLSFVGL HRFTVMGMGE DIASNLGISY AMTAALGLIL V S I T V A V T V V T V G A I H F V G L SELDVIGLGR DQAMSLGLND AHYIPKYFSV IAILVAISTS LIGPTAFMGV ........ G . . . A . . L G . . . . . . . . . . . . . . . . . . . . . . . . . G...F.GL
601 FxuamycsmAAPQLARRLT Fepgescco VAPHIARRIS Cbrcerwch IAPHIRKRLV FeubbacsuVVPHITRFLV Fhubbacsu IIPHITRFLV FecdesccoWPHMMRSIT HempyerenVVPHLIRMRI Cbrberwch IVPHMARKLL Fepdescco MMPHMARWLV F e c c e s c c o LVPHLARFWA Btucescco VIPHILRLCG Fhubescco MAPHIARMMG
650 A S . P G V A L V P A A A M . G A V L L LASDLVAQHL FTANELPVGA GT A R W G L T Q A A L C GALLL LAADLCAQQL FMPYQLPVGV AP G F A G Q A A M A F L S G A G L V MVADLCGRTL FLPLDLPAGI GS.DYSRLIP CSCILGGIFL TLCDLASRFI NYPFETPIEV GV.DYRWIIP CSAVLGAVLL VFADIAARLV NAPFETPVGA GG.RHRRLLP VSALTGALLLVVADLLARII HPPLELPVGV GA.DHRWLLP GAALGGACLL LTADTLARTL VAPAEMPVGL PA.DHRWLLP GCALLGACLL L L A D I L A R V V I V P Q E V P V G V GA.DHRWSLP VTLLATPALL LFADIIGR.V IVPGELRVSV GF.DQRNVLP VSMLLGATLM LLADVLARAL AFPGDLPAGA LT.DHRVLLP GCALAGASAL LLADIVARLA LAAAELPIGV FR.RTMPHIV ISALVGGLLL VFADWCGRMV LFPFQIPAGL
Fatdviban VIPNLVALKY GD.HLKNTLP IVALGGASLL IFCDVISRVV LFPFEVPVGL Fatcviban FIANIAYSIT GSPQYRHTLP VACTIAIVMF LTAQLMVEHF F.NYKTTVSI Consensus . . P H . . R . . . . . . . . . . . . P ..... G..LL ..AD...R .... P . . . P . G . ::(:i!" i:ii:::!s
::?-: ::i:.: ! !!/ ..... .
<....
9" ~: ;:.:..--;,i.
k !, ;i!:..!-:i!-ii:, ..... ~::.:: :-~::
:-k, ~i~i:,;::~~i:
Fxuamycsm Fepgescco Cbrcerwch Feubbacsu Fhubbacsu Fecdescco Hempyeren Cbrberwch Fepdescco Feccescco
Btucescco Fhubescco Fatdviban Fatcviban Consensus
651 673 VTVSLGGIYL VYLLVTQARR . . . VTVSLGGIYL IVLLIQESRK K.. FVSALGAPFF LYLLIKQRH . . . . VTSIIGVPFF LYLIKRKGGE QNG LTSLIGVPFF FYLARRERRG L.. L T A I I G A P W F V W L L V R M R ..... ITSLLGGPYF LWLILRQREQ RSG MTALFGAPFF IFLLRRGGRY G.. VSAFIGAPVL IFLVRRKTRG GA V L A L I G S P C F V W L V R R R G ..... VTATLGAPVF IWLLLKAGR . . . . LSTFIGAPYF IYLLRKQSR . . . . TASAVGGVMF LAFLLKGAKA . . . LVNVLCGGYF LIITMRARSQ L.. .....
G...F
..L
. . . . . . . . . .
Residues listed in the consensus sequence are present in at least 75 % of the aligned transporter sequences. Database accession numbers :; :/- - ::-:.-.
~:ill::,F!:i!~ :-.:':'--.2,
.
.. ~",.dT~:
Btucescco Cbrberwch Cbrcerwch Fatcviban Fatdviban Feccescco Fecdescco Fepdescco Fepgescco Feubbacsu Fhubbacsu Fhubescco Fxuamycsm Hempyeren
SWISSPR OT P06609
P3 773 7 P37738 P 15030 P 15029 P23876 P23877 P40410 P06972
PIR A24498; S04777 $54821 $54822 B41671 A41671 JS0113 JS0114 S16296; S16305 S 16297
S07318; $45222
EMBL/GENBANK M14031 X87208 X87208 M74068 M74068 M26397; U 14003 M26397; U 14003 X57471; X59402 X57471 L 19954 X93092 X05810; D26562 U10425 X77867
References 1 Higgins, C.F. (1992) Annu. Rev. Cell Biol. 8, 67-113.
~l C~
This Page Intentionally Left Blank
ABC Binding Protein-Dependent Transporters" Cytoplasmic Elements
m
Binding protein-dependent monosaccharide transporter family Summary Transporters of the binding protein-dependent monosaccharide transporter family, examples of which are the L-arabinose transport ATP binding protein ARAG from Escherichia coli 1 (Aragescco) and the ribose transport ATP binding protein, also from E. coh '2 (Rbsaescco), mediate import of monosaccharides. These transporters are found mostly in gram-negative bacteria; one member of the family is found in the gram-positive species Bacillus subtilis a. Statistical analysis of multiple amino acid sequence comparisons places the binding protein-dependent monosaccharide transporter family in the ATP binding cassette (ABC) superfamily 4. Proteins in this superfamily use the energy of ATP hydrolysis to pump substrates across cell membranes. In transporters of this family the two ATP binding domains form one chain, separate from any transmembrane domains. The family is characterized by the cytoplasmic ATP binding domains 4, which are described in the following tables. Each transporter is associated with two transmembrane subunits. In the arabinose transport system, ARAG is associated with two copies of the ARAH transmembrane protein 1; in contrast, in the ribose transport system, RBSA is associated with two homologous transmembrane proteins, RBSC and RBSD s. Unusually, the transmembrane proteins of the ribose transport system are predicted to contain eight membrane-spanning helices by the hydropathy of their amino acid sequences s. Quite a large number of amino acids are conserved within the binding proteindependent monosaccharide transporter family, including several long sequence motifs unique to the family, signature motifs of the ABC superfamily, and motifs necessary for function by the criterion of site-directed mutagenesis.
Nomenclature, biological sources and substrates CODE
DESCRIPTION [SYNONYMS]
Aragescco
L-Arabinosetransport ATP binding protein [ARAG] Galactosidetransport ATP binding protein [MGLA] Galactosidetransport ATP binding protein [MGLA, HI08231 Methylgalactosidetransport ATP binding p r o t e i n [MGLA] Ribosetransport ATP binding protein [RBSA] Ribosetransport ATP binding protein [RBSA] Ribosetransport ATP binding protein [RBSA, HI0502] D-Xylosetransport ATP binding protein [XYLG] D-Xylosetransport ATP binding protein [XYLG, HIlllO]
Mglaescco Mglahaein Mglatrepa Rbsabacsu Rbsaescco Rbsahaein Xylgescco Xylghaein
~_2~
ORGANISM [COMMON NAMES] Escherichia cold
SUBSTRATE(S)
L-Arabinose
[gram-negativebacterium] Escherichia c o l d
Galactosides
[gram-negativebacterium] Haemophilus influenzae
Galactosides
[gram-negative bacterium] Treponernapallidum Methyl~ [gram-negative bacterium] galactosides Bacillus subtilds
Ribose
[gram-positivebacterium] Escherichia cold
Ribose
[gram-negativebacterium] Haemophilus influenzae
Ribose
[gram-negativebacterium] Escherichia cold
D-Xylose
[gram-negativebacterium] Haemophilus influenzae
[gram-negativebacterium]
D-Xylose
.....
Phylogenetic tree
:.:..:.: T- .-
.',!g
' ::. <2:.2 . ... :.-::...
aescco
:<
..
:,:
...
.~:g"ahaein
. . :.. ,... :..~.
,~:giat repa
.... <-:c , .:~:. :. ,.
:.:::.!.Y": ];f
P.hsaesccc
:.::~ < :.:.,-... ....... <<.
:%.::.:: .... : :, :: -:.: -
r ::-:,:..::- ::i.:;::.; !:b/ii:i:/: "
3.bsahaein
',if::i<',% ;: ::-.::::::::::::::::::::::::::::::::: :::%, :::.-::::.:.%:;
X,/lgescco
~.~:,:.~::,.- ..:.,:.::.:. ........ ....:...:<.
Xylghaein
::.:.,~:-.;,::~=====================.:
:..-..:~.~::::~-~.:,-.:.,
i;iiiii~i!ii?
Rbsabacsu
:?~:?-~:.f:~ !::: ]1%11.::
::::::::::::::::::::::~:~,;:~::::;~
Aragesccc
::::::::::::::::::: ::::::::::::::::::::
Physical and genetic characteristics AMINO ACIDS 504 506 506 496 453 501 493 513 503
i~,%~!!~:i
Aragescco Mglaescco Mglahaein iii!!;i!i~!i:!! Mglatrepa Rbsabacsu Rbsaescco Rbsahaein l',::i!!'j,i!j,?i;,::',',i!':':i!;,:,?!i:Xylgescco Xylghaein !i!%ii:i:%i~::::(!i
MOL. WT
55 018 56 401 56 567 55 157 49 911 55 041 54 157 56 470 55 679
CHROMOSOMAL LOCUS 42.77 minutes 48.14 minutes 47.757
84.69 minutes 80.36 minutes 64.177
Multiple amino acid sequence alignments
i!!~:i,,!!!~iii~iiiiii~iill ::::::::::::::::::::::::::: ::'J.::~ii~.:i::~:: i!!!i~iii!i~::!ii:,::ii!i::ii!ii.i
1
50
i~!~!!i;i!ii:;i!i!!iii!'i!:i M g l a e s c c o M V S S T T P S S G E Y L L E M S G I N K S F P G V K A L D N V N L K V R P H S I H A L M G E N G A
::::::::::::::::::::::::::::::::::::::::::::
MglahaeinMTAQTQCQDS QVLLTMTNVC KSFPGVKALD NANLTVRSHS VHALMGENGA M g l a t r e p a ......... M C D V L T I R D L S K S F A R N R V L N G V N F R M G K G A V V G L M G E N G A Rbsaescco M EALLQLKGID KAFPGVKALS GAALNVYPGR VMALVGENGA R b s a h a e i n ......... M E T L L K I S G V D K S F P G V K A L N N A C L S V Y A G R V M A L M G E N G A X y l g e s c c o ......... M P Y L L E M K N I T K T F G S V K A I D N V C L R L N A G E I V S L C G E N G S X y l g h a e i n .......... M A L L E M K H I T K K F G D V T A L H N I S I E L E A G E I L S L C G E N G S Rbsabacsu ....... MQIEMKDIH KTFGKNQVLS GVSFQLMPGE VHALMGENGA Aragescco MQQSTPYLSFRGIGKTFPGVKALTDISFDCYAGQVHALMGENGA C o n s e n s u s . . . . . . . . . . . . . L ...... K . F . . V . A L . . . . . . . . . G . . . . L . G E N G A
-"{.-..
.
% . . -
;;2:; :: .! :.... :
:
-q:
...: .
:
.
--.
.
. -.{ }.:-:
..
-:}..:. :-.
:
>.
... ..
.
: -.: : .
.. ...:
i.:.::}:.... ::.....
...
..:
.
9
~
.
.
.
..... .~.-:
-
. . . ;::... -
. .
... ./ :... .:-..... ;
.:... .
.
.:. ....
.
..,.:... ....
TILFQGKEID FHSAKEALEN EILFLGEPVNFKTSKEALEN QILVDGSPVD FQSPKEALEN TLLWLGKETT FTGPKSSQEA TIEYLNRSVNFNGPKASQEA EIIFAGEEIQ ASHIRDTERK DIYFSESELK ARNIRDTEEK QISINGNETY FSNPKEAEQH SVVINGQEMS FSDTTAALNA .I...G .... F .........
i00 GISMVHQELN GISMVHQELN GVAMVHQELN GIGIIHQELN GISIIHQELN GIAIIHQELA GISIIHQELT GIAFIHQELN GVAIIYQELH GI...HQEL.
i01 Mglaescco LVLQRSVMDN Mglahaein LVRQTSVMDN Mglatrepa QCLDRTVMDN Rbsaescco LIPQLTIAEN Rbsahaein LVGNLTIAEN Xylgescco LVKELTVLEN Xylghaein LVKNMSVLEN Rbsabacsu IWPEMTVLEN Aragescco LVPEMTVAEN ConsensusL ..... V..N
MWLGR.YPTK GMFVDQDKMY RETKAIFDEL LWLGR.YPLK GPFVDHAKMY RDTKAIFDEL LFLGR.YPAR FGIVDEKRML DDSLTLFASL IFLGREFVNR FGKIDWKTMY AEADKLLAKL IFLGREFKTS WGAINWQKMH QEADKLLARL IFLGNEITHN .GIMDYDLMT LRCQKLLAQV IFLGNEITHK .GLTADNEMY LRCKNLLQQV LFIGKEISSK LGVLQTRKMK ALAKEQFDKL IYLG.QLPHK GGIVNRSLLN YEAGLQLKHL ..LG ....... G ...... M .......... L
150 DIDIDPRARV DIDVDPKEKV KMDVNPRAVM NLRFKSDKLV GVTHSSKQLC SLSISPDTRV QLDADPNTRV SVSLSLDQEA GMDIDPDTPL ..........
Mglaescco Mglahaein Mglatrepa Rbsaescco Rbsahaein Xylgescco Xylghaein Rbsabacsu Aragescco Consensus
151 GTLSVSQMQM AKLSVSQMQM RSMSVSQRQM GDLSIGDQQM AELSIGEQQM GDLGLGQQQL GELGLGQQQL GECSVGQQQM KYLSIGQWQM ...S..Q.QM
IEIAKGFSYN AKIVIMDEPT SSLTEKEVNH IEIAKAFSYN AKIVIMDEPT SSLSEKEVEH VEIAKAMSYN AKIIVLDEPT SSLTEREIVR VEIAKVLSFE SKVIIMDEPT DALTDTETES VEIAKALSFE SKVIIMDEPT DALTDTETEA VEIAKALNKQ VRLLILDEPT ASLTEQETSI VEIAKALNKQ VRLLILDEPT ASLTEKETEI IEIAKALMTN AEVIIMDEPTAALTEREISK VEIAKALARN AKIIAFDEPT SSLSAREIDN .EIAKA ........ I.DEPT ..L...E...
200 LFTIIRKLKE LFKIIDKLKQ LFAIIRDLSK LFRVIRELKS LFNVIRELKA LLDIIRDLQQ LLNLIKDLKA LFEVITALKK LFRVIRELRK LF..I..L..
Mglaescco Mglahaein Mglatrepa Rbsaescco Rbsahaein Xylgescco Xylghaein Rbsabacsu Aragescco Consensus
201 RGCGIVYISH RGCGIIYISH KGVAFIYISH QGRGIVYISH ENRGIVYISH HGIACIYISH HNIACIYISH NGVSIVYISH EGRVILYVSH .G .... YISH
KMEEIFQLCD KMDEIFKICD KMDEIFQICS RMKEIFEICD RLKEIFQICD KLNEVKAISD KLNEVKAISD RMEEIFAICD RMEEIFALSD ...EIF.I.D
250 EVTVLRDGQW IAT.EPLAGL TMDKIIAMMV EITILRDGKW INT VNVKES TMEQIVGMMV EVIVLRDGVL TLS.QSIGEV EMSDLITAMV DVTVFRDGQF IAE.REVASL TEDSLIEMMV DVTVLRDGQF IGE.RVMAEI TEDDLIEMMV TICVIRDGQH IGT.RDAAGM SEDDIITMMV KICVIRDGEH VGT.KDASTM TEDDIITMMV RITIMRDGKT VDT.TNISET DFDEVVKKMV AITVFKDGRY VKTFTDMQQV DHDALVQAMV ...V.RDG .............. D ..... MV
Mglaescco Mglahaein Mglatr epa Rbsaescco
251 GRSLNQRFPD GRELTQRFPE GRTLDKRFPD GRKLEDQYPH
KENKPGEVIL KTNVPKEVIL ADNTVGDDYL LDKAPGDIRL
EVRNLTSLRQ ..... PSIRD QVENLTAKNQ ..... PSIQD EIRGLSTRYA ..... PQLRD KVDNLCG ........ PGVND
.
-. :;(.:.
:
51 Mglaescco GKSTLLKCLF GIYQKD..SG Mglahaein GKSTLLKCLF GIYAKD..EG Mglatrepa GKSTLMKCLF GMYAKD..TG Rbsaescco GKSTMMKVLT GIYTRD..AG Rbsahaein GKSTLMKVLT GIYSKD..AG Xylgescco GKSTLMKVLC GIYPHGSYEG Xylghaein GKSTLMKILC GIYPCGDYSG Rbsabacsu GKSRLMNILT GLHKAD..KG Aragescco GKSTLLKILS GNYAPT..TG C o n s e n s u s G K S T L . K . L . G . Y ...... G
..(:
.
-......
:
::.
:-.q:
..
-r..:. .- .: :: .:
::
..
~_24
300 VSFDLHKGEI VSFELRKGEI ISLSVKRGEI VSFTLRKGEI
Binding protein-dependent monosaccharide transporter family Rbsahaein GRRLDEQYPH LSQEKGECVL Xylgescco GRELTALYPN EPHTTGDEIL Xylghaein GREITSLYPH EPHEIKDEIL Rbsabacsu GRELTERYPK RTPSLGDKVF Aragescco GRDIGDIYGW QPRSYGEERL C o n s e n s u s G R . L .... P ...... G...L
~?iL::
::i :)r
"
i:!.?(i:il..~ ~.
..
DVKHVSG ........ SGIDD VSFKLHAGEI R I E H L T A W H P V N R H I K R V N D VSFSLKRGEI RVENLSAWHP INTHIKRVDN VSFSLHEGEI EVKNASVK ....... GSFED VSFYVRSGEI RLDAVKA ....... PGVRTP ISLAVRSGEI ................... D V S F .... GEI
Mglaescco Mglahaein Mglatrepa Rbsaescco Rbsahaein Xylgescco Xylghaein Rbsabacsu Aragescco Consensus
301 LGIAGLVGAK LGIAGLVGAK FGLYGLVGAG LGVSGLMGAG VGVSGLMGAG LGIAGLVGAG LGVAGLVGSG VGVSGLMGAG VGLFGLVGAG .G..GL.GAG
350 RTDIVETLFG I.REKSAGTI TLHGKQINNH NANEAINHGF RTDIVEAIFG V.RELIEGTI KLHGKTVKNH TALEAINNGF RSELLEAIFG L.RTIADGEI SLAGKKIRLK SSRDAMKLNF RTELMKVLYG ALP.RTSGYV TLDGHEVVTR SPQDGLANGI RTELGKLLYG ALP.KTAGKV RLKNQEIENR $PQDGLDNGI R T E T I Q C L F G V W P G Q W E G K I YIDGKQVDIR NCQQAIAQGI RTDMVQCLFG SYEGKFEGNI FINQKQVNIK NCAQAIEHKI RTEMMRALFG VDRLDT.GEI WIAGKKTAIK NPQEAVKKVS RSELMKGMFG G.TQITAGQV YIDQQPIDIR KPSHAIAAGM RT ....... G ....... G . . . . . . . . . . . . . . . . A .....
Mglaescco Mglahaein Mglatrepa Rbsaescco Rbsahaein Xylgescco Xylghaein Rbsabacsu Aragescco Consensus
351 ALVTEERRST ALVTEERRST AFVPEERKLN VYISEDRKRD VYISEDRKGD AMVPEDRKRD VMVPEDRKKH ALLQRIARMK MLCPEDRKAE .... E.R...
400 GIYAYLDIGF NSLISNIRNY KNKVGLLDNS RMKSDTQWVI GIYSNLSIEF NSLISNMKSY LTPWKLLSTK KMKSDTQWVI GMFAKGSIEY NTTIANLPAY K.RYGLLSKK KLQEAAEREI GLVLGMSVKE NMSLTALRYF SRAGGSLKHA DEQQAVSDFI GLVLGMSVKE NMSLTSLDHF SQKGG.IRHQ AEKMAVDDFI GIVPVMAVGK NITLAALNKF TGGISQLDDA AEQKCILESI GIVSIMGVGK NITLSSLKSY CFGKMVVNEA KEEQIIGSAI G ....................................... GIIPVHSVRD NINISARRKH VLGGCVINNG WEENNADHHI G ......... N . . . . . . . . . . . . . . . . . . . . . . . . . . . . I
Mglaescco Mglahaein Mglatrepa Rbsaescco Rbsahaein Xylgescco Xylghaein Rbsabacsu Aragescco Consensus
401 DSMRVKTPGH RTQIGSLSGG DSMNVKTPSH RTTIGSLSGG KAMRVKCVSP SELISALSGG RLFNVKTPSM EQAIGLLSGG LMFNIKTPNR DQQVGLLSGG QQLKVKTSSP DLAIGRLSGG KRLKVKTFSP DLPIGRLSGG ...SCSTASP ETHARHLSGG RSLNIKTPGA EQLIMNLSGG ..... KT ...... I..LSGG
451 Mglaescco GAKFEIYQLI Mglahaein GAKFEIYQLI Mglatrepa GAKYEIYQLI Rbsaescco GAKKEIYQLI Rbsahaein GAKKEIYQLI Xylgescco GAKYEIYKLI Xylghaein GAKYEIYKLI Rbsabacsu GAKREIYTLM Aragescco GAKHEIYNVI ConsensusGAK.EIY.LI
AELAKKGKGI QELAKKDKGI IRMAREGKTI NQFKADGLSI NEFKKEGLSI NQLVQQGIAV NQLAQEGIAI NELTERGVAI YALAAQGVAV ...... G..I
NQ.QKVIIGR WLLTQPEILM NQ.QKVIIGR WLLTQPEILM NQ.QKVIIGK WLERDPDVLL NQ.QKVAIAR GLMTRPKVLI NQ QKVAIAR GLMTRPNVLI NQ.QKAILAR CLLLNPRILI NQ QKAILAK CLSLNPKILI KPGKKVVIAK WIGIGPKVLI NQ.QKAILGR WLSEEMKVIL NQ.QK ...... L...P..L. IIISSEMPEL IMISSEMPEL IVVSSEMPEI ILVSSEMPEV LMISSDMPEV IVISSELPEV IVISSELPEV IMVSSELPEI LFASSDLPEV I..SSE.PE.
450 LDEPTRGIDV LDEPTRGIDI LDEPTRGIDV LDEPTRGVDV LDEPTRGVDV LDEPTRGIDI LDEPTRGIDV LDEPTRGVDV LDEPTRGIDV LDEPTRG.DV
500 LGITDRILVM SNGLVSGIVD LGVTDRILVM SNGKLAGIVE LGITNRIAVM SNYRLAGIVD LGMSDRIIVM HEGHLSGEFT LGMSDRVLVM REGKISAEFS LGLSDRVLVM HEGKLKANLI LGISDRVLVM HQGKLKASLI LGMSDRIIVVHEGRISGEIH LGVADRIVVM REGEIAGELL LG..DR..VM ..G ....... _
r
Mglaescco Mglahaein Mglatrepa Rbsaescco Rbsahaein Xylgescco Xylghaein Rbsabacsu Aragescco Consensus
.:. : c
"
~
.
..
t
Residues listed in the consensus sequence are present in at least 75 % of the aligned transporter sequences. Residues indicated in boldface type are also conserved in at least one other family of the ABC transporter superfamily.
.,.:
Database accession numbers
..
-'7"
-
...
: .:~:.% ; .
.... " : x
501 525 TKTTTQNEIL RLASLHL . . . . . . . . SAKTSQEEIL QLAAKYL ........ TKSTDQEALL RLSARYL ........ R E Q A T Q E V L M A A A V G K L N R V NQE.. REEATQEKLLAAAIGK ......... NHNLTQEQVM EAALRSEHHV EKQSV NTALTQEQVM ETALKE ......... AREATQERIM TLATGGR ........ H E Q A D E R Q A L S L A M P K V S Q A VA... ..... QE . . . . . A . . . . . . . . . . . .
:-:-
Aragescco Mglaescco Mglahaein Mglatrepa Rbsabacsu Rbsaescco Rbsahaein Xylgescco Xylghaein
SWISSPR O T
PIR
EMBL/GENBANK
P08531 P23199 P44884
SO1074 B37277
X06091; G40946 M59444; G 146854 L45461; G 1005849 U45323 Z25798; G397497 M 13169; G 147513 L45143; G 1003888 U00039; G466705 L45746; G 1006414
P36947 P04983 P44735 P37388 P45046
B26304
References 1 Scripture, J.B. et al. (198 7) J. Mol. Biol. 197, 37-46. Buckel, S.D. et al. (1986) J. Biol. Chem. 261, 7659-7662. 3 Woodson, K. and Devine, K.M. (1994) Microbiology 140, 1829-1838.
4 Higgins, C.F. et al. (1992) Annu. Rev. Cell Biol. 8, 67-113. 5 Burland, V.D. et al. (1993) Genomics 16, 551-561.
~.26
I
Binding protein-dependent peptide transporter family Summary
,.:...
k
".i i!~.-:. v ..ii .:... :.,....q.
:...:4--
-:-
.............. ....:.
....
.,,...,.
~....,.. .7.; :
~::i::.:::.:,;i:::~i::21::]:i:: :i:: ;&
The majority of transporters of the binding protein-dependent peptide transporter family, examples of which are the oligopeptide transport ATP binding protein OPP 1, found in both gram-positive and gram-negative bacteria (e.g. Oppdbacsu, Oppdhaein), and the glutamine z and histidine 3 transporters of Escherichia coli (Glnqescco, Hispescco), mediate import of amino acids and small peptides. Some members of this family import mono- or oligosaccharides (e.g. Lackagrra)4, metals (e.g. Nikdescco)s or nitrate (e.g. Nrtsynsp)6. A few mediate antibiotic resistance - Sap transporters mediate export of, and confer resistance to, antimicrobial peptides - and the substrates of a few are unknown r. These transporters are only found in unicellular organisms; most are found in bacteria (both gram-positive and gram-negative), but a few are found in simple eukaryotes. Statistical analysis of multiple amino acid sequence comparisons places the binding protein-dependent peptide transporter family, in the ATP binding cassette (ABC) superfamfly s. Proteins in this superfamily use the energy of ATP hydrolysis to pump substrates across cell membranes. In transporters of this family each ATP binding domain forms one chain, with a single ATP binding motif. Two ATP binding domains and two transmembrane domains form the active complex. The family is characterized by the cytoplasmic ATP binding domains s, which are described in the following tables. The ATP binding domains of each transporter system are associated with two transmembrane subunits. Most transport systems in this family contain two homologous ATP binding domains (e.g. OPPD and OPPF 1); others, such as the histidine transporter, only contain one ATP binding component (HISP 3 in this example)which presumably functions as a homodimer. In most transporters in this family, each domain of the associated transmembrane proteins is predicted to contain six membrane-spanning helices by the hydropathy of their amino acid sequences. In the case of the transmembrane domains OPPB and OPPC from the oligopeptide transport system of Salmonella typhimurium, this has been confirmed by fl-lactamase fusions 9. Two exceptions to the six-transmembrane-helix rule occur in this family. The transmembrane domains of HISQ and HISM, associated with HISP, each have five transmembrane helices lo and the MALK protein of the E. co//maltose/maltodextrin transport ATP binding protein has been shown experimentally to have eight such helices 11. The two N-terminal helices can be deleted without loss of transport function lz. Many residues and short sequence motifs are conserved within the binding protein-dependent peptide transporter family, including motifs unique to the family, signature motifs of the ABC superfamily, and motifs necessary for function by the criterion of site-directed mutagenesis.
Nomenclature, biological sources and substrates CODE
DESCRIPTION [SYNONYMS] Abchaein ATP binding protein [ABC] Abcxantsp ProbableATP dependent transporter [YCF16]
ORGANISM [COMMON NAMES] Haemophilus influenzae [gram-negative bacterium] Antithamnionsp. [red algal
SUBSTRATE(S) [R~SISTANC~]~ Unknown
Unknown
II
CODE ...
..... -.:.
..:.
.....
Abcxcyapa Probable ATP dependent transporter [YCF1] Abcxgalsu Probable ATP dependent transporter [YCF16] Abcxodosi Probable ATP dependent transporter [YCF16] Amiestrpn Oligopeptide transport ATP binding protein [AMIE] Amifstrpn Appdbacsu Appfbacsu
.... -
--
Artpescco Artphaein
...
::
Brafpseae Bztrhoca Cysaescco
-:.
:.... ..... ..
...
:-.-:
DESCRIPTION [SYNONYMS]
Cysasynsp Devaanasp Dppdbacsu
ORGANISM [COMMON NAMES] Cyanophora paradoxa
SUBSTRATE(S) [RESISTANCE] ~
Unknown
[flagellate protozoanl Galdieria sulphuraria
[algal Odontella s i n e n s i s
Unknown Unknown
[diatom] Streptococcus pneumoniae
[gram-positive bacterium] Oligopeptide transport Streptococcus ATP binding protein [AMIE] pneumoniae [gram-positive bacterium] Oligopeptide transport Bacillus subtilis ATP binding protein [APPD] [gram-positive bacterium] Oligopeptide transport Bacillus subtilis ATP binding protein [APPF] [gram-positive bacterium] Arginine transport ATP Escherichia coli binding protein [ARTP] [gram-negative bacterium] Arginine transport ATP Haemophilus influenzae binding protein [ARTP, [gram-negative bacterium] HIll80] High-affinity branchedPseudomonas chain amino acid transport aeruginosa ATP binding protein [BRAF] [gram-negative bacterium] BztABCD transporter ATP Rhodobacter capsulatus binding protein [gram-negative bacterium] Sulfate transport ATP Escherichia coli binding protein [CYSA] [gram-negative bacterium] Sulfate transport ATP Synechococcus sp. binding protein [CYSA] [cyanobacterium] DevA protein Anabaena sp. [alga] Dipeptide transport ATP Bacillus subtilis binding protein [DPPD, [gram-positive bacterium]
Oligopeptides Oligopeptides Oligopeptides Oligopeptides Arginine Arginine Branched amino acids Gln/Glu/Asn/Asp Sulfate, thiosulfate Sulphur-containing compounds Unknown Dipeptides
DCIAD]
i
Dppdescco Dipeptide transport ATP binding protein [DPPD] Dppdhaein Dipeptide transport ATP binding protein [DPPD, HIl1851 Dppfescco Dipeptide transport ATP binding protein [DPPD, Dppfhaein
-,... ...
Drrastrpe
DPPEI
Fecehaein
Ftseescco
~_2~
Dipeptides
[gram-negative bacterium] Haemophilus influenzae
Dipeptides
[gram-negauve bacterium] Escherichia coli
Dipeptides
[gram-negauve bacterium]
Dipeptide transport ATP binding protein [DPPF, HIl184] Daunorubicin resistance ATP binding protein
Haemophilus influenzae
Fe3*-Dicitrate transport ATP binding protein [FECE] Fe3+-Dicitrate transport ATP binding protein [FECE homolog, FECE, HI03611 Cell division ATP binding protein [FTSE]
Escherichia coli
[DRRA]
Feceescco
Escherichia coli
Dipeptides
[gram-negauve bacterium] [Daunorubicin, [gram-negauve bacterium] doxorubicin] Streptomyces peucetius
Fe3*-Citrate
[gram-negauve bacterium] Haemophilus mfluenzae
Fe3*-Citrate
[gram-negauve bacterium] Escherichia coli
[gram-negative bacterium]
Unknown
CODE :ii:iiii!:~!i:':!
Ftsehaein Glnqbacst ~!i!~ii:- ,
Glnqescco
.?-..!:: .....
.
.
:-:3 ~....
Gltlescco
:-i~il}:
Gltlhaein :!i,}!~
Gluacorgl
......
DESCRIPTION [SYNONYMS] Cell division ATP binding protein [FTSE,HI0769] Glutamine transport ATP binding protein [GLNQ]
Glutamine transport ATP binding protein [GLNQ] Glutamate/aspartate transport ATP binding protein [GLTL] Glutamate/aspartate transport ATP binding protein [GLTL,HI1078] Glutamate transport ATP binding protein [GLUA]
?.4,!::i: -::.... ...
Hispescco Hispsalty
::?i} : -~: i'-::
Lackagrra
High-affinity branchedchain amino acid transport ATP binding protein [LW1 protein, [LIVG] High-affinity branchedLivgsalty chain amino acid transport ATP binding protein [LW1 protein G, LIVG] Malkentae Maltose/maltodextrin transport ATP binding protein [MALK] Malkescco Maltose/maltodextrin transport ATP binding protein [MALK] Malksalty Maltose/maltodextrin transport ATP binding protein [MALK] Mbpxmarpo Probable transport protein [MBPX] Mklmycle Possible ribonucleotide transport ATP binding protein [MKL] Modcescco Molybdenum transport ATP binding protein [MODC, CHLD, NARD] Modchaein Molybdenum transport ATP binding protein [MODC, HI1691] Modcrhoca Molybdenum transport ATP binding protein [MODC, MOLDI Moddazovi Molybdenum transport ATP binding protein [MODD]
Livgescco
..
i 4-:.
9~:~.~:...
:. ::,~ -.~. ....
" :.v 7-:! i % ..
"'7i:' ....
5
, .:.:..,~ -
Histidine transport ATP binding protein [HISP] Histidine transport ATP binding protein [HISP] Lactose transport ATP binding protein [LACK]
ORGANISM [COMMON NAMES] Haemophilus influenzae [gram-negative bacterium] Bacillus stearothermophilus [gram-positive bacterium] Escherichia coli [gram-negative bacterium] Escherichia coli [gram-negative bacterium]
SUBSTRATE(S) [RESISTANCE]a Unknown
Glutamine Glutamine Glutamate, aspartate
Haemophilus influenzae Glutamate, [gram-negative bacterium] aspartate Corynebacterium glutamicum [gram-positive bacterium] Escherichia coli [gram-negative bacterium] Salmonella typhimurium [gram-negative bacterium] Agrobacterium radiobacter [gram-positive bacterium] Escherichia coli [gram-negative bacterium]
Glutamate Histidine Histidine Lactose Branched amino acids .
Salmonella typhimurium Branchedamino [gram-negative bacterium] acids Enterobacter aerogenes Maltose, [gram-negative bacterium] maltodextrin Escherichia coli Maltose, [gram-negative bacterium] maltodextrin Salmonella typhimurium Maltose, [gram-negative bacterium] maltodextrin Marchantia polymorpha [liverwort] Ribonucleotides? Mycobacterium leprae [gram-positive bacterium] U
n
k
n
o
w
n
Escherichia c o l i Molybdenum [gram-negative bacterium]
Molybdenum Haemophilus influenzae [gram-negative bacterium] Rhodobacter capsulatus Molybdenum [gram-negative bacterium] Azotobacter vinelandid Molybdenum [gram-negative bacterium]
~.2~
Binding protein-dependent peptide transporter family
CODE
:.:..:..
DESCRIPTION [SYNONYMS]
Msmkstrmu Multiple sugar-binding transport ATP binding protein [MSMK] Nasdklepn Nitrate transport protein
[NASD]
Nikdescco
....
Nickel transport ATP binding protein [NIKD] Nikeescco Nickel transport ATP binding protein [NIKE] Nocpagrtu Nopaline permease ATP binding protein [NOCP] Nrtcsynsp Nrtdsynsp Occpagrtu
Nitrate transport ATP binding protein [NRTC] Nitrate transport ATP binding protein [NRTD] Octopine permease ATP binding protein P [OCCP]
Oppdbacsu Oligopeptide transport ATP binding protein ~ii!7:ii >.il [OPPD, SPOOKD] Oppdhaein Oligopeptide transport ATP binding protein [OPPD, H/l121] Oppdlacla Oligopeptide transport ATP binding protein [OPPD] Oppdmycge Oligopeptide transport ATP binding protein [OPPD, MG079] Oppdsalty Oligopeptide transport ATP binding protein [OPPD] Oppfbacsu Oligopeptide transport ATP binding protein [OPPF, SPOOKEI Oppfhaein Oligopeptide transport ATP binding protein [OPPF, HI1120] Oppflacla Oligopeptide transport i,!ii!:~iil ATP binding protein [OPPF] Oppfsalty Oligopeptide transport ATP binding protein [OPPF] iii~:iiii!!%': Opuabacsu Glycine betaine transport ATP binding protein i~i~i.:!i:il.i)~ii:~ [OPUAA] P29mycge Probable ABC transporter ATP binding protein P29 [MGg.901 Pebccamje Probable ABC transporter ATP binding protein [PEB1C] Phncescco Phosphonates transport ATP binding protein i~?!i!i:i~:!:~:ir [PHNC] Phnkescco Phosphonates transport ATP binding protein [PHNK] ~i~;ii:i:;il;!i~!:~:?-i
::::::::::::::::::: ::::~.:
~_3{
ORGANISM [COMMON NAMES] Streptococcus mutans
SUBSTRATE(S) [RESISTANCEp
Melibiose, [gram-positive bacterium] raffinose, somaltotriose Nitrate Klebsiella pneumoniae [gram-negative bacterium] Nickel Escherichia cold [gram-negative bacterium] Nickel Escherichia cold [gram-negative bacterium] Nopaline Agrobacterium tumefaciens
[gram-negative bacterium] Nitrate Synechococcus sp. [cyanobacterium] Nitrate Synechococcus sp. [cyanobacterium] Octopine Agrobacterium tumefaciens
[gram-positive bacterium] Bacillus subtilis
Oligopeptides
[gram-positive bacterium] Haemophilus mfluenzae
Oligopeptides
[gram-negative bacterium] Lactococcus Iactis
Oligopeptides
[gram-positive bacterium] Mycoplasma genitah'um
Oligopeptides
[gram-negative bacterium] Salmonella typhimurium
Oligopeptides
[gram-negative bacterium] Bacillus subtilis
Oligopeptides
[gram-positive bacterium] Haemophilus influenzae
Oligopeptides
[gram-negative bacterium] Lactococcus lactis
Oligopeptides
[gram-positive bacterium] Salmonella typhimuriurn
Oligopeptides
[gram-negative bacterium] Bacillus subtilis
Glycine betaine
[gram-positive bacterium] Mycoplasma genitah'um
Unknown
[gram-negative bacterium] Campylobacter jejuni
Amino acid?
[gram-negative bacterium] Escherichia cold
Alkylphosphonates
[gram-negative bacterium] Escherichia cold
[gram-negative bacterium]
Alkylphosphonates
CODE
Phnlescco Potaescco Potahaein :,.:!i?:. ..
Provescco Provsalty
:):i:i;,.:,~ ~ ,..
Pstbescco ..>..~....... . .:;::ii: 7:.:
Sapdhaein ).::!i".:::.
-; :.:(~ :.~:: . ..:::- :...:--
Sapdsalty Sapfescco Sapfhaein
..........
Sapfsalty Sfucserma Ugpcescco
DESCRIPTION [SYNONYMS] Phosphonates transport ATP binding protein IPHNLI Spermidine/putrescine transport ATP binding protein [POTA] Spermidine/putrescine transport ATP binding protein [POTA, HI1347] Glycine betaine/L-proline transport ATP binding protein [PROV] Glycine betaine/L-proline transport ATP binding protein [PROV] Phosphate transport ATP binding protein [PSTB, PHOT] Peptide transport system ATP binding protein [SAPD, HI1641] Peptide transport system ATP binding protein [SAPD] Peptide transport system ATP binding protein [SAPF] Peptide transport system ATP binding protein [SAPF, HI16421 Peptide transport system ATP binding protein [SAPF] Fe3+-transport ATP binding protein [SFUC] sn-Glycerol-3-phosphate transport ATP binding protein [UGPC]
SUBSTRATE(S) ORGANISM [RESISTANCE] ~ [COMMON NAMES] Alkylphosphonates Escherichia coli [gram-negauve bacterium]
Spermidine, Escherichia coli [gram-negative bacterium] putrescine Haemophilus influenzae Spermidine, [gram-negauve bacterium] putrescine
Glycine betaine, Escherichia coli [gram-negative bacterium] c-proline Salmonella typhimurium Glycine betaine, [gram-negauve bacterium] h-proline
Phosphate Escherichia coli [gram-negauve bacterium] [Antimicrobial Haemophilus influenzae [gram-negauve bacterium] peptides] Salmonella typhimurium [gram-negauve bacterium] Escherichia coli [gram-negative bacterium] Haemophilus influenzae [gram-negative bacterium]
[Antimicrobial peptides] [Antimicrobial peptides] [Antimicrobial peptides]
Salmonella typhimurium [gram-negative bacterium] Serratia marcescens [gram-negative bacterium] Escherichia coli [gram-negative bacterium]
[Antimicrobial pe~tides] Fe § sn-Glycerol-3phosphate
a Presumed substrates; protein confers resistance to specified compounds.
Phylogenetic tree Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and are therefore not included in the phylogenetic tree: Hispsalty (Hispescco); Livgsalty (Livgescco); Malksalty, Malkentae (Malkesccol; Sapfsalty, Sap~aein (Sapfescco); Provsalty (Provescco).
:31
!
i I'
< ::!.. -:i:-2.-
-.:{
..
-{.
Abcxcyapa Abcxodosi Abcxantsp Abcxgalsu Sapdhaein Sapdsalty Amifstrpn Oppfbacsu Oppfhaein Oppfsalty Dppfescco mppfhaein Appfbacsu Oppfiacla Sapfescco Nikeescco Dppdescco mppdhaein Oppdhaein Oppdsalty mppdbacsu
Oppdbacsu Appdbacsu
.... h i
.
.
-d
.
.
.,
:
. . . :
.
..
.
.
-
..
,:.
:.
-% ....
~_3~
I ,
Amiestrpn Oppdlacla Oppdmycge Phnkescco Nikdescco Modcescco Modchaein Modcrhoca Moddazovi P29mycge Phncescco Brafpseae Livgescco Nasdkiepn Nrtcsynsp Nrtdsynsp Opuabacsu Provescco Artpescco Artphaein Hispescco Nocpagrtu Occpagrtu Gitlescco Gluacorgl Ginqbacst Pebccamje Glnqescco Gitlhaein Ftseescco Ftsehaein ibchaein Lackagrra Maikescco Msmkstrmu Ugpcescco Potaescco Potahaein Cysaescco Cysasynsp Sfucserma Devaanasp Mbpxmarpo Pstbescco Mklmycle Phnlescco Feceescco Fecehaein Drrastrpe Bztrhoca
Physical and genetic characteristics
.
. .
:
k i
.
.-:
k
.
L:
:
.
Abchaein Abcxantsp Abcxcyapa Abcxgalsu Abcxodosi Amiestrpn Amifstrpn Appdbacsu Appfbacsu Artpescco Artphaein Brafpseae Bztrhoca Cysaescco Cysasynsp Devaanasp Dppdbacsu Dppdescco Dppdhaein Dppfescco Dppfhaein Drrastrpe Feceescco Fecehaein Ftseescco Ftsehaein Glnqbacst Glnqescco Gltlescco Gltlhaein Gluacorgl Hispescco Hispsalty Lackagrra Livgescco Livgsalty Malkentae Malkescco Malksalty Mbpxmarpo Mklmycle Modcescco Modchaein Modcrhoca Moddazovi Msmkstrmu Nasdklepn Nikdescco Nikeescco Nocpagrtu Nrtcsynsp Nrtdsynsp Occpagrtu Oppdbacsu Oppdhaein
AMINO ACIDS 345 251 259 257 251 355 308 328 329 242 243 255 263 365 344 244 335 327 330 334 327 330 255 306 222 218 242 240 241 257 242 257 258 363 255 255 294 3 71 369 370 347 352 351 363 380 3 77 261 253 268 257 659 274 262 336 323
MOL. W T
37 877 28 205 28 995 29 072 28 316 39 546 34 743 36 311 37 112 26 844 27 138 28 284 30 012 41 054 38 476 26 723 36 681 35 844 36 328 3 7 560 36 917 35 700 28 190 34 251 24 439 24 349 27 436 26 731 26 661 28894 26 485 28 667 28 771 39324 28 427 28 452 32247 40 974 40 755 42 799 3 7 583 39 144 39 582 38 545 41 573 41 964 28 725 26 503 29 619 28 188 72346 30 365 28 954 3 7 196 35 720
CHROMOSOMAL LOCUS 35.757 ATP synthase operon
ATP synthase operon ami locus ami locus app operon app operon 19.47 minutes 68.119 bra operon cysTWAM gene cluster dciA operon 79.73 minutes 68.309 68.255 drrAB locus 97.15 minutes 45.418 77.57 minutes glnQH operon 18.23 minutes glu gene cluster 52.21 minutes histidine transport operon lac operon 77.38 minutes liv cluster malB region 91.41 minutes malB region cosmid B 1790 17.21 minutes 96.211 msm operon nasFEDCBA operon 77.89 minutes 77.9 minutes Plasmid IYFIC58 Plasmid pTiA6 opp operon 64.847
Z3~
Oppdlacla Oppdmycge ii!~. , Oppdsalty Oppfbacsu Oppfhaein Oppflacla !i. %_IL Oppfsahy Opuabacsu P29mycge Pebccamje Phncescco Phnkescco Phnlescco Potaescco Potahaein Provescco !::2...!ii:.:-i ~:::-!:.L" Provsalty Pstbescco Sapdhaein Sapdsalty Sapfescco Sapfhaein Sapfsalty i!i;i!!i!iiiii::i::~::~! Sfucserma Ugpcescco [.-i
i. 2
AMINO ACIDS 338 402 335 307 332 319 334 418 245 242 262 252 226 3 78 381 400 400 257 349 330 268 269 268 345 356
MOL. WT 3 7 349 45 494 36 864 34 912 36 760 35 976 3 7 214 46 468 28 006 27 262 29 430 27 831 24 705 43 028 43 429 44162 44 210 29 027 39 478 3 7 611 30 570 30 294 30 671 36 692 39 523
CHR O M O S O M A L LOCUS
opp operon 64.793 opp operon opp operon upstream of amyE
93.11 minutes 92.98 minutes 92.96 minutes 25.46 minutes 77.723 60.38 minutes proU locus 84.1 minutes
29.05 minutes 93.27 Plasmid pEG6162 sfu region 77.27 minutes
Multiple amino acid sequence alignments Abcx r Abcxodosi Abcxantsp Abcxgalsu Sapdhaein Sapdsalty ~ifstrpn Oppfbacsu Oppfhaein
Oppfsalty Dppfescco Dppfhaein Appfbacsu Oppflacla Sapfescco Nikeescco Dppdescco Dppdhaein Oppdhaein
!!i;iiii!ii!!!!i }i:;!i.:::i;i!:!ii!:!! '~:~:i
:[iiiiii11711172
?:fs
:~
~_34
Oppdsalty Dppdbacsu Oppdbacsu Appdbacsu Amiestrpn
1 ..................... ..................... ....................... ....................... .......................... .......................... ........................ ....................
50 MSTEKTKIL EVKNLKAQV . . . . . . . . . . . MNTN. YPIL EIKNLKACI . . . . . . . . . . . MNNRILL NIKNLDVTI . . . . . . . . . . . MKHKSLL QIKNLHVKL . . . . . . . . . . . MALL DICNLNIEIQ T . . . . . . . . . MPLL DIRNLTIEFK T . . . . . . . . . MSEKLV EIKDLEISFG . . . . . . . . . MNELTEKLL EIKHLKQHFV . . . . . . . . M TVSNNKELLL
.................. MN AVIEQRKVLL ................. MST QEATLQQPLL .................. MT NEVKENTPLL .................... MTAANQETIL ......................... MSEIL ..................... MIETLL .......................... MTLL .......................... MALL .......................... MALL ......................... MNPLL
............ MSLSETAT QAPQPANVLL ......................... MEKVL ...................... MIRVTRLL ......................... MSTLL ..................... MTKEKNVIL
EVNHLGVSFK
EIADLKVHFD QAIDLKKHYP NAIGLKKYYP ELRDVKKYFP NLKDLKVYYP EVRNLSKTFR NISGLSHHYA NVDKLSVHFG DVKELSVHFG DVKNLYVRFK
EVNDLRVTFA SVQNLHVSFT EVKDLAISFK EVNNLKTYFF TARDIVVEFD
iKNDKSLFFA
E T
IKEGKQWFWQ VK.. KGMFA. VK.. KGLFA. IR.. SGLFQR IR.. SGFFNR Y R T G W .... F .... H G G F N G D ......... D ......... T .........
T T T R V
......... ......... ......... ......... .........
Oppdlacla
MESENIL
........
Nikdescco
.........................
Modchaein
.................................
Moddazovi
..........
Phnkescco
Modcescco
iiiiii~i: iil,~l ~~i!
.......................
Oppdmycge
Modcrhoca
MA LKRSNFFVDK
DQQLKDNLIL
........................
MNTSFEPGTK
GETGALSGGL
Brafpseae
........................
Nasdklepn
.........................
Nrtdsynsp
.............
Nrtcsynsp
MENKPIL
.........................
MQTII
MSRPIL
........................
......................... MTAILPS
MKPLI
MSVFL
TAATVNTGFL
MSVDEKPIKI
Artpescco
...........................
Hispescco
........................
Provescco
Artphaein
Nocpagrtu
..... MAIKL
KVEKVSKIFG
MSQPLL
Opuabacsu
EIKNLYKIFG
......................
.....................
Gluacorgl
SVNNLTHLY
ELRNIAL
I .........
V .........
...........
.............
MLELNFS
Q ........
MISARFS
G ........
PADGIRARFR
SVNGLMMRF
...........
QVQGVTSVLA
P ........
P
HFDCVGKTFP
T ........
P
EQGLSKEQIL
E ........
AVDHVHQVFD
L ........
...........
VAEDVHKNF
...........
...........
Pebccamje
............................
MI ELKNVNKYY
...........
Gltlhaein
..........................
Glnqescco
............................
Ftseescco
............................
Abchaein
............................
Malkescco
..........................
Ugpcescco
..........................
Ftsehaein
Lackagrra Msmkstrmu Potaescco
NLNHIYKKY
.....................
MASV
MAGL
KQPSSLSPLV
QVSQVSKQF
Mbpxmarpo
...........................
Mklmycle
..............
Devaanasp
Pstbescco
....................... ...................
MAAIGG
MRQEAVI
DGRMPMGVAI
Phnlescco
............................
Fecehaein
..................................
Feceescco Drrastrpe
..................................
Abcxcyapa
DGTE..
Abcxodosi
G
........... LH ...... QQ
M TLRTENLTVS
..................................................
51
N
...........
MNTQPT
..............................................
Consensus
K ........
...........
QVRNLNFYY
Bztrhoca
G
..........
EVKGLTKSF
MDSFST
G
..........
AIKSLNHYYG
MI NVQNVSKTFI
D
G
..........
ELHGIGKSY
.......................................
D
..........
MSI LIYKVSKSL
M SMVETAPSKI
P
G
..........
MSI EIANIKKSF
MSTL
.......... ..........
MPKDKAVGI
..........................
G
KLQAVTKSW
QLAGIRKCF
P
..........
..........
ELRSIKKSY
Sfucserma
L ........
QLQNVTKAW
ENQLQNKPII
...........................
..........
MI KLNNIXKIFE
MVEL
Cysasynsp
..........
MI KFSNVSKAYH
..........................
MKLGRKPKLV
...........
MI RFEHVSKAYL
RLTDIRKSY
..........
...........
KVSNIQKNF
MAEV
MGQSKKLN
...........
MI EFKNVSKHF
..........................
............
...........
MI YFHQVNKYY
............................
Potahaein
Cysaescco
MI TLKNVSKWY
MSML
K
...........
NVIDLHKRY
MI KMTGVQKYF
............................
A
...........
............................
Glnqbacst
P
K ........
...........
............................
R
...........
QLKDIRKNF
Gltlescco
MPNPVRPAV
D
...........
MAI RVKNLNFFY
MDATQPTL
V ........
EVSGLTMRF
MSI QLNGINCFY
MSENKL
Q
...........
RVEKLAKTF
T
K ........
SFEKVSIIY
KQTKKAVQMLANGKTKKEIL
EHPQRAFKYI
...........................
Occpagrtu
DITDLHVNFK
MLQINVK
................................. .......................
Livgescco
MPQQI
.................................
P29mycge
Phncescco
MNQPLL
EAKQVSVAFR
SIWVNDVTVR
RAIETSGLVK
MSDT
i00 ILKG
NENE..ILKD
VNLTINSGEI
LNLKIHKGEI
HAIMGPN..
HAIMGPN..G
G SGKSTFSKIL
SGKSTFSKVL
AG...HPAYQ AG...HPAYN
~_3~
...
.
-
.::
: : . .
.
.
~
...
.
..
.
.
..:
,. .
.
. . . . .
:i
~_3t
Abcxantsp Abcxgalsu Sapdhaein Sapdsalty Amifstrpn Oppfbacsu Oppfhaein Oppfsalty Dppfescco Dppfhaein Appfbacsu Oppflacla Sapfescco Nikeescco Dppdescco Dppdhaein Oppdhaein Oppdsalty Dppdbacsu Oppdbacsu Appdbacsu Amiestrpn Oppdlacla Oppdmycge Phnkescco Nikdescco
GETQ..ILNS LNLSIKPGEI HAIMGKN..G SGKSTLAKVI AG...HPSYK ANTEEYILNG INLNVNQGEI HAIMGPN..G SGKSTLSKVI AG...HSLYK SNGRIKIVDG VNLSLNEGEI SGLVGES..G SGKSLIAKVI CNAIKEN.WI SEGWVKAVDR VSMTLSEGEI RGLVGES..G SGKSLIAKAI CGVAKDN.WR G S K K F V A V K N A N F F I N K G E T F S L V G E S . . G S G K T T I G R A I IGLNDT .... P R G T V K A V D D L S F D I Y K G E T L G L V G E S . . G C G K S T T G R S I I R L Y E A .... K P Q T L K A V K D V S F K L Y A G E T L G V V G E S . . G C G K S T L A R A I I G L V E A .... P P K T L K A V D G V T L R L Y E G E T L G V V G E S . . G C G K S T F A R A I I G L V K A .... P E R L V K A L D G V S F N L E R G K T L A V V G E S . . G C G K S T L G R L L T M I E M P .... K P Q Q V K A L D G V S F Q L E R G K T L A V V G E S . . G C G K S T L G R L L T M I E E P .... K V G D V K A V D G V S F S L K K G E T L G I V G E S . . G C G K S T A G R T M IRLYKP .... V T D N V L A V D G V D L T I H E G E T V G L V G E S . . G S G K S T I G K T I V G L E Q M .... R R Q T V E A V K P L S F T L R E G Q T L A I I G E N . . G S G K S T L A K M L A G M I E P .... K H Q H Q A V L N N V S L T L K S G E T V A L L G G T . . G C G K S T L A R L L V G L E S P .... ESAPFRAVDR ISYSVKQGEV VGIVGES..G SGKSVSSLAI MGLID.YPGR KKTPFKAVDR ISYQVAQGEV LGIVGES..G SGKSVSSLAI MGLID.HPGR P D G W T A V N D L N F T L N A G S T L G I V G E S . . G S G K S Q T A F A L MGLLAANGE. P D G D V T A V N D L N F T L R A G E T L G I V G E S . . G S G K S Q T A F A L MGLLATNGR. YGGTVQAVRG VSFDLYKGET FAIVGES..G CGKSVTSQSI MGLLPPYSAK YGGEVQAIRG VNFHLDKGET LAIVGES..G SGKSVTSQAI MKLIPMPPGY KKEPIPAVDG VDFHISKGET VALVGES..G SGKSITSLSI MGLVQSSGGK RDKVLTAIRG VSLELVEGEV LALVGES..G SGKSVLTKTF RGMLEENG.R AGKFQKAIYD IDLSLKRGEV LAIVGES..G SGKSTFATAV MGLHNPNQTQ KDGILHAVRG IDLKVERGSI VGIVGES..G SGKSVSVKSI IGFNDNAQTK ..APGKGFSD V S F D L W P G E V L G I V G E S . . G S G K T T L L K S I SARLTPQQ.. .QAAQPLVHG V S L T L Q R G R V L A L V G G S . . G S G K S L T C A A T L G I L P A . G V R Modcescco L G N H C L . . . T I N E T L P A N G I T A I F G V S . . G A G K T S L I N A I SGL ..... TR M o d c h a e i n L G Q L A L . . . Q A N I Q V P D Q G V T A I F G L S . . G S G K T S L I N L V SGL ..... IQ M o d c r h o c a Q G D F T L . . . D A A F D V P G Q G V T A L F G P S . . G C G K T T V L R C M AGL ..... TR M o d d a z o v i Y A G F A L . . . D V D L T L P G H G V T A L F G H S . . G S G K T T L L R C V AGL ..... ER P29mycge ..KKAPLLQNISFKVMAKENVCLLGKS..GVGKSSLL .... N S V T . . N T K Phncescco ..NQHQALHA VDLNIHHGEM VALLGPS..G SGKSTLLRHL SGLIT..GDK B r a f p s e a e ..GGLLAVNG V N L K V E E K Q V V S M I G P N . . G A G K T T V F N C L TGFY ..... Q L i v g e s c c o ..GGLLAVNN V N L E L Y P Q E I V S L I G P N . . G A G K T T V F N C L TGFY ..... K N a s d k l e p n ..AQFLALQN V S F D I Y E G E T I S L I G H S . . G C G K S T L L N L I AGI ..... AL N r t c s y n s p G G G Q Y I A L K D V S L N I R P G E F I S L I G H S . . G C G K S T L L N L I AGL ..... AQ N r t d s y n s p . R G P Y V A I E D V N L S V Q Q G E F I C V I G H S . . G C G K S T L L N L V SGF ..... SQ O p u a b a c s u T . G S T V G V N Q A D F E V Y D G E I F V I M G L S . . G S G K S T L V R M L NRL ..... IE P r o v e s c c o T G L S L G V K D A S L A I E E G E I FVIMGLS G SGKSTMVRLLNRL IE A r t p e s c c o ..GAHQALFD I T L D C P Q G E T L V L L G P S . . G A G K S S L L R V L NLL ..... EM A r t p h a e i n ..GSSQTLFD I N L E A E E G D T V V L L G P S . . G A G K S T L I R T L NLL ..... EV H i s p e s c c o ..GEHEVLKG V S L Q A N A G D V I S I I G S S . . G S G K S T F L R C I NFL ..... EK N o c p a g r t u ..GTLEILKG I S L T A N K G D V V S I I G S S . . G S G K S T F L R C M NFL ..... ET O c c p a g r t u ..GNLEVLHG V S L S A N E G E V I S I L G S S . . G S G K S T L L R C V NML ..... EV G l t l e s c c o ..GHFQVLTD C S T E V K K G E V V V V C G P S . . G S G K S T L I K T V NGL ..... EP G l u a c o r g l ..GDFHALTD I D L E I P R G Q V V V V L G P S . . G S G K S T L C R T I NRL ..... ET G l n q b a c s t ..GDFHVLKD I N L T I H Q G E V V V I I G P S . . G S G K S T L V R C I NRL ..... ET P e b c c a m j e ..GTHHVLKI F N L S V K E G E K L V I I G P S . . G S G K S T T I R C M NGL ..... EE G l n q e s c c o ..GPTQVLHN I D L N I A Q G E V V V I I G P S . . G S G K S T L L R C I NKL ..... EE G l t l h a e i n ..NGNHVLKG I D F E I N K G E V V A I L G P S . . G S G K T T F L R C L NLL ..... ER F t s e e s c c o G . G R Q . A L Q G V T F H M Q P G E M A F L T G H S . . G A G K S T L L K L I CGI ..... ER F t s e h a e i n G . A T Q P A L Q G L N F H L P V G S M T Y L V G H S . . G A G K S T L L K L I MGM ..... EK A b c h a e i n X . K K L T A L D N V S L N I E K G Q I C G V I G A S . . G A G K S T L I R C V N L L ..... EK L a c k a g r r a S . L E . . V I K G V N L E V S S G E F V V F V G P S . . G C G K S T L L R M I AGL ..... ED
Malkescco Msmkstrmu Ugpcescco Potaescco Potahaein
I
: I /
ii:/i~ i
...
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
E . V V . . V S K D I N L D I H E G E F V V F V G P S . . G C G K S T L L R M I A G L ..... ET N . S S H Y S V E D F D L D I K N K E F I V F V G P S . . G C G K S T T L R M V A G L ..... ED G . K T Q . V I K P L T L D V A D G E F I V M V G P S . . G C G K S T L L R M V A G L ..... ER G . K E . . V I P Q L D L T I N N G E F L T L L G P S . . G C G K T T V L R L I A G L ..... ET S . N T . . I I N D F N L T I N N G E F V T I L G P S . . G C G K T T V L R L L A G L ..... EE Cysaescco R . T Q . . V L N D I S L D I P S G Q M V A L L G P S . . G S G K T T L L R I I A G L ..... EH C y s a s y n s p S . F Q . . A V K D V D L T V E T G S L V A L L G P S . . G S G K S T L L R L I A G L ..... EQ S f u c s e r m a A . I R . . V L E H I D L Q V A A G S R T A I V G P S . . G S G K T T L L R I I AGF ..... EI D e v a a n a s p A . L K R Q I L F D I N L E I Y P G E I V I M T G P S . . G S G K T T L L S L I GGL ..... RS M b p x m a r p o . . G N L K I L D R V S L Y V P K F S L I A L L G P S . . G S G K S S L L R I I A G L ..... DN Pstbescco ..GKFHALKN INLDIAKNQV TAFIGPS..G CGKSTLLRTF NKMFELYPEQ Mklmycle ..GSSRIWEDVTLDIPAGEVSVLLGPS..GTGKSVFLKSL IGLL ..... R Phnlescco NGVRLPVLNR ASLTVNAGECVVLHGHS..G S G K S T L L R S L YA ........ F e c e e s c c o Y G T D . K V L N D V S L S L P T G K I T A L I G P N . . G C G K S T L L N C F SRLL ..... M F e c e h a e i n Y N N G H T A I H N M T F S L N S G T I C A L V G V N . . G S G K S T L F K S I M G L V ..... K D r r a s t r p e V Y N G T R A V D G L D L N V P A G L V Y G I L G P N . . G A G K S T T I R M L A T L L ..... R Bztrhoca SFVRTEMLAP RPAPVSQVGA IKWMRENLFS GPLNTALTVF GLLATVWLVQ A b c x c y a p a V T G G E I L F K N KNLL . . . . . . . . . . . . . . . E L E P E E R A R A G V F L A F Q Y P I E A b c x o d o s i I L S G D I L F K G SSIL . . . . . . . . . . . . . . . N L D P E E R S H M G I F L A F Q Y P I E Abcxantsp ITNGQILFEN QDVT ............... E IEPEDRSHLG IFLAFQYPVE AbcxgalsuVVKGEIIFQD QNLL ............... N YTIEDRANLG IFLAFQYPLE $ a p d h a e i n I T A D R F R F H D V E L L K L . . . . . . . . . . SPNK R R K L V G K E I S M I F Q N P L S C L S a p d s a l t y V T A D R M R F D D I D L L R L . . . . . . . . . . SSRE R R K L V G H N V S M I F Q E P Q S C L A m i f s t r p n . S N G D I I F D G Q K I N G K . . . . . . . . . . KSRE Q A A E L I R R I Q M I F Q D P A A S L O p p f b a c s u . T D G E V L F N G E N V H G R . . . . . . . . . . KSRK K L L E F N R K M Q M I F Q D P Y A S L O p p f h a e i n . S E G E I L W L G K H L R K Q . . . . . . . . . . SAKQ W K E T . R K D I Q M I F Q D P L A S L O p p f s a l t y . T D G K V A W L G K D L L G M . . . . . . . . . . KADE W R E V . R S D I Q M I F Q D P L A S L Dppfescco .TGGELYYQG QDLLK ........... HDPQ AQKLRRQKIQ IVFQNPYGSL D p p f h a e i n . T K G E L Y Y K G H N F L E . . . . . . . . . . . NDSE T K A L R R K K I Q I V F Q N P Y A S L A p p f b a c s u . T E G Q I L F K G Q D I S N L . . . . . . . . . . SEEK L R K S V R K N I Q M V F Q D P F A S L O p p f l a c l a . T S G Q L I Y K G Q D V S K K . . . . . . . . . . KIRN Q L K . Y N K D V Q M I F Q D A F S S L S a p f e s c c o . T S G E L L I D D H P L H F G . . . . . . . . . . DY$ .... F R S Q R I R M I F Q D P S T S L Nikeescco .AQGNISWRG EPLAKL .......... N.RA QAKAFRRDIQ MVFQDSISAV D p p d e s c c o V M A E K L E F N G Q D L Q R I . . . . . . . . . . SEKE R R N L V G A E V A M I F Q D P M T S L Dppdhaein VSAESLQFEN TDLLTL .......... ESKA KRQLIGADVA MIFQDPMTSL O p p d h a e i n . V E G S A I F E G K E L V N L . . . . . . . . . . PNAE L N K I R A E Q I S M I F Q D P M T S L O p p d s a l t y . I G G S A T F N G R E I L N L . . . . . . . . . . PERE L N T L R A E Q I S M I F Q D P M T S L D p p d b a c s u V T D G R I L F K N K D L C R L . . . . . . . . . . SDKE M R G I R G A D I S M I F Q D P M T A L O p p d b a c s u F K R G E I L F E G K D L V P L . . . . . . . . . . SEKE M Q N V R G K E I G M I F Q D P M T S L Appdbacsu IMDGSIKLED KDLTSF .......... TEND YCKIRGNEVS MIFQEPMTSL A m i e s t r p n I A E G S I D Y R G Q D L T A L S . . . . . . . . . SHKD W E Q I R G A K I A T I F Q D P M T S L O p p d l a c l a I T . G S I L L D D E E V I G . K . . . . . . . . . TGDS M A S I R G S K V G M I F Q N P L T A L O p p d m y c g e . . A K L M N F K N V D I T K L K . . . . . . . . . KH.Q W K Y Y R G T Y V S Y I S Q D P L F S L P h n k e s c c o . . . G E I H Y E N R S L Y A M . . . . . . . . . . SEAD R R R L L R T E W G V V H Q H P L D G L Nikdescco QTAGEILADG KPVSP ................ CALRGIKIA TIMQNPRSAF Modcescco P Q K G R I V L N G R V L N D . . . . . . . . . . A E K G I C L T P E K R R V G Y V F Q D . . A R L Modchaein PDEGFICLND RTLVD .......... MESQE SLPTHLRKIG YVFQD..ARL M o d c r h o c a L P G G H L V V N G V T W Q E . . . . . . . . . . GRQ.. I T P P H R R A V G Y V F Q E . . A S L ModdazoviAAEARLEING ELWQD .......... SAAGV FLPTHRRALG YVFQE..ASL P29mycge I V K S G L V Y F D G V A S N K .... K E Y K K L . . . . . . . . . KKQCS Y L D Q I . . P N L Phncescco S V G S H I E L L G R T V Q R E G R L A R D I R K S . . . . . . . . . R A H T G Y I F Q Q . . F N L Brafpseae PTGGLIRLDG EEIQGLPG ............. HKIARKGVVRTFQN..VRL Livgescco PTGGTILLRD QHLEGLPG ............. QQIARMGVVRTFQH..VRL C o n s e n s u s ...G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FQ ..... L
:37
Binding protein-dependent peptide transporter family
~i[:i~,i-!?::ii..
23~
Nasdklepn Nrtcsynsp Nrtdsynsp Opuabacsu Provescco Artpescco Artphaein Hispescco Nocpagrtu Occpagrtu Gltlescco Gluacorgl Glnqbacst Pebccamje Glnqescco Gltlhaein Ftseescco Ftsehaein Abchaein Lackagrra Malkescco Msmkstrmu Ugpcescco Potaescco Potahaein Cysaescco Cysasynsp Sfucserma Devaanasp Mbpxmarpo Pstbescco Mklmycle Phnlescco Feceescco Fecehaein Drrastrpe Bztrhoca Consensus
i01 150 PTEGGLLCDN REIAG ................... PGPERAWFQN..HSL PSSGGIILEG RQVTE ................... PGPDRMWFQN..YSL PTSGGVYLDG QPIQE ................... PGPDRMWFQN..YSL PTAGNIYIDG DMIT .......... NMSKDQ LREVRRKKIS MVFQK..FAL PTRGQVLIDG VDIA .......... KISDAE LREVRRKKIA MVFQS..FAL P R S G T L N I A G N H F D F T K T . P S ...... D K A I R D L R R . N V G M V F Q Q . . Y N L P K S G E L S I A N N E F N L S N A M A N ...... P K A I Q Q L R Q . D V G M V F Q Q . . Y H L PSEGSIVVNG QTINLVRDKD GQLKVADKNQ LRLLRT.RLT MVFQH..FNL PNKGRIAVGQ EEWVKTDAAGRLIGVDRKK IERMRM.QLG MVFQS..FNL PNAGSVAIMG EEIALEHRAG RLARPKDLKQVNRLRE.RAAMVFQG..FNL V Q Q G E I T V D G IVVN ..... D K ...... KTD L A K L R S . R V G M V F Q H . . F E L I E E G T I E I D G KVLP ..... E E ...... G K G L A N L R A . D V G M V F Q S . . F N L I S S G E L I V D N V K V N ..... D K ...... HID I N Q L R R . N I G M V F Q H . . F N L V S S G E V V V N N L V L N ..... H K ...... N.K I E I C R K . Y C A M V F Q H . . F N L I T S G D L I V D G L K V N ..... D P ...... KVD E R L I R Q . E A G M V F Q Q . . F Y L P E Q G I L E F T D G S L K ..... I D F S Q K I S K A D E L K L R R . R S S M V F Q Q . . Y N L PSAGKIWFSG HDIT .......... RLKNRE VPFLRR.QIG MIFQD..HHL ANAGQIWFNG HDIT .......... RLSKYE IPFLRR.QIG MVHQD..YRL PTSGSVIVDG VELT .......... KLSDRE LVLARR.QIG MIFQH..FNL I S S G E L T I G G T V M N . . . . . . . . . . . . D .... V D P S K R G I A M V F Q T . . Y A L I T S G D L F I G E KRMN . . . . . . . . . . . . D .... T P P A E R G V G M V F Q S . . Y A L I T K G E L K I D G E V V N . . . . . . . . . . . . D .... K A P K D R D I A M V F Q N . . Y A L V T E G D I W I N D Q R V T . . . . . . . . . . . . E .... M E P K D R G I A M V F Q N . . Y A L V D S G R I M L D N E D I T . . . . . . . . . . . . H .... V P A E N R Y V N T V F Q S . . Y A L L D S G S I I L D G E D I T . . . . . . . . . . . . N .... V P A E K R H I N T V F Q S . . Y A L Q T S G H I R F H G TDVS . . . . . . . . . . . . R .... L H A R D R K V G F V F Q H . . Y A L PDSGRIFLTG RDAT ............ N .... ESVRDRQIG FVFQH..YAL PDGGQILLQG QAMG ............ NGSGWVPAHLRGIG FVPQD..GAL V Q E G N L Q F L G VELS . . . . . . . . . . G A S Q N K L V Q I R . R S I G Y I F Q A . . H N L CDYGNIWLHG IDVTN ................ ISTQYRRMS FVFQH..YAL R A E G E I L L D G DNI . . . . . . . . . . . . L T N S Q D I A L L R A K V G M V F Q K . . P T P P E R G S I L I D G T D I I E C S A K E LYEI . . . . . . . . . . . R T L F G V L F Q D . . G A L ..... N Y L P D E G Q I Q I K H G D E W V D L V T A P A R K V V E I R K T T V G W V S Q F L R V PQSGTVFLGD NPINMLSS ............. RQLARRLSL LPQHHLTPE. P Q Q G E I K L C D LPI .... S . . . . . . . . . . . . . Q A L K R N L V A Y V P Q S E E V D W PDGGTARVFG HDVT..SE ............. PDTVRRRIS VTGQYASVDE AAAPWLLHGVWNANSLTECR AIIAERWGPE ATGACWAVIR VRWNQFLFGF ...G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FQ ..... L
Abcxcyapa Abcxodosi Abcxantsp Abcxgalsu Sapdhaein Sapdsalty Amifstrpn Oppfbacsu Oppfhaein Oppfsalty Dppfescco Dppfhaein Appfbacsu
151 IAGVSNIDFL IPGVSNEDFL IPGVTNADFL IAGVNNIDFL DPSRKIGKQL DPSERVGRQL NERATVDYII NPRMTVADII NPRMNIGEII NPRMTIGEII NPRKKVGQIL NPRKKIGSIL NPRKTLRSII
200 RLAYNNRRKE E ............................. RLAYNSKQKF L ............................. RIAYNAKRAF D ............................. RLAYNSKLKF N ............................. IQNIPNWTFK NKWWKWFGW ..................... MQNIPAWTYK GRWWQRLGW ..................... SEGLYN .................................. AEGLDI .................................. AEPLKI .................................. AEPLRT .................................. EEPLLI .................................. EEPLII .................................. KEPFNT ..................................
Oppflacla Sapfescco Nikeescco Dppdescco Dppdhaein Oppdhaein Oppdsalty Dppdbacsu Oppdbacsu Appdbacsu Amiestrpn Oppdlacla Oppdmycge Phnkescco N ikd e s c c o
Modcescco
: u:
:
: :7
4
Modchaein Modcrhoca Moddazovi P29mycge Phncescco Br a f p s e a e Livgescco Nasdklepn Nrtcsynsp Nrtdsynsp Opuabacsu provescco Artpescco Artphaein Hispescco Nocpagrtu Occpagrtu Gltlescco Gluacorgl Glnqbacst Pebccamje
Glnqescco
/
G l t l h a e in Ftseescco Ftsehaein Abchaein Lackagrra Malkescco Msmkstrmu Ugpcescco Potaescco Potahaein Cysaescco Cysasynsp Sfucserma Devaanasp Mbpxmarpo Pstbescco
NPRKTIYDII AEPIRN .................................. NPRQRISQIL DFPLRL .................................. NPRKTVREIL REPMR ................................... NPCYTVGFQI MEAIKV .................................. NPAYTVGFQI MEALKT .................................. NPYMKIGEQL MEVLQL .................................. NPYMRVGEQL MEVLML .................................. NPTLTVGDQL GEALLR NPTMKVGKQI TEVLFK .................................. NPVLTIGEQI TEVLIY .................................. DPIKTIGSQI TEVIVK .................................. NPLMKIGQQI KEMLAV NPTMTIGKQV KEAIYVASKR RYFQAKSDLK FALSNKEIDK KTYKSKLKEI RRQVSAGGNI GERLMATGAR HY ............................ N P L H T M H T H A RE T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FPHYKVRGNL RYGMS ................................... FPHYTVKGNL RYGMK ................................... FTHLSVRENL VYGLR ................................... FPHLSVRRNL EYGMK ................................... IDTDYVYEAI LRSAKQKLTWL ....... QKLI .................. V N R L S V L E N V L I G A L G S T P F W . . . . . . . R T CF . . . . . . . . . . . . . . . . . . F K E M T A V E N L L V A Q H R H L N T N F L A G L F K T P AF . . . . . . . . . . . . . . . . . . F R E M T V I E N L L V A Q H Q Q L K T G L F S G L L K T P SF . . . . . . . . . . . . . . . . . . LPWLTCFDNV ALAVDQVFR. R ............................. LPWRTVRQNI ALAVDSVL.. H ............................. LPWKSARDNI ALAVKAA.R. P ............................. F P H R T I L E N T E Y G .... L E L Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M P H M T V L D N T A F G .... M E L A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CAHLTVQQNL IEA...PCRV L ............................. WPHLTVIENL IEA...PMKV R ............................. WSHMTVLENV MEA...PIQV L ............................. WGHMTVLQNV MEG...PLHV L ............................. WSHQTILQNV MEA...PVHV Q ............................. FPHLSIIENL TLA...QVKV L ............................. FPHLTIKDNV TLA...PIKV R ............................. YPHMTVLQNI TLA... PMKV L ............................. YPHMTVLQNL TLA...PMKL Q ............................. FPHLTALENV MFG...PLRV R ............................. FPHRSALENV MEG... MVVV Q ............................. L M D R T V Y D N V AI .... P L I I A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . L T D R T V V E N V A L .... P L I I A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LSSRTVFENVAL .... P L E L E . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Y P H M T V R E N M G F A L R F .... A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Y P H L S V A E N M S F G L K P .... A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Y P H M S V Y D N M A F G L K L .... R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Y P H M S V E E N M A W G L K I .... R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F P H M T V F E N V A F G L R M .... Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F P H M T I F E N V A F G L R M .... Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FRHMTVFDNI AFGLTVLPRR E ............................. F K H L T V R K N I A F G L E L .... R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FPHFTVAGNI GFGLK ................................... LGFLTARQNV QMAVELNEHI .............................. F K H M T V Y E N I S F G .... L R L R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FP.MSIYDNI AFGVRLFEKL S .............................
!3c~
Binding protein-dependent peptide transporter family :::
: :
: >
::L: ;::: i+ .....
. ,:.....
i!:::i](U):."-
Mklmycle Phnlescco Feceescco Fecehaein Drrastrpe Bztrhoca Consensus
Abcxcyapa Abcxodosi Abcxantsp Abcxgalsu Sapdhaein Sapdsalty Amifstrpn i !i~,!~F:}i!i;.!: Oppfbacsu Oppfhaein Oppfsalty Dppfescco Dppfhaein Appfbacsu Oppflacla Sapfescco Nikeescco Dppdescco Dppdhaein Oppdhaein ;',~!~ii~i ::{i:!~i!i !:: Oppdsalty Dppdbacsu Oppdbacsu Appdbacsu Amiestrpn Oppdlacla Oppdmycge Phnkescco Nikdescco Modcescco Modchaein Modcrhoca Moddazovi P29mycge Phncescco Brafpseae Livgescco Nasdklepn Nrtcsynsp Nrtdsynsp Opuabacsu i)%.!:~i! Provescco Artpescco Artphaein Hispescco Nocpagrtu :::i:[I>i:F:
========================
.::..., .....
>.
;i::::::::::::::::::::::::::::::
.... :::::::::::::::::::::
:,:< :: ....... ~A:~:o:::):):
.......
. ._....
!.i;;i :;>i:;:i :)..L ::::::::::::::::::::::
~:::L)~.?)k:-.; ....
i i2}~!ii..:. ::. :i.i .::]?.ii. ................
.......
, ....,>. ........ i:;i-;::::. L:(I. Li
11.1-i:.i! i:4)'.]::
:~,:.(.:::~:;:,.: :::.-
...... ..............
:.:~:::: ~:.;-~,..~:.:, >.>>.: ..>.. ..... .............
,.>...: ............ ....... +::, .... ................ .........
..........
.>~.....~>~:.:.. :..
:..:., ::::.>-,. .............. : .......
i~:i:~i:ii!ii!:~iii]i:! :~::~ ::::>:-::,.:~::: .m.
......... ,.~,:.;..~:.:~.:~:,.:~s.,
:.~i!!!)!i=.ii:.ii:.i::::! ;::
.~ ....................... ............>, .............
:.,..:::..::,.,<.:. :. :..:..:
FGSMNLYDNT AFPLREHTK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I P R I S A L E V V M Q P L .................................... ..GITVQELV SYGRNPWLSL W . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Q F P V S V Y D V V M M G R Y G Y M N F L ............................. GLTGT.ENLV MMGR . . . . . L Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . YPVDQYWRLF VTFAGLFLAL APVLFDALPR KLIWGTLLYP
LAAFWLL...
201 ...GLTELDP LTFYSIVKEK LNVVKMD.PH FLNRNVNEGF ...NKDEVDP ISFFTIINEK LKLVDMS.PV FLSRNVNEGF ...NKEELDP LSFFSFIENK ISNIDLN.ST FLSRNVNEGF ...QRNPVDP LKFLEIVYPK LKLVGLD.ES FLYRKVNEGF ............ KKRRAIEL LHRVGIKDHR DIMASYPNEL ............ RKRRAIEL LHRVGIKDHK EPMRSFPYEL ...HRLFKDE EERKEKVQSI IREVGLL..A EHLTRYPHEF ...HKLAKTK KERMQRVHEL LETVGLN..K EHANRYPHEF ...YQPHLSA AEVKEKVQAM MLKVGLL..P NLINRYPHEF ...YHPKLSR QDVRDRVKAM MLKVGLL..P NLINRYPHEF .... NTSLSK EQRREKALSM MAKVGLK..T EHYDRYPHMF NTKLSA KERREKVLSM MEKVGLR A EFYDRYPHMF .... HNMYTM RERNEKVEEL LARVGLH..P SFAGRYPHEF ...FEKIDAN TENK.RIHEL LDIVGLP..K QALEQYPFQF .... NTDLEP EQRRKQIIET MRMVGLL..P DHVSYYPHML ...HLLSLKK SEQLARARQM LKAVDLD..D SVLDKRPPQL .... HQGGNK STRRQRAIDL LNQVGIPDPA SRLDVYPHQL .... HEGGTK KARKDRTLEL LKLVGIPDPE SRIDVYPHQL .... HKGYDK QTAFAESVKM LDAVKMPEAK KRMGMYPHEF .... HKGMSK AEAFEESVRM LDAVKMPEAR KRMKMYPHEF .... HKKMSK KAARKEVLSM LSLVGIPDPG ERLKQYPHQF .... HEKISK EAAKKRAVEL LELVGIPMPE KRVNQFPHEF HKNMKK KEARQRAVEL LQMVGFSRAE QIMKEYPHRL .... HQGKTA KEAKELAIDY MNKVGIPDAD RRFNEYPFQY .... HDVYPE NQYESRIFQL LEQVGIPNPK RVVNQFPHQL KQTYQQKIKP INVEKKTLEI LQFIGINDAK KRLKAFPSEF .......... GDIRATAQKW LEEVEIP..A NRIDDLPTTF .... CLALGK P A D D A T L T A A I E A V G L E N A A R V L K L Y P F E M ..KSMV D .... QFDKL VALLGIE..P .LLDRLPGSL ...... NVSQ D .... DFNYI VDLLGIT..H .LLKRYPLTL ...... RARG PLR.ISEAEV TQLLGID..P .LLRRPTATL ...... RVDA ASRQVSWERV LELLGIG..H .LLERLPGRL ...... CFEP KWIKDKILAI LKEVNLN..D YVSCII.KDL ...... SWFT GEQKQRALQA LTRVGMV..H FAHQRV.STL ...... RRSE REAMEYAAHW LEEVNLT..E FA.NRSAGTL ...... RRAQ SEALDRAATW LERIGLL..E HA.NRQASNL ...... SMSK GERKEWIEHN LERVQMG..H .ALHKRPGEI ...... DRNR TERRTIIEET IDLVGLR..A .AADKYPHEI ...... HLST SEQRQVVDHH LELVGLT..E .AQHKRPDQL ...... GVDK QERQQKALES LKLVGLE..G FE.HQYPDQL ...... GINA EERREKALDA LRQVGLE..N YA.HSYPDEL ...... GLSK DQALASAEKL LERLRLK..P YS.DRYPLHL ...... GVSE NEAKTDAMEL LKRLRLE..Q LA.DRFPLHL GLSK QEARERAVKY LAKVGID E RAQGKYPVHL KQPK GEVRDRAMDF LDKVGIA N K.HAAYPSQL
250 SGGEKKRNEI SGGEKKRNEI SGGEKKKNEI SGGEKKKNEI TEGEGQKVMI TDGECQKVMI SGGQRQRIGI SGGQRQRIGI SGGQCQRIGI SGGQCQRIGI SGGQRQRIAI SGGQRQRIAI SGGQRQRIGI SGGQQQRIGI APGQKQRLGL SGGQLQRVCL SGGMSQRVMI SGGMSQRVMI SGGMRQRVMI SGGMRQRVMI SGGMRQRIVI SGGMRQRVVI SGGMRQRVMI SGGMRQRIVI SGGMRQRVMI SGGMRQRIVI SGGMQQRLQI SGGMLQRMMI SGGEKQRVAI SGGEKQRVAI SGGERQRVAI SGGERQRVGI SAGQKQRVEI SGGQQQRVAI AYGQQRRLEI AYGDQRRLEI SGGMKQRVGI SGGMKQRVAI SGGMKQRVAI SGGMQQRVGL SGGMRQRVGL SGGQQQRVAI SGGQQQRVAI SGGQQQRVSI SGGQQQRVSI
..................................................
. . . .
}:i: iii:::i::
.............. .....:.
....
Occpagrtu Gltlescco Gluacorgl Glnqbacst Pebccamje Glnqescco Gltlhaein Ftseescco Ftsehaein Abchaein Lackagrra Malkescco Msmkstrmu Ugpcescco Potaescco Potahaein Cysaescco Cysasynsp Sfucserma Devaanasp Mbpxmarpo Pstbescco Mklmycle Phnlescco Feceescco Fecehaein Drrastrpe Bztrhoca Consensus
...... GRDR KACRDEAEAL LERVGIA..S K.RDAYPSEL ...... KRDK APAREKALKL LERVGLS..A HA.NKFPAQL ...... KMKK SEAEKLAMSL LERVGIA..N QA.DKYPAQL ...... RIPE KEAKETAMYY LEKVGIP..D KA.NAYPSEL ...... KKSK KEAEETAFKY L K W G L L . . D KA.NVYPATL ...... GANK EEAEKLAREL LAKVGLA..E RA.HHYPSEL KQDK AQAREKALSL LEKVGLK N KA DLFPSQL ...... GASG DDIRRRVSAALDKVGLL..D KA.KNFPIQL ...... GMHP KDANTRAMAS LDRVGLR..N KA.HYLPPQI ...... SESK AKIQEKITAL LDLVGLS..E KR.DAYPSNL ...... GMAK DEIERRVNAAAKILELD... ALMDRKPKAL ...... GAKK EVINQRVNQV AEVLQLA... HLLDRKPKAL ...... HYSK EAIDKRVKEA AQILGLT... EFLERKPADL ...... GMGK QQIAERVKEA ARILELD... GLLKRRPREL ...... KTPA AEITPRVMEA LRMVQLE... TFAQRKPHQL ...... KVPN EEIKPRVLEA LRMVQLE... EMADRKPTQL ...... R P N A A A I K A K V T K L LEMVQLA... HLADRYPAHV ...... KHTK EKVRARVEEL LELVQLT... GLGDRYPSQL ....... GGK REKQRRIEAL MEMVALD..R RLAALWPHEL ........ SQ EEAIAKAEAM LTAVGLE..N RV.DYYPENL ...... GFSA QKITNKVNDL LNCLRIA..D ISFE.YPAQL ...... RADM DERVQWALTKAALWNET..K DKLHQSGYSL ....... KKE SEIRDIVMEK LQLVGLG..G DE.KKFPGEI ...LDTGVPR EACAAKAARL LTRLNVP..E RLWHLAPSTF ...... GRLS AEDNARVNVA MNQTRI...N HLAVRRLTEL ...... RIPK AIDKQKVQEA MQRVNI...E HLAHRQIGEL ...... GYSW ARARERAAEL IDGFGL...G DARDRLLKTY ............... WGGPI WGPVSVLAGF AILGLLFTAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P...
Abcxcyapa Abcxodosi Abcxantsp Abcxgalsu Sapdhaein Sapdsalty Amifstrpn Oppfbacsu Oppfhaein Oppfsalty Dppfescco Dppfhaein Appfbacsu Oppflacla Sapfescco Nikeescco Dppdescco Dppdhaein Oppdhaein Oppdsalty Dppdbacsu Oppdbacsu Appdbacsu
251 LQMALLNPSL LQMILLDSEL LQMSLLNSKL LQMSLLDAKL AMAVANQPRL AIALANQPRL ARALVMQPDF ARALAVDPEF ARALIIEPKM ARALILEPKL ARGLMLDPDV ARGLMLDPDV ARALTLNPEL ARAVATNPKL ARALILRPKV ARALAVEPKL AMAIACRPKL AMAIACRPKL AMALLCRPKL AMALLCRPKL AMALICEPDI AMALAANPKL AIALSCNPKL
!j~.~!:~:!i !i;i" . . . .
l:ii~!,:ij:-i!il ~
u
i....
.......
. . .
...
!: .-,
-:i
"L
AILDETDSGL SILDETDSGL AILDETDSGL AILDETDSGL LIADEPTNAL LIADEPTNSM VIADEPISAL IIADEPISAL IICDEPVSAL IICDDAVSAL VIADEPVSAL VVADEPVSAL IIADEPVSAL IVADEPVSAL IIADEALASL LILDEAVSNL LIADEPTTAL LIADEPTTAL LIADEPTTAL LIADEPTTAL LIADEPTTAL LIADEPTTAL LIADEPTTAL
SGGQQQRAAI SGGQQQRVAI SGGQQQRVAI SGGQQQRVAI SGGQQQRVAI SGGQQQRVAI SGGQQQRVGI SGGEQQRVGI SGGEQQRVDI SGGQKQRVAI SGGQRQRVAI SGGQRQRVAI SGGQRQRVAM SGGQRQRVAM SGGQQQRVAI SGGQQQRIAI SGGQKQRVAL SGGQRQRVAL SGGQQQRVAL SGGQKQRVAI SGGQKQRVAL SGGQQQRLCI SGGMRNVPGL SGGEQQRVNI SGGQRQRAFL SGGQKKRVFL SGGMRRRLDI APKLGVPVSA SGG..QR..I
300 DIDALRIVAE GVNQLSNK...ENSIILITH DIDALKIISK GINTFMNQ...NKAIILITH DIDALKTIAK QINSLKTQ...ENSIILITH DIDALQDISN AIKSILKMSQ FKQSIIIITH ESTTALQVFR .L.LSSMNQN QGTTILLTSN EPTTQAQIFR .L.LTRLNQN SNTTILLISH DVSVRAQVLN .L.LKKFQKE LGLTYLFIAH DVSIQAQVVN.L.MKELQKE KGLTYLFIAH DVSIQAQVVN.L.LKSLQKE MGLSLIFIAH DVSIQAQVVN.L.LQQLQRE MGLSLIFIAH DVSVRAQVLN .L.MMDLQQE LGLSYVFISH DVSVRAQVLN .L.MMDLQDE LGLSYVFISH DVSIQPQVIN .L.MEELQEE FNLTYLFISH DLSVQAQVLN .F.MKLIQKD LGIAFLFISH DMSMRSQLIN .L.MLELQEK QGISYIYVTQ DLVLQAGVIR .L.LKKLQQQ FGTACLFITH DVTIQAQIIE .L.LLELQQK ENMALVLITH DVTIQAQIME .L.LLELQKK ECMSLILITH DVTVQAQIMT .L.LNELKRE FNTAIIMITH DVTVQAQIMT .L.LNELKRE FNTAIIMITH DVTIQAQILE .L.FKEIQRK TDVSVILITH DVTIQAQILE .L.MKDLQKK IDTSIIFITH DVTIQAQVLE .L.MKDLCQK FNTSILLITH
Z41
Binding protein-dependent peptide transporter family
,;s::
,
Amiestrpn AIALACRPDV LICDEPTTAL Oppdlacla AIAIANDPDL IIADEPTTAL Oppdmycge AIAVATEPDL IIADEPTTAL Phnkescco ARNLVTHPKL VFMDEPTGGL Nikdescco AMAVLCESPF IIADEPTTDL Modcescco GRALLTAPEL LLLDEPLASL Modchaein GRALLTDPDI LLMDEPLSAL Modcrhoca GRALLSQPEL LLMDEPLSAL Moddazovi ARALLTSPRL LLMDEPLAAL P29mycge AKLFFKSPKL LLVDEPTTGL Phncescco ARALMQQAKV ILADEPIASL Brafpseae ARCMMTRPRI LMLDEPAAGL Livgescco ARCMVTQPEI LMLDEPAAGL Nasdklepn ARALAMKPKV LLLDEPFGAL Nrtcsynsp ARGLAIRPKL LLLDEPFGAL Nrtdsynsp ARALSIRPEV LILDEPFGAL Opuabacsu ARALTNDPDI LLMDEAFSAL Provescco ARALAINPDI LLMDEAFSAL Artpescco ARALMMEPQV LLFDEPTAAL Artphaein ARALMMKPQV LLFDEPTAAL Hispescco ARALAMEPEV LLFDEPTSAL Nocpagrtu ARALAMQPSA LLFDEPTSAL Occpagrtu ARALAMRPDV MLFDEPTSAL Gltlescco ARALCMDPIA MLFDEPTSAL Gluacorgl ACALAMNPKI MLFDEPTSAL Glnqbacst ARGLAMKPKI MLFDEPTSAL Pebccamje ARSLCTKKPY ILFDEPTSAL Glnqescco ARALAVKPKM MLFDEPTSAL Gltlhaein ARALAVKPDI ILLDEPTSAL Ftseescco ARAVVNKPAV LLADEPTGNL Ftsehaein ARAIVHKPQL LLADEPTGNL Abchaein ARALASDPKV LLCDEATSAL Lackagrra GRAIVRQPDV FLFDEPLSNL Malkescco GRTLVAEPSV FLLDEPLSNL Msmkstrmu GRAIVRDAKV FLMDEPLSNL Ugpcescco GRAIVRDPAV FLFDEPLSNL Potaescco ARAVVNKPRL LLLDESLSAL Potahaein ARAVVNKPKV LLLDESLSAL Cysaescco ARALAVEPQI LLLDEPFGAL Cysasynsp ARALAVQPQV LLLDEPFGAL Sfucserma ARALSQQPRL MLLDEPFSAL Devaanasp ARALVNNPPL VLADEPTAAL Mbpxmarpo ARSLAIQPDF LLLDEPFGAL Pstbescco ARGIAIRPEV LLLDEPCSAL Mklmycle ARALVLDPQI ILCDEPDSGL Phnlescco ARGFIVDYPI LLLDEPTASL Feceescco AMVLAQNTPV VLLDEPTTYL Fecehaein ARALAQQSPI ILLDEPFTGV DrrastrpeAASIVVTPDL LFLDEPTTGL Bztrhoca G I G L W A A L F WLY..AAAPI Consensus A ...... P ..... DEP...L
i!i)il. i! i-!:i'.iiii.;
:ih; :
.... ::.
i.5~.( ......
.. i .:..
.~:
.
. . . . .. . ~ ;.,
:..~ 9
.... . . . . . . . .. ....
DVTIQAQIID .L.LKSLQNE YHFTTIFITH DVTIQAQILD .L.ILEIQKK KNAGVILITH DVTIQAKVLT .L.IKQLRDL LNITIIFISH DVSVQARLLD .L.LRGLVVE LNLAVVIVTH D W A Q G G A S S .ICWKHYAKQ CGNAA..GDH DIPRKRELLP Y.LQRLTRE INIPMLYVSH DVPRKRELMQ Y.LERLSKE INIPILYVTH DRISRDEILP Y.LERLHAS LQMPVILVSH DLKRKNEILP Y.LERLHDE LDIPMLFVSH DPLTASKIMD .L.ITDFVKR EKITLVFVTH DPESARIVMD .T.LRDINQN DGITVVVTLH NPKETDDLKA .L.IAKLRSE HNVTVLLIEH NPKETKELDE .L.IAELRNH HNTTILLIEH DALTRAHLQD .A.VMQIQQS LNTTIVMITH DALTRGNLQE Q LMRICQE AGVTAVMVTH DAITKEELQE E LLNIWEE ARPTVLMITH DPLIRKDMQD E LLDLHDN VGKTIIFITH DPLIRTEMQD E LVKLQAK HQRTIVFISH DPEITAQIVS .I.IRELAET N.ITQVIVTH DPEITAQVVD .I.IKELQET G.ITQVIVTH DPELVGELLR .I.MQQLAEE GK.TMVVVTH DPELVGEVLK .V.IRKLAEE GR.TMVVVTH DPELVGEVLK .V.MRDLAAE GR.TMLIVTH DPEMINEVLD .V.MVELANE G.MTMMVVTH DPEMVNEVLD V MASLAKE G MTMVCVTH DPETIGEVLD .V.MKQLAKE G.MTMVVVTH DPETIQEVLD .V.MKEISHQ SNTTMVVVTH DPELRHEVLK .V.MQDLAEE G.MTMVIVTH DPELVGEVLQ .T.LKMLAQE GW.TMIIVTH DDALSEGILR .L.FEEFNR. VGVTVLMATH DDELSLGIFN .L.FEEFNR. LGMTVLIATH DPATTQSILK .L.LKEINRT LGITILLITH DAELRVHMRV .E.IARLHKE LNATIVYVTH DAALRVQMRI E ISRLHKR LGRTMIYVTH DAKLRVSMRA .E.IAKIHRR IGATTIYVTH DAKLRVQMRL .E.LQQLHRR LKTTSLYVTH DYKLRKQMQN .E.LKALQRK LGITFVFVTH DYKLRKQMQQ .E.LKMLQRQ LGITFIFVTH DAQVRKELRR .W.LRQLHEE LKFTSVFVTH DAKVRKDLRS .W.LRKLHDE VHVTTVFVTH DTGLRAATRK .A.VAELLTE AKVASILVTH DKQSGRDVVE .I.MQRLAKD QGTSILLVTH DGELRRHLSK .W.LKRYLQD NKITTIMVTH DPISTGRIEE .L.ITELKQD ..YTVVIVTH DPVRTAYLSQ .L.IMDINAQ IDATILIVTH D A K N S A A W E ...LIREAKT RGAAIVGIFH DINHQVDLMR ...LMGELRT QGKTVVAVLH DVKTENAIVD ...LLQQLRE EGHLILVSTH DPRSRNQVWD ...IVRALVD AGTTVLLTTQ EAALQSALPL ALPEVDSDQF GGFLLALVIG D ............................ H
301 350 Abcxcyapa YQRLLDYIVP DYIHVMQNGR ILKTGGAELA KELEIKGYDW LNELEMVKK.
Z4~
k-}
i-. "
..
:.i p ...p:.
-
,, }.~
i.~i- i
l.J.
....::::f:
ii%i:~ .
.~ ...
:.%
i.-
--:-%.
i .iiik
~
....
..~.
....
....
......
S}.pl
~
....
...
"6
.,.
%9ii~.
?~% . .
...
ii.!-: ::~..
i. '--"2
Abcxodosi YQRLLDYVQP Abcxantsp YQRLLDYIKP Abcxgalsu YQRILNYIQP Sapdhaein DIKSISEW.C Sapdsalty DLQMLSQW.A Amifstrpn DLSVVRFI.S Oppfbacsu DLSMVKYI.S Oppfhaein DLAWKHI.S Oppfsalty DLAVVKHI S Dppfescco DLSWEHI A Dppfhaein DLSVVEHI.A Appfbacsu DLSVVRHI S Oppflacla DLGVVRHM.T Sapfescco HIGMMKHI.S Nikeescco DLRLVERF.C Dppdescco DLALVAEA.A Dppdhaein DLALVAEA A Oppdhaein DLGVVAGI.C Oppdsalty DLGVVAGI.C Dppdbacsu DLGVVAQV.A Oppdbacsu DLGVVANV.A Appdbacsu DLGVVSEA.A Amiestrpn DLGVVASI.A Oppdlacla DLGVVAEV.A Oppdmycge NISLIANF.C Phnkescco DLGVARLL.A Nikdescco DMGVVARL.A Modcescco SLDEILHL.A Modchaein SLDELLRL.A Modcrhoca DLSEVERL.A Moddazovi LPDEVARL.A P29mycge DIDLALKY.S Phncescco QVDYALRY.C Brafpseae DMKLVMSI.S Livgescco DMKLVMGI.S Nasdklepn DVDEAVLL.S Nrtcsynsp DVDEALLL.S Nrtdsynsp DIDEALFL.A Opuabacsu DLDEALRI.G Provescco DLDEAMRI.G Artpescco EVEVARKT.A Artphaein EVNVAQKV.A Hispescco EMGFARHV.S Nocpagrtu EMGFARDV.S Occpagrtu EMDFARDV.S Gltlescco EMGFARKV.A Gluacorgl EMGFARKA.A Glnqbacst EMGFAREV.A Pebccamje EMGFAKEV.A Glnqescco EIGFAEKV.A Gltlhaein EMQFAKDV.A Ftseescr DINLISRR.S Ftsehaein DINLIQQK.P Abchaein EMEVVKQI.C
N Y V H V M Q N G K I I K T G T A D L A K E L E S K G Y E W LK ........ D Y I H V M Q K G E I I Y T G G S D T A M K L E K Y G Y D Y LNK ....... D Y I H V M Y K G K I I K T G D A S L A N Q L E S Q G Y E W LASE ...... D Q I S V L Y C G Q N ........ T E . . . S A P T E I L . I E S P H H P Y D K I N V L Y C G Q T ........ V E . . . T A P S K D L . V T M P H H P Y D R I A V I Y K G V I ........ V E . . . V A E T E E L . F N N P I H P Y D R I G V M Y F G K L ........ V E . . . L A P A D E L . Y E N P L H P Y D R V L V M Y L G N A ........ M E . . . L G S D V E V . Y N D T K H P Y DRVLVMYLGH A V E LGTYDE V Y H N P L H P Y DEVMVMYLGR C V E KGTKDQ I F N N P R H P Y D E V M V M Y L G R C ........ I E . . . K G T T E Q I . F S N P Q H P Y DRVGVMYLGK M M E LTGKHE L Y D N P L H P Y D N I A V M H N G R I ........ V E . . . K G T R R D I . F D E P Q H I Y D Q V L V M H Q G E V ........ V E . . . R G S T A D V . L A S P L H E L Q R V M V M D N G Q I ........ V E T Q V V G E K L T F . S S D A G R V L H K I I V M Y A G Q V ........ V E . . . T G D A H A I . F H A P R H P Y ERIIVMYAGQ I V E EGTAKD I F R E P K H P Y D Q V M V M Y A G R T ........ M E . . . Y G T A E Q I . F Y H P T H P Y D K V L V M Y A G R T ........ M E . . . Y G K A R D V . F Y Q P V H P Y D R V A V M Y A G K M ........ A E . . . I G T R K D I . F Y Q P Q H P Y D R V A V M Y A G Q I ........ V E . . . T G T V D E I . F Y D P R H P Y D R V I V M Y C G Q V ........ V E . . . N A T V D D L . F L E P L H P Y D K V A V M Y A G E I ........ V E . . . Y G T V E E V . F Y D P R H P Y D T V A V M Y A G Q L ........ V E . . . K T S V E E L . F Q N P K H P Y D F V Y V M Y A G K I ........ V E . . . Q G L V E E I . F T N P L H P Y D R L L V M K Q G Q V ........ V E . . . S G L T D R V . L D D P H H P Y D D V A V M S D G K I ........ V E . . . Q G D V E T L . F N A P K H T V DRVMVLENGQ VKAFGALEEV WGSSVMNPWL P.KEQQSSIL DRVVLMENGI VKAYDRVEKIWNSPIFAPWK G.ESEQSSVL DTLVLMEAGR VRAAGPIAAM QADPNL.PLI H.RPDLAAVI DHVVLLDQGR VTAQGSLQDI MARLDL.PTA F.HEDAGVVI TRIIALKNH ........... ALVLDRLTEK L.TKEQLYKI ERIVALRQG ........... HVFYDGSSQQ F.DNERFDHL DHIVVINQGA P ........... LADGTPEQ I.RDNP..DV DRIYVVNQGT P ........... LANGTPEQ I.RNNP..DV DRVLMMTNGPAATVGEILDV NLPRPANRVQ L.ADDSRYHH D R V V M L T N G P A A Q I G Q I L E V DFPRPRQRLE M.METPHYYD DRVVMMTNGPAATIGEVLEI PFDRPREREA V.VEDPRYAQ D R I V L M K D . . . . . . . . . . G. N I V Q I G T P E E I . L M N P S N E Y D R I A I M Q N . . . . . . . . . . G. E W Q V G T P D E I . L N N P A N D Y S R V V Y M E N . . . . . . . . . . G. H I V E Q G D A S C . . F T E P Q T E A T K W Y M E Q . . . . . . . . . . G. K I V E M G S A D C ..FENPKTEQ T H V I F L H Q . . . . . . . . . . G. K I E E E G A P E Q L . F G N P Q S P R S K V L F L E K . . . . . . . . . . G. Q I E E Q G T P Q E V . F Q N P T S P R S R T V F L H Q . . . . . . . . . . G. V I A E E G P S S E M . F A H P R T D R N R V I F M D E . . . . . . . . . . G. K I V E D S P K D A F . F D D P K S D R D R V L F M A D . . . . . . . . . . G. L I V E D T E P D S F . F T N P K S D R D R I V F M D Q . . . . . . . . . . G. R I L E E A P P E E F . F S N P K E E R D R I I F M E D . . . . . . . . . . G. A I V E E N I P S E F . F S N P K T E R S R L I F I D K . . . . . . . . . . G. R I A E D G N P Q V L . I K N P P S Q R D R V I L M A D . . . . . . . . . . G. H I V E Q N T A D K F . F T C P Q H E R Y R M L T L S D . . . . . . . . . . G. H L H G G V G H E . . . . . . . . . . . K P C L V L E Q . . . . . . . . . . G. YLRY . . . . . . . . . . . . . . . . D Q V A V I D Q . . . . . . . . . . G. R L V E Q G T V G E I . F A N P K T E L
~_4~
.... :.:
...~
..:..........:
......... .....
::.::,..~-.,:-.~:,. ~.~ .........~... .: ........... :......
....
...:.... .........
--: :::.::..- .~ - :.. ::
-:-.: . . . . . : : :: .... :.:.-... .... :.. . ..,:..:.
....
:~.:...:. : .:.-:~
. ...:....
..:
......
Lackagrra Malkescco Msmkstrmu Ugpcescco Potaescco Potahaein Cysaescco Cysasynsp Sfucserma Devaanasp Mbpxmarpo Pstbescco Mklmycle Phnlescco Feceescco Fecehaein Drrastrpe Bztrhoca Consensus
DQVEAMTL.A DQVEAMTL~ DQTEAMTL.A DQVEAMTL.A DQEEALTM S DQEEAITM.S DQEEATEV.A DQEEAMEV.A DQSEALSF.A D.NRILDI.A DQKEAISM.A NMQQAARC.S NVNIARTV.P DEAVRNDV.A DLNQASRY,C NLGSVPDF.C YLDEADQL.A VTAIVVSLPL ..........
Sapdhaein Sapdsalty Amifstrpn Oppfbacsu Oppfhaein Oppfsalty Dppfescco Dppfhaein Appfbacsu Oppflacla Sapfescco Nikeescco Dppdescco Dppdhaein Oppdhaein Oppdsalty Dppdbacsu Oppdbacsu Appdbacsu Amiestrpn Oppdlacla Oppdmycge Phnkescco Nikdescco Modcescco Modchaein Modcrhoca Moddazovi P29mycge Phncescco Brafpseae Livgescco Nasdklepn
351 400 TQALINA.VP DFTQPLGFKT KLGTLEGTAP ILEQMPI.GC RLGPRCPFAQ TQALIRA.IP DFGSAMPHKS RLNTLPGAIP LLEQLPI.GC RLGPRCPYAQ TQALLSA.VP IPDPILERKK VLKVYEGSQH .................... TKSLLSA.IP LPDPDYERNR C.SEYDPSVH .................... TKALMSA.VP IPDPKLERNK SIELLEGDLP SPINPP.SGC VFRTRCLKAD TKALMSA.VP IPDPDLERNK KIQLLEGELP SPINPP.SGC VFRTRCPIAG TQALLSA.TP ..RLNPDDRR ERIKLSGELP SPLNPP.PGC AFNARCRRRF TKALLSA.TP ..RLSPNLRR ERIKLTGELP SPINPP.KGC AFNPRCWKAT TQALLSS.VP VTRKRGSVKR ERIVLKGELP SPANPP.KGC VFHTRCPVAK TKRLLSA.IP SIDVTRRAEN RKNRLKVEQD FEDKKA.NFY DKDGHALPLK TKRLIAG.HF GEALTADAWR KDR . . . . . . . . . . . . . . . . . . . . . . . . . . . QNAVLPA.FP VRRRTTEKV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TQALLRA.LP ..EFA.QDKE RLASLPGVVP GKYDRP.NGC LLNPRCPYAT TQALLRS.LP ..EFA.EGKS RLESLQGVVP GKYDRP.TGC LLNPRCPYAT SIGLMDA.IP ..RLDGNE.E HLVTIPGNPP NLLHLP.KGC PFSPRCQFAT SIGLLNA.VP ..RLDSEG.A EMLTIPGNPP NLLRLP.KGC PFQPRCPHAM TKGLLGS.VP ..RLDLNG.A ELTPIDGTPP DLFSPP.PGC PFAARCPNRM TWGLLAS.MP ..TLESSGEE ELTAIPGTPP DLTNPP.KGD AFALRSSYAM TEGLLTS.IP ..VID.GEID KLNAIKGSVP TPDNLP.PGC RFAPRCPKAM TWSLLSS.LP ..QLA.DDKG DLYSIPGTPP SLYTDL.KGD AFALRSDYAM TRSLLRS.NP ..S.AETVSD DLYVIPGSVP SLSKIEYDKD LFLARVPWMK TWALISS IP E QKDKNK PLTSIPGVIP NMLTPP.KGD AFASRNQYAL TQLLVSS.VL ..QN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TRTLVSAHLA LYGMDLAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . KVTVLEHHSA LRDDRLALGD QHLWVNKLDE PLQAALRIRI QASDVSLVLQ ALPVHLHNPP YKMTALSLGE QVLWIHQVPA NVGERVRVCI YSSDVSITLQ E G W I A L D P A YGLSTLQVPG GRIVVPGNLG PIGARRRLRV PATDVSLGRH ESVVAEHDDH YHLTRLAFPG GAVLVARRPE APGQRLRLRV HARDVSLANS YDN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . YRSINRVEEN AKAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IKAYLGEA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IRAYLGEA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LRQQILHFLY E ..... KQPK AA . . . . . . . . . . . . . . . . . . . . . . . . . . . .
i !:!i;ii}:!;:~:::i~(~ ...........
.. ...... . . . . . : :?-:::::: . . . . . :.
!!.;i;..:j i~.: . .:.;;:-..: ~. : :.? :.. ............ . . . . .
: ~..~.~:;.. ...... :::~~:,~:~?., .:.:~::.~:: :~:.:~. ,:.:~........:;~
. . . .
,.;: 9 .:j:.:: ..:?
:........
....
........ : : . ...:.:: . . . ..::..~: ........ ;:...:-::. : - :,.. . . . ..... . . . .
:....:. -, .:~:............ .: ...~..:.~ :~.u~::-. -:..:..:.. ;.~: ......... ........ ... .....::. ........ ....... :~--:::.:~:.:.: ...-.:~;: :.....,:~ ..:~.:. ....... ......
::.:.:::... .
.
;:"!2; :.....
...
.
....
:: ... :i.
':::;. :..... L. ..........
... . . . . . . .... . . . . . . . . . . . . . ....
~. . . . . . . . .,:..
...... 9 .
~ :....:~-:.:. ~. ~::: ..;.:: .., :. ~.
....
:9..,.~::.~: ... ..... ........ ,...:...::~ .: .. ....... ..... . . .
!44
DKIVVMRG .......... G. IVEQVGAPLA L.YDDPDNMF D K I W L D A .......... G. RVAQVGKPLE L.YHYPADRF DRIVIMSSTK NEDGSGTIG. RVEQVGTPQE L.YNRPANKF QRVMVMNG .......... G. VAEQIGTPVE V.YEKPASLF DRIVVMRD G RIEQDGTPRE I YEEPKNLF DRIVLLRK .......... G. KIAQDGSPRE I,YEDPANLF DRVVVMSQ .......... G. NIEQADAPDQ V.WREPATRF DQIVVMNH .......... G. KVEQIGSPAE I.YDNPATPF DQVAVMRS .......... G. RLAQVGAPQD L.YLRPVDEP DRIVEMEDG . . . . . . . . . . . . ILARDSQTA I.VSYDSGAW DEIVILKE .......... G. RLLQQGKPKN L.YDQPINFF DHTAFM ........... YLG ELIEFSNTDD L.FTKPAKKQ DNMGMLFRKH LVMFGPREVL LTSDEPVVRQ F~ DRLHPMGASS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DQLVVMANGH VMAQGTPE . . . . . . . . . . . . . . . . . . . . . . D Q . W M I N R T VIAAGKTE . . . . . . . . . . . . . . . . DTFNQH DRIAVIDHGR VIAEGTTGEL KSSLGSNVLR LRLHDAQSRA GILLALGRQS DMLIVKSLSV GIIEFVRGVP LITLLFTASL D .......................................
Nrtcsynsp Nrtdsynsp Opuabacsu iiiiii~ii!!!i ~ Provescco Artpescco Artphaein Hispescco Nocpagrtu Occpagrtu Gltlescco Gluacorgl Glnqbacst Pebccamje
ili!iiii
Glnqescco
Gltlhaein Abchaein Lackagrra Malkescco Msmkstrmu Ugpcescco Potaescco Potahaein Cysaescco Cysasynsp Sfucserma Devaanasp Mbpxmarpo Pstbescco Mklmycle Feceescco Fecehaein Drrastrpe Bztrhoca Consensus
LRNELINFLQ QQRRAKRRAKAAAPAPAVAASQQKTVRLGF LPGNDCAPLA LRTEALDFLY RRFAHDDD ................................ V E K F V E D V D L S K V L T A G H I M K R A E T V R I D K .... G P R V A L T L M K N L G I S S VRTFFRGVDI SQVFSAKDIA RRTPNGLIRK TPGFGPRSAL KLLQDEDREY FKNYLSH ........................................... FKHYLSH ........................................... LQRFLKGSLK ........................................ CRAFLSSVL ......................................... F R Q F L R R D G G TSH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A K D F L . . A K I LH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A K D F L . . G K I LAH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A K V F L . . S R I LNH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A R L F L . . G K I LKN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LQEFL..QHV S ....................................... TKQFLLQAKI PLELDYYI ................................ AQEFIRSTFH ISLPDEYLEN LTDTPKHSKA YPIIKFEFTG RSVDAPLLSQ VAGFIGSPRM NFLPAVVIGQ A.E...GGQV TVALKARPDT QLTVACATPP VAGFIGSPKM NFLPVKVTAT AID...QVQV ELPMPNRQQV WLPVES.RDV V A G F I G S P A M N F F D . . V T I K D G H L V S K D G L T I A V T E G Q L K MLESKGFK.. V A S F I G S P A M N L L T G R V N N E G T H F E L D G G I E L P L N G G ...... YRQYA.. V A G F I G . . E I N M F N A T V I E R LDE ....... Q R V R A N V E G R E C N I Y V N F A V V A R F I G . . E I N V F E A T V I E R KSE ....... Q V V L A N V E G R I C D I Y T D M P V V L E F M G . . E V N R L Q G T I R G G QFH ....... V G A H R W P L G . . . . . . . . YTP V M S F I G . . P V N V L P N S . . S H IFQ ....... A G G L D T P . . . . . . . . . . . . . T A S F L G E T L V . . . . . . . . . . . . . . . LTAEL A H G W A D C A L G R I A V D . . . D R NETP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V G I F L G .... L L I E I P K L N E S I T L K N I P S K T P Q N L K K F A F D P I W V K I F A N TEDYITGRYG ........................................ IGMSEEKDES TMAEEAALLE AGHYAGGAEE VEGVPPQITV TPGMPKRKAV ..EVMTPGLL . . . . . . . . . . . . . . . . . . . . R T V F S V E A E I H P E P V S G R P M NLEIVFGGVL .................... RHIKLLGENL HNDE.DKRSV EAERLLSAEL GVTIHRDSDP TALSARIDDP RQGMRALAEL SRTHLEVRSF LLQYFLPPGT NFDLILRVVI LVTLFAAAYI AEVIRGGLAALPRGQYEAAD ..................................................
401 Sapdhaein KKCM.EKPRR Sapdsalty RECI.ITPRL Amifstrpn DY.ETDKPSM Oppfbacsu QLKDGETMEF Oppfhaein ENCAKQKPPF Oppfsalty PECAQTRPVL Dppfescco GPCTQLQPQL Dppfhaein EKCRENQPHL Appfbacsu PICKEQIPEF Oppflacla KLSESHWAAL Dppdescco DRCRAEEP.A Dppdhaein EYCRQVEP.Q Oppdhaein EQC.QIAP.K Oppdsalty EIC.NNAP.P DppdbacsuVVCDRVYP.G Oppdbacsu KIDFEQEP.P Appdbacsu DKCWTNQP.S Amiestrpn QIDFEQKA.P
450 L K I K Q H E F S C H Y P I N L R E K N F K E K T T A T P F ILNCKGNE.. TGAKNHLYAC HFPLNMERE ..................... V E I R P G H Y V W A N Q A E L A R Y Q KGLN . . . . . . . . . . . . . . . . REVKPGHFVM CTEAEFKAFS .................... T S Q N N S H F V A CLKVL . . . . . . . . . . . . . . . . . . . . . . . . . EG.SFRHAVS CLKVDPL ....................... K D Y . G G Q L V A C F A V D Q D E N P QR . . . . . . . . . . . . . . . . . . E Q H T D G K L I A CFHID . . . . . . . . . . . . . . . . . . . . . . . . . K E A A P S H F V A CHLYS . . . . . . . . . . . . . . . . . . . . . . . . . PKGGENVESN Y ............................. LNMLADGRQS K..CHYPLDD AGRPTL .............. L H H I . G S R K V K . . C H T P L N E Q G N P V E Y Q G A .......... L T T F N H G Q L R N . . C W L S A E K FNL . . . . . . . . . . . . . . . . . L E A F S P G R L R A . . C F K P V E E LL . . . . . . . . . . . . . . . . . . Q T I R S D S H T V N . . C W L Q D Q R A E H A V L S G D A KD ........ M F K V S D T H Y V K . . S W L L H P D A . P K V E P P E A .......... L L T H K S G R T V R . . C F L Y E E E GAEQS . . . . . . . . . . . . . . . QFSVSETHWA K..TWLLHED APKVEKPAVI ANLHDKIREK
I
! 1
i i
! i i !
i
Oppdlacla Oppdmycge Modcescco Modchaein Modcrhoca Moddazovi Nrtcsynsp Opuabacsu Provescco Abchaein Lackagrra Malkescco Msmkstrmu Ugpcescco Potaescco Potahaein Cysaescco Cysasynsp Sfucserma Mbpxmarpo Mklmycle
Feceescco
Fecehaein Drrastrpe Bztrhoca Consensus
EEAQKVISEK MTEISSNHFV RGQAWKKFEF PDQKLKGGEK .......... A I D F E Y H P P . . . F F E V T K T H KAATWLLHPQ APKVEPPQAV IDNITLTKKA PPQQTSIRNV L R A K W N S Y D DNG ...... Q VEVELEVGGK TLWARISPWA KPEQTSIRNI LRGKITQIEI QDS ...... R VDLAVLVEGH KIWASISKWA APTDTTILNA L P A V I L G . A E A A E G Y Q I T V R LALGASGEGA SLLARVSRKS RIEDSSITNV LPATVREVVE ADTPAHVLVR LE .... AEGT PLIARITRRS IAQELGLFQD LGLSVELQSF LTWEALEDSI RLGQLEGALM MAAQPLAMTM IYAVDKQKKL LGVIYASDAK KAAESDLSLQ DILNTEFTTV PENTYLTEIF GYVIERGNKF VGAVSIDSLK T A L T Q Q Q G L D A A L I D A P L A V DAQTPLSELL ASKKFGVELS ILTSQIDYAG GVKFGYTIAE VEGDEDAITQ TKVYLMENNV QGGDAVTVGV RPEHFLPA ..... GSGDTQL TAHVD ...... VVEHLGNTS QVGANMSLGI RPEHLLPS ..... DIADVIL EGEVQ ...... VVEQLGNET ..NKNLIFGI RPEDISSSLL VQETYPDATV D A E W ...... VSELLGSET ..GRKMTLGI RPEHIALS ..... SQAEGGV PMVMD ...... TLEILGADN EPGQKLHVLL RPEDLRVEEI NDDNHAEGLI GYVRERNYKG MTLESVVELE EKDQKLQVLL RPEDIVIEEL DENEHSKAII GHIIDRTYKG MTLESTVEFD AYQGPVDLFL RPWEVDIS.. RRTSLDSPLP VQVLEASPKG HYTQLVVQPL .... HPEVFL RPHDIEIA ....... IDPIP ETVPARIDRI VHLGWEVQAE QRSGPARIML RPEQIQIGLS DPAQRGQAVI TG .......... IDFAGFVS RSINKYRFFL RPYEFCIKSE MDLEATPVQI KTIIYKRTFV QLDLFVTSFL ARRQARVRAM LPTLPKGAQA AILDDLEGAH NYQAHEFGD ........... CLMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TVLTDDEKAV VFYGETKQDP PAPTTQNCHF EDCPYKSAVK NKRD ...... SLGQSSLDEV FLALTGHPAD DRSTEEAAEE EKVA ................ ALGLDYWQAQ RLIIMPQALK ISIPGIVSSF IGLFKDTTLV AFVGLFDPLK ..................................................
451 500 MGFAHLAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LQFKDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RDELAIKPGL WLYAQIKSVS ITA . . . . . . . . . . . . . . . . . . . . . . . . . . . QNELRFAIGQ DVYVQIKAVS VM . . . . . . . . . . . . . . . . . . . . . . . . . . . . FDLLGFQPGE QVVARLKAMA LSAPAQTGG . . . . . . . . . . . . . . . . . . . . . CDQLGIAPGR RMWAQIKAVA LLG . . . . . . . . . . . . . . . . . . . . . . . . . . . GLGGHRPFAI ATPLTVSRNG GAIALSRRYL NAGVRSLEDL CQFLAATPQR DVVSDANIPI AVVDEKQRMK GIVVRGALIG ALAGNNEYIN AEGTNEQTQD SHVGQAPCAV PVVDEDQQYV GIISKGMLLR AL ....... D REGVNNG... RVEVLGYVQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . YVYAHTVPGE QIIIEQEERR HGGRYGDEIA VGISAKTSFL FDASGRRIR. QIHIQIPSIR QNLVYRQNDV VLVEEGATFA IGLPPERCHL FREDGTACRR MLYLKL..GQ TEFAARVDAR DFHEPGEKVS LTFNVAKGHF FDAETEAAIR LAHGRW..GE QKLVVRLAHQ ERPTAGSTLW LHLAENQLHL FDGETGQRV. ...NGKMVMV SEFFNEDDPD FDHSLDQKMA I N W V E S W E V V L A D E E H K . . . ..HNGMRVLV SEFFNEDDPH MDHSIGQRVG ITWHEGWEVV LNDEDNQ... Cysaescco GWYNEPLTVV MH ...... GD DAPQRGERLF VGLQHARLYN GDERIETRDE Cysasynsp VRLEDGQVLV AHLPRDRYRD LQLEPEQQVF VRPKQARSFP LNYSI ..... Sfucserma TLNLQMAATG AQLEIKTVSR EGLRPGAQVT LNVMGQAHIF AG ........ Devaanasp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M b p x m a r p o W N L T I P I G Y Q SFRNLHIESF MQTLYIKPRL QVFLRAYPIL TNIKKN .... Bztrhoca G I S N W R S D M AWKGTYWEPY IFVALIFFLF NFSMSRYSMY LERKLKRDHR Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Amiestrpn Oppdmycge Modcescco Modchaein Modcrhoca Moddazovi Nrtcsynsp Opuabacsu Provescco Abchaein Lackagrra Malkescco Msmkstrmu Ugpcescco Potaescco Potahaein
501 550 Nrtcsynsp LRLAIPDPIA MPALLLRYWL ASAGLNPEQD VELVGMSPYE MVEALKAGDI
~_4~
k Opuabacsu Malkescco Cysaescco Consensus
PSAQEVK ........................................... LHKEPGV ........................................... ELALAQSA .......................................... ..................................................
551 600 Nrtcsynsp DGFAAGEMRI ALAVQAGAAY VLATDLDIWA GHPEKVLGLP EAWLQVNPET 601 650 Nrtcsynsp AIALCSALLK AGELCDDPRQ RDRIVEVLQQ PQYLGSAAGT VLQRYFDFGL 651 700 Nrtcsynsp GDEPTQILRF NQFHVDQANY PNPLEGTWLL TQLCRWGLTP LPKNRQELLD 701 750 Nrtcsynsp RVYRRDIYEA AIAAVGFPLI TPSQRGFELF DAVPFDPDSP LRYLEQFEIK 751 764 Nrtcsynsp APIQVAPIPL ATSA
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and are therefore not included in the alignment: Hispsalty (Hispescco); Livgsalty (Livgescco); Malksalty, Malkentae (Malkescco); Sapfsahy, Sapfhaem (Sapfescco); Provsahy (Provescco). Residues listed in the consensus sequence are present in at least 75 % of the aligned transporter sequences. Residues indicated in boldface type are also conserved in at least one other family of the ABC transporter superfamily.
Database accession numbers
SWISSPROT
9
Abchaein Abcxantsp Abcxcyapa Abcxgalsu Abcxodosi I Amiestrpn Amifstrpn Appdbacsu ! Appfbacsu Artpescco Artphaein Brafpseae Bztrhoca Cysaescco ! Cysasynsp Devaanasp Dppdbacsu i Dppdescco Dppdhaein Dppfescco Dppfhaein Drrastrpe Feceescco Fecehaein Ftseescco Ftsehaein ~
,
i
P44785 Q02856 P48255 P35020 Q00830 P18765 P18766 P42064 P42065 P30858 P45092 P21629 P16676 P14788 P26905 P37314 P45095 P3 7313 P45094 P32010 P15031 P44662 P 10115 P44871
PIR $3 7635 $39521 $21682 S11152 S11153 $31694 D36125 C35402; QRECSA A30301; GRYCS7 A55541 S 16650
$27707 JS0115; QRECM3 S03131; C E E C F E
EMBL/GENBANK L45262; G 1005459 X63382; G 14178 U30821; G 1016162 X67814; G429179 X60752; G11945 X17337; G47346 X17337; G47347 U20909; G677943 U20909; G677944 X86160; G769790 L45815; G 1006549 D90223; G216866 U37407 M32101; G145661 J04512;G142152 X56678; G48807 L08399; G349228 L45820; G 1006559 L08399; G349229 L45819; G 1006557 M73758; G153230 M26397; G145928 L45003; G 1003608 X04398; G41499 L45407; G 1005744
i
_::.:
......
:~i~. i~ , ii:!....
Glnqbacst Glnqescco Gltlescco Gltlhaein Gluacorgl Hispescco Hispsalty Lackagrra Livgescco Livgsalty Malkentae Malkescco Malksalty Mbpxmarpo Mklmycle Modcescco Modchaein Modcrhoca Moddazovi Msmkstrmu Nasdklepn Nikdescco Nikeescco Nocpagrtu Nrtcsynsp Nrtdsynsp Occpagrtu Oppdbacsu Oppdhaein Oppdlacla Oppdmycge Oppdsalty Oppfbacsu Oppfhaein Oppflacla Oppfsalty Opuabacsu P29mycge Pebccamje Phncescco Phnkescco Phnlescco Potaescco Potahaein !i!:i~iii!~'.:i,= Provescco Provsalty Pstbescco Sapdhaem Sapdsalty Sapfescco Sapfhaem Sapfsalty Sfucserma Ugpcescco .
.
: . . : : .
%. .i;::-i. .. "
........ .:::..
:: :::::::::::::::::::z~?-. :-%.:::.,%.
:: ,~::4 ~::~::. ---
.: :....: ...:. ::. :
-. --.---..
::::::::::::::::::.
.....
--.v.:-.: ..::- -
.......
======================== :.. . .....
.......
......
,:.. :::::::::::::::::::::: ....
x::.:i;::i;:!;::i!;:..... ::..... ....... .:..%.,
i:::::;::-?~;i:--;i,i~-: i-.:
...
=================================
:::::::::::::::::::-:.:.:..~,
SWISSPROT P27675 P 10346 P41076 P45022 P48243 P07109 P02915 Q0193 7 P22730 P30293 P 18813 P02914 P 19566 P 10091 P30769 P09833 P45321 Q08381 P3 7732 Q00752 P39459 P33593 P33594 P35116 P38045 P38046 P35117 P24136 P45052 Q07733 P47325 P04285 P2413 7 P45051 Q07734 P08007 P46920 P47532 P45677 P 16677 P16678 P 16679 P23858 P45171 Pl 4175 P17328 P07655 P45288 P36636 P36637 P45289 P36638 P21410 P 10907
PIR A42478 S03183; QRECGQ
A27835 A03412; QREBPT 34734 F3 7074 JH0670 S05328 A03411; MMECMK S05329; $20602 SO1592; BVLVMX $31144 B26871; BVECHD C36914 $31045 E42400 $39597 $39598 G42600 $30893 $30894; $36604 C41044; C42600 S 15233; D38447
A03413; QREBOT S 15234; E38447 D29333; QREBOF
D35718 C35719 D35719 A40840 JS0128; BVECPV S05374; QREBVT Q00616; BVECZB $39588 $39589 C35108; QRSEUC S03 783; QRECUC
EMBL/GENBANK M61017; G142988 X 14180; G581098 U10981; G624632 L45715; G1006354 X81191; G732701 Y00455; G41705 V01373; G47734 X66596 J05516; G 146635 D12589; G217074
U00006; G409797 X54292; G47772 X04465; G 11666 Z 14314; G581333 U27192; G973216 L46321; G1008029 L06254; G310274 X69077; G49180 M77351; G153741 L27431; G473439 X73143; G404848 X73143; G404849 X61625; G48971 X61625; G48972 M80607; G154771 X56347; G580898 L45757; G1006435 U09553; G495177 U39688; G1045756 X05491; G47805 X56347; G580899 L45756; G1006433 L18760; G308851 X05491; G47806 U17292; G984803 U39709; G1045987 L13662; G388564 D90227; G216591 D90227; G216600 D90227; G216601 M64519; G147326 L45980; G 1007359 M24856; G147373 X52693; G47831 X02723; G42398 L46271; G1007931 X74212; G414211 U08190; G470683 L46272; G1007933 X74212; G414212 M33815; G152862 X13141; G43249
I Perego, M. et al. {1991} Mo|. Microbio|. 5, 173-185. 2 Nohno, T. et al. {1986) Mol. Gen. Genet. 205, 2 6 0 - 2 6 9 .
~_48
:::::::::::::::::::::::::::::::~::i
':;:!ili:i:i!~ii!~)~:!ii!:::: -
a Kraft, R. and Leinwand, L.A. {1987) Nucleic Acids Res. 15, 8568. 4 Williams, S.G. et al. (1992)Mol. Microbiol. 6, 1755-1768. s Sofia, H.J. et al. {1994) Nucleic Acids Res. 22, 2576-2586. 6 0 m a t a , T. et al. (1993) Mol. Gen. Genet. 236, 193-202. 7 Parra-Lopez, C. et al. {1993} EMBO I. 12, 4053-4062. s Higgins, C.E (1992) Annu. Rev. Cell Biol. 8, 67-113. 9 Pearce, S.R. et al. [1992) Mol. Microbiol. 6, 47-57. lo Higgins, C.F. et al. (1992) Nature 298, 723-727. 11 Froshauer, S. et al. (1988) Mol. Biol. 200, 501-511. le Ehrmann, M. et al. (1990) Proc. Natl Acad. Sci. USA 87, 7574-7578.
This Page Intentionally Left Blank
Other ABC-Associated (Cytoplasmic) Proteins
I
Heme exporter family Summary
ii:ii::: :?: :".L
Most transporters of the heine exporter family, the example of which is the heme exporter CYCV from Bradyrhizobium japonicurn 1 (Ccmabraja), mediate export of heme into the periplasm for the biogenesis of C-type cytochromes. The COB transporters z, which transport cobalt for the synthesis of vitamin B12 (cobalamin), and the NIK transporters 3 which transport nickel (e.g. Nikeescco), are also members of this family. These transporters are found only in gram-negative bacteria. Statistical analysis of multiple amino acid sequence comparisons places the heme exporter family in the ATP binding cassette (ABC)superfamily ~. Proteins in this transporter superfamily use the energy of ATP hydrolysis to pump substrates across cell membranes. Transporters of the heme exporter family exist as four or five separate chains - two separate ATP binding domains and two separate transmembrane domains. The family is characterized by the cytoplasmic ATP binding domains ~, which are described in the following tables. The associated transmembrane domains are predicted to contain six membrane-spanning helices by the hydropathy of their amino ;i!!iii,~i~i!?i:!!:! acid sequences. !!;ii.iii!:i!:ii:i:: i:i:i!:'.: A relatively small number of amino acids, scattered throughout the whole length of the sequences, are conserved within the family, almost all of which are conserved in other families of the ABC superfamily. ....:..:::: ,,:. ...... ......... , ..............<:: :. ....
:t:-:-:::..:.. ,-:: ..
.
,:. ~ ::-:~:~., ~..:.:~.~ ....
.~..-..~...,;,
.~
.~ ..::.:::....,-: ..:
....... .: ...::...
. .. ....... ., .:.,::. :.,
.,~:~.;... ,:. ..:...=:. .......
.:::
~.~:.,-~:.~-~.~::..:..: ~:-:,:.~.: :~ .:-:,.::. ........................
:.:-.:.~:..~...:~.:
iii:!:.ii:i:i:ii:i!:i:.i! :ii:! i,lii:iii::&~i
Nomenclature, biological sources and substrates CODE
Cbiosalty Ccmabraja
:i!:.!.,!::~iii:.::.i:-~i9: :! :: ;L7
Ccmaescco
.
,:::::...,:..:::.. :.:;. ...
Ccmahaein :2": i= 2 2...... i::.::..i:.ii:.:;~i:~...:.:!::
Ccmarhoca " z . y..:.:. ......
Nikeescco
~_52
DESCRIPTION [SYNONYMS] Cobalttransport ATP binding protein [CBIO] Heme exporter protein A [Cytochrome C type biogenesis ATP binding protein A, CYCV] Heme exporter protein A [Cytochrome C type biogenesis ATP binding protein A, CCMA] Heme exporter protein A [Cytochrome C type biogenesis ATP binding protein A, CCMA, HI1089] Heme exporter protein A [Cytochrome C type biogenesis ATP binding protein A, HELA] Nickel transport ATP binding protein [NikE]
OR GANISM [COMMON NAMES] Salmonella typhimurium [gram-negative bacterium] Bradyrhizobium japonicum [gram-negative bacterium]
S UBSTRATE(S)
Coz§ Heine
Escherichia coh" [gram-negative bacterium]
Heme
Haemophilus influenzae [gram-negative bacterium]
Heme
Rhodobacter capsulatus [gram-negative bacterium]
Heme
Escherichia coli [gram-negative bacterium]
Ni 2§
,.iiiiii!:ii:~iii=~i! ~ii~.!i!!
Phylogenetic tree tcnaescco
...
:..~:,.. : ..,:~....: :.
Ccmahaein
::::::::::::::::::::::
~:. :: .: :...:.:. :. ....
Ccmabraja .........
-:j:-:::=:. ===================== .........
ilb:::i:::::::i::iZ:
Cc~arhoca
?:.:!i::::::::::::::::::::::::%:':::: ! <
Cbiosalty
i-~i< <"%:.:k ..... i ~:;:i:i!.~;:i.!i:-~!:~::!::::ii:. :::::::::::::::::::::: :::::~i?:!::i:.!i. :::: ::::::i.< ,:. :<;::::.:::::":;
Nikeescco
Physical and genetic characteristics AMINO ACIDS 271 200 205 212 214 268
================================== :.-~
...... :=.;i/:ii:i:!::i::L. :::=i Zi??:i~% i= ii
....... .:+:: .... .::.:.::.:.::::._. ......
~i,: i~i,:!~ii~!!i:ii~ i!iiiii!! '.:.%::.;::. ::.:
i:i~?
:::!;:::i::!:-:-i::ir
:::::::::::::::::::::::::.=:
-.~?.:: :::::::::::::::::::::: ;-::
,:c--:-,::::.7--:: i:;:::%1.k:; :::"
~-;.::::: ;;7.}-,.:.-
:7::::q
i":.-: ::%::ii< !::i....?Z!.I<~! ... ::
Cbiosalty Ccmabraja Ccmaescco Ccmahaein Ccmarhoca Nikeescco
MOL. WT
30 21 22 24 22 29
147 132 865 009 168 619
CHROMOSOMAL LOCUS cob operon cyc 49.42 minutes
hel 77.9 minutes
Multiple amino acid sequence alignments 1
50
Ccmaescco ........... MLEARELLC ERDERTLFSG LSFTLNAGEW VQITGSNGAG C c m a h a e i n ...... M F E Q H K L S L Q N L S C Q R G E R V L F R A L T C D F N S G D F V Q I E G H N G I G Ccmabraja .... M Q L S G R R V I C V R G G R E V F A G L D F E A V S G E A V A V V G R N G S G Ccmarhoca M TLLAVDQLTV SRGGLAVLEG VSFSLAAGHA LVLRGPNGIG Cbiosalty ........... MLATSDLWF RYQNEPVLKG LNMDFSLSPV TGLVGANGCG Nikeescco MTLLNISGLS HHYAHGGFNG KHQHQAVLNN VSLTLKSGET VALLGGTGCG Consensus ............ L ........................ G ...... G.NG.G
51 i00 Ccmaescco KTTLLRLLTG LSRPDAGEVL WQGQPLHQVR DSYHQ ...... NLLWIGHQP Ccmahaein KTSLLRILAG LVRPLEGEVR WDSEAISKQR EQYHQ ...... NLLYLGHLS Ccmabraja KTSLLRLIAG LLIPAGGTIA LDGG...DAE LTLPE ...... QCHYLGHRD C c m a r h o c a K T T L L R T L A G L Q P P L A G R V S MP . . . . . . . . . . . PE . . . . . . G I A Y A A H A D C b i o s a l t y K S T L F M N L S G L L R P Q K G A V L W Q G K P L D Y S K RGL . . . . . . . . . . . . L A R R Q Nikeescr KSTLARLLVG LESPAQGNIS WRGEPLAKLN RAQAKAFRRD IQMVFQDSIS ConsensusK..L.R.L.GL..P..G .................................
~_5~
I I
Ccmaescco Ccmahaein Ccmabraja Ccmarhoca Cbiosalty Nikeescco Consensus
i01 GIKTRLTALE GVKPELTAWE ALKPALSVAE GLKATLSVRE QVATVFQDPE AVNPRKTVRE ......... E
150 N L H F Y H R D . . . . . . . . . . . . G D T A Q C .... L E A L A Q A G L A NLQFYQRI ............ $QAEQNTDML WDLLEKVGLL NLSFWADF ............ LGGERLDA.. HESLATVGLD N L Q F W A A I . . . . . . . . . . . . H A T D T V E T .... A L A R M N L N QQIFYTDIDS DIAFSLRNLG GPEAEITRRV DEALTLVDAQ ILR . . . . . . . . . E P M R H L L S L K K S E Q L A R A R Q M L K A V D L D ................................. L ......
Ccmaescco Ccmahaein Ccmabraja Ccmarhoca Cbiosalty Nikeescco Consensus
151 G.FEDIPVNQ G.REDLPAAQ H.ATHLPAAF A.LEHRAAAS H.FRHQPIQC DSVLDKRPPQ ..........
LSAGQQRRVA LSAGQQKRIA LSAGQRRRLS LSAGQKRRLG LSHGQKKRVA LSGGQLQRVC LS.GQ..R..
Ccmaescco Ccmahaein Ccmabraja Ccmarhoca Cbiosalty Nikeescco Consensus
201 QRMAQHTEQ. ALFDEHAQR. GLMRDHLAR. EAVRAHLAA. AIIRRIVAQ. RLLKKLQQQF ..........
250 G G I V I L T T H Q P L N V A E S K I R R I S L T Q T R A A .......... G G I V L L T S H Q . . E V P S S H L Q K L N L A A Y K A E .......... GGLIIAATHM ALGIDSRELR IGGVA ............... GGAALMATHI DLGLS..EAR VLDLAPFKAR PPEAGGHRGA GNHVIISSHD IDLIYEISDA VYVLRQGQIL THGAPGEVFA GTACLFITHD LRLVERFCQR VMVMDNGQIV ETQVVGEKLT G ....... H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Ccmaescco Ccmahaein Ccmabraja Ccmarhoca Cbiosalty Nikeescco Consensus
251 296 .............................................. .............................................. .............................................. FDHGFDGAFL .................................... CTEAMEHAGL TQPWLVKLHT QLGLPLCKTE TEFFHRMQKC AFREAS FSSDAGRVLQ NAVLPAFPVR RRTTEKV ................... ..............................................
200 LARLWLTRAT LWILDEPFTA IDVNGVDRLT LGRLWLSQAP LWILDEPFTA IDKKGVEILT LARLLTVRRP IWLLDEPTTA LDVAGQDMFG LARLLVTGRPVWVLDEPTVS LDAASVALFA IAGALVLQAR YLLLDEPTAG LDPAGRTQMI LARALAVEPK LLILDEAVSN LDLVLQAGVI LA . . . . . . . . . . . LDEP .... D ........
Residues listed in the consensus sequence are present in at least 75 % of the aligned transporter sequences. Residues indicated in boldface type are also conserved in at least one other family of the ABC transporter superfamily.
Database accession numbers SWISSPR O T
Cbiosalty Ccmabraja Ccmaescco Ccmahaein Ccmarhoca Nikeescco
P30963 P33931 P45032 P29959 P33594
PIR
EMBL/GENBANK
Q05596 A39741
L 12006; G 154435 M60874; G 152074 U00008; G405926 L45726; G1006376 X63462;G46024 X73143;G404849
$23663 $39598
References 1 Ramseier, T.M. et al. (1991) J. Biol. Chem. 266, 7793-7803. 2 Roth, J.R. et al. (1993) J. Bacteriol. 175, 3303-3316. 3 Navarro, C. et al. (1993) Mol. Microbiol. 9, 1181-1191.
4 Higgins, C.F. (1992) Annu. Rev. Cell Biol. 8, 67-113.
~_5~
Macrolide-streptogramin-tylosin resistance family Summary
.<,,.:i::..: .:ii ;zi: I i::-.:.:~:i..i~: ii..~. :l :'~ii :. :: : !:.~: .:::.:I
i~::;;ii::. ~::!::r :i~.i:~:~ :.i-.:. ,.~.:......
:.i:;.%..i
'~!!(ii ,!ii:i i
Transporters of the macrolide-streptogramin-tylosin resistance family, the examples of which are the erythromycin resistance protein MSRA 1 of Staphylococcus epidermidis (Mrsastaep)and the tylosin resistance protein TLRC e of Streptomyces fradiae (Tlrcstrfr), confer resistance to antibiotics by acting as ATP-dependent efflux pumps. These transporters are found mostly in gram-positive bacteria from the genera Staphylococcus and Streptococcus. Several members of this family are plasmid-encoded. Statistical analysis of multiple amino acid sequence comparisons places the macrolide-streptogramin-tylosin resistance family, in the ATP binding cassette (ABC) superfamily 3. Proteins in this superfamily use the energy of ATP hydrolysis to pump substrates across cell membranes. In transporters of the macrolide-streptogramin-tylosin resistance family the two ATP binding domains form one chain, separate from any transmembrane domains. The MsrA protein is characterized by a long "Q-linker" domain between the two ATP binding domains. Unusually, it is not known to associate with a specific transmembrane protein or complex. Instead it is believed to interact with other transmembrane resistance proteins to alter their specificity*. Other members of this family are associated with specific transmembrane proteins. The macrolide-streptogramin-tylosin resistance family is characterized by the cytoplasmic ATP binding domains 3, which are described in the following tables. Where present, the associated transmembrane domains are predicted to contain six membrane-spanning helices by the hydropathy of their amino acid sequences. Several amino acid sequence motifs are highly conserved in the macrolidestreptogramin-tylosin resistance family, including motifs unique to the family, signature motifs of the ABC superfamily, and motifs necessary for function by the criterion of site-directed mutagenesis.
Nomenclature, biological sources and substrates CODE
DESCRIPTION
[SYNONYMS]
ORGANISM
[COMMON NAMES]
RESISTANCE
a
Unknown Saccharopolyspora erythraea [gram-negative bacterium] [Linocomycin] Lmrcstrli LmrCprotein Streptomyces lincolnensis [gram-negative bacterium] [Antibiotics] Msrastaep Erythromycinresistance Staphylococcus ATP binding protein epidermidis [MSRA] [gram-positive bacterium] i!ilili~ii~:i~] Srmbstram SrmBprotein polyketide Streptomycesambofaciens [Streptogramin B] synthase [gram-negative bacterium] [Tylosin] Tlrcstrff Tylosinresistance ATP Streptomycesfradiae binding protein [TLRC] [gram-negativebacterium] [Virginiamycin Vgastaau VgA protein Staphylococcus aureus A-like antibiotics] [gram-positive bacterium] a Presumed substrates; protein confers resistance to specified compounds. Abcsacer
:~i::ii~!::::~ii.i~:.iiii:Gi:;~
ABC tranporter [ertx]
Phylogenetic tree
.x ili.ii!i>
Vgastaau
;~i~:<........... .ii.i! i:i!.:i!.:iii:ii:- :. il iii ::::::::::::::::::::::.::..,:.:.| =
ii%;.. ....................... .t
Msrastaep '!::):.!:t .!:". :. "_'_-
$rmbstram ...... . : : . : . , : . ,
Tircstrfr
!ii!,i!il ' i/"ill
Abcsacer
Lmrcstrli
!:ii,:i:i~iiiiiiii:ii~ii!i-~i
Physical and genetic characteristics AMINO ACIDS 481 579 488 550 548 522
::/;~::,::::.i,:.::::?-:: :.:.~:,...,::..,:~:.. ;.::-. :::::~.~::~:,::~.~.,~:,:..,.: :::::::::::::::::::::.:,:,,:..~ .....
~f::;~:!!i::i:i;:!. i.ii:~:! '.~! {:::::%:.i):i~" .:i:~i:::.:i:: [::i?ii:!:~:k:_ ~i ~i
Abcsacer Lmrcstrli Msrastaep Srmbstram Tlrcstrfr Vgastaau
MOL. WT
52 832 62 803 55 912 60 146 59 129 60 184
CHROMOSOMAL LOCUS
Plasmid PUL5050
Multiple amino acid sequence alignments 1
~;:/:i:ii.
!ii?:.!i<<~::<
........
!:};i::!::iG::i:,::::
}}:<%::~ ~:::.;}.-
;;~::; {%f:::.)~{-:::, .......... ......... !:::.ii:.ii!i.:::iiil%:.: ;:
~_5(
50
Vgastaau .... M K I M L E G L N I K H Y V Q D R L L L N I N R L K I Y Q N D R I G L I G K N G S G K T T L Msrastaep ...MEQYTIK FNQINHKLTD LRSLNIDHLY AYQFEKIALI GGNGTGKTTL Srmbstram ...MSIAQYA LHDITKRYHD CWLDRVGFS IKPGEKVGVI GDNGSGKSTL Tlrcstrfr MRTSPSSQLS LHGVTKRYDD RWLSQVSLA ISPGEKAGII GDNGAGKSTL Abcsacer .... M V N L I N L E S V S K S Y G V R P L L D E V S L G V G A S D R I G V V G L N G G G K T T L Lmrcstrli MADASIVC TNLSFSWPDE TPVFDGLSFA LGDG.RCGLV GPNGAGKSTL Consensus ....................... L . . . . . . . . . . . . . G.. G . N G . G K . T L
51 i00 Vgastaau LHILYKKIVPEEGIVKQFSHC ............................. Msrastaep LNMIAQKTKP ESGTVETNGE I ............................. Srmbstram LKILAGRVEP DNGALTVVAP GGVGYLAQTL ELPLDATVQD AVDLALSDLR Tlrcstrfr LRLLAGEERP DAGEVTVIAP GGVGYLPQTL GLPPRATVQD AIDLAMTELR Abcsacer L E V L S G S V D R D S G R V S H S R D L R M A V V T Q R T E L P E G S T V R N AV ........ L m r c s t r l i L R L A V G E L T P T A G S I T . . . A Q D M S V P A E S L P L . I D G T V D E A ....... WR C o n s e n s u s L ........ P ..G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
~i
[!!iiiiiii~i!iiiV:iig!~ia!s t a a u [ii!i~:ii%ii;
Msr astaep Srmbstram Tlr cstrfr Abcsacer Lmr cstr li Consensus
!i!i~i~iiiii~i!!i!ii
[!!i!!
Msrastaep
[~iii~i!ii{iS r m b s t r a m Tlrcstrfr Abcsacer Lmrcstrli Consensus
i01 150 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EL IPQLKL .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . QY F E Q L N M D V E N ELE. A A M R E A E A E L G E S D E N G S E R E L S A G L Q R Y A A L V E Q Y Q A R G G Y E A D V VLE .AELRRT EAALAEA... A T D E A L Q D A L T A Y A R L T E Q Y E V R D G Y G A D A .LD.P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H G F T A E H E W A A D A S L H R A A L H A I E S G D V D E A H F TT . . . . . . . . . . . . . . . . . . . V G D H W D I E E ............................... 151 200 . . . . . . . . . . . . . . . . . . IE S T K S G G E V T R N Y I R Q A L D K N P E L L L A D E P T DFNTLDGSLM SELHIPMHTT DSMSGGEKAK YKLANVISNY SPILLLDEPT RVEVALHGLG LPSLDRDRKL GTLSGGERSR LALAATLASS PELLLLDEPT RVDAALHGLG LPGLPRDRRL GTLSGGERSR LALAATLASQ PELLLLDEPT KVRSVLTGLG MTSLGLDTPV ADFSGGERRR VALAA.LVRE LDLLVLDEPT RTTIVLDRLG LGDVSLDRPL RSLSGGQVLA IGLAAQLLKR PDVLILDEPT ........ L . . . . . . . . . . . . . . SGGE ..... LA . . . . . . . . . L . L D E P T
!i!?:?i?!i?ii!!!il
201 250 Vgastaau TNLDNNYIEK LEQDLKNWHG AFIIVSHDRA FLDNLCTTIW EI.DEGRITE Msrastaep NHLDKIGKDY LNNILKYYYG TLIIVSHDRA LIDQIADTIW DIQEDGTIRV i~ii;ii~.i!ii:~i!il S r m b s t r a m N D L D D R A M E W L E D H L A G H R G T V I A V T H D R V F L D R L T T T I L E V . D S G S V T R Tlrcstrfr NDLDDRAVHW LEEHLSGHRG TVVTVTHDRV FLDRLTATVL EV.DGRGVSR Abcsacer NHLDVEGVRW LADHLLQRRC ALVIVTHDRW FLDTVCNRTW EV.VQGRVEQ Lmrcstrli NNLDLAARQR LYQVVEEWKG ALLVVSHDRE LLDRV.DTIA EL.QASELRL C o n s e n s u s N . L D ...... L . . . L .... G .... V . H D R . . L D .... T.. E ......... ~i!ii! 251 300 Vgastaau YKGNYSNYVE QKELERHREE LEYEKYEKEK KRLEKAINIK EQKAQ...RA Ssrastaep FKGNYTQYQN QYEQEQLEQQ RKYEQYISEK QRLSQASKAK RNQAQQMAQA S r m b s t r a m Y G N G Y E G Y L T AKAVERE... R R L R E Y E E W R A E L D R N R G L I T S N V A R M D G I T l r c s t r f r H G D G Y A G Y L A AKAAERR... R R Q Q Q Y D E W R A E L D R N R R L A E A N V A R L D G I Abcsacer YEGGYRLGLR PRRAGPG WR ........ S R P R R S A R T S P A R .... Lmrcstrli YGGNFTAYTE AVELEQENVQRAVLRADRSC~TSARRRRAQERAQRRASN iii!{i :~[i~i:ii::i::::!:i C o n s e n s u s .... Y..Y ...... E . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........
!ii:i?~iill
iilNi
301 350 Vgastaau TKKPKNLSLS EGKIKGAKPY FAGKQKKLRK TVKSLETRLE K.LESVEKRN M s r a s t a e p S S K Q K N K S I A PDRLSASKE. K G T V E K A A Q K Q A K H I E K R M E . H L E E V E K P Q Srmbstram PRKMSLSVFG HGAYRRRGRD HGAMVRI..R NAKQRVAQLT E..NPVHAPA Tlrcstrfr PRKMGKAAFG HGAFRARGRD HGAMSRV..R NAKERVERLT A..NPVAPPA Abcsacer . ..SSRGCSGGAKART..S KPRFRVEAAEALIADVPPPR LmrcstrliAKRNKVRRGCPVSTRAPLQRQAQESAG RAASVHQDRVSQAKAKLDEAS Consensus ................................................ P.
......... i!:ii:::!:!ii::i~!i! .......
Vgastaau Msrastaep Srmbstram Tlrcstrfr Abcsacer Lmrcstrli Consensus
351 400 ELPPLKMDL. V N L . . . E S V K N R T I I R G E D V S G T I E G R V L W K A K S F S . . I R SYHEFNFPQN KIY...DIHN NYPII.AQNL TLVKGSQKLL TQVRFQ..IP D P L S F A A R I D T .... A G P E A E E A V A E L T D V ..RVAGR..L A V D S L T . . I R D R L S L T A R I A T .... A D G P G E A P A A E L D G V ..VVGSR..L R V P K L R . . L G D T V E L . . . V S F .... A K R R L G K T V L E L E N V D L R I A D R . . V L L E D L T W L I G QGMREEARLA ITLPQTSVPA GRTVLTCHEA NVRYGERTLF TGSGVDLGIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R . . . . . . . . . . . I.
Macrolide-streptogramin-tyh, sin resistance family
401 450 Vgastaau GGDKMAIIGSNGTGKTTFIKKIV...HGNPG.ISLSPSVKIGYFSQKIDT ii~i!i~,!,!i~' !i~i~il M s r a s t a e p Y G K N I A L V G A N G V G K T T L L E A I Y . . . H Q I E G . I D C S P K V Q M A Y Y R Q L A Y E Srmbstram PGERLLVTGP NGAGKSTLLR VLSGELEPDG GSVRVG..CR VGHLRQDETP TlrcstrfrAAERLLITGP NGAGKSTLLS VLAGELSPDA GAVSVP..GR VGHLRQEETP Abcsacer PGDRIGLVGV NGSGKTTLLR LLAGERDADA GRRIEGKTVR LAHLTQELHD Lmrcstrli GPERIALLGP NGSGKSTLLK LIAGELEPSS GTVTAP.TDR VSYLSQRLDL C o n s e n s u s ........ G. N G . G K . T L L . . . . . . . . . . . G . . . . . . . . . . . . . . Q ....
i!iii ii!!i, il
i~i)ii!:~!!i !!i.,i~l i
?;Y:;~:iii-:i i;i :i;~!:ii~4!.!i; ~!i!
.::-:: .--::.-:.:
Vgastaau Msrastaep Srmbstram Tlrcstrfr Abcsacer Lmrcstrli Consensus
451 500 L E L D K S I L E N VQ ........ S S S Q Q N E T L I R T I L A R M H F F R D D V Y K P I S V D M R D V S L L Q Y LM ........ D E T D S S E S F S R A I L N N L G L . N E A L E R S C N V W A P G L T V L R A FAQ ....... G R E G Y L E D H A E K L L S L G L F S P S D L R R R V K D W P A K L T V L E A FAH ....... N R P G D R D E Q A D R R L S L G L F E P E A L R L R V G E LPGDWRVLEA IEDVAERVTL DKYELTASQL GERFGFG... KGRQWTPVSD LDLDASVLDN LRRFA ....... PHLQDGEV RYRLAQFLFR GDRVHRTAGW ....... L . . . . . . . . . . . . . . . . . . . . . . . . . L ................
Vgastaau Msrastaep Srmbstram Tlrcstrfr Abcsacer Lmrcstrli Consensus
501 LSGGERVKVA LSGGERTKLS LSYGQRRRIE LSYGQRRRIE LSGGERRSVQ LSGGERLRAT L S . G . R ....
Vgastaau Msrastaep Srmbstram Tlrcstrfr Abcsacer Lmrcstrli Consensus
551 600 SIIFVSHDRKFIEKV..ATR IMTIDNKEIK IFDGTYEQFKQAEKPTRNIK I I L F T S H D T R F V K H V . . S D K K W E L T G Q S I H DIT . . . . . . . . . . . . . . . . . AVVVVTHDRR MRSRF..TGA RLTMGDGRIA EFSAG ............... ALVLVTHDRR MRSRF..TGS HLELREGVVS GAR ................. TLVVVSHDRY LVERVCDTST RCSATGG ....................... AFVVVSHDQA FL.RAIGVSR WLRLADGTLE EIAEADDAWP HRISEAAAAR ...... HD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Vgastaau Msrastaep Srmbstram Tlrcstrfr Abcsacer L m r c s t r li Consensus
601 647 E D K K L L L E T K I T E V L S R L S I EPSE ..... E L E Q E F Q N L I N E K R N L D K ............................................... ............................................... ............................................... ............................................... V V R G V N P V A R V R N W P Y A V G I RPVC . . . . . . . . . . . . . . . . . . . . . . . ...............................................
9
........
ii!~i~!ii!!!::.i!i::.:;~,:
i@ii@,G~-!
~ii~i~!:iii21 ii~ii:!i~ii::!
N
LTKVFLS..EVNTLVLDEPT LAVLFST..K ANMLILDEPT IARLVSD..P MDLLLLDEPT LARLVSE..P VGLLLLDEPT LARLLMA..E PNVLVLDEPT LACVLSTDPA PQLLLLDEPT LA . . . . . . . . . . . L . L D E P T
NFLDMEAIEA NFLDIKTLEA NHLTPVLVEE NHLSPALVEE NDLDIDTLQQ NNLDLNSAAQ N . L .......
550 FESLLKEYNG LEMFMNKYPG LEQALADYRG LEEALTGYGG LEDLLDTWPG LENALNAFQG L E . . L .... G
Residues listed in the c o n s e n s u s sequence are present in at least 75% of the aligned t r a n s p o r t e r sequences. Residues indicated in boldface type are also conserved in at least one o t h e r family of the ABC t r a n s p o r t e r superfamily.
Database accession numbers Abcsaeer Lmrestrli Msrastaep
~_5~
SWISSPR O T
PIR
EMBL/GENBANK
P23212
$47441 $44975 S11158; YESAEE
X80735 X79146 X52085; G47001
Srmbstram Tlrcstrfr Vgastaau
SWISSPROT
PIR
EMBL/GENBANK
P25256
$25202 JQ 1142 JC 1204
X63451 M57437; G153508 M90056
References Ross, J.I. et al. (1990)Mol. Microbiol. 4, 1207-1214. 2 Rosteck, P.R. et al. {1991} Gene 10Z, 27-32. a Higgins, C.E {1992} Annu. Rev. Cell Biol. 8, 67-113. 4 Ross, J.I. et al. {1995} Gene 153, 93-98.
~.5~
H+-Dependent Symporters
H+/sugar-symporter-uniporter family Summary
..
.
.
.
. . : : .
.
.. .... .:
i: .,:/ ;:..:- -) . ....... ....:
.
.
...:.:. - -.. :..: -s:: :.,..-.
:
:..... ....... .. ':)).:?:. :. : .2-i .
. . . . . :v :: :-.:::: :.: -.
Transporters of the H*/sugar-symporter-uniporter family, the example of which is the GLUT1 facilitative glucose transporter of humans (Gtrlhomsa), mediate either symport (H§ substrate uptake)or uniport (facilitative uptake) of structurally dissimilar sugars, including mono- and disaccharides, aldohexoses and aldopentoses, and carboxylated compounds ~-6. In addition, they also serve as ion and water channels r's. Possible transport of nicotinamide by the GLUT1 glucose transporter also has been reported~. Some members of the family are inhibited by forskolin, cytochalasin-B, or both, while others are insensitive to both antibiotics s. Members of the H§ uniporter family have a broad biological distribution that includes bacteria, plants and humans. Statistical analysis of multiple amino acid sequence comparisons places the H+/sugar-symporter-uniporter family in the uniporter-symporter-antiporter (USA) superfamily, also known as the major facilitator superfamily (MFS) ~~ Members of the H§ family are predicted to contain 12 membrane-spanning helices by the hydropathy of their amino acid sequences s, reaction with peptide-specific antibodies ~ and glycosylation-scanning mutagenesis ~a. Eukaryotic proteins are glycosylated 7 and may exist as oligomers ~. There is considerable similarity between the sequences of the N- and C-terminal halves of these proteins, implying they arose through gene duplication of an ancestral six-helix protein s,~o. Several amino acid sequence motifs are highly conserved in the H§ symporter-uniporter family, including motifs unique to the family, signature motifs of the USA/MFS superfamily, and motifs necessary for function by the criterion of site-directed mutagenesis a,s,r,~o
Nomenclature, biological sources and substrates COD~
.:..::: :::G.i .'.:~? i~:
DESCRIPTION
[SYNONYMS] Araeescco Arabinose-H+ symporter
[hKaEl Araeldeox Arabinose-H+ symporter
OR GANISM
S UBSTRATE(S)
Escherichia c o l d
H*/arabinose
/COMMONNAMES] [gram-negative bacterium] Klebsiella oxytoca
H*/arabinose
[gram-negative bacterium] : :..:. ................
........... .....:......
::!~.-i;:~i!~!i:~i::!:-.ii-C!
Gal2sacce Facilitativegalactose transporter [GAL2, IMP1] Galpescco Galactose-H+ symporter [GALPI Glcpsynsp Glucosetransporter [GLCP, GTR] Glfzymmo Glucosefacilitated diffusion protein [GLF] Gtrlleido Membranetransporter 1
[D1]
,:-,::7:::::::~:. ~::::::.?"~
Gtrlbosta ii=!;!i%i!!i?;ii!i:~iGtrlgalga
~_62
Facilitativeglucose transporter type 1 [GTR1, GLUT1] Facilitativeglucose transporter type 1 [GTR1, GLUT1]
Saccharom yces cerevisiae [yeast] Escherichia c o l d
Galactose
Gallus gallus
Aldopentoses, aldohexoses
H*/galactose [gram-negative bacterium] Synechocystis sp. Glucose strain PCC 6803 [cyanobacterium] Zymomonas mobilds Glucose [gram-negative bacterium] Leishmania donovani Glucose [trypanosome] Bos taurus Aldopentoses, [cow] aldohexoses [chicken]
CODE
. 7::!!7.-
:::-
:..:
-.
..:
... .. .:,: :.:.
. .;..~i I -
ORGANISM [COMMON NAMES] Gtrlhomsa Facilitative glucose transporter Homo sapiens
.
".A: :-. ...
....,::,..
-::~::::~.::..-~
:..
: s
:
.....
_.
i:;:ig.;,i:.;;.
. . . .
:::.:~:::::::::::~:..::: ::::.
:::::::::::::::::::::::::::::::::
[human] Mus musculus
[mouse] Oryctolagus cuniculus
[rabbit] Rattus norvegicus
[rat] Sus scrofa
Ipig] Gallus gallus
Aldopentoses, aldohexoses Aldopentoses, aldohexoses Aldopentoses, aldohexoses Aldopentoses, aldohexoses Aldopentoses, aldohexoses Glucose
[chicken] Homo sapiens
Glucose
[human] Leishmania donovani
Glucose
[trypanosome]
:.::. :i[ " : 9 ..~,,.. ~..
.....
SUBSTRATE(S)
/SYNONYMS/
type 1 [GTR1, GLUT1, SLC2al] Gtrlmusmu Facilitative glucose transporter type 1 [GTR1, GLUT1, GT1] Gtrl orycu Facilitative glucose transporter type 1 [GTR1, GLUT1] Gtrlratno Facilitative glucose transporter type 1 [GTR1, GLUT1] Gtrlsussc Facilitative glucose transporter type 1 [GTR1, GLUT1] Gtr2galga Facilitative glucose transporter type 2 [GTR2, GLUT2] Gtr2homsa Facilitative glucose transporter type 2 [GTR2, GLUT2, SLC2a2] Gtr21eido Membrane transporter 2 [D2]
..
..
DESCRIPTION
Gtr2musmu Facilitative glucose transporter type 2 [GTR2, GLUT2] Gtr2ratno Facilitative glucose transporter type 2 [GTR2, GLUT2] Gtr2sacsp Facilitative glucose transporter type 2 [GTR2, GLUT2] Gtr3canfa Facilitative glucose transporter type 3 [GTR3, GLUT3] Gtr3galga Facilitative glucose transporter type 3 [CEF-GT3, GTR3, GLUT3] Gtr3homsa Facilitative glucose transporter type 3 [GTR3, GLUT3, SLC2a3] Gtr3musmu Facilitative glucose transporter type 3 [GTR3, GLUT3] Gtr3oviar Facilitative glucose transporter type 3 [GTR3, GLUT3] Gtr3ratno Facilitative glucose transporter type 3 [GTR3, GLUT3] Gtr4homsa Insulin responsive facilitative glucose transporter type 4 [GTR4, GLUT4, SLC2a4] Gtr4musmu Insulin responsive facilitative glucose transporter type 4 [GTR4, GLUT4] Gtr4ratno Insulin responsive facilitative glucose transporter type 4 [GTR4, GLUT4, GT2I Gtr5homsa Facilitative glucose transporter type 5 [GTRS, GLUT5, SLC2aS] Gtr5orycu Facilitative glucose transporter type 5 [GTRS, GLUTS] Gtr5ratno Facilitative glucose transporter type 5 [GTR5, GLUTS] Gtr7ratno Facilitative glucose transporter type 7 [GTR7, GLUT7] Gtrlduma Glucose transporter [GTR,
KHT2]
Mus musculus
Glucose
[mouse] Rattus norvegicus
Glucose
[rat] Saccharum sp.
Glucose
[Sugar cane] Canis familaris
Glucose
[dog] Gallus gallus
Glucose
[chicken] Homo sapiens
Glucose
[human] Mus mUscUlHs
Glucose
[mouse] Ovis aries
Glucose
[sheep] Rattus norvegicus
Glucose
[rat] Homo sapiens
Glucose
[human] Mus musculus
Glucose
[mouse] Rattus norvegicus
Glucose
[rat] Homo sapiens
Fructose
[human] Oryctolagus cuniculus
Fructose
[rabbit] Rattus norvegicus
[rat] Rattus norvegicus
[rat] Kluyveromyces marxianus [yeast]
Fructose
Glucose Glucose
~_63
CODE
i-~ii:i-(:.?-:i
ii;~ii;:i:! Hex6ficco !?}~i~i?:i Hgtlklula Huplchlke Hup2chlke Hxt0sacce ::;::iii!i:!!i~):::,~ii
Hxtlsacce Hxt2sacce
~i!!'i!i~i:~: Hxt3sacce Hxt4sacce ~:..:G~:~: ~:
~iS:i!!i;~i:.
!!i:ii~,~ Hxt5sacce Hxt6sacce Hxt7sacce Hxt8sacce Hxtasacce iii~i!~s
.
B
Hxtchlke Hxtcsacce Hxtdsacce
.........
Itrlsacce Itr2sacce
N
Lacpklula Ma3tsacce Ma6tsacce
..........
Qayneucr Qutdemeni Raglklula Sgtlschma i;i:i~:!?i;i~
Sgt2schma Sgt4schma Snf3sacce
~_64
DESCRIPTION [SYNONYMS]
ORGANISM [COMMON NAMES] Ricmus communis
SUBSTRATE(S)
Hexoses Hexose cartier protein [castor oil plant] [HEX6] Glucose Kluyveromyces lactis High-affinity glucose [yeast] transporter [HGT1] H+/hexose Hexose-H+ cotransporter [HUP1] Chlorella kessleri [alga} Hexoses Chlorelia kessleri [alga} Hexose transporter [HUP2] Glucose Saccharomyces Hexose transporter cerevisiae [yeast] [HXT10, YFL01lW] Glucose, Saccharomyces High-affinity glucose mannose transporter [HXT1, YHR094C] cerevisiae [yeast] Glucose Saccharomyces High-affinity glucose cerevisiae [yeast] transporter [HXT2] Glucose Saccharomyces High-affinity glucose cerevisiae [yeast] transporter [HXT3] Glucose Saccharomyces Low-affinity glucose cerevisiae [yeast] transporter [HXT4, RAG 1, LGT1, YHR092C] Glucose Saccharomyces Probable glucose transporter cerevisiae [yeast] [I-IXT5] Glucose Saccharomyces Hexose transporter [HXT6] cerevisiae [yeast] Glucose Saccharomyces Hexose transporter [HXT7] cerevisiae [yeast] Glucose Saccharomyces Hexose transporter cerevisiae [yeast] [HXT8, HRA569, YJL214W] Glucose Saccharomyces Low-affinity hexose cerevisiae [yeast] transporter [HXT11, LGT3] Hexoses Chlorella kessleri Hexose transport protein [alga} homolog Glucose Saccharomyces Hexose transporter cerevisiae [yeast] [HXT13, HXT8, YEL069C] Glucose Saccharomyces Hexose transporter cerevisiae [yeast] [HXT14, HXT9, NO3451 Myo-mositol Saccharomyces Myo-inositol transporter 1 cerevisiae [yeast] [ITRI] Myo-mositol Saccharomyces Myo-mositol transporter 2 cerevisiae [yeast] [rrR21 Lactose Kluyveromyces lactis Lactose permease [yeast] [LACP, LAC12] Maltose Saccharomyces Maltose permease [Mal3T, cerevisiae [yeast] MA3T, MAL31, YBR298C, YBR21161 Maltose Saccharomyces Maltose permease cerevisiae [yeast] [MAL6T, MAL61] Neurospora crassa [mold} Quinate Quinate permease [QA-yl Emericella nidulans [mold} Quinate Oumate permease [QUTD] Glucose Kluyveromyces lactis Low-affinity glucose [yeast] transporter [RAG1] Glucose Schistosoma mansoni Glucose transporter [fluke] [SGT1, SGTP1] Glucose Schistosoma mansoni Glucose transporter [fluke] [SGT2, SGTP2] Glucose Schistosoma mansoni Glucose transporter [fluke} ISGT4, SGTP4I Glucose, Saccharomyces High-affinity glucose mannose, cerevisiae [yeast] transporter [SNF3] fructose
CODE
DESCRIPTION [SYNONYMS]
Stllsacce
Sugartransporter [STL1] Glucose-H*symporter [STP1]
Stplarath
OR GANISM [COMMON NAMES] Saccharomyces cerevisiae [yeast] Arabidopsis thah'ana
SUBSTRATE(S)
Glucose H*/glucose
[mouse-ear cress] Stp4arath Sugricco
Monosaccharide transporter
Arabidopsis thah'ana
[STP41
[mouse-ear cress]
Sugarcarrier protein [RCSTC]
Ricinus communis
Monosaccharides Hexoses
[castor oil plant] Tgtptaeso
Facilitative glucose transporter [TGTP1] Xyleescco Xylose-H+syrnporter [XYLE]
Taenia solium
Glucose
[tapeworm] Escherichia coli
H*/xylose
[gram-negative bacterium] Cotransported ions are listed for known symporters.
~_63
Phylogenetic .....
tree
...,...:.
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the phylogenetic tree: Araekleox [Araeescco}; Gtrlmusmu, GtrlolTcu, Gtrlsussc, Gtrlbosta, Gtrlratno {Gtrlhomsa); Gtr2musmu {Gtr2ratno}; Gtr3ratno {Gtr3musmu}; Gtr4ratno, Gtr4musmu {Gtr4homsa}; Hxt6sacce {Hxt7sacce).
q
I
::::::::::::!::::!::~:i::i::~::
I
i
I
t
I
i:::ii:iiiii:i.!i:i~,i:ii!~i....
~:li;::i~i~i}~i,i~:ii'i!!':~:ii '~:::~::~:?:<~::~i: .-~i::::;: !~i:.::ili::iiiii~.i.:.i::~.i~
:i~i!:.;~:,!?.i:iiiiiii ~.~:i~i {:~::~%::4::: ::i;{{:":.~ i!i::!!!<-~::i:ii~i::i:i:-i:i::i ~:ii~j.i:::::::!~::} :.:-ii.i~:i~iiI
i::
!~.Gi!:.::%:!11:i
I I
:::::::::::::::::::::::::::::::: .............................
............
............... .................
':i!!'~;::~?,i!iii!iiii'~!~!i~'::,i~
~_6(
[<
Itrlsacce itr2sacce Gtrlleido Sgtlschma Tgtptaeso Sgt4schma Gtr5homsa Gtr5ratno GtrSorycu Gtrlgalga Gtrlhomsa Gtr3canfa Gtr3oviar Gtr3homsa Gtr3musmu Gtr3galga Gtr4homsa Gtr2homsa Gtr2ratno Gtr2galga Gtr7ratno Sgt2schma Hxt0sacce Hxt2sacce Hxt4sacce Hxt7sacce Gtrkluma Gal2sacce Hxtlsacce Hxt3sacce Hxt5sacce maglklula Hxtasacce Hxt8sacce Hxtcsacce Hxtdsacce Snf3sacce Qayneucr Qutdemeni Stplarath Sugricco Mstlnicta Stp4arath Gtr2ricco Hex6ricco Huplchlke Hxtchlke Hup2chlke Araeescco Galpescco Glfzymmo Xyleescco Glcpsynsp Hgtlklula Lacpklula Stllsacce Ma6tsacce Gtr21eido
Proposed orientation of human GLUT1 in the membrane The model is based on predictions of membrane-spanning regions and a-helical content s, la,13. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75% of the aligned transporters (see below) are shown. Consensus residues indicated by an asterisk are not conserved in GLUT1. OUTSIDE
GR RG
NH2
S P R
GRR E
E F
INSIDE
COOH
Physical and genetic characteristics AMINO ACIDS
MOL. EXPRESSION WT SITES
Km
CHROMOSOMAL LOCUS
Araeescco
472
51684
Arabinose: 140-320/aM ~s
64.17 minutes
Araekleox Gal2sacce Galpescco
472 574 464
51 732 63 738 50 982
Glcpsynsp Glfzymmo Gtrlbosta
468 473 492
49747 50 200 54131 erythrocyte,brain
Galactose: 42-49/aM 15
Chromosome 12 66.48 minutes
~_6~
/ ::::::::::::::::::::::::::::::::::::::::::::::
i~~ ,:~i~ ,;~i,!i:!i!J':.i~!i Gtrlgalga Gtrlhomsa
iii!:iii
AMINO ACIDS 490 492
MOL. WT 54086 54117
547 492 492 492
58 787 53 991 erythrocyte, brain 54097 erythrocyte, brain 53 962 erythrocyte, brain Glucose: 20 mM 2-Deoxyglucose: 5rnM 3-Methylglucose: 20mM Fructose: 17 naM2,16 49 777 erythrocyte, brain 57699 liver, intestine, kidney,/~-cells 57489 liver, intestine, Glucose: 66 mM kidney, 3-cells 2-Deoxyglucose: 17mM 3-Methylglucose: 42 mM Fructose: 36 mM 2,16 60 215 promastigotes 57 075 fiver, intestine, kidney,//-cells 57 085 liver, intestine, 2-Deoxyglucose: 7ram 2 kidney,//-cells 55737 54 282 brain 54174 brain 53 924 brain 2-Deoxyglucose: 2rnM 3-Methylglucose: 11 mM Fructose: 6 mM 2,16 53 478 brain 54194 brain 53 580 brain 54 787 muscle, Glucose: 5 naM adipocytes 2-Deoxyglucose: 5naM ~6 54 951 muscle, adipocytes 54 895 muscle, Glucose: 2 rnM adipocytes 3-Methylglucose: 3mM le 54974 intestine Fructose: 6 mM 2 53 854 intestine 55 543 intestine 58393 liver microsomes 62727 55594 60 761 57 463 58 343
,~-::~::t ~:~.::~::.:~ .~::,.
~-~-~, .:.:::~::.~: :::::~
EXPRESSION SITES erythrocyte, brain erythrocyte, brain
Km
CHROMOSOMAL LOCUS
Glucose: 17 mM 2-Deoxyglucose: 7 mM 3-Methylglucose:
1p35-p31.3
18 m M 2 , 1 6
Gtrlleido ! ! )! ~:~i?~i~,!:i;:!i Gtrlmusmu Gtrlorycu i~.ii!i :i !:,i~'!:i,i~i Gtrlratno ;.;~il;!i:ii~i!?!;il :Y~i::!
!i?iiiiiiii::!i GIIIIIII;I~:II:II
i~;iiiiii:~!i il
i!i!
!~i:'}!i!~::i;:i;i:'
Gtrlsussc Gtr2galga
451 533
Gtr2homsa
524
................ 7::::::::::':7
N!?!;~!!!,:!!~;:,! .: :.:.:.:..:-:.::.:...:.::....
..................
Gtr21eido 558 Gtr2musmu 523 Gtr2ratno
522
.::.:.:.:.:-:::::. =============================
Gtr2ricco Gtr3canfa Gtr3galga ; j2!~':! ',~!i,:! ,:},:! Gtr3homsa ...............
...................
518 495 496 496
:::::::::::::::::::::::::::: .............. :~:~:~.~.:~
::::::::::::::::::::::::::::::::::::: z.zzzL :::::::::::::::::::::::::::::::::
.========================,.:
Gtr3musmu Gtr3oviar Gtr3ratno Gtr4homsa
493 494 493 509
: ! ! ! !i! i! fl)':i~:~! ! :::::::::::::::::::::::: ....
::::::::::::::::::::::::::::: :::::::::::::::::::::::::::::
..............
Gtr4musmu 510 Gtr4ratno
509
Gtr5homsa Gtr5orycu Gtr5ratno Gtr7ratno Gtrkluma Hex6ricco Hgtlklula Huplchlke Hup2chlke
501 486 502 528 566 510 551 533 540
..............
::::::::::::::::::::: ~!!!!!!!!!!!!s ..........................
.,:.:.::::.::::.::,.: ..:.:.
................
:?;:;:;:?;:;-;-::;:-:;::-;::-;--:;:
~_68
3q26.1-q26.3
12p13.3
17p13
lp31
!i~iii~!ii,~iiI
!iii~:i~Li/i~i! ~!ii.!~":i~.!~!.i~!:i:,i
i!iii~:!.i!',::i,?:i .
.
.
.
AMINO ACIDS
MOL. WT
Hxt0sacce Hxt 1sacce Hxt2sacce Hxt3sacce Hxt4sacce Hxt5sacce Hxt6sacce HxtTsacce Hxt8sacce Hxtasacce Hxtchlke Hxtcsacce Hxtdsacce Itrlsacce
546 570 541 567 576 592 570 570 569 567 534 564 515 584
60662 63 261 59 840 62 557 63 910 66 251 62 719 62 735 63 492 62 857
Itr2sacce
612
67 041
587 614 614 537 533 567 521 489 505 884 536 522 514 523 510 491
65 383 68 262 68 225 60103 59 476 63 123 56815 54 060 55 009 96 718 59 943 57 596 57095 57 768 55 191 53 608
EXPRESSION SITES
CHROMOSOMAL LOCUS
Km
Chromosome Chromosome Chromosome Chromosome Chromosome Chromosome Chromosome Chromosome Chromosome Chromosome
6 8 13 4 8 8 4 4 10 15
57 771
62 734 58 153 63 605
Myomositol:
Chromosome 5 Chromosome 14 Chromosome 4
lOO pM 6
Lacpklula Ma3tsacce Ma6tsacce i!~:!i::,i~:',i:'~?~:.:~i: Qayneucr Qutdemeni ii~:,!i.:.!G::!i Raglklula Sgtlschma Sgt2schma Sgt4schma Snf3sacce Stllsacce Stplarath Stp4arath Sugricco Tgtptaeso Xyleescco ..... ===================== ....
Myomositol: 400 pM 6 Lactose: 1 mM 6
Chromosome 14
Chromosome 4 Chromosome 4
Xylose: 70-170 pM 15 91.29 minutes
Multiple amino acid sequence alignments 1
50
I t r l s a c c e . . . . . . . . . . . . . . . M G I H I P Y L T S K T S Q S N V G D A V G N A D S V E F N ..... Itr2sacce MAEMKNSTAASSRWTKSRLS HFFPSYTNSS GMGAASTDQS STQGEELHHR Hxt4sacce ................................... MS..E EAAYQEDTAV Hxt 7 s a c c e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MS..Q DAAIAEQTPV Gtrkluma ................................... MSELE TGTAAHGTPV Gal2sacce ........................................ MAVEENNVPV Hxtlsacce ........................................ MNSTPDLISP Hxt3sacce ........................................ MNSTPDLISP Hxt5sacce ......................... MSELE NAHQGPLEGS ATVSTNSNSY Raglklula .................................... MSNQ MTDSTSAGSG Hxtasacce .............................................. MSGV Hxt8sacce ........................................ MTDRKTNLPE Snf3sacce ............... MDPNS NSSSETLRQE KQGFLDKALQ RVKGIALRRN
Itrlsacce Itr2sacce
51 ... S E H D S P S K R G K I H I E S H KHCEEDNDGQ KPKKSPVSTS
EIQ...RAPA TMQIKSRQDE
SDDEDRIQIK DEDDGRIVIK
i00 PVNDEDDTSV PVNDEDDTSV
~_6~
Sgtlschma Gtr5homsa Gtr5ratno Gtr4homsa :::::::::::::::::::::::::.~ Gtr2galga ~/i!i!i~!iii:~(! Hxt0sacce !;:ihi!:}:i-::?!] Hxt2sacce Hxt4sacce Hxt7sacce Gtrkluma Gal2sacce Hxtlsacce ~!?!::.~ii::i;::-
.. ........ .._
................. ....
_
....................' 3
. . . . .
._
!::i]
....................
N
~_7C
Itrlsacce Itr2sacce Gtrlleido Sgtlschma Tgtptaeso Sgt4schma Gtr5homsa Gtr5ratno Gtr5orycu Gtrlgalga Gtrlhomsa Gtr3musmu
................................................. M ............................................ MEQQDQ ............................................ MEKEDQ ...................................... MP S G F Q Q I G S E D ............................................. MDGKS ......... M VSSSVSILGT SAKASTSLSR K...DEIKLT PETREASLDI ..... M S E F A T S R V E S G S Q Q T S I H S T P I V Q K L E T D E S P I Q T K S E Y T N A E L QNTPADALSP VES..DSNSA LSTPSNKAER DDMKDFDENH EESNNY.VEI EH ..... L S A V D S . . A S H S V L S T P S N K A E R D E I K A Y G E G . E E H E P V . V E I EN ......... KS..VSSSQASTPTNVGSRDDLKVDDDNH ..SVDA.IEL VSQQPQAGED VISSLSKDSH LSAQSQKYSN DELKAGESGP EGSQSVPIEI Q K S N S S N S Y E L E S G R S K A M N T P E ...... G K N E S F H D N L S E S Q V Q P A V A P Q K S S E N S N A D L P S N S S Q V M N M P E ...... E K . . G V Q D D F . Q A E A D Q V L T N NEKSGNSTAP GTAGYNDNLA QAKPVSSYIS HEGPPKDELE ELQKEVDKQL T E H S V D T N T A L K A G S P N D L K . . . . . . . . VS HE .... E D L N D L E K T A E E T L N N T S A N D L S T T E S N S N S V A N A P .... S V K T E H N D S K N S L N L D A T E P P I D L E P I F E E A E D D G C P S I E N S S H L S V P T V E E N K D F S E Y N G E E A EE ..... V V V .... M S S A Q S S I D S D G D V R D A D I H V A P P V E K E W S D G F D D N E V I N G D N V E P ........................... MGR DYNVTIKYLD DKEENIEGQA NSNKDHTTDD TTGSIRTPTS LQRQNSDRQS NMTSVFTDDI STIDDNSILF ......................................... MTLLALKED ......................................... MSILALVED ..................................... MPA GGFVV..GDG ..................................... MPA VGGIPPSGGN ..................................... MAG GGGIGP..GN ....................................... M AGGFVSQTPG ................................... MAGGG MAALGVKTER ........................................ MAAGLAITSE ................................ MAGGGVVVVSGRGLSTGD ................................. MAGGAIV ASGGASRSSE ................................. MAGGGPV ASTTTNRASQ ........................................ MVTINTESAL ............................................... MPD ............................................. MNPSS ................................ MSLKNWLL LRDIQYEGTF ....................... MADHSSS SSSLQKKPIN TIEHKDTLGN .MKGLSSLIN RKKDRNDSHL DEIENGVNAT EFNSIEMEEQ GKKSDFDLSH ......................... MTLKK RSSAPELPTS LDEDEEEDSP ..................................................
i01 MITFNQSLSP IITFNQSISP .......... GVASNNGITG .... M K G I S G .MGSGKKFTK S.MKEGRLTL E..KTGKLTL EKKKEGRLTL MESGS.KMTA MEPSSKKLTG ..MGTTKVTP
FIITLTFVAS IS.GFMFGYD FIITLTFVAS IS.GFMFGYD MRASVMLCAALG.GFLFGYD KLVLTVLITC VGSSFLIGYN PLVLAIFTTC FGSSFLLGYN SLSLSVLLAC LGSSFTIGYN VLALATLIAAFGSSFQYGYN VLALATFLAAFGSSFQYGYN VLALRTLIAAFGSSFQYAYN RLMLAVGGAV LG.SLQFGYN RLMLAVGGAV LG.SLQFGYN SLVFAVTVAT IG.SFQFGYN
TGYIS TGYIS TGVIN LGVLN LGVAN LGVLN VAAVN VAAVN VSVCN TGVIN TGVIN TGVIN
150 ............... ............... ............... ............... ............... ............... ............... ............... ............... ............... ............... ...............
Gtr3canfa ..MGTQKVTV SLIFALSIAT IG.SFQFGYN TGVIN ............... Gtr3oviar ..MGTTKVTT PLIFAISIAT IG.SFQFGYN TGVIN ............... Gtr3homsa ..MGTQKVTP ALIFAITVAT IG.SFQFGYN TGVIN ............... Gtr3galga .MADKKKITA SLIYAVSVAAIG.SLQFGYN TGVIN ............... Gtr4homsa GEPPQQRVTG TLVLAVFSAV LG.SLQFGYN IGVIN ............... Gtr2homsa ..MTEDKVTG TLVFTVITAV LG.SFQFGYD IGVIN ............... Gtr2ratno ..MSEDKITG TLAFTVFTAV LG.SFQFGYD IGVIN ............... Gtr2galga KMQAEKHLTG TLVLSVFTAV LG.FFQYGYS LGVIN ............... Gtr7ratno ..MSDDSLTA TLSLFVFTAV LG.SFQFGYD IGVIN ............... S g t 2 s c h m a ...... M R Q L K F F L P Y C I I T L G S S F P F G Y H T G V I N . . . . . . . . . . . . . . . Hxt0sacce PYKPIIAYWT VMG.LCLMIA FG.GFIFGWD TGTIS ............... Hxt2sacce PAKPIAAYWT VIC.LCLMIA FG.GFVFGWD TGTIS ............... Hxt4sacce PKKPASAYVT VSI.CCLMVA FG.GFVFGWD TGTIS ............... Hxt7sacce PKRPASAYVT VSI.MCIMIA FG.GFVFGWD TGTIS ............... Gtrkluma PKKPRSAYITVSI.LCLMVAFG.GFVFGWDTGTIS ............... Gal2sacce PKKPMSEYVT VSL.LCLCVR FG.GFMFGWD TSTIS ............... Hxtlsacce PNTGKGVYVT VSI.CCVMVA FG.GFIFGWD TGTIS ............... Hxt3sacce PNTGKGAYVT VSI.CCVMVA FG.GFVFGWD TGTIS ............... Hxt5sacce EKKSKSDLLF VSV.CCLMVA FG.GFVFGWD TGTIS ............... Raglklula QQKPAKEYIF VSL.CCVMVA FG.GFVFGWD TGTIS ............... Hxtasacce PQKPLSAYTT VAI.LCLMIA FG.GFIFGWD TGTIS ............... Hxt8sacce PEKPASAYAT VSI.MCLCMA FG.GFMSGWD TGTIS ............... Hxtcsacce PKRGLIGYLV IYL.LCYPIS FG.GFLPGWD SGITA ............... Hxtdsacce AKISHNASLH IPVLLCLVIS LG.GFIFGWD IGTIG ............... Snf3sacce SEPPQKQSMM MSICVGVFVA VG.GFLFGYD TGLIN ............... Qayneucr RPTPKAVYNW RVYTCAAIAS FA.SCMIGYD SAFIG ............... Qutdemeni RPTPREVYNW RVYLLAAVAS FT.SCMIGYD SAFIG ............... Stplarath QKAYPGKLTP FVLFTCVVAAMG.GLIFGYD IGISG ............... Sugricco RKVYPGNLTL YVTVTCVVAAMG.GLIFGYD IGISG ............... Mstlnicta GKEYPGNLTL YVTVTCIVAAMG.GLIFGYD IGISG ............... Stp4arath VRNYNYKLTP KVFVTCFIGA FG.GLIFGYD LGISG ............... Gtr2riccoAAQYKGRMTL AVAMTCLVAAVG.GAIFGYD IGISG ............... Hex6ricco GGQYNGRMTS FVALSCMMAAMG.GVIFGYD IGVSG ............... Huplchlke Y...RGGLTV YVVMVAFMAACG.GLLLGYD NGVTG ............... Hxtchlke Y...QGGLTA YVLLVALVAACG.GMLLGYD NGVTG ............... Hup2chlke YGYARGGLNW YIFIVALTAG SG.GLLFGYD IGVTG ............... Araeescco TPRSLRDTRR MNMFVSVAAAVA.GLLFGLD IGVIA ............... Galpescco AKKQGRSNKA MTFFVCFLAALA.GLLFGLD IGVIA ............... Glfzymmo ..MSSESSQGLVTRLALIAAIG.GLLFGYD SAVIA ............... Xyleescco ..MNTQYNSS YIFSITLVAT LG.GLLFGYD TAVIS ............... Glcpsynsp SPSQSTANVK FVLLISGVAALG.GFLFGFD TAVIN ............... Hgtlklula YKKFPHVYNI YV..IGFIAC IS.GLMFGFD IASMS ............... Lacpklula DRDHKEALNS DNDNTSGLKI NGVPIEDARE EVLLP ............... Stllsacce ........................ MKDLKL SNFKG ............... Ma6tsacce LEYGPGSLIP NDNNEEVPDL LDEAMQDAKE ADESE ............... Gtr21eido QPLSNTPFFS MKNLIVATPI ILTPLLYGYN LGFVGPYSTM YGYASNCQLY Consensus ..................... G ..... G . . . G . . . . . . . . . . . . . . . . . .
Itrlsacce Itr2sacce Gtr lleido Sgt i s c h m a Tgtptaeso
151
.... S A L I S I SALISI .... A A L F Q M .... L P R R N I .... L P G D N I
200
GT ...................................... NR KD ...................................... EIYFNETVVP .NTPE ....... LDS ............... KKFLVNYYKP DNSSA ....... LNA ...............
[71
.... .....
.:
Sgt4schma
.... L P G E N I
Gtr5ratno
.... S P S E F M
QQFYNDTYYD
.... R P Q K V I
EDFYNHTWLY
Gtr5homsa
Gtr5orycu
Gtrlgalga
ii%-i;.;!;.i-"::i.i .........
.... .. ...........
.
.
.
.
EEFYNQTWVH
Gtr3canfa
.... A P E T I I
KDFLNYTLEE
Gtr3homsa
.... A P E K I I
Gtr3musmu
Gtr3oviar
.... A P E A I I .... A P E K I I
RYE ...........................
RYG ...........................
KDFLNYTLEE
RLE ...........................
KDFLNYTLEE
RSE ...........................
QAFYNRTLSQ
RSG ...........................
KEFINKTLTD
KSE ...........................
KGN ...........................
EQSYNETWLG
RQGPE
.... A P Q E V I
ISHYRHVLGV
PLDDRRATIN
Gtr2galga
.... A P Q K V I
EAHYGRMLGA
.... A P A D L I
KSFINTTLAARSV
.... G F V N . Q
TDFKRRF..G
Sgt2schma
...............
RTG ...........................
.... A P Q K V I
.... A P Q E V I
NLVTP
RNK ...........................
.... A P Q Q V I
Gtr7ratno
ISHYRHVLGV
ISHYRHVLGV
.........................
PLDDRKAINN
YVINSTDELP
IPMVRHATNT
SRDNATITVT
YDINGTDT.P
PLDDRRATIN
YDINGTDT.P
TISYSMNPKP LIVTPAHTTP
IPGTEAWGSS
LIVTPAHTTP
...........................
Hxt0sacce
.... G F I N . Q T D F K R R F . . G E L Q . . R . D G S
F ...................
Hxt4sacce
.... G F V A . Q
TDFIRRF..G
M.K..HHDGT
Y ...................
Gtrkluma
.... G F V N . Q
TDFIRRF..G
Q.E..KADGS
H ...................
Hxt2sacce Hxt7sacce
.... G F I N . Q
TDFIRRF..G
Hxt3sacce
.... G F V A . Q T D F L R R F . . G M . K . . H K D G S
Raglklula
.... G F V N . Q
Hxtasacce
H ...................
.... G F V A . Q T D F L R R F . . G M . K . . H H D G S
.... G F V R . Q
TDFIRRF..G
.... G F V N . L
SDFIRRF..G
H ................... Y ...................
S.T..RANGT
T ...................
Q.K..NDKGT
Y ...................
TDFLRRF..G
Q.E..KADGS
.... G F V N . Q
TDFLRRF..G
NYS..HSKNT
Hxtdsacce
.... G M T N . M
VSFQEKF..G
TTNIIHDDET
H ................... Y ...................
.... G F I N . M D N F K M N F . . G S Y K . . H S T G E
Snf3sacce
.... S I T S . M
Qutdemeni
.... T T L S . L
Qayneucr
N ...................
M.K..HKDGT
Hxt8sacce
Hxtcsacce
Y ...................
M.K..HKDGT
.... G F V V . Q
Hxt5sacce
TDFLRRF..G
QMK..S.DGT
Gal2sacce
Hxtlsacce
.... T T L A . L
..... NY..V
PSFTKEF
KSHVAPNHDS
............
QSFQNEF
............
Y ...................
I ...................
F ...................
D F ...................
N W ...................
Stplarath
.... G V T S . M
PSFLKRFFPS
VY.RKQQEDA
Mstlnicta
.... G V T S . M
DSFLSRFFPS
VF.RKQKADD
S ...................
Gtr2ricco
.... G V T S . M
DPFLEKFFPV
VF.HR.KNSG
G ...................
Sugricco
Stp4arath
Hex6ricco
.... G V T S . M .... G V T S . M .... G V T S . M
DSFLKKFFPS
EPFLEEFFPY
VY.RKKKADE VY.KKMKSAH
DPFLKKFFPD
VY.RKMKEDT
EQFERKFFPD
VYEKKQQIVE
Huplchlke
.... G V V S . L
EAFE.KFFPDVWAKKQEVHE
Hup2chlke
.... G V T S . M
PEFLQKFFPS
Hxtchlke
Araeescco
i!iii!i:;ii::i
.... A P E T I L
KNASEAENTA
RTG ...........................
Gtr2homsa Gtr2ratno
!!i~i4i !!i::ii!~ii~i
TEFYNDTYYD
.... A P Q K V I
Gtr4homsa
.......
.... S P S E L M
KEFLSRTMLG
QQFYNETYYG
Gtrlhomsa
Gtr3galga
:::::::::::::::::::::::~::..::-;
.... S P A L L M
Galpescco
clfzymmo Xyleescco
Glcpsynsp
.... G V A S . M
.... G A L P . . . . . . . .
.... G A L P . . . . . . . .
FITD
FIAD
IYDRTQQPSD
S ................... ....................
E ................... ....................
....................
S ...................
HFV ........................... E F Q . . . . . ._. . . . . . . . . . . . . . . . . . .
AZCT
PVDI H~ZAPRHLSA
GAVA
ALQK
.... G T V E . . . . . . . .
S ...................
SLNT VFVAPQNLSE
S ...................
HF ........
Q
...CFITYLC
AT...MQGYD
i!i:!!:i:!'.i!i:;~!;:~!H g t l k l u l a
.... S M I G . T
DVY .....................................
Stllsacce
.... K F I S R T
SHW..GLTGK
Gtr21eido
SAKKSCETLTAAKCRWFNAS
Lacpklula
Ma6tsacce K..~i:.::.: : .~.I ~i.:~:?
~_7~
Consensus
.... G Y L S K .
RGMPLM
QYY..KLYGL
KLRYFITIAS
TALKTYPKAAAWSLLVSTTL
TYVSNTTYGE
GALMGSIYTE
MTGFSLFGYD
QGLMASLITG
VCGWADRTTC
FLKYSDEAGC
IQEGYDTAIL
..................................................
GAFYALPVFQ
....................
+ ........
=====================================
............ ~-:: 9 ~-~-: .-.-~- -~:~-;~
..
...........
Itrlsacce
SIFAGTAADI
............ HFGFSEHS ......................
..W.QYALIV SFFYTHVS
AIAIAGAFVG TIFVVAAAIG
AFISGFISAA AFSCGWVADG
Sgt4schma
......................
SFLYAQVS
TAFVVAGAIG
AFSCGAIADC
Gtr5ratno Gtr5orycu Gtrlgalga
............ ............ ............
..TLLWSLTV ..TLLWSVTV ..TTLWSLSV
SMFPFGGFIG SMFPSGGFAG AIFSVGGMIG
SLMVGFLVNN SLLVGPLVNK SFSVGLFVNR
..TSLWSLSV
AIFSVGG~IG
Itr2sacce
Gtrlleido Sgtlschma
.......................... ::::::::::::::::::::::i:::t-::~.:~:.::
Gtr5homsa
...........
.......... ..
........................... ...........
........... ...............
Gtrlhomsa Gtr3musmu
............ ..... i:-..~-))-:" .................... .....................
Gtr3canfa Gtr3oviar
.................
Gtr3homsa i::~i!~i:i!::!'~i!i~::}!i', G t r 3 g a l g a Gtr4homsa Gtr2homsa .................
...........
............
: ............ :..~:: ....
Gtr2ratno Gtr2galga Gtr7ratno Sgt2schma HxtOsacce
i!i!:,;ili',i:i~!ii H x t 2 s a c c e :=!.i;:,!;i!:!!,i!!~!:-i!:!,!Hi~ix t 4 s a c c e ..
....... .:.:.~. ......... :i!:i!~!:!i}i:.~!!;!!::2:!: ........... .:.::.:.:::. :
:::::::::::::::::::::::::::::::::::
Snf3sacce
Qayneucr Qutdemeni i!i;il!i!~/!i~!;;:i~i: S t p l a r a t h ...........
Sugricco Mstlnicta Stp4arath Gtr2ricco Hex6ricco Huplchlke ii:,~iiiii:ili:iiii:!ii~,ii: Hxtchlke Hup2chlke Araeescco ............
....... ......:. ....... ............
....................... ;; .~.~.~..,..-~-,.~: .;.;-; ....:-;.-: :;.;;:.~ ..;:
.-. :.:....:x.<.: ........
.<...>,..,..
. . . . . . . . ................. ............. ..... .<.:.~-..~...
....
Galpescco Glfzymmo Xyleescco Glcpsynsp
..YGEKELITAATSLGALIT
NFLYGQVT
EFMEDFPL
..TLLWSVTV
............ ............
ESILPTTL DLPSEGLL
..TTLWSLSV ..TALWSLCV
............ ............
ENIESFTL ELIDEFPL EPISPATL
NLPTEVLL
TPPSSVLL
..TSLWSLSV
............ APPSEVLL ............ ETISPELL .......... GPSSIPPGTL T P W A E ..... E E T V A A A Q L I
..TSLWSLSV ..TSLWSLSV ..TTLWALSV ..TMLWSLSV
D A W . E ..... E E T E G S A H I V EGTLAPSAGF EDPTVSPHIL DAWE ...... EETEGSAHIV ............. TCDERFI
..TMLWSLSV ..TMYWSLSV ..TMLWSLSV ..DLLWSLCV
............ ............
QLS ....... YLS .......
............ ............ ............
YLS ....... YLS ....... YLS .......
............
Hxtcsacce
DLDNKVLT
............
Hxt5sacce
Raglklula Hxtasacce Hxt8sacce
DLDHKVLT
......................
............ ............ ............
i:!i:i~!~!!i:~,~:i!i H x t d s a c c e
~.:::: ::::::::::::::::::::::::::::::
............
Hxt7sacce Gtrkluma
Gal2sacce Hxtlsacce Hxt3sacce
.::..:~.....~.,.~..~..:.,:. .....................
............
250
..YGEKEIVTAATSLGALIT
Tgtptaeso
:::::::::::::::::::::::::::
201
............ ............ ............
............
DVRTGLIV DVRTGLIV
YLS ....... YLS ....... YLS .......
KVRTGLIV KVRTGLIV NVRTGLIV
YLS .......
DVRTGLMV
SVLVICAAIA
SMFPFGGFIG
AIFSVGGMIG AIFSVGGMIG AIFSVGGMIG
AIFSVGGMIG AIFSVGGMIG AIFSVGGMIS SSFAVGGMTA
SFSVGLFVNR SFSVGLFVNR
SFSVGLFVNR SFSVSLFFNR SFLIGIISQW SFFGGWLGDT
SSFAVGGMVA SMFAVGGMVS SSFAVGGMVA TSFLLGGFFG
SFFGGWLGDK SFTVGWIGDR SFFGGWLGDK GLIGGVLANK
SIFNIGCAIG SIFNIGCAIG SIFNIGCAIG
GIILAKLGDM GIILSKLGDM GIILSKLGDM
SIFNIGCAIG
GIVLSKLGDM
GIFNIGCALG GIFNIGCAFG
GLTLGRLGDI GLTLGRLGDM
SIFNIGCAVG SIFNIGCAIG SIFNVGSAIG
GIVLSNIGDR GIVLSKVGDI CLFLSKLGDI
SIFNISCGVG SFLSLGTFFG
ALTLSKIGDW ALTAPFISDS
..SQTLTMFT SSLYLAALIA ..SQTLTMFT SSLYLAALLS ..SQLLTLFT SSLYVAALVS ..NQGLAAFT SSLYLAGLVA ..SQLLTSFT SSLYVAGLVA ..NAKLQLFV SSLFLAGLVS ..NPKLQLFV SSLFLAGLIS ..DQKLQLFT SSFFLAGMFV ..SRLQEWVVSSMMLGAAIG
SLVASTITRK SLVASTVTRK SLFASTITRV SLVASPVTRN SFFASSVTRA CLFASWITRN CIFSAWITRN SFFAGSVVRR ALFNGWLSFR
NVRMGLLV
AMFSIGCAIG
............ ............ ............
ASYTPGAL ..ALLQSNIV ESLNTD ..... LISANIV TNQYCQYD ..SPTLTMFT
SVYQAGAFFG SLYQRGAFFG SSLYLAALIS
............ ............ ............ ............
I A A T
T A A D
SFSVGLFVNR SFSVGLFVNR
NVRTGLIV KVRMGLIV NVRTGLIV
YLS .......
...... ...... ...... ......
SLLVGPLVNK
AIFNIGCAFG SIFNIGCAIG SIFNIGCAIG
YLS ....... YLS ....... YLS .......
SNQYCQYD TNQYCKFD ENEYCRFD KNNYCKYD ISNYCKFD DSPYCTYD TSPYCTYD KDPYCTYD L ...... T
AFTCGWVADG
NVRTGLIV KVRTGLIV KVRTGLIV
............ FVSTKKLT ..DLQIGLII ................... T ..AQQMSILV
............ ............ ............ ............ ............ ............ ............ ............ ............
SVGAGTAADV
..SHTQEWVV SSMMFGAAVG ..ASLSGMVVVAVLVGCVTG ..NSLLGFCV ASALIGCIIG ..SLLTGLSV SLALLGSALG
GIILSKGGDM GIVLAKLGDM GIILAKLGDM
GLIFARLADT
CLFAYATSYF ALFAYPIGHF SLVASTVTRK
AVGSGWLSFK SLLSGWIGIR GALGGYCSNR AFGAGPIADR
~.73
H+/sugar-synlporter-uniporter filmily Hgtlklula Lacpklula Stllsacce Ma6tsacce Gtr21eido Consensus
............ KDYFSNPD ..SLTYGGIT ASMAGGSFLG SLISPNFSDA DAYLKYY ....... HLDINS ..SSGTGLVF SIFNVGQICG AFFVPL.MDW KQFNYEFPAT KENGDHDRHA ..TVVQGATT SCYELGCFAG SLFVMFCGER KKYGSLNSNT GDYEISVSWQ ..IGLCLCYM AGEIVGLQVT GPSVDYMGNR LSDSACKWSY SANTCGNQVG YSSIQSGVFA GSLVIGSTMG ALMGGYLTKR ................................... G ..............
Itrlsacce Itr2sacce Gtrlleido Sgtlschma Tgtptaeso Sgt4schma Gtr5homsa Gtr5ratno Gtr5orycu Gtrlgalga Gtrlhomsa Gtr3musmu Gtr3canfa Gtr3oviar Gtr3homsa Gtr3galga Gtr4homsa Gtr2homsa Gtr2ratno Gtr2galga Gtr7ratno Sgt2schma HxtOsacce Hxt2sacce Hxt4sacce Hxt7sacce Gtrkluma Gal2sacce Hxtlsacce Hxt3sacce Hxt5sacce Raglklula Hxtasacce Hxt8sacce Hxtcsacce Hxtdsacce Snf3sacce Qayneucr Qutdemeni Stplarath Sugricco Mstlnicta Stp4arath Gtr2ricco Hex6ricco Huplchlke
251 300 FGRKRCLMGS NLMFVIGAI .................. LQV SAHTFWQMAV F G R R P C L M F S N L M F L I G A I . . . . . . . . . . . . . . . . . . LQI T A H K F W Q M A A FGRRPCIAVA DALFVIGSV .................. LMGAAPNVEVVLV LGRRNGLILN NVIGIIGGV ............... IVGPCV LVKQPALLYV LGRKRSLMVNNGIGIVGSV ............... ISSVCV VANQPALLYV LGRRNGLIVN SLLAIIGGI ............... LVGPCV AYSQPALLFV FGRKGALLFN NIFSIVPAI ............... LMGCSR VATSFELIII LGRKGALLFN NIFSILPAI ............... LMGCSK IAKSFEIIIA FGRKGALLFN NIFSIVPAI ............... LMGCSK VARSFELIII FGRRNSMLMS NILAFLAAV ............... LMGFSK MALSFEMLIL FGRRNSMLMM NLLAFVSAV ............... LMGFSK LGKSFEMLIL FGRRNSMLLV NLLAIIAGC ............... LMGFAK IAESVEMLIL FGRRNSMLMV NLLAVAGGC ............... LMGFCK IAQSVEMLIL FGRRNSMLIV NLLAIAGGC ............... LMGFCK IAESVEMLIL FGRRNSMLIV NLLAVTGGC ............... FMGLCK VAKSVEMLIL FGRRNSMLLV NVLAFAGGA ............... LMALSK IAKAVEMLII LGRKRAMLVNNVLAVLGGS ............... LMGLANAAASYEMLIL LGRIKAMLVA NILSLVGAL ............... LMGFSK LGPSHILIIA LGRIKAMLAANSLSLTGAL ............... LMGCSK FGPAHALIIA LGRVKAMLVVNVLSIAGNL ............... LMGLAK MGPSHILIIA LGRIKAMLAANSLSLTGAL ............... LMGCSK FGPSHALIIA LGRKNSLFLL SIPTVIGSL ............... LMMFSK MAQSFEMIII YGRKIGLMC. VILVYVVGI ................ VIQIA SSDKWYQYFI YGRRIGLMC. VVLVYIVGI ................ VIQIA SSDKWYQYFI YGRKMGLIV. VVVIYIIGI ................ IIQIA SINKWYQYFI YGRKVGLIV. VVVIYIIGI ................ IIQIA SINKWYQYFI YGRRIGLMI. VVLIYVVGI ................ IIQIA SIDKWYQYFI YGRKKGLSI. VVSVYIVGI ................ IIQIA SINKWYQYFI YGRRIGLIV. VVVIYTIGI ................ IIQIA SINKWYQYFI YGRKMGLIV. VVVIYIIGI ................ IIQIA SINKWYQYFI YGRKIGLMT. VVVIYSIGI ................ IIQIA SIDKWYQYFI WGRRIGLIT. VIIIYVIGI ................ IIQIA SVDKWYQYFI YGRRIGLIT. VTAIYVVGI ................ LIQIT SINKWYQYFI YGRCMGLII. VIVVYMVGI ................ VIQIA SIDKWYQYFI L G R R L A I V I . V V L V Y M V G A . . . . . . . . . . . . . . . . IIQIS S N H K W Y Q Y F V IGRKGGIWF. ALVVYCIGI ................ TIQIL SYGRWYFLTL YGRKPTIIFS TIFIFSIGN ................ SLQVG AGGITL.LIV LGRRKSLIAF SVVFIIGA .............. AIMLAADG QGRGIDPIIA WGRRWGLMFS ALIFFLGA .............. GMMLGANG .DRGLGLIYG FGRRLSMLFG GILFC.AG ................. ALING FAKHVWMLIV FGRKLSMLFG GVLFC.AG ................. AIINGAAKAVWMLIL LGRRLSMLCG GVLFC.AG ................. ALING FAQNVAMLIV FGRKWSMFLG GFTFF.IG ................. SAFNG FAQNIAMLLI YGRKASIVCG GVSFL.IG ................. AALNVAAVNLAMLIL FGRKPSILLG GXVFL.AX ................. AALGGAAVNVYMLIF WGRKVTMGIG GAFFV.AG ................. GLVNA FAQDMAMLIV
Hxtchlke iv!iii!:;~!iiiil Hup2chlke Araeescco Galpescco Glfzymmo Xyleescco Glcpsynsp Hgtlklula ii~::}~ ~:~:~ff?::.~:i Lacpklula Stllsacce ~:--:..-;::~j~,.~:::~.~::.:::: ~..:.,:-~::.~,,.~::.-~-::.-~:. Ma6tsacce Gtr21eido :, .~ .~~::.~;:~.. ,., .. iii:!i).: :.:::r i:~:; Consensus
WGRKASMGIG GIFFIAAG . . . . . . . . . . . . . . . . . GLVNA FAQDIAMLIV WGRKPTMLIA SVLFLAGA . . . . . . . . . . . . . . . . . GL.NA GAQDLAMLVI LGRKYSLMAG AILFVLGSIG SAF . . . . . . . . . . . . . . . . . . ATSVEMLIA LGRKKSLMIG AILFVAGSLF SAA . . . . . . . . . . . . . . . . . . APNVEVLIL FGRRGGLLMS SICFVAAGFGAALTEKLFGT GGSA .......... LQIFCF FGRRDSLKIA AVLFFISGVG SAWPELGFTS INPDNTVPVY LAGYVPEFVI HGRIKTMILA AVLFTLSSIG SGLPFTIWD . . . . . . . . . . . . . . . . . . FIF FGRKVSLHICAALWIIG .................. AILQCAAQDQAMLIV KGRKPAILIG CLGVVIGAII S .............. SLTT .... TKSALIG IGRKPLILMG SVITIIGAVI S .............. TCAFR GYWALGQFII YTLIMALFFLAAFIFI ...................... LY FCKSLGMIAV LDYCKSFLFI GLLSVIGNV . . . . . . . . . . . . . . . LTHVAT GLFHYWVLFV .GR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
:..-::(..::.j~. ~..::.--.:-.~.
~.~!!::i: ;!i!i!!.iff~ .....
Itrlsacce f::::: :::::::::::::::::::::::: Itr2sacce Gtrlleido Sgtlschma i!i:i::22i!;;ii~::;:ii~!i;;i: !!i!S!!i~;!~!!!!'i!~!:iTgtptaeso Sgt4schma .~::::~::.~:~.~:~:~:~ Gtr5homsa )ii:.:.r ii::ii~i:i:.ii Gtr5ratno ~:...:~..~:..j~::~:~.~.:~:: Gtr5orycu Gtrlgalga !'i!::!!!:, iiii~/~ii; ~ Gtrlhomsa Gtr3musmu Gtr3canfa Gtr3oviar Gtr3homsa :i!!ii~!::!=:'~?iGtr3galga Gtr4homsa Gtr2homsa :.:~:~.,:.:~:..~:.:.~.~.:. ~,::~,~: :.~::,~,::...~:.. Gtr2ratno Gtr2galga Gtr7ratno Sgt2schma Hxt0sacce Hxt2sacce i':::::::::::::::::::::::::::: Hxt4sacce Hxt7sacce Gtrkluma Gal2sacce Hxtlsacce Hxt3sacce Hxt5sacce Raglklula Hxtasacce ......... Hxt8sacce .:.;-~:;~;~..--.-~.:.:.:;. .
Hxtcsacce
~:~::;!!ii?i~}!i
Hxtdsacce Snf3sacce Qayneucr Qu~demeni
301 GRLIMGFGVG GRLIMGFGVG SRVIVGLAIG GRFVIGINSG GRAISGLNSG GRVFNGFNFG SRLLVGICAG SRLLVGICAG SRLLVGICAG GRFIIGLYSG GRFIIGVYCG GRLLIGIFCG GRLIIGLFCG GRLIIGLFCG GRLVIGLFCG GRFIIGLFCG GRFLIGAYSG GRSISGLYCG GRSVSGLYCG GRAITGLYCG GRSVSGLYCG GRFTIGIACG GRIVSGMGVG GRIISGMGVG GRIISGLGVG GRIISGLGVG GRIISGLGVG GRIISGLGVG GRIISGLGVG GRIISGLGVG GRIISGLGVG GRIISGLGVG GRIISGLGVG GRIIAGIGAG GKIIYGLGAG GRAVTGIGVG GRVISGIGIG GRVLAGIGVG GRVLAGIGVG
IGSLIAPLFI IGSLISPLFI ISSATIPVYL ITIGIASLYL LSIGIAAMFL ISMGIAPMYL VSSNVVPMYL ISSNWPMYL VSSNWPMYL LTTGFVPMYV LTTGFVPMYV LCTGFVPMYI LCTGFVPMYI LCTGFVPMYI LCTGFVPMYI LCTGFVPMYI LTSGLVPMYV LISGLVPMYI LISGLVPMYI LSSGLVPMYV LISGLVPMYI AHTVVGPMFL GVAVLSPTLI GIAVLSPTLI GIAVLSPMLI GIAVLSPMLI GISVLSPMLI GIAVLCPMLI GITVLSPMLI GIAVLSPMLI GITVLAPMLI GITVLSPMLI GIAVLSPMLI SISVLAPMLI GCSVLCPMLL VTTVLVPMFL AISAVVPLYQ GASNMVPIYI AGSNICPIYI
350 SEIAPKMIRG RLTVINSLWL TGGQLVAYGC SEIAPKMIRG RLTVINSLWL TGGQLIAYGC AEVTSPKHRG ATIVLNNLFL TGGQFVAAGF TEVAPRDLRG GIGACHQLAV TVGIAFSYFI TEIAPRHLRG MIGACNQLAI TIGIVISYVL TEIAPLSLRG GIGSLHQLAL TIGILVSYLM GELAPKNLRG ALGVVPQLFI TVGILVAQIF GELAPKNLRG A L G W P Q L F I TVGILVAQLF GELAPKNLRG ALGVESQLFI TLGILVAQIF GEVSPTALRG ALGTFHQLGI VLGILIAQVF GEVSPTAFRG A L G T L H Q L G I W G I L I A Q V F GEVSPTALRG A F G T L N Q L G I V V G I L V A Q I F GEISPTALRG AFGTLNQLGI VIGILVAQIF GEISPTALRG AFGTLNQLGI VIGILVAQIF GEISPTALRG A F G T L N Q L G I W G I L V A Q I F SEVSPTSLRG AFGTLNQLGIVVGILVAQIF GEIAPTHLRG ALGTLNQLAI VIGILIAQVL GEIAPTALRG ALGTFHQLAI VTGILISQII GEIAPTTLRG ALGTLHQLAL VTGILISQIA SEVSPTALRGALGTLHQLAI VTGILISQVL GEISPHTLRGAAGTLLQLGI TVGIIISQIL SEIAPVNFRGAAGTFNQFVI VSAILLSQVL SEISPKHLRG TCVSFYQLMI TLGIFLGYCT SETAPKHIRG TCVSFYQLMI TLGIFLGYCT SEVSPKHIRG TLVSCYQLMI TLGIFLGYCT SEVSPKHLRG TLVSCYQLMI TAGIFLGYCT SETAPKHIRG TLVSFYQLMI TFGIFLGYCT SEIAPKHLRG TLVSCYQLMI TAGIFLGYCT SEVAPSEMRG TLVSCYQVMI TLGIFLGYCT SEVAPKEMRG TLVSCYQLMI TLGIFLGYCT SEVSPKQLRG TLVSCYQLMI TFGIFLGYCT SETAPKHLRG TLVSCYQLMI TFGIFLGYCT SEVAPKQIRG TLVQLYQLMC TMGIFLGYCT SETAPKHIRG TLLACWQLMV TFAIFLGYCT SEIAPTDLRG GLVSLYQLNM TFGIFLGYCS SENSPLKIRG SMVSTYQLIV TFGILMGNIL AEATHKSLRG AIISTYQWAI TWGLLVSSAV SELAPPAVRG RLVGIYELGW QIGGLVGFWI SEMAPSAIRG RLVGVYELGW QIGGVVGFWI
Stplarath Sugricco ~i:~:iii~{~!f:iii::!::i i~i-i Mstlnicta !ii~!i:i!!ii~!i:!;:i~S!t p 4 a r a t h iiii!i):-i;~i'.:iS:~ Gtr2ricco Hex6ricco Huplchlke Hxtchlke Hup2chlke Araeescco Galpescco :i:?.:ii:ii!:C:i Glfzymmo Xyleescco Glcpsynsp Hgtlklula Lacpklula ili~i~!~.!!~ii~ii~:,~ i~!~ Stllsacce Ma6tsacce Gtr21eido Consensus ::::::::::::::::::::: :~':~;!~ii~!ii?i!i::ii~i::~:: ~i~ii~:!~'~~!1 !!iii
Itrlsacce Itr2sacce Gtrlleido Sgtlschma Tgtptaeso Sgt4schma ......... Gtr5homsa Gtr5ratno Gtr5orycu Gtrlgalga Gtrlhomsa -::. : Gtr3musmu Gtr3canfa Gtr3oviar Gtr3homsa Gtr3galga Gtr4homsa Gtr2homsa Gtr2ratno Gtr2galga GtrTratno Sgt2schma Hxt0sacce :'::::~'!i!!i!i? H x t 2 s a c c e ....i:::%!): : Hxt4sacce !!!!!i~)i':!!!?i:ii~:!H x t T s a c c e Gtrkluma Gal2sacce Hxtlsacce : 7?" L?:. .: ==============================Hxt3sacce Hxt5sacce Raglklula
~_76
GRILLGFGIG FANQAVPLYL SEMAPYKYRG GRILLGFGIG FANQSVPLYL SEMAPYKYRG GRILLGFGIG FANQSVPLYL SEMAPYKYRG GRILLGFGVG FANQSVPVYL SEMAPPNLRG GRIMLGVGIG FGNQAVPLYL SEMAPAHLRG GRVLLGVGVG FANQAVPLYL SEMAPPRYRG GRVLLGFGVG LGSQVVPQYL SEVAPFSHRG GRVLLGFGVG LGSQVVPQYL SEVAPFSHRG GRVLLGFGVG GGNNAVPLYL SECAPPKYRG ARVVLGIAVG IASYTAPLYL SEMASENVRG SRVLLGLAVG VASYTAPLYL SEIAPEKIRG FRFLAGLGIGWSTLTPTYI AEIRPPDKRG YRIIGGIGVG LASMLSPMYI AELAPAHIRG WRVLGGIGVGAASVIAPAYI AEVSPAHLRG GRVISGMGIG FGSSAAPVYC SEISPPKIRG GRWFVAFFAT IANAAAPTYC AEVAPAHLRG GRVVTGVGTG LNTSTIPVWQ SEMSKAENRG GQALCGMPWG CFQCLTVSYA SEICPLALRY ARIVLGFPLG WQSITSSHYT DKFAPANHAK G R . . . G . . . G ...... P .... E . . P . . . R G
ALNIGFQLSI TIGILVAEVL ALNIGFQLSI TIGILVANVL ALNLGFQLSI TIGILVANVL AFNNGFQVAI IFGIVVATII GLNIMFQLAT TLGIFTANLI AINNGFQFSV GIGALSANLI MLNIGYQLFV TIGILIAGLV MLNIGYQLFV TIGILIAGLV GLNMMFQLAV TIGIIVAQLV KMISMYQLMV TLGIVLAFLS SMISMYQLMI TIGILGAYLS QMVSGQQMAI VTGALTGYIF KLVSFNQFAI IFGQLLVYCV RLGSLQQLAI VSGIFIALLS TISGLFQFSV TVGIMVLFYI KVAGLYNTLW SVGSIVAAFS LLVNLEGSTI AFGTMIAYWI YLTTYSNLCW TFGQLFAAGI TLGTLFQVSV STGIFVTSFF ...... QL .... GI ......
351 400 G A . . . G L N Y V ......... N N G W R I L V G L S L I P T A V Q . F T C L C F L P D T P R G A . . . G L N H V ......... K N G W R I L V G L S L I P T V L Q . F S F F C F L P D T P R T A I M V V F T S K ......... N I G W R V A I G I G A L P A V V Q A F C L L F F L P E S P R TFTF . . . . . . . . . . . LLNTL N L W P L A V A L G A V P A A I S L V T L P . F C P E S P R TLSH . . . . . . . . . . . LLNTP T L W P V A M G V G A I P A V I A L I I S P . F T V E S P R TLTY . . . . . . . . . . . TLNTP T L W P I S V A V G S V P A L I A L I L L P . Y C P E S P R GLRN . . . . . . . . . . . L L A N V D G W P I L L G L T G V P A A L Q L L L L P . F F P E S P R GLRS . . . . . . . . . . . VLASE E G W P I L L G L T G V P A G L Q L L L L P . F F P E S P R GLRS . . . . . . . . . . . I.RQQ K G W P I L L G L T G G P A A A A . . C P P . F F P E S P R GLDL . . . . . . . . . . . IMGND S L W P L L L G F I F V P A L L Q C I I L P . F A P E S P R GLDS . . . . . . . . . . . IMGNK D L W P L L L S I I F I P A L L Q C I V L P . F C P E S P R GLDF . . . . . . . . . . . ILGSE E L W P G L L G L T I I P A I L Q S A A L P . F C P E S P R GLKV . . . . . . . . . . . IMGTE E L W P L L L G F T I I P A V L Q S A A L P . F C P E S P R G L K V . . . . . . . . . . . ILGTE D L W P L L L G F T I L P A I I Q C A A L P . F C P E S P R GLEF . . . . . . . . . . . ILGSE E L W P L L L G F T I L P A I L Q S A A L P . F C P E S P R GLEG . . . . . . . . . . . IMGTE A L W P L L L G F T I V P A V L Q C V A L L . F C P E S P R GLES . . . . . . . . . . . L L G T A S L W P L L L G L T V L P A L L Q L V L L P . F C P E S P R GLEF . . . . . . . . . . . ILGNY D L W H I L L G L S G V R A I L Q S L L L F . F C P E S P R GLSF . . . . . . . . . . . ILGNQ D Y W H I L L G L S A V P A L L Q C L L L L . F C P E S P R GLDF . . . . . . . . . . . LLGND E L W P L L L G L S G V A A L L Q F F L L L . L C P E S P R GLDN . . . . . . . . . . . SSGNV N T W P H L L S L S R I P A A L Q P A I L P . F P P E S P P SLPE . . . . . . . . . . . V M G T T E L W P Y L L A L C T V S S V I H I L L L F . T C P E S P T N Y G T K K Y .... S ...... NS I Q W R V P L G L C F A W A I F M V I G M V . M V P E S P R N Y G T K D Y .... S ...... NS V Q W R V P L G L N F A F A I F M I A G M L . M V P E S P R N Y G T K T Y .... T ...... NS V Q W R V P L G L G F A W A L F M I G G M T . F V P E S P R N F G T K N Y .... S ...... NS V Q W R V P L G L C F A W A L F M I G G M T . F V P E S P R N Y G T K T Y .... S ...... NS V Q W R V P L G L C F A W A I F M I T G M L . F V P E S P R N Y G T K S Y .... S ...... NS V Q W R V P L G L C F A W S L F M I G A L T . L V P E S P R N F G T K N Y .... S ...... NS V Q W R V P L G L C F A W A L F M I G G M M . F V P E S P R N F G T K N Y .... S ...... NS V Q W R V P L G L C F A W A L F M I G G M T . F V P E S P R N F G T K N Y .... S ...... NS V Q W R V P L G L C F A W S I F M I V G M T . F V P E S P R N Y G T K N Y .... S ...... NS V Q W R V P L G L C F A W A I F M V L G M M . F V P E S A R
Hxtasacce Hxt8sacce Hxtcsacce Hxtdsacce Snf3sacce Qayneucr Qutdemeni Stplarath Sugricco Mstlnicta Stp4arath Gtr2ricco Hex6ricco Huplchlke Hxtchlke Hup2chlke Araeescco Galpescco Glfzymmo Xyleescco Glcpsynsp Hgtlklula Lacpklula Stllsacce Ma6tsacce Gtr21eido Consensus
N Y G T K N Y .... H ...... NA T Q W R V G L G L C F A W T T F M V S G M M . F V P E S P R N Y G T K T Y .... S ...... NS V Q W R V P L G L C F A W A I I M I G G M T . F V P E S P R V Y G T R K Y .... D ...... NT A Q W R V P L G L C F L W A L I I I I G M L . L V P E S P R N F I C E R C Y K D PT ...... QN I A W Q L P L F L G Y I W A I I I G M S L V . Y V P E S P Q SQGTHA .... RN ...... DA S S Y R I P I G L Q Y V W S S F L A I G M F . F L P E S P R N Y G V N T T M A P TR ........ S Q W L I P F A V Q L I P A G L L F L G S F . W I P E S P R N Y G V D E T L A P SH ........ K Q W I I P F A V Q L I P A G L L I I G A L . L I R E S P R NYFFAKIKGG W .......... GWRLSLGGAVVPALIITIG SL.VLPDTPN NYFFAKIKGG W .......... GWRLSLGGA MVPALIITVG SL.VLPDTPN NYFFAKIH.. W . . . . . . . . . . G W R L S L G G A M V P A L I I T I G S L . F L P E T P N NYFTAQMKGN I .......... GWRISLGLA CVPAVMIMIG AL.ILPDTPN NYGTQNIK.P W .......... GWRLSLGLAAAPALLMTLA GL.FLPETPN NYGTEKIEGG W .......... GWRISLAMA AVPAAILTFG AL.FLPETPN NYAVRDWEN ............ GWRLSLGLAAAPGAILFLG SL.VLPESPN NYGVRNWDN ............ GWRLSLGLA AVPGLILLLG AI.VLPESPN NYGTQTMNN ............ GWRLSLGLA GVPAIILLIG SL.LLPETPN DTAFSYSG ............. NWRAMLGVL ALPAVLLIILVV.FLPNSPR DTAFSYTG ............. AWRWMLGVI IIPAILLLIG VF.FLPDSPR T W L L A H F G S I D ..... WVNA S G W C W S P A S E G L I G I A F L L L L L . T A P D T P H N Y F I A R S G D A S ..... WLNT D G W R Y M F A S E C I P A L L F L M L L Y . T V P E S P R NWFIALMAGG SAQNPWLFGAAAWRWMFWTE LIPALLYGVC AF.LIPESPR G Y G C H F I D G A .......... A A F R I T W G L Q M V P G L I L M V G V F . F I P E S P R T Y G T N K . . . . . . . . NFPNSS K A F K I P L Y L Q MMFPGLVCI. F G W L I P E S P R DFGL . . . . . . . . . . SYTNSS V Q W R F P V S M Q IVFALFLLA. F M I K L P E S P R M K N S Q N . . . . . . . . KYANSE L G Y K L P F A L Q W I W P L P L A V G I F . L A P E S P W G L V L G N T I Q .... Y D A A S N A N T M G R M Q G L V S V S T L L S I F V V F L P L I T K D G . . . . . . . . . . . . . . . . . . . . . . W . . . . . . . . . . A . . . . . . . . . . . PESPR
Itrlsacce Itr2sacce Gtrlleido Sgtlschma Tgtptaeso Sgt4schma Gtr5homsa Gtr5ratno Gtr5orycu Gtrlgalga Gtrlhomsa Gtr3musmu Gtr3canfa Gtr3oviar Gtr3homsa Gtr3galga Gtr4homsa Gtr2homsa Gtr2ratno Gtr2galga Gtr7ratno Sgt2schma Hxt0sacce Hxt2sacce Hxt4sacce
401 450 Y Y V M K . G D L A R A T E V L K R S ...... YTDTS E E I I E R K V E E L V T L N Q S I P G Y Y V M K . G D L K R A K M V L K R S ...... YVNTE D E I I D Q K V E E L S S L N Q S I P G W L L S K . G H A D R A K A V A D K F ...... EVDLC E F . . . Q E G D E LPS ....... F L Y M K K H K E A E A R K A F L Q L N V K E . N V D T F I G E L R E E I E V A KNQPVFK... W L Y L K K K D E K A A R E A F A R I N G S E . N V D M F I A E M R E E L E V A QNQPEFK... F L F I K K G K E A K A R K A F Q R L N C I D . D I N E T F N E M K R E M H E A EKRPKFK... Y L L I Q K K D E A A A K K A L Q T L R G W D . S V D R E V A E I R Q E D E A E KAAGFIS... Y L L I Q K K N E S A A E K A L Q T L R G W K . D V D M E M E E I R K E D E A E KAAGFIS... YLLIG.QEPR CRQKALQSLR GWD.SVDREL EEIRREDEAARAAGLVS... F L L I N R N E E N K A K S V L K K L R G T T . D V S S D L Q E M K E E S R Q M MREKKVT... F L L I N R N E E N R A K S V L K K L R G T A . D V T H D L Q E M K E E S R Q M MREKKVT... F L L I N K K E E D Q A T E I L Q R L W G T S . D V V Q E I Q E M K D E S V R M SQEKQVT... F L L I N R K E E E N A K E I L Q R L W G T Q . D V S Q D I Q E M K D E S A R M AQEKQVT... F L L I N R K E E E K A K E I L Q R L W G T E . D V A Q D I Q E M K D E S M R M SQEKQVT... F L L I N R K E E E N A K Q I L Q R L W G T Q . D V S Q D I Q E M K D E S A R M SQEKQVT... F L L I N K M E E E K A Q T V L Q K L R G T Q . D V S Q D I S E M K E E S A K M SQEKKAT... Y L Y I I Q N L E G P A R K S L K R L T G W A . D V S G V L A E L K D E K R K L ERERPLS... Y L Y I K L D E E V K A K Q S L K R L R G Y D . D V T K D I N E M R K E R E E A SSEQKVS... Y L Y L N L E E E V R A K K S L K R L R G T E . D I T K D I N E M R K E K E E A STEQKVS... Y L Y I K L G K V E E A K K S L K R L R G N C . D P M K E I A E M E K E K Q E A ASEKRVS... W L T I D I D D E G N A K R I L Q S L Q G Y D . E V S H E L Q E I K D E S Q K E EAETFLT... Y L Y I I K G D R R R S E N A L V Y L R G Q D C D V H A E L E L L K L E T E Q S STHKS.N... YLVEK.GKYE EARRSLAKSN KVTVTDPGVVFEFDTIVANM ELERAVGNAS FLVEK.GRYE DAKRSLAKSN KVTIEDPSIV AEMDTIMANV ETERLAGNAS YLVEV.GKIE EAKRSIALSN KVSADDPAVM AEVEVVQATV EAEKLAGNAS
9
-!
I
i
i
~.7~
Hxt7sacce Gtrkluma Gal2sacce Hxtlsacce Hxt3sacce Hxt5sacce Raglklula Hxtasacce Hxt8sacce Hxtcsacce Hxtdsacce Snf3sacce Qayneucr Qutdemeni Stplarath Sugricco Mstlnicta Stp4arath Gtr2ricco Hex6ricco Huplchlke Hxtchlke Hup2chlke Araeescco Galpescco Glfzymmo Xyleescr Glcpsynsp Hgtlklula Lacpklula Stllsacce Ma6tsacce Gtr21eido Consensus
YLAEV.GKIE EAKRSIAVSN KVAVDDPSVL AEVEAVLAGV EAEKLAGNAS FLVEK.DRID EAKRSIAKSN KVSYEDPAVQ AEVDLICAGV EAERLAGSAS YLCEV.NKVE DAKRSIAKSN KVSPEDPAVQ AELDLIMAGI EAEKLAGNAS YLVEA.GRID EARASLAKVNKCPPDHPYIQ YELETIEASV EEMRAAGTAS YLVEA.GQID EARASLSKVN KVAPDHPFIQ QELEVIEASV EEARAAGSAS YLVEV.GKIE EAKRSLARAN KTTEDSPLVT LEMENYQSSI EAERLAGSAS FLVET.DQIE E A R K S L A K T N K V S I D D P V V K Y E L L K I Q S S I E L E K A A G N A S YLIEV.GKDE EAKRSLSKSN KVSVDDPALL AEYDTIKAGI ELEKLAGNAS FLVQV.GKIE QAKASFAKSN KLSVDDPAVV AEIDLLVAGV EAEEAMGTAS YLIEC.ERHE EARASIAKIN KVSPEDPWVL KQADEINAGV LAQRELGEAS YLAKIKNDVP SAKYSFARMN GIPATDSMVI EFIDDLLENN YNNEETNNES YYV.LKDKLD EAAKSLSFLR GVPVHDSGLL EELVEIKATY DYEASFGSSN W L Y A N G K R E E .AMKVLCWIR N L E P T D R Y I V Q E V S F I D A D L E R Y T R Q V G N G W L F L R G N R E K .GIETLAWIR N L P A D H I Y M V E E I N M I E Q S L E Q Q R V K I G L G SMIERGQHEE .AKTKLRRIR GVD .... DVS Q E F D D L V A A S K E S Q . . S I E H SMIERGQHEE . A R A H L K R V R G V E .... D V D E E F T D L V H A S E D S K . . K V E H SMIERGNHDE .AKARLKRIR GID .... DVD E E F N D L W A S E A S R . . K I E N SLIERGYTEE .AKEMLQSIR GTN .... EVD E E F Q D L I D A S E E S K . . Q V K H SLIERGRVEE .GRRVLERIR GTA .... DVD A E F T D M V E A S E L A N . . T I E H SLIQRSNDHE R A K L M L Q R V R GTT .... DVQ A E L D D L I K A S I I S R . . T I Q H F L V E K G K T E K .GREVLQKLR GTS .... EVD A E F A D I V A A V E I A R P I T M R Q F L V E K G R T D Q .GRRILEKLR GTS .... HVE A E F A D I V A A V E I A R P I T M R Q S L I E R G H R R R .GRAVLARLR RTE .... AVD T E F E D I C A A A E E S T R Y T L R Q W L A E K G R H I E A E E V L R M L R D TSE .... KAR E E L N E I R E S L K L . . K Q G G W A W F A A K R R F V D A E R V L L R L R D TSA .... EAK R E L D E I R E S L Q V . . K Q S G W A W L V M K G R H S E A S K I L A R L . E PQA .... DPN L T I Q K I K A G F D K A M D K S S A G W L M S R G K Q E Q A E G I L R K I . M GNT .... LAT Q A V Q E I K H S L D H G R . K T G G R Y L V A Q G Q G E K A A A I L W K V . E GGD .... VPS R . I E E I Q A T V S L D H K P R F S D W L A N H D R W E E T S L I V A N I V A N G D V N N E Q V R F Q L E E I K E Q V IIDSAAK.NF WLVGVGREEE AREFIIKYHL NGDRTHPLLD MEMAEIIESF HGTDLSNPLE W L I S Q S R T E E A R Y L V G T L D D ADPN ..... D E E V I T E V A M L H D A V N R T K H E WLV.KKGRID QARRSLERIL SGKGPEKELL VSMELDKIKT TIEKEQKMSD YSKSRRGDYE GENSEDASRKAAEE .......................... .L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E . . . . . . . . . . . . . . . . . .
Itrlsacce Itr2sacce Gtrlleido Sgtlschma Tgtptaeso Sgt4schma Gtr5homsa Gtr5ratno Gtr5orycu Gtrlgalga Gtrlhomsa Gtr3musmu Gtr3canfa Gtr3oviar Gtr3homsa Gtr3galga Gtr4homsa Gtr2homsa
451 KNVPEKVWNT KNPITKFWNM ...... VRID .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... ..........
IKELHTVPSN VKELHTVPSN YRPLMARDMR FTQLFTQRDL FTELFRRRDL FFRLFTQRDL VLKLFRMRSL VWKLFRMQSL VRALCAMRGL IMELFRSPMY ILELFRSPAY VLELFRSPNY VLELFRSRSY VLELFRAPNY VLELFRVSSY VLELFRSPNY LLQLLGSRTH IIQLFTNSSY
LRALIIGCGL FRALIIGCGL FR.VVLSSGL RMPVLIACLI RMPVIIAVLI RMPVLIACII RWQLLSIIVL RWQLISTIVL AWQLISVVPL RQPILIAIVL RQPILIAWL VQPLLISIVL RQPIIISIML RQPIIISIML RQPIIISIVL RQPIIISITL RQPLIIAWL RQPILVALML
500 QAIQQFTGWNSLMYFSGTIF QAIQQFTGWNSLMYFSGTIF QIIQQFSGIN TIMYYSSVIL QVLQQLSGIN AVITYSSLML QVMQQLSGIN AVVANSSEML QVFQQLSGIN AVITYSSTML MGGQQLSGVN AIYYYADQIY MAGQQLSGVN AIYYYADQIY MW.QQLSGVN AIYYY.DQIY QLSQQLSGIN AVFYYSTSIF QLSQQLSGIN AVFYYSTSIF QLSQQLSGIN AVFYYSTGIF QLSQQLSGIN AVFYYSTGIF QLSQQLSGIN AVFYYSTGIF QLSQQLSGIN AVFYYSTGIF QLSQQLSGIN AVFYYSTGIF QLSQQLSGIN AVFYYSTSIF HVAQQFSGIN GIFYYSTSIF
!~ ..i/i
ii.
......
::.2::
"
k~
...
... ,..
.... ...
::
?!.
..
..
: i? ...
:.
5 .
.. . ...
.
..
.~.
..
Gtr2ratno .......... VIQLFTDPNY RQPIVVALML HLAQQFSGIN GIFYYSTSIF Gtr2galga .......... IGQLFSSSKY RQAVIVALMV QISQQFSGIN AIFYYSTNIF GtrTratno .......... LIELLRSRDT RWPSLLIAFL MQSQQTSGVNGIFYYHQHIY Sgt2schma .......... VCDLLRIPYL RWGLIVALVP HIGQQFSGIN GILYYFVSLF H x t 0 s a c c e WH . . . . . . . . . E L F S N K G A I L P R V I M G I V I Q S L Q Q L T G C N Y F F Y Y G T T I F H x t 2 s a c c e WG . . . . . . . . . E L F S N K G A I L P R V I M G I M I Q S L Q Q L T G N N Y F F Y Y G T T I F Hxt4sacce WG ......... EIFSTKTKV FQRLIMGAMI QSLQQLTGDN YFFYYGTTVF H x t T s a c c e WG . . . . . . . . . E L F S S K T K V L Q R L I M G A M I Q S L Q Q L T G D N Y F F Y Y G T T I F Gtrkluma IK . . . . . . . . . E L F S T K T K V F Q R L I M G M L I Q S F Q Q L T G N N Y F F Y Y G T T I F G a l 2 s a c c e WG . . . . . . . . . E L F S T K T K V F Q R L L M G V F V Q M F Q Q L T G N N Y F F Y Y G T V I F H x t l s a c c e WG . . . . . . . . . E L F T G K P A M F Q R T M M G I M I Q S L Q Q L T G D N Y F F Y Y G T I V F Hxt3sacce WG ......... ELFTGKPAM FKRTMMGIMI QSLQQLTGDN YFFYYGTTVF H x t 5 s a c c e WG . . . . . . . . . E L V T G K P Q M F R R T L M G M M I Q S L Q Q L T G D N Y F F Y Y G T T I F Raglklula WG ......... ELITGKPSM FRRTLMGIMI QSLQQLTGDN YFFYYGTTIF H x t a s a c c e WS . . . . . . . . . E L L S T K T K V F Q R V L M G V M I Q S L Q Q L T G D N Y F F Y Y G T T I F H x t 8 s a c c e WK . . . . . . . . . E L F S R K T K V F Q R L T M T V M I N S L Q Q L T G D N Y F F Y Y G T T I F H x t c s a c c e WK . . . . . . . . . E L F S V K T K V L Q R L I T G I L V Q T F L Q L T G E N Y F F F Y G T T I F Hxtdsacce KKQSLVKRNT FEFIMGKPKL WLRLIIGMMI MAFQQLSGIN YFFYYGTSVF Snf3sacce FIDCFISS ...... KSRPKQ TLRMFTGIAL QAFQQFSGIN FIFYYGVNFF Qayneucr F ...... W K P F . L S L K Q R K V Q W R F F L G G M L F F W Q N G S G I N A I N Y Y S P T V F Q u t d e m e n i F ...... W K P F K A A W T N K R I L Y R L F L G S M L F L W Q N G S G I N A I N Y Y S P R V F S t p l a r a t h P ...... W R N LL .... R R K Y R P H L T M A V M I P F F Q Q L T G I N V I M F Y A P V L F Sugricco P ...... W R N LL .... Q R K Y R P H L S M A I A I P F F Q Q L T G I N V I M F Y A P V L F M s t l n i c t a P ...... W R N LL .... Q R K Y R P H L T M A I M I P F F Q Q L T G I N V I M F Y A P V L F S t p 4 a r a t h P ...... W K N IM .... L P R Y R P Q L I M T C F I P F F Q Q L T G I N V I T F Y A P V L F G t r 2 r i c c o P ...... FRN IL .... E P R N R P Q L V M A V C M P A F Q I L T G I N S I L F Y A P V L F H e x 6 r i c c o P ...... FKN IM .... R R K Y R P Q L V M A V A I P F F Q Q V T G I N V I A F Y A P I L F H u p l c h l k e S ...... WAS LF .... T R R Y M P Q L L T S F V I Q F F Q Q F T G I N A I I F Y V P V L F Hxtchlke S ...... WRS LF .... T R R Y M P Q L L T S F V I QFFQQFTGINAIIFYVPVLF H u p 2 c h l k e S ...... W A A L F .... S R Q Y S P M L I V T S L I A M L Q Q L T G I N A I M F Y V P V L F A r a e e s c c o L ...... F K . . I .... N R N V R R A V F L G M L L Q A M Q Q F T G M N I I M Y Y A P R I F G a l p e s c c o L ...... F K . . E .... N S N F R R A V F L G V L L Q V M Q Q F T G M N V I M Y Y A P K I F Glfzymmo L ...... F A . . F .... G... I T V V F A G V S V A A F Q Q L V G I N A V L Y Y A P Q M F X y l e e s c c o L ...... L M . . F .... G... V G V I V I G V M L S I F Q Q F V G I N V V L Y Y A P E V F G l c p s y n s p L ...... L S . . R .... R G G L L P I V W I G M G L S A L Q Q F V G I N V I F Y Y S S V L W H g t l k l u l a G ...... Y K D LF .... R K K T L P K T I V G V S A Q M W Q Q L C G M N V M M Y Y I V Y I F L a c p k l u l a M L D V R S L F . R ..... T R S D R Y . R A M L V I L M A W F G Q F S G N N V C S Y Y L P T M L S t l l s a c c e K H S L S S L F S R ..... G R S Q N L Q R A L I A A S T Q F F Q Q F T G C N A A I Y Y S T V L F Ma6tsacce EGTYWD ........ CVKDGI NRRRTRIACL CWIGQCSCGA SLIGYSTYFY Gtr21eido .............. YTMTQM IGPILNGVAM GCVTQLTGIN ANMNFAPTIM Consensus ................................. Q Q . . G . N .... Y .... F
.,
.2,
9
.
:.
i~. ..
.
.
Itrlsacce Itr2sacce Gtrlleido Sgtlschma Tgtptaeso Sgt4schma Gtr5homsa Gtr5ratno Gtr5orycu Gtrlgalga Gtrlhomsa
501 ETVGFKN... ETVGFKN... YDAGFRDAIM ELAGIPDVYL KSAKVSPDML KTAGIPLVYI LSAGVPEEHV LSAGVKSNDV LSP..LDTDT EKSGV..EQP EKAGV..QQP
SSAVSIIVSG TNFIFTLVAF SSAVSIIVSG TNFVFTLIAF PVVLSIPLAF MNALFTAVAI QY.CVFAIGV LNVIVTVVSL EY.FVVGLGL LNVICTIVAL QF.CVVAVPA INVLMTVLSV QY.VTAGTGA VNVVMTFCAV QY.VTAGTGAVNVFMTMVTV QY.YTAATGAVNVLMTVCTV VY.ATIGSGVVNTAFTVVSL VY.ATIGSGI VNTAFTVVSL
F.SIDKIGRR F.CIDKIGRR F.TVDRFGRR P.LIERAGRR P.LLEKAGRR Y.LIERAGRR F.VVELLGRR F.VVELWGRR F.VVESWA.R F.VVERAGRR F.VVERAGRR
550 TILLIGLPGM YILLIGLPGM RMLLISVFGC TLLLWPTVSL TLLLWPSLVV TLLLWPTVLL LLLLLGFSIC NLLLIGFSTC LLLLLGFSPL TLHLIGLAGM TLHLIGLAGM
~_7~
;(i ....
:/:
....
::-:::: : .:-:.-.!:.::: -
.
.
.
~:!i2: ~i:;i~:i!:: :2
!iiliii:ii!ii.~i:)!::i": !i.
:::::::::::::::::::::::
.:
:::::::::::::::::::::::::::::::::}::
ii!!iii!:'i i
iiiiiii::: !
;iiii!~::i%i~:?..
:8C
Gtr3musmu Gtr3canfa Gtr3oviar Gtr3homsa Gtr3galga Gtr4homsa Gtr2homsa Gtr2ratno Gtr2galga Gtr7ratno Sgt2schma Hxt0sacce Hxt2sacce Hxt4sacce Hxt7sacce Gtrkluma Gal2sacce Hxtlsacce Hxt3sacce Hxt5sacce Raglklula Hxtasacce Hxt8sacce Hxtcsacce Hxtdsacce Snf3sacce Qayneucr Qutdemeni Stplarath Sugricco Mstlnicta Stp4arath Gtr2ricco Hex6ricco Huplchlke Hxtchlke Hup2chlke Araeescco Galpescco Glfzymmo Xyleescco Glcpsynsp Hgtlklula Lacpklula Stllsacce Ma6tsacce Gtr21eido Consensus
KDAGV..QEP I Y . A T I G A G V V N T I F T V V S L F.LVERAGRR KDAGV..EEP I Y . A T I G A G V V N T I F T V V S L F.LVERAGRR KDAGV..QEP V Y . A T I G A G V V N T I F T V V S V F.LVERAGRR KDAGV..QEP I Y . A T I G A G V V N T I F T V V S L F.LVERAGRR ERAGI~ V Y . A T I G A G V V N T V F T V V S L F.LVERAGRR ETAGV..GQP A Y . A T I G A G V V N T V F T L V S V L.LVERAGRR QTAGI..SKP V Y . A T I G V G A V N M V F T A V S V F.LVEKAGRR QTAGI..SQP VY.ATIGVGA INMIFTAVSV L.LVEKAGRR QRAGV..GQP V Y . A T I G V G V V N T V F T V I S V F.LVEKAGRR KQAGA..QDP A Y . V T L G S G S V N F L T T V V S L I.VVEKAGRR ISNGLTKQVA SY.ANLGTGV TILIGAFASI F.VIDRKGRR NAVGM...QD S F E T S I V L G A V N F A S T F V A L Y.IVDKFGRR NAVGM...KD S F Q T S I V L G I V N F A S T F V A L Y.TVDKFGRR TAVGL...ED S F E T S I V L G I V N F A S T F V G I F.LVERYGRR KAVGL...SD S F E T S I V L G I V N F A S T F V G I Y.VVERYGRR NSVGM DD S F E T S I V L G I V N F A S T F V A I Y VVDKFGRR KSVGL...DD S F E T S I V I G V V N F A S T F F S L W.TVENLGRR QAVGL...SD S F E T S I V F G V V N F F S T C C S L Y.TVDRFGRR NAVGM...SD S F E T S I V F G V V N F F S T C C S L Y.TVDRFGRR QAVGL...ED S F E T A I V L G V V N F V S T F F S L Y.TVDRFGRR QSVGM...DD S F E T S I V L G I V N F A S T F F A L Y.TVDHFGRR KSVGL...KD S F Q T S I I I G V V N F F S S F I A V Y.TIERFGRR KSVGM...ND S F E T S I V L G I V N F A S C F F S L Y.SVDKLGRR KSVGL~ G F E T S I V L G T V N F F S T I I A V M.VVDKIGRR KGVGI...KD P Y I T S I I L S S V N F L S T I L G I Y.YVEKWGHK NKTGV...SN S Y L V S F I T Y A V N V V F N V P G L F.FVEFFGRR RSIGITGTDT GFLTTGIFGV VKMVLTIIWL LWLVDLVGRR KSIGVSGGNT SLLTTGIFGV VTAVITFVWL LYLIDHFGRR NTIGFT.TDA S L M S A V V T G S V N V G A T L V S I Y.GVDRWGRR DTIGFG.SDA A L M S A V I T G L V N V F A T M V S I Y.GVDKWGRR KTIGFG.ADA S L M S A V I T G G V N V L A T V V S I Y.YVDKLGRR QTLGFG.SKA SLLSAMVTGI IELLCTFVSV F.TVDRFGRR QSMGFG.GNA SLYSSVLTGA VLFSSTLISI G.TVDRLGRR RTIGLE.ESA SLLSSIVTGL VGSASTFISM L.IVDKLGRR SSLGSA.NSA A L L N T V V V G A V N V G S T L I A V M.FSDKFGRR SSLGSA.SSA A L L N T V V V G A V N V G S T M I A V L.LSDKFGRR SSFGTA.RHA A L L N T V I I G A V N V A A T F V S I F.SVDKFGRR KMAGFTTTEQ QMIATLVVGL TFMFATFIAV F.TVDKAGRK ELAGYTNTTE QMWGTVIVGL TNVLATFIAI G.LVDRWGRK QNLGFG.ADT A L L Q T I S I G V V N F I F T M I A S R.VVDRFGRK KTLGAS TDI ALLQTIIVGV INLTFTVLAI M TVDKFGRK RSVGFT.EEK SLLITVITGF INILTTIVAI A.FVDKFGRK NMAGYT.GNT NLVASSIQYV LNVVMTIPAL F.LIDKFGRR RNVGMKSVSL NVLMNGVYSI VTWISSICGA F.FIDKIGRR NKTIKLDYRL SMIIGGVFAT IYALSTIGSF F.LIEKLGRR EKAGVS.TDT AFTFSIIQYC LGIAATFVS. WWASKYCGRF SNLGL .... Q P L V G N I I V M A W N M L A T F . C V IPLSRRFSMR ...G .............. G . . N ............... GRR
TLHMIGLGGM TLHMIGLGGM TLHLIGLGGM TLHMIGLGGM TLHLVGLGGM TLHLLGLAGM SLFLIGMSGM TLFLAGMIGM SLFLAGLMGM TLFLAGMIGM PLLMFGTSVC KCLLWGSASM KCLLGGSASM RCLLWGAASM TCLLWGAASM KCLLWGAAAM KCLLLGAATM NCLMWGAVGM NCLLYGAIGM NCLLWGCVGM NCLLYGCVGM TCLLWGAASM RCLLLGAATM KCLLFGAAGM TCLLYGSTNL KVLVVGGVIM RILFIGAAGG NLLLVGAAGG FLFLEGGTQM FLFLEGGVQM FLFLEGGIQM ILFLQGGIQM KLLISGGIQM ALFIFGGVQM FLLIEGGIQC FLLIEGGITC GLFLEGGIQM PALKIGFSVM PTLTLGFLVM PLLIWGALGM PLQIIGALGM PLLLMGSIGM PVLIIGGIFM EGFLGSISGA KLFLLGATGQ DLYAFGLAFQ TLFLFCGFVG ..... G ....
551 600 Itrlsacce TMALVVCSIA F ...... HFL GIKFDGAVAV VVSSGFSSWG IVIIVFIIVF Itr2sacce TVALVICAIA F ...... HFL GIKFNGADAV VASDGFSSWG IVIIVFIIVY Gtrlleido LVLLVVIAII G ...... FFI GTRI .......... SYSVGG GLFLALLAVF
if::i"/i :!-(i!::!-.....-:-2
iiii:~,%(
ii il !! ~ ..
ii}ii::-::..!ih iili![ ).;:; >: i:.:~i
Sgtlschma ALSLLLLTIFVN ............. LADSG PQSTKN.AMG IISIILILIY Tgtptaeso AIILLLLVIF VN ............. IANYG GVVNKT.PFV LVSAVLVFIY Sgt4schma AFSLLCLTISVN ............. IASST KDPTTARTAG IISAVLIILY Gtr5homsa LIACCVLTAALA . . . . . . . . . . . . . L Q D . . T V S W .... M P Y I S I V C V I S Y G t r 5 r a t n o L T A C I V L T V A L A . . . . . . . . . . . . . L Q N . . T I S W .... M P Y V S I V C V I V Y Gtr5orycu APTCCVLTAALA . . . . . . . . . . . . . L Q D . . T V S W .... M P Y I S I V C I I V Y G t r l g a l g a A G C A I L M T I A L T . . . . . . . . . . . . . L L D . . Q M P W .... MS Y L S I V A I F G F G t r l h o m s a A G C A I L M T I A L A . . . . . . . . . . . . . L L E . . Q L P W .... MS Y L S I V A I F G F G t r 3 m u s m u A V C S V F M T I S L L . . . . . . . . . . . . . L K D . . D Y E A .... MS F V C I V A I L I Y G t r 3 c a n f a A V C S I L M T I S L L . . . . . . . . . . . . . L K D . . N Y N W .... MS F V C I G A I L V F G t r 3 o v i a r A F C S I L M T I S L L . . . . . . . . . . . . . L K D . . N Y S W .... MS F I C I G A I L V F G t r 3 h o m s a A F C S T L M T V S L L . . . . . . . . . . . . . L K D . . N Y N G .... MS F V C I G A I L V F G t r 3 g a l g a A V C A A V M T I A L A . . . . . . . . . . . . . L K E . . K W ...... IR Y I S I V A T F G F G t r 4 h o m s a C G C A I L M T V A L L . . . . . . . . . . . . . L L E . . R V P A .... MS Y V S I V A I F G F G t r 2 h o m s a F V C A I F M S V G L V . . . . . . . . . . . . . L L N . . K F S W .... MS Y V S M I A I F L F G t r 2 r a t n o F F C A V F M S L G L V . . . . . . . . . . . . . L L D . . K F T W .... MS Y V S M T A I F L F G t r 2 g a l g a L I S A V A M T V G L V . . . . . . . . . . . . . L L S . . Q F A W .... M S Y V S M V A I F L F G t r 7 r a t n o F F C A V F M S L V L V . . . . . . . . . . . . . L L D . . K F T W .... MS Y V S M T A I F L F S g t 2 s c h m a L F S L L L F T L T LI . . . . . . . . . . . . . I K Q V T E I N K .... L T I L S I V L T Y T F Hxt0sacce AICFVIFATV GVTRLWP ......... QGKD QP..SSQSAG NVMIVFTCFF Hxt2sacce AICFVIFSTV GVTSLYP ......... NGKD QP..SSKAAG NVMIVFTCLF Hxt4sacce TACMVVFASV GVTRLWP ......... NGKK NG..SSKGAG NCMIVFTCFY Hxt7sacce TACMVVYASV GVTRLWP ......... NGQD QP..SSKGAG NCMIVFACFY Gtrkluma TACMWFASV GVTRLWP ......... DGAN HPETASKGAG NCMIVFACFY Gal2sacce MACMVIYASV GVTRLYP HGKS QP..SSKGAG NCMIVFTCFY Hxtlsacce VCCYVVYASV GVTRLWP ......... NGQD QP..SSKGAG NCMIVFACFY Hxt3sacce VCCYVVYASV GVTRLWP ......... NGEG NG..SSKGAG NCMIVFACFY Hxt5sacce ICCYWYASV GVTRLWP ......... NGQD QP..SSKGAG NCMIVFACFY Raglklula VACYWYASV GVTRLWP ......... DGPD HPDISSKGAG NCMIVFACFY Hxtasacce LCCFAVFASV GVTKLWP ......... QGSS HQDITSQGAG NCMIVFTMFF Hxt8sacce TACMVIYASV GVTRLYP ......... NGKS EP..SSKGAG NCTIVFTCFY Hxtcsacce MACMVIFASI GVKCLYP ......... HGQD GP..SSKGAG NAMIVFTCFY Hxtdsacce LFYMMTYATV GT...FG ......... RETD FSNI ....... VLIIVTCCF Snf3sacce TIANFIVAIV GCS ................. LKTVAAAKVMIAFICLF eayneucr S L C M W F I G A Y IKI ADPGSNKAEDAKLTSGG IAAIFFFYLW Q u t d e m e n i S V C L W I V G G Y IKI . . . . . . . . . . A K P E . N N P E G T Q L D S G G I A A I F F F Y L W S t p l a r a t h L I C Q A V V A A C IG . . . . . . . . . . . A K F G V D G T P G E L P K W Y A I V V V T F I C I Y Sugricco L I C Q A I V A A C IG . . . . . . . . . . . A K F G V D G A P G D L P Q W Y A V V V V L F I C I Y M s t l n i c t a L I C Q I A V S I C IA IKFGVNG TPGDLPKWYA IVVVIFICVY S t p 4 a r a t h L V S Q I A I G A M IG . . . . . . . . . . . V K F G V A G T . G N I G K S D A N L I V A L I C I Y Gtr2ricco IVCQVIVAVI LG ........... AKFGAD...KQLSRSYS IAVVVVICLF Hex6ricco FVAQIMVGSI MA AELGDH GGIGKGYA YIVLILICIY Huplchlke CLAMLTTGWLA ........... IEFAKYG T.DPLPKAVA SGILAVICIF Hxtchlke CLAMLAAGIT LG VEFGQYG T EDLPHPVS AGVLAVICIF Hup2chlke FIGQWTAAV LG ........... VELNKYG T.N.LPSSTA AGVLVVICVY Araeescco ALGTLVLGYC L ............... MQFD N.GTASSGLS WLSVGMTMMC GalpesccoAAGMGVLG.T M ............... MHI...GIHSPSAQ YFAIAMLLMF Glfzymmo AAMMAVLGCC F ....................... WFKVGG VLPLASVLLY Xyleescco AIGMFSLGTA F ....................... YTQAPG IVALLSMLFY Glcpsynsp TITLGILSVV FG ........... GATVVNG Q.PTLTGAAG IIALVTANLY Hgtlklula FTWLFSVAGI LATYSVPAPG GVNGDDTVTI QIPSENTSAANGVIASSYLF Lacpklula ALALTGLSIC TARY ................. EKTKKKSAS NGALVFIYLF Stllsacce AVSFT...IT FACL ................. VKENKENAR GAAVGL.FLF M a 6 t s a c c e A I M F F I I G G L GC . . . . . . . . . . . . . . . . . . . . . SDTHGAK MGSGALLMVV --
::::
-.
.
..
.
i!!~.:/!j':-
.,....
: . .
.
.
..
..
-..:
.
.
_
_
~_81
Gtr21eido SLCCVFLGGI PVY ................ P GVTKSDKAIS GIAITGIAIF Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ./
-
. . . ..
.
?9:
-/:i:i
-:....... :,. .< . : : :
.....i .
:::)
:
-. : : L .
.~
tf
.
-
i!~!illiii::~. .. .
.
... ...... i : - . . :
:i;i;.:~i!!f!!i i ~. : - H . ....
..... . . . <
........
: ,t-::...:-:
"
. . . .
. . . .
...
:.:
:-: 9 -.:. 'i
-
601 ItrlsacceAAFYALGIGT VPWQ.QSELF Itr2sacceAAFYALGIGT VPWQ.QSELF Gtrlleido LALYAPGIGC IPWVIMGEIF Sgtlschma ICSFALGLGP VPALIVSEIF Tgtptaeso VAAFAMGLGP MPALIVAEIF Sgt4schma ICGFALGLGP IPGVIVAEIF Gtr5homsa VIGHALGPSP IPALLITEIF Gtr5ratno VIGHAVGPSP IPALFITEIF Gtr5orycu VIGHAIGP.A IRSLY.TEIF Gtrlgalga VAFFEIGPGP IPWFIVAELF Gtrlhomsa VAFFEVGPGP IPWFIVAELF Gtr3musmu VAFFEIGPGP IPWFIVAELF Gtr3canfa VAFFEIGPGP IPWFIVAELF Gtr3oviar VAFFEIGPGP IPWFIVAELF Gtr3homsa VAFFEIGPGP IPWFIVAELF Gtr3galga VALFEIGPGP IPWFIVAELF Gtr4homsa VAFFEIGPGP IPWFIVAELF Gtr2homsa VSFFEIGPGP IPWFMVAEFF Gtr2ratno VSFFEIGPGP IPWFMVAEFF Gtr2galga VIFFEVGPGP IPWFIVAELF Gtr7ratno VSFFEIGPIP IPFFGVREWF Sgt2schma LFGFSVS... IPWFLVSELF Hxt0sacce IFSFAITWAP IAYVIVAETY Hxt2sacce IFFFAISWAP IAYVIVAESY Hxt4sacce LFCFATTWAP IPFVVNSETF Hxt7sacce IFCFATTWAP IPYVVVSETF Gtrkluma IFCFATSWAP IAYVVVAESY Gal2sacce IFCYATTWAP VAWVITAESF Hxtlsacce IFCFATTWAP IAYVVISECF Hxt3sacce IFCFATTWAP IAYVVISETF Hxt5sacce IFCFATTWAP VAYVLISESY Raglklula IFCFATTWAP IAYVVISESY Hxtasacce IFSFATTWAG GCYVIVSETF Hxt8sacce IFCFSCTWGP VCYVIISETF Hxtcsacce IFCFATTWAP VAYIVVAESF Hxtdsacce IFWFAITLGP VTFVLVSELF Snf3sacce IAAFSATWGGVVWVISAELY Qayneucr TAFYTPSWNG TPWVINSEMF Qutdemeni TAFYTPSWNG TPWVINSEMF Stplarath VAGFAWSWGP LGWLVPSEIF Sugricco VSGFAWSWGP LGWLVPSEIF Mstlnicta VAGFAWSWGP LGWLVPSEIF Stp4arath VAGFAWSWGP LGWLVPSEIS Gtr2ricco VLAFGWSWGP LGWTVPSEIF Hex6ricco VAGFGWSWGP LGWLVPSEIF Huplchlke ISGFAWSWGP MGWLIPSEIF Hxtchlke IAGFAWSWGP MGWLIPSEIF Hup2chlke VAAFAWSWGP LGWLVPSEIQ Araeescco IAGYAMSAAPVVWILCSEIQ Galpescco IVGFAMSAGP LIWVLCSEIQ
650 PQNVRGIGTS YATATNWAGS LVIASTFLT. PQNVRGVGTS YATATNWAGS LVIASTFLT. PTHLRTSAAS VATMANWGAN VLVSQVFPI. RQGPRAAAYS LSQSIQWLSN LIVLCSYPV. RQGPRAAAYS LSQSIQWACN LIVVASFPS. RQEPRAAAYS LSQGVNLLCN LLVLFSYPS. LQSSRPSAFM VGGSVHWLSN FTVGLIFPF. LQSSRPSAYM IGGSVHWLSN FIVGLIFPF. LQSGRPPTWW..GQVHWLSN FTVGLVFPL. SQGPRPAAFA VAGLSNWTSN FIVGMGFQY. SQGPRPAAIA VAGFSNWTSN FIVGMCFQY. SQGPRPAAIA VAGCCNWTSN FLVGMLFPS. SQGPRPAAMA VAGCSNWTSN FLVGLLFPS. GQGPRPAAMA VAGCSNWTSN FLVGLLFPS. SQGPRPAAMA VAGCSNWTSN FLVGLLFPS. SQGPRPAAMA VAGCSNWTSN FLVGMLFPY. SQGPRPAAMA VAGFSNWTSN FIIGMGFQY. SQGPRPAALA IAAFSNWTCN FIVALCFQY. SQGPRPTALA LAAFSNWVCN FIIALCFQY. SQGPRPAAIA VAGFCNWACN FIVGMCFQY TQIWRPGAIV CVATLDWVPN FKKGICFQS. TQENRDAAVS IAAATNWLCN AIVALIFPQ. PLRVKNRAMA IAVGANWMWG FLIGFFTPF. PLRVKNRAMA IAVGANWIWG FLIGFFTPF. PLRVKSKCMA IAQACNWIWG FLIGFFTPF. PLRVKSKAMS IATAANWLWG FLIGFFTPF. PLRVKAKCMA IATASNWIWG FLNGFFTPF. PLRVKSKCMA LASASNWVWG FLIAFFTPF. PLRVKSKCMS IASAANWIWG FLISFFTPF. PLRVKSKAMS IATAANWLWG FLIGFFTPF. PLRVRGKAMS IASACNWIWG FLISFFTPF. PLRVKGKAMA IASASNWIWG FLIGFFTPF. PLRVKSRGMA IATAANWMWG FLISFFTPF. PLRVRSKCMS VATAANLLWG FLIGFFTPF. PSKVKSRAMS ISTACNWLWQ FLIGFFTPF. PLRTRAISMA ICTFINWMFN FLISLLTPM. PLGVRSKCTA ICAAANWLVNFICALITPY. DQNTRSLGQA SAAANNWFWNFIISRF... DPTVRSLAQA CAAASNWLWNFLISRF .... PLEIRSAAQS ITVSVNMIFT FIIAQIFLT. P L E I R S A A Q S V N V S V N M F F T FVVAQVFLI. PLEIRSAAQS INVSVNMIFT FIVAQVFLT. PLEIRSAAQA INVSVNMFFT FLVAQLFLT. PLETRSAGQS ITVAVNLLFT FAIAQAFLS. PLEIRSAGQS IVVAVSFLFT FVVAQTFLS. TLETRPAGTA VAVVGNFLFS FVIGQAFVS. TLETRPAGTA VAVMGNFLFS FVIGQAFVS. TLETRGAGMS MAVIVNFLFS FVIGQAFLS. PLKCRDFGIT CSTTTNWVSN MIIGATFLT. PLKGRDFGIT CSTATNWIAN MIVGATFLT.
}i:.iji}i~i.
,ib( : :?:.i'}ii.. i}:~j i!:7%
Glfzymmo IAVFGMSWGP VCWVVLSEMF Xyleescco VAAFAMSWGP VCWVLLSEIF Glcpsynsp VFSFGFSWGP IVWVLLGEMF Hgtlklula VCFFAPTWGI GIWIYCSEIF Lacpklula GGIFSFAFTP MQSMYSTEVS Stllsacce ITFFGLSLLS LPWIYPPEIA Ma6tsacce AFFYNLGIAPVVFCLVSEMP Gtr21eido IALYEMGVGP CFYVLAVDVF Consensus ......... P ....... E.F
!iii/iiiii ii
Itrlsacce Itr2sacce Gtrlleido Sgtlschma Tgtptaeso Sgt4schma Gtr5homsa Gtr5ratno Gtr5orycu Gtrlgalga Gtrlhomsa Gtr3musmu i~:)?i:~:!;~:!i Gtr3canfa Gtr3oviar Gtr3homsa Gtr3galga Gtr4homsa Gtr2homsa Gtr2ratno Gtr2galga Gtr7ratno [:i:!:, ::;:2i:::i .... Sgt2schma ~: %:::,[: Hxt0sacce Hxt2sacce Hxt4sacce Hxt7sacce Gtrkluma Gal2sacce Hxtlsacce Hxt3sacce Hxt5sacce Raglklula Hxtasacce Hxt8sacce i.~!.ZA,::~! Hxtcsacce Hxtdsacce ?: .-/:: Snf3sacce Qayneucr Qu/demeni Stplarath Sugricco Mstlnicta Stp4arath . . . . . . .
,.....
:.:
. . . . . .
.,
.~ :
...
.....::.
..
:= ...................
..
PSSIKGAAMP IAVTGQWLAN ILVNFLFKV. PNAIRGKALA IAVAAQWLAN YFVSWTFPM. NNKIRAAALS VAAGVQWIAN FIISTTF... NNMERAKGSA LSAATNWAFN FALAMFVPS. TNLTRSKAQL LNFWSGVAQ FVNQFATPK~ SMKVRASTNA FSTCTNWLCN FAVVMFTPI. SSRLRTKTII LARNAYNVIQVVVTVLIMY. PESFRPIGSS ITVGVMFIFN LIINICYPIA .... R . . . . . . . . . . N W . . . F . . . . . . . . .
651 700 ...M ...... L Q N I T P A G T F A F F A G L S C L S T I F C Y F C Y P . . . E L S G L E L E ...M ...... L Q N I T P T G T F S F F A G V A C L S T I F C Y F C Y P . . . E L S G L E L E ...L ...... M G A I G V G G T F T I I S G L M A L G C I F V Y F F A V . . . E T K G L T L E ...I . . . . . . . Q K N I G G Y S F L P F L V V V V I C W I F F F L F M P . . . E T K N R T F D ...L . . . . . . . N E L L K G Y V Y L P Y L V V V A V C W V V F F L F M P . . . E T K N R T F D ...I . . . . . . . N D A I G G Y S F L P F L V I V I I C W I F F F L Y M I . . . E T K N R T C D ...I . . . . . . . Q E G L G P Y S F I V F A V I C L L T T I Y I F L I V P . . . E T K A K T F I ...I . . . . . . . Q V G L G P Y S F I I F A I I C L L T T I Y I F M V V P . . . E T K G R T F V ...I . . . . . . . Q W A . G L Y S F I I F G V A C L S T T V Y T F L I V P . . . E T K G K S F I ...I . . . . . . . A Q L C G S Y V F I I F T V L L V L F F I F T Y F K V P . . . E T K G R T F D ...V . . . . . . . E Q L C G P Y V F I I F T V L L V L F F I F T Y F K V P . . . E T K G R T F D A AAYLGAYVF IIFAAFLIFF LIFTFFKVP ETKGRTFE ...A . . . . . . . A F Y L G A Y V F I I F T G F L I V F L V F T F F K V P . . . E T R G R T F E ...A . . . . . . . T F Y L G A Y V F I V F T V F L V I F W V F T F F K V P . . . E T R G R T F E ...A . . . . . . . A H Y L G A Y V F I I F T G F L I T F L A F T F F K V P . . . E T R G R T F E ...A . . . . . . . E K L C G P Y V F L I F L V F L L I F FIFTYFKVP...ETKGRTFE ...V . . . . . . . A E A M G P Y V F L L F A V L L L G F F I F T F L R V P . . . E T R G R T F D ...I . . . . . . . A D F C G P Y V F F L F A G V L L A F T L F T F F K V P . . . E T K G K S F E ...I . . . . . . . A D F L G P Y V F F L F A G V V L V F T L F T F F K V P . . . E T K G K S F D ...I . . . . . . . A D L C G P Y V F V V F A V L L L V F FLFAYLKVP...ETKGKSFE ...L . . . . . . . R D F K G P Y H F W A F H G V V I V W Y G N Y W F K V P . . . E T K G K S F D ...L . . . . . . . V I Y I G I Y A F I P F I C A L L V V L I F V G L Y L P . . . E T K G K T P A ...I ...... T R S I G F S Y G Y V F M G C L . I F S YFYVFFFVC...ETKGLTLE ...I ...... T S A I G F S Y G Y V F M G C L . V F S FFYVFFFVC...ETKGLTLE ...I ...... S G A I D F Y Y G Y V F M G C L . V F S YFYVFFFVP...ETKGLTLE ...I ...... T G A I N F Y Y G Y V F M G C L . V F M F F Y V L L V V P . . . E T K G L T L E ...I ...... T S A I H F Y Y G Y V F M G C L . V A M F F Y V F F F V P . . . E T K G L T L E ...I ...... T S A I N F Y Y G Y V F M G C L . V A M F F Y V F F F V P . . . E T K G L S L E ...I ...... T G A I N F Y Y G Y V F M G C M . V F A Y F Y V F F F V P . . . E T K G L S L E ...I ...... T G A I N F Y Y G Y V F M G C M . V F A Y F Y V F F F V P . . . E T K G L T L E ...I ...... T S A I N F Y Y G Y V F M G C M . V F A Y F Y V F F F V P . . . E T K G L T L E I TSAIHFYYGY VFMGCM VFA FFYVYFFVP ETKGLTLE ...I ...... T G A I N F Y Y G Y V F L G C L . V F A Y F Y V F F F V P . . . E T K G L T L E ...I ...... T S A I N F Y Y G Y V F M G C L . A F S YFYVFFFVP...ETKGLTLE ...I ...... T G S I H F Y Y G Y V F V G C L . V A M F L Y V F F F L P . . . E T I G L S L E ...I ...... V S K I D F K L G Y I F A A C L . L A L I I F S W I L V P . . . E T R K K N E Q ...IVDTGSHTSSLGAKIFF IWGSLN.AMGVIVVYLTVY...ETKGLTLE ...... T P Q M F I K M E Y G V . Y FFFASLMLLS IVFIYFFLP...VTKSIPLE TPQM FTSMGYGV Y FFFASLMILS IVFVFFLIP ETKGVPLE ...MLCHLKFGL FLVFAFFVVVM SIFVYIFLP ETKGIPIE ...MLCHLKFGL F IFFSFFVLIM SIFVYYFLP ETKGIPIE ...MLCHLKFGL FLFFAFFVVIMTVFIYFFLP ETKNIPIE ......... MLCHMKFGL.F FFFAFFVVIMTIFIYLMLP...ETKNVPIE
~_83
Gtr2ricco ......... LLCAFKFGI.FLFFAGWITVMTVFVCVFLP...ETKGVPIE Hex6ricco ......... M LCHFKSGI.F FFFGGWVVVM TAFVHFLLP...ETKKVPIE Huplchlke ......... M LCAMEYGV.F LFFAGWLVIM VLCAIFLLP...ETKGVPIE Hxtchlke ......... M LCAMKFGV.F LFFAGWLVIM VLCAIFLLP...ETKGVPIE Hup2chlke ......... M MCAMRWGV.F LFFAGWVVIM TFFVYFCLP...ETKGVPVE Araeescco ......... L LDSIGAAGTF WLYTALNIAF VGITFWLIP...ETKNVTLE Galpescco ......... M LNTLGNANTFWVYAALNVLF ILLTLWLVP...ETKHVSLE Glfzymmo ...ADGSPALNQTFNHGFSYLVFAALSILGGLIVARFVP...ETKGRSLD Xyleescco ...MDKNSWL VAHFHNGFSY WIYGCMGVLA ALFMWKFVP...ETKGKTLE G l c p s y n s p ....... P P L L D T V G L G P A Y G L Y A T S A A I S I F F I W F F V K . . . E T K G K T L E H g t l k l u l a . . . A F K N I S W K ....... TY I I F G V F S V A L T I Q T F F M F P . . . E T K G K T L E L a c p k l u l a . . . A M K N I K Y ....... W F Y V F Y V F F D I F E FIVIYFFFV...ETKGRSLE S t l l s a c c e . . . F I G Q S G W ....... G C Y L F F A V M N Y L Y IPVIFFFYP...ETAGRSLE Ma6tsacce ...QLNSEKWNWGAKSGF...FWGGFCLATLAWAVVDLP...ETAGRTFI Gtr21eido TEGISGGPSG NPNKGQAVAF IFFGCIGVVA CVIEYFFLQPWVEPEAKMTD Consensus ...................................... P...ETKG...E
i. -~ :~ :.:
~.:..
~..~
..
9
.
.. 9
.
z
-:-
.
.
- :
.
~ .
.
.
. .
..
..
. . . .
.. --~(/
.
-g
~.
.
..
?
:::
. . .
... .. . .
II !84
Itrlsacce Itr2sacce Gtrlleido Sgtlschma Tgtptaeso Sgt4schma Gtr5homsa Gtr5ratno Gtr5orycu Gtrlgalga Gtrlhomsa Gtr3musmu Gtr3canfa Gtr3oviar Gtr3homsa Gtr3galga Gtr4homsa Gtr2homsa Gtr2ratno Gtr2galga GtrTratno Sgt2schma Hxt0sacce Hxt2sacce Hxt4sacce Hxt7sacce Gtrkluma Gal2sacce Hxtlsacce Hxt3sacce Hxt5sacce Raglklula Hxtasacce Hxt8sacce Hxtcsacce Hxtdsacce
701 750 EVQTILKDGFNIKASKALAK KRKQQVARV...HELKYEPTQEIIEDI... E V Q T I L K D G F N I K A S K A L A K K R K Q Q V A E G A A H H K L K F E P T Q E I V E S .... QIDNMFRKRA GLPPRFHEEG ESGESGAGYR EDGDLGRLAT EDVCDLSSLG EVARDLAFGN IVVGKRTTAL EDRNLTVFTK QGNNEGPASE SLLYPRSDND EVARDLAFGS IVVGKRTAAL QA...PVFTK EDEEAATA ..... LRRSDEE SNARDLATAKWACQRPSRL T Y K N E E P F Y S DE . . . . . . . . . . . . . . . . . . E I N Q I F T K M N K V S E V Y .... P E K E . . E L K E L P P V T S E Q . . . . . . . . . . . . E I N Q I F A K K N K V S D V Y .... P E K E E K E L N D L P P A T R E Q . . . . . . . . . . . . E I I R R F I R M N K V . E V S .... P D R E . . E L K D F P P D V S E . . . . . . . . . . . . . E I A Y R F R Q G G A S Q S ...... D K T P D E . F H S L G A D S Q V . . . . . . . . . . . . . E I A S G F R Q G G A S Q S ...... D K T P E E L F H P L G A D S Q V . . . . . . . . . . . . . DIARAFEGQA HSGKGP ....... AGVELNS MQPVKETPGN A ......... E I T R A F E G Q G Q D A N R A .... E K G P I V E M N S M Q P V K E T A T V . . . . . . . . . . E I T R A F E G Q V Q T G T R G .... E K G P I M E M N S I Q P T K D T N A . . . . . . . . . . . D I T R A F E G Q A H G A D R S .... G K D G V M E M N S I E P A K E T T T N V . . . . . . . . . DISRGFEEQV ETSSPSSPPI EKNPMVEMNS IEPDKEVA ............ QISAAFH ..... RTPSLLEQ EVKPSTELEY LGPDEND ............. EIAAEFQKKS GSAHR ...... PKAAVEMKF LGATETV ............. EIAAEFRKKS GSAPP ...... RKATVQMEF LGSSETV ............. EIAAAFRRKK LPA ......... KSMTELED LRGGEEA ............. E I A A E F R K K H G G R P P . . . . . . K L R W I T A N F I I A S D Q V K K M K N D ....... SIEDYFMRVC GFRGTEAHEN PTFTDIIDDT TQY ................. EVNEMYEERI KPWKSGGWIP SSR.RTPQPT SSTPLVIVDS K ......... EVNEMYVEGV KPWKSGSWIS KEK.RVSEE ..................... EVNTLWEEGV LPWKSPSWVP PNK.RGTDYN ADDLMHDDQP FYKKMFGKK. EVNTMWEEGV LPWKSASWVP PSR.RGANYD AEEMTHDDKP LYKRMFSTK. E V Q E M W E E G V L P W K S S S W V P S S R . R N A G Y D V D A L Q H D E K P W Y K A M L .... EIQELWEEGV LPWKSEGWIP SSR.RGNNYD LEDLQHDDKP WYKAMLE... EVNDMYAEGV LPWKSASWVP VSK.RGADYN ADDLMHDDQP FYKSLFSRK. EVNDMYAEGV LPWKSASWVP TSQ.RGANYD ADALMHDDQP FYKKMFGKK. EVNEMYEENV LPWKSTKWII oSR.RTTDYD LDATRNDPRP FYKRMFTKEK E V N E M Y S E G V L P W K S S S W V P S S R . R G A E Y D V D A L Q H D D K P W Y K A M L .... EVNTMWLEGV PAWKSASWVP PER.RTADYD ADAIDHDDRP IYKRFFSS.. E V D E M W M D G V L P W K S E S W V P A S R . R D G D Y D N E K L Q H D E K P F Y K R M F .... EIQLLYEEGI KPWKSASWVP PSR.RGISSE ESKTEKKDWK KFLKFSKNSD EINKIFEPE .........................................
;i;;i~!i
i~ii}!....................
Snf3sacce E I D E L Y I K S S T G V V S P K F N K D I R E R A L K F Q Y D P L Q R L E D G K N T F V A K R N N Q a y n e u c r A M D R L F E I K P V Q N A N K N L M A E L N F D R N P E R E E S S S L D D K D RVTQTENAV. Qutdemeni SMETLFDKKP VWHAHSQLIR EL...RENEE AFRADMGASG KGGVTKEYVE Stp lar ath E M G Q V W R S H . . . W Y W S R F V E DGEYG .... N .A L E M G K N S N QAGTKHV... Sugr icco E M G Q V W K Q H W Y W S R Y V V DEDYP N G G L E M G K E G R IP.. KNV MstlnictaEMVIVWKEH...WFWSKFMTEVDYPGTRNGTSVEMSKGS..AGYKIV... Stp4ar ath E M N R V W K A H . . . W F W G K F I P DEA .... VNM G A A E M Q Q K S V .......... Gtr 2r icco E M V L L W R K H . . . W F W K K V M P V D M P L E D G W G A A P A S N N H K . . . . . . . . . . . H e x 6 r i c c o K M D I V W R D H . . . W F W K K I I G E E A A E E N N K M EAA . . . . . . . . . . . . . . . . . Huplchlke RVQALYARH...WFWNRVMG PAAAEVIAED EKRVAAASAI IKEEELSKAM Hxtchlke RVQALYARH...WFWKKVMG PAAQEIIAED EKRVAASQAI MKEERISQTM Hup2chlke TVPTMFARH...WLWGRVMG EKGRALVAAD EARKAGTVAF KVESGSEDGK AraeesccoHIERKLMAG...EKLRNIGV .............................. GalpesccoHIERNLMKG...RKLREIGAHD ............................ Glfzymmo EIEEMWRSQ K ......... XyleesccoELEALWEPE...TKKTQQTATL ............................ G l c p s y n s p QM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hgtlklula EIDQMWVDNI PAWRTANYIP QLPIVKDEEG NKLGLLGNPQ HLEDVHSNEK Lacpklula ELEVVFEAPN PRKASVDQAF LAQVRATLVQ RNDVRVANAQ NLKEQEPLKS Stllsacce EIDIIF . . . . . . . . . . AKAY E D G T Q P W R V A N H L P K L S L Q E V E D H A N A L G S Ma6tsacce EINELFRLGV PARKFKSTKV DPFAAAKAAA AEINVKDPKE DLETSVVDEG G t r 2 1 e i d o D L D G A A V P E G KHD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Consensus ..................................................
i!i~:i!ii'~;i:i:~i !!!!!!!!!i!'!~; ~ ~!!
Tgtptaeso Snf3sacce Qutdemeni Hup ichlke Hxtchlke Hup2chlke Sgtlklula Lacpklula Stllsacce Ma6tsacce Gtr21eido Consensus
751 800 DAKVDA ............................................ FDDETPRNDF RNTISGEIDH SPNQKEVHSI PERVDIPTST EILESPNKSS EA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . K ................................................. K ................................................. PASDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G L L D R S D S A S NSN D A D H V E K L S E AESV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Y D D E M E K E D F GEDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RSTPSVVNK ......................................... .................................................. ..................................................
801 850 Gtrlleido AAIKAAPHEP K ....................................... Snf3sacce GMTVPVSPSL QDVPIPQTTE PAEIRTKYVD LGNGLGLNTY NRGPPSLSSD 851 900 Snf3sacce S S E D Y T E D E I G G P S S Q G D Q S N R S T M N D I N D Y M A R L I H S T S T A S N T T D K F S 901 950 Snf3sacce G N Q S T L R Y H T A S S H S D T T E E D S N L M D L G N G L A L N A Y N R G P P S I L M N S S D E 951 i000 Snf3sacce E A N G G E T S D N L N T A Q D L A G M K E R M A Q F A Q S Y I D K R G G L E P E T Q S N I L S T S
i~,:~;:=========================== I!~i~:;:~!)ii~',i::!!i!:
!ii~:::!!~ji;:!!!iiiil i
i001 1034 Snf3sacce L S V M A D T N E H N N E I L H S S E E N A T N Q P V N E N NDLK
Proteins listed s u b s e q u e n t l y in italics are at least 90% identical to t h e paired t r a n s p o r t e r s listed in p a r e n t h e s i s and therefore are n o t i n c l u d e d in t h e a l i g n m e n t s . Araekleox (Araeescco); Gtrlmusmu, Gtrlorycu, Gtrlsussc, Gtrlbosta, Gtrlratno ( G t r l h o m s a ) ; Gtr2musmu (Gtr2ratno); Gtr3ratno ( G t r 3 m u s m u ) ; Gtr4ratno, Gtr4musmu (Gtr4homsa); Hxt6sacce (Hxt7sacce). Residues listed in t h e c o n s e n s u s s e q u e n c e are p r e s e n t in at least 75 % of t h e aligned t r a n s p o r t e r sequences. Residues i n d i c a t e d by boldface type are also c o n s e r v e d in at least one o t h e r f a m i l y of t h e U S A / M F S superfamily.
-
-:
:
: :
:
9: :
~
~.8~
Database accession numbers
Araeescco Araekleox Gal2sacce Galpescco Glcpsynsp Glfzymmo Gtrlbosta Gtrlgalga Gtrlhomsa Gtrlleido Gtrlmusmu Gtrlorycu Gtrlratno Gtrlsussc Gtr2galga Gtr2homsa Gtr21eido Gtr2musmu Gtr2ratno Gtr2ricco Gtr3canfa Gtr3 galga Gtr3homsa Gtr3musmu Gtr3oviar Gtr3ratno Gtr4homsa Gtr4musmu Gtr4ratno Gtr5homsa Gtr5orycu Gtr5ratno Gtr7ratno Gtrkluma Hex6ricco Hgtlklula Hup 1chlke Hup2chlke Hxt0sacce Hxtlsacce Hxt2sacce Hxt3sacce Hxt4sacce Hxt5sacce Hxt6sacce Hxt7sacce
SWISSPR OT
PIR
EMBL/GENBANK
P09830 P45598 P13181 P3 7021 P15729 P21906 P27674 P46896 Pl1166 Q01440 P 17809 P13355 P11167 P20303
B26430
J03732 X79598 M68547; M81879 U283 77 X15988; X16472 M60615 M60448 L07300 K03195; M20653 M85072 M22998; M23384 M21747 M13979; M22061 X17058 Z22932 J03810 M85073 X16986; X15684 J03145 L21753 L35267 M3 7785 M20681 M75135; X61093 L39214 D13962 M20747; M91463 M23383 X14771; J04524 M55531 D26482 L05195; D13871 X66031 Z47080 L08188 U22525 Y07520 X66855 D50617; Z46255 L07079; M82963 M33270 L07080; S52309 M81960; X67321 X77961; U00060 Z31691 Z31692
P 11168 Q01441 P14246 P 12336
A33865; JQ0383 S06973; S10014 A37855 A27217 A48442 A30310; S09705 A30797 A25949 S04223 $37476 A31318 S06920; S05319 A31556
P28568 P 11169 P32037
A41264 A31986 A41751
Q07647 P14672 P14142 P19357 P22732 P46408 P43427 Q00712
$38981 A33801; A49158 B30310 S03349; A32101 A36629
P 15686
SO7096
P43581 P32465 P23585 P32466 P32467 P38695 P39003 P39004
A39728; $38798 S 12200 $31294 $31314; $39817 $43742; $46726 $43185 $43186
$24344 $51081
i.i;;ii:-~i+7~
i;i;;Fi:?....
, :~,:::~-:,~.:~-.
.- :.i
..... ~: ,~ ..:. .-: :- :.l
. : iii! yil
Hxt8sacce Hxtasacce Hxt chlke Hxtcsacce Hxtdsacce Itr 1sacce Itr2sacce Lacpklula Ma3tsacce Ma6tsacce Qayneucr Qutdemeni Ragl klula Sgt 1schma Sgt2schma Sgt4schma Snf3sacce Stllsacce Stp 1arath Stp4arath Sugricco Tgtptaeso Xyleescco
SWISSPR OT P40886 P40885
P39924 P42833 P30605 P30606 P07921 P38156 P 15685 P11636 P15325 P 18631
P 10870 P39932 P23586
P09098
PIR $45159 $45153 $38435
A40538 B40538 A31776 $46182 SO7686 S04254; G31277 S08498 S 11295 A53153 B53153 C53153 A31928 S 12042 $25009 A26430; A27418
EMBL/GENBANK 7_,34098; Z49489 Z34098; X82621 X75440 U 18795 Z 46259 D90352 D90353 X06997 Z36167 M2 7823 X14603 X13525 X53 752 L25065 L25066 L25067 J03246 L07492 X55350 X66857 L08196 U39197 J02812; X0663
References GIIU:
!ii~i!~,,};!!,~i:i,!% i!i~ii:~,ii?Fi::,i~i:!:i
i:~iiii:i~,ii:~L;:?i
1 0 l s o n , A.L. and Pessin, J.E. (1996) Annu. Rev. Nutr. 16, 235-256. z Thorens, B. (1996) Am. J. Physiol. 270, G541-G553. 3 Baldwin, S.A. (1993) Biochim. Biophys. Acta 1154, 17-50. 4 Mueckler, M. (1993) Eur. J. Biochem. 219, 713-725. s Henderson, P.J.E (1993) Curr. Opin. Cell Biol. 5, 708-721. 6 Bisson, L.E et al. (1993) CRC Crit. Rev. Biochem. Mol. Biol. 28, 259-308. 7 Fischbarg, I. and Vera, I.C. (1995) Am. J. Physiol. 268, C1077-C1089. s Boorer, K.J. et al. (1994) J. Biol. C h e m . 269, 20417-20424. 9 Sofue, M. et al. (1992) Biochem. J. 288, 669-674.
lo Griffith, J. et al. (1992) Curr. Opin. Cell Biol. 4, 684-695. ~1 Marger, M.D. and Saier, M. (1993) Trends Biochem. Sci. 18, 13-20.
~2 Davies, A. et al. (1987)J. Biol. C h e m . 262, 9347-9352. 13 Hresko, R. et al. (1994) J. Biol. C h e m . 269, 20482-20488. !!ii;~!~iY!i;i':~: 14 Herbert, D. and Carruthers, A. (1992)J. Biol. C h e m . 267, 23829-23838.
is Henderson, P.J.F. (1992) Int. Rev. Cytol. 137, 149-208. 16 Gould, G. et al. (1991) Biochemistry 30, 5139-5145.
~_8~
H+/rhamnose symporter family
Summary ~-.
::
!,::: i
Transporters of the H§ symporter family, the example of which is the RHAT rhamnose-H § symporter of Escherichia coli (Rhatescco), mediate symport (H+-coupled substrate uptake) of rhamnose. The two known members of the family are found only in gram-negative bacteria. Statistical analysis reveals no apparent relationship between the amino acid sequences of the H§ symporter family and any other family of transporters. They are predicted to contain ten membrane-spanning helices by the hydropathy of their amino acid sequences and activities of reporter gene fusions 1 The amino acid sequences of the two known members of the H§ symporter family are more than 90% identical.
-
i
i
i
i-
~ ..
.
Nomenclature, biological sources and substrates i
:-::
9
.
:
:
CODE
DESCRIPTION
Rhatescco
Rhamnose-H+symporter [RHAT] Rhamnose-H+symporter [RHAT]
): .. ;. :::
~
.-
9
:
. .
Rhatsalty
[SYNONYMS!
ORGANISM
SUBSTRATE(S)
Escherichiac o l i [gram-negative bacterium] Salmonellatyphimurium [gram-negative bacterium]
H+/L-rhamnose
[COMMONNAMES]
H+/L-rhamnose
Cotransported ions are listed.
Proposed orientation of RHAT in the membrane
ii
!: .: ::L . . ..::
.
:
..i:.-~~:!~: :
:-
!:
-
r.,!
The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the inside and is folded ten times through the membrane 2. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. O U T S I D E
i...... 9
:.
:
NH 2 COOH
!.
::
.... :::::. : : E.... ,
16
57
6~) .
.
i"17
.
!131,,I.95 .
.
..
!:...: .:. !
::.... ...
INSIDE
~.8~
' 21~ .
~
i.
~I
Physical and genetic characteristics
Rhatescco Rhatsahy
AMINO ACIDS
MOL. W T
Km
CHR O M O S O M A L LOCUS
344 344
3 7 319 37 390
Rhamnose: 28/~M 2
88.25 minutes
M u l t i p l e a m i n o acid s e q u e n c e a l i g n m e n t s 1
50
Rhatescco MSNAITMGIF WHLIGAASAA CFYAPFKKVK KWSWETMWSV GGIVSWIILP 51 i00 Rhatescco WAISALLLPN FWAYYSSFSL STRLPVFLFG AMWGIGNINY GLTMRYLGMS i01 150 Rhatescco MGIGIAIGIT LIVGTLMTPI INGNFDVLIS TEGGRMTLLG VLVALIGVGI 151 200 Rhatescco VTRAGQLKER KMGIKAEEFN LKKGLVLAVM CGIFSAGMSF AMNAAKPMHE
i!iiU:i
201 250 Rhatescco AAAALGVDPL YVALPSYVVI MGGGAIINLG FCFIRLAKVK DLSLKADFSL 251 300 Rhatescco AKSLIIHNVL LSTLGGLMWY LQFFFYAWGH ARIPAQYDYI SWMLHMSFYV 301 350 Rhatescco LCGGIVGLVL KEWNNAGRRP VTVLSLGCVV IIVAANIVGI GMAN
!~:,i!!~i~:iii:!-i
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the alignments: R h a t s a h y (Rhatescco). Database accession numbers Rhatescco Rhatsalty
SWISSPR O T
PIR
EMBL/GENBANK
P27125 P27135
B42436; $26145 A42436
M85158; X60699 M85157
References I Tate, C. and Henderson, P.J.F. (1993) J. Biol. Chem. 268, 26850-26857. 2 Muir, J. et al. (1993) Biochem. J. 290, 833-842.
~8c~
H+/amino acid symporter family Summary ... ..:... 7 :
.
..
.
....
.
' v . "
.
.
:~..
....
. .. .... ..... .
:.:. . . . . . .
_
. . . . . . ..
.
-.--v---:'
.:.
:-:
.....
..
: - i i - : i l . . . y : :::.: .... . :.......
...
:....:.
....
...:;-:....-..:.: ...... ........
Transporters of the H+/amino acid symporter family, the example of which is the PHEP phenylalanine transporter of Escherichia coli (Phepescco), mediate proton-dependent uptake of one or more amino acids *. Members of the family occur in both gram-positive and gram-negative bacteria and in various fungi. Statistical analysis reveals no apparent relationship between the amino acid sequences of the H+/amino acid symporter family and any other family of transporters. They are predicted to form 12 membrane-spanning helices by the hydropathy of their amino acid sequences and the activities of reporter gene fusions ~. Eukaryotic transporters of the H+/amino acid symporter family may be glycosylated. Several amino acid sequence motifs are highly conserved in the H+/amino acid symporter family.
Nomenclature, biological sources and substrates CODE
DESCRIPTION [SYNONYMS]
Alplsacce
High-affinitybasic amino acid transporter [ALP1] L-Asparaginetransporter
Anspsalty
:.:.. :: . . . . -
....
,
....
..:....:: .
...
....
:.:..i::::..ii...
.
Canlsacce
Saccharom yces cerevisiae [yeast] Candida albicans
Cycaescco ..... :.......
DipSsacce Gabpbacsu .... - - : . .
Gabpescco Gaplsacce Hiplsacce :.>:...::.:.....
Hutmbacsu Inal triha Isp5schpo
.... .....: : . > ?
~_9(
High-affinity arginine transporter [CAN1, YELO63c] High-affinity basic amino acid transporter [CAN1] Serine-alanine, -glycine transporter [CYCA, DAGA] Dicarboxylicamino acid transporter [DIPS] GABAtransporter [4-amino butyrate transporter, GABP] GABAtransporter [4-amino butyrate transporter, GABP] Generalamino acid transporter [GAP1, YKR039w] High-affinity histidine transporter [HIP1, G7572] Putative histidine transporter [HUTM, EE57d] Generalamino acid transporter [INA1, INDA1]
Putativeamino acid transporter lISPS] Lyplsacce High-affinitylysine transporter [LYP1] Lyspescco Lysinetransporter [LYSP,CADR]
Lysine, arginine L-Asparagine
Aromatic amino [gram-negative bacterium] acids Corynebacterium Aromatic amino glutamicum acids [gram-positive bacterium] Escherichia coli
7 ~
SUBSTRATE(S)
[gram-negative bacterium]
Aropescco Aromaticamino acid transporter [AROP] Aropcorgl Aromaticamino acid transporter [AROP]
Canlcanal
- v
[ANSP]
OR GANISM [COMMON NAMES] Saccharomyces cerevisiae [yeast] Salmonella typhimurium
Arg~nme
Lysine, arginine
[yeast] Serine, alanine, [gram-negative bacterium] glycine Saccharomyces Dicarboxylic cerevisiae [yeast] acids Bacillus subtilis 4-amino butyrate [gram-positive bacterium] Escherichia coli 4-amino butyrate [gram-negative bacterium] Saccharomyces Many amino cerevisiae [yeast] acids Saccharornyces Histidine cerevisiae [yeast] Bacillus subtilis Histidine [gram-positive bacterium] Trichoderrna Many amino harzianum acids
Escherichia coli
lflmgus]
Schizosaccharomyces pombe [yeast] Saccharom yces cerevisiae [yeast] Escherichia cob"
[gram-negative bacterium]
Many amino acids Lysine Lysine
.......
CODE
.
Paplsacce Phepescco Proysalty G:i~i:!!
Putxemeni Roccbacsu Rocebacsu :....-:.::
:-
......
Tat2sacce
:!~i~====================== .::
Vallsacce .iiii::)%iii::ii:. ~:;i
!i:i;iii!i~i':~!:;
DESCRIPTION [SYNONYMS] General amino acid transporter [PAP1] Phenylalanine transporter [PHEP] Prolme transporter [PROY]
Proline transporter [PUT4] High-affinity proline transporter [PUTX, PRNB] Amino acid transporter [ROCC, IPA78dl Amino acid transporter
[ROC~,l
High-affinity tryptophan transporter [TAT2, SCM2, LTG3, TAP2] Valine-tyrosine-tryptophan transporter [VAIL1, VAP 1, TAT1, TAP1, YBR069c, YBR0710]
SUBSTRATE(S)
ORGANISM [COMMON NAMES] Saccharomyces cerevisiae [yeast] Escherichia coil [gram-negative bacterium] Salmonella typhimurium [gram-negative bacterium] Saccharomyces cerevisiae [yeast] Emericella nidulans [mold] Bacillus subtilis [gram-positive bacterium] Bacillus subtilis [gram-positive bacterium] Saccharomyces cerevisiae [yeast]
Prolme, 4-amino butyrate Proline
Saccharomyces cerevisiae [yeast]
Yalme, alanine, tyrosine, tryptophan
Many amino acids Phenylalanine Proline
Arginme, omithine Arginine, ornithine Tryptophan
P h y l o g e n e t i c tree I
....... .. :.~ .:.
i:i~.~--i:!i::<::
i~ii~::i!~<:i:
r-~ 1
!}~i!'!:.il}?~: -<~. ?:~iii!:-7 !::i?i{!k~ .....
I
i!~-Jii:i!:~':.i i~;!:!i:i::!:i
1
I
J
Aropescco mhepescco Proysalty Cycaescco Anspsalty Gabpbacsu Gabpescco iropcorgl Roccbacsu Rocebacsu Hutmbacsu Lyspescco maplsacce Vallsacce Gaplsacce Hiplsacce Tat2sacce inaltriha IspSschpo Put4sacce Putxemeni ilplsacce Canlsacce Lyplsacce Canlcana! mip5sacce
Proposed orientation of PHEP in the membrane i:,i;i:ii:::[i:::
........
The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane 2. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75% of the aligned transporters (see below) are shown. Consensus residues indicated by an asterisk are not conserved in PHEP.
Z91
OUTSIDE . L . ' i:i
:
m (/,:
:
WG
SPF
E G
5,
49
'.4
.l~ ,,
PG
F
GU: >.
Y
LGI
.
G~
p
G
v
GI GG
F
G :
H
N
13
Y N
K
iGE
m
~
V
,i
g2
~, ,
156
g
L
S
IR
NS
2sl
slo
PG
R E EF
P
F
COOH
F
P
P
P
INSIDE ..
.
~
NH
2
Physical and genetic characteristics
..
, : :" :! ' /~:: :,:!~ 9
:,': :i~::: :: ~i!i:: : '(~:~!i!:~[i ~i~/::i~::i:.' ..
:::i~::~i:::;:~ V::::: :::i:::: : ~i/: :(ii!:' :.:.~.:.!~i....... :~.
~_9~
Alp 1sacce Anspsalty Aropescco Aropcorgl Canlsacce C a n 1canal Cycaescco Dip5sacce Gabpbacsu Gabpescco Gap 1sacce H i p l sacce Hutmbacsu I n a l triha Isp5schpo Lypsacce Lyspescco Paplsacce Phepescco Proysalty Put4sacce Putxemeni Roccbacsu Rocebacsu Tat2sacce Vallsacce
AMINO ACIDS
MOL. W T
574 497 456 463 590 571 470 608 469 466 602 603 475 573 543 611 488 566 458 292 627 570 470 467 592 619
64 066 54 004 49 809 49 268 65 785 63 317 51 659 68 097 51084 51 080 65 655 66 006 51 581 62 850 60 074 68 118 53 471 62 707 50 667 31 824 68 786 63 101 51 730 51 634 65 404 68 757
CHROMOSOMAL LOCUS 2.58 m i n u t e s Chromosome 5 95.4-95.42 minutes C h r o m o s o m e 16 52 ~ 60.17 m i n u t e s C h r o m o s o m e 11 Chromosome 7 335 ~
C h r o m o s o m e 14 48.34 m i n u t e s C h r o m o s o m e 11 12.96 m i n u t e s C h r o m o s o m e 15 327 ~ 354 ~ C h r o m o s o m e 15 Chromosome 2
Multiple amino acid sequence alignments 1 Pap l s a c c e . . . . . . . . . . . . . . .
.
Vallsacce Gaplsacce Hiplsacce Tat2sacce Inaltriha Put4sacce Alp isac c e Canlsacce Lyplsacce Canlcanal Dip5sacce
.
.
::
-
i::i~iiiiii::il~ ..
/ 7 : :
..
:::
............... MDDSV SFIAKEASPA .................... MSNTSSYEKN ................. MPR NPLKKEYWAD ..............................
QYSHSLHERT HSEKQKRDFT NPDNLKHN...GITIDSEFL VVDGFKPATS PAFENEKEST MTEDFISSVK RSNEELKERK ................... M ...MVNILPF HKNNRHSAGV VTCADDVSGD GSGGDTKKEE NVVQVTESPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MDE TVNI QM . . . . . . . . . . . . . . . . . . . . . . . . MTNSKE DADIEEKHMY NEPVTTLFHD MGRFSNIITS NKWDEKQNNI GEQSMQELPE DQIEHEMEAI DPSNKTTPYS ................................................. M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MKMPLKKMFT STSPRNSSSL
.
-?_
~::
50 MEKSA EFEVTD.. SA LYNNFNTSTT ASLTP . . . . .
.:7 (-.(
: .:--; ..<.: :
"
51 i00 .................................................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MKNASTV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MCTPRG LTPPWFFIVL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MVDQVKV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MKTQ TTHAAEQHAA .................................................. .................................................. .................................................. .................................................. .................................................. ..................................................
Aropescco Phepescco Proysalty Cycaescco Anspsalty Gabpbacsu Gabpescco Aropcorgl Roccbacsu Rocebacsu Hutmbacsu Lyspescco Paplsacce Vallsacce Gaplsacce Hiplsacce Tat2sacce Inaltriha Isp5schpo Put4sacce Putxemeni Alplsacce Canlsacce Lyplsacce Canlcanal Dip5sacce Consensus
ITEKQDEVSG QTAEPRRTDS KSILQRKCKE FFDSFKRQLP P..DRNSELE TQEPITIPSN GSAVSIDETG SGSKWQDFKD ...SFKRVKP IEVDPNLSEA TFVTELTSKT DSAFPLSSKD SPGINQTTND .ITSSDRFRR NE ...... DT SNFGFVEYKS KQLTSSSSHN SNSSHHDDDN .QHGKRNIFQ RCVDSFKSPL SKEESGHVTP EKGDNVVDYQ ASTTVLPSEG PERDANWFTR NGLNV..DSF ........ MP AMKRKKLDME SSRWFPKGET C...FQRWYR SFLPP..EDG SGSRNNHRSD NEKDDAIRME KISKNQSASS NGTIREDLIM DV..DLEKSP . . . . . . . . . . . . . . . . . . . . . . . . . MSPPS AKSMEEGRTP SVQYGYGDPK SKEGQYEINS SSIIKEEEFV DEQYSGENVT KAITTERKVE DDDAAKETE. VEASQTHHRR GSIPLKDEKS KELYPLRSFP TRVNGEDTFS MEDGIGDED. IDEKQYNTKK KHGSLQGGAI AD...VNSIT NSLTRLQVVS HEPDIDEDE. PEDYEKYRMG SSNESHQKSV QPISSSISKS NKKTKHQTDF VQDSDIIEA. DSDHDAYYSK QNPDNFPVKE QEIYNIDLEE NNVSSRSSTS TSPSARDDSF ..................................................
Aropescco Phepescco Proysalty Cycaescco Anspsalty Gabpbacsu Gabpescco
i01 _MMEGQQHGE SEDTASNQEP IGQKLMESNN VADDQAPAEQ KRRWLNAHEE ..MNQS..QS ..MGQSSQPH
I I . I I I E i K E HSEESRIIII NGLVHR[III FVDSFRRAESQRLEEDNDLE
QLKRGLKNRH TLHRGLHNRH KLKRGLSTRH SLRRNLTNRH GYHKAMGNRQ GLKKELKTRH ELGGGLKSRH
IQLIALGGSI IQLIALGGAI IRFMALGSAI IQLIAIGGAI VQMIAIGGAI MTMISIAGVI VTMLSIAGVI
GTGLFLGSAS GTGLFLGIGP GTGLFYGSAD GTGLFMGSGK GTGLFLGAGA GAGLFVGSGS GASLFVGSSV
150 VIQSAGP.GI AIQMAGP AV AIKMAGP.SV TISLAGP.SI RLQMAGP.AL VIHSTGP.GA AIAEAGP.AV
~_9~
H+/amino acid symporter family
... L
r
/.
.-<.:
X ',: ,:./.
.. ;:!-:;:: ..
Aropcorgl Roccbacsu Rocebacsu Hutmbacsu Lyspescco Paplsacce Vallsacce Gaplsacce Hiplsacce Tat2sacce Inaltriha Isp5schpo Put4sacce Putxemeni Alplsacce Canlsacce Lyplsacce Canlcanal Dip5sacce Consensus
.... MAKSNE G L G T G L R T R H L T M M G L G S A I .... M Q N H K N E L Q R S M K S R H L F M I A L G G V I .MNTNQDNGN Q L Q R T M K S R H L F M I S L G G V I .MNLQENSSQ Q L K R T M K S R H L F M I S L G G V I VSETKTTEAP GLRRELKARH LTMIAIGGSI DGTKSMKSNN HLKKSMKSRHVVMMSLGTGI SQEK ..... N N L T K S I K S R H L V M I S L G T G I EKVAIITAQT PLKHHLKNRH LQMIAIGGAI EQEDI..NNT NLSKDLSVRH LLTLAVGGAI DGS.. FDTS N L K R T L K P R H L I M I A I G G S I KKKHYGPGMV ELERPMKARH LHMIAIGGSI KPQ ....... K L K R T L T A R H IQMIGIGGAI S V D G D S E P H K .LKQGLQSRH V Q L I A L G G A I TLEGEIEEHT ATKRGLSSRQ LQLLAIGGCI ..SSPQ.ERR E V K R K L K Q R H I G M I A L G G T I ..EGEV.QNA E V K R E L K Q R H IGMIALGGTI ..EEAHYEDK H V K R A L K Q R H IGMIALGGTI ~ EVKRDLKARH VSMIAIGGTI AVPDGKDENT RLRKDLKARH ISMIAIGGSL . . . . . . . . . . . L ...... RH ...... GG.I
GAGLFLGTGV GIRAAGP.AV GTGLFLGSGF TISQAGPLGA GTGFFLGTGF TINQAGPLGA GTGLFLSTGY TLHQAGPGGT GTGLFVASGA TISQAGPGGA GTGLLVANAK GLSLAGPGSL GTGLLVGNGQ VLGTAGPAGL GTGLLVGSGT ALRTGGPASL GTGLYVNTGA ALSTGGPASL GTGLFVGSGK AIAEGGPLGV GAGFFVGSGG ALAKGGPGSL GTGVWVGSKN TLREGGAASV GTGLLVGTSS TLHTCGPAGL GTGLFVGTST VLTQTGPAPL GTGLIIGIGP PLAHAGPVGA GTGLFIGLST PLTNAGPVGA GTGLFVGIST PLSNAGPVGS GTGLFISTGS LLHTTGPVMS GTGLLIGTGT ALLTGGPVAM G T G L L . G . . . . . . . . GP...
..
.
.
::.;. ;.-. .. ::;,..~:...:.:..?ii!i:~;!i,-i i;-
;,:::;% i::.: ._
..... .......... ::,.'!;~!;:~::.~::i;" :
~.~...::~,.#, .~..:
151 Aropescco ILGYAIAGFI AFLIMRQLGE Phepescco LLGYGVAGII AFLIMRQLGE Proysalty LLAYIIGGVA AYIIMRALGE Cycaescco I F V Y M I I G F M L F F V M R A M G E A n s p s a l t y A L V Y L I C G I F SFFILRALGE GabpbacsuVVSYALAGLL VIFIMRMLGE Gabpescco LLAYLFAGLLVVMIMRMLAE Aropcorgl LLAYIIAGAIVVLVMQMLGE Roccbacsu IAAYIIGGFL MYLVMLCLGE Rocebacsu VLSYLVGGFI MFLTMLCLGE Hutmbacsu ILAYVIGGLM MYLVMQCLGE Lyspescco LLSYMLIGLM VYFLMTSLGE Paplsacce VIGYVMVSFV TYFMVQAAGE Vallsacce VLGYGIASIM LYCIIQAAGE G a p l s a c c e L I G W G S T G T M IYAMVMALGE Hiplsacce VIDWVIISTC LFTVINSLGE T a t 2 s a c c e V I G W A I A G S Q IIGTIHGLGE Inaltriha FVDFLIIGIM MFNVVYALGE Isp5schpo LICYSLVGSM VLMTVYSLGE Put4sacce FISYIIISAV IYPIMCALGE Putxemeni LMSYIVMASIVWFVMNVLGE A l p l s a c c e L I S Y L F M G T V IYSVTQSLGE Canlsacce LISYLFMGSL AYSVTQSLGE Lyplsacce LIAYIFMGTI VYFVTQSLGE Canlcanal LISFLFVTTI CFSVTQSLGE Dip5sacce LIAYAFVGLL VFYTMACLGE C o n s e n s u s ...Y...G . . . . . . . . . LGE
200 M V V E E P V A ..... G S F S H F A Y K Y W G S F A G F M V V E E P V S ..... G S F A H F A Y K Y W G P F A G F M S V H N P A A ..... S S F S R Y A Q E N L G P L A G Y L L L S N L E Y ..... K S F S D F A S D L L G P W A G Y L V L H R P S S ..... G S F V S Y A R E F L G E K A A Y M S A V N P T S ..... G S F S Q Y A H D A I G P W A G F MAVATPDT GSFSTYA DKAIGRWAGY M A A A R P A S ..... G S F S R Y G E D A F G H W A G F L A V A M P V A ..... G S F Q A Y A T K F L G Q S T G F L A V A F P V S ..... G S F Q T Y A T K F I S P A F G F L S V A N A V T ..... G S F Q K Y A T T F I G P S T G F L A A Y M P V S ..... G S F A T Y G Q N Y V E E G F G F M G V T Y P T L P .... G N F N A Y N S I F I S K S F G F L G L C Y A G L T .... G N Y T R Y P S I L V D P S L G F L A V I F P . I S .... G G F T T Y A T R F I D E S F G Y L S A A F P . V V .... GGFNVYS M R F I E P S F A F I T V R F P . V V .... G A F A N Y G T R F L D P S I S F L A I M Y P . V S .... GSFYTYS A R F I D P A W G F L A V A F P . I N .... G S F H T Y G T R F I H P S W G F MVCFLPGDGS DSAGSTANLV TRYVDPSLGF M T T Y L P I R G .... VSVPYLI G R F T E P S I G F M A T F I P V T S ..... SFSVFA Q R F L S P A L G A M A T F I P V T S ..... SFTVFS Q R F L S P A F G A M A T F I P V T S ..... SITVFS K R F L S P A F G V M A T Y I P I S G ..... S F A Q F V T R W V S K S C G A M A S Y I P L D G ...... FTSYA S R Y V D P A L G F ..... P . . . . . . . . . F . . . . . . . . . . . . G.
:.:.::.: :...:::-.:::-. :::::::::::::::::::::::: i li;!:i:!:tif.~.-)i il;:i
!94
201 250 Ar opescco A S G W N Y W V L Y V L V A M A E L T A VGKYIQFW.. YPEIPT... W V S A A V F F V V I P h e p e s c c o L S G W N Y W V M F V L V G M A E L T A AGIYMQYW.. F P D V P T . . . W I W A A A F F I I I
iili:".(/! 9
_
.
.
.:..... .
9
..
.
.
.
..
!i:=:(i
P r o y s a l t y I T G W T Y C F E I L I V A I A D V T A FGIYMGVW.. F P A V P H . . . W I W V L S V V L I I C y c a e s c c o F T G W T Y W F C W V V T G M A D V V A ITAYAQFW.~ FPDLSD W VASLAVIVLL Anspsalty VAGWMYFINW AMTGIVDITA VALYMHYWGA FGDVPQ W VFALGALTIV G a b p b a c s u T I G W L Y W F F W V I V I A I E A I A GAGIIQYW.. FHDIPL W LTSLILTIVL G a b p e s c c o T I G W L Y W W F W V L V I P L E A N I A A M I L H S W . . VPGIPI W LFSLVITLAL A r o p c o r g l S L G W L Y W F M L I M V M G A E M T G A A A I M G A W . . F.GVEP W IPSLVCWFF R o c c b a c s u M I G W L Y W F S W A N T V G L E L T S AGILMQRW.. L P S V P I . . . W I W C L V F G I V I R o c e b a c s u A F G W L Y W L G W A V T C A I E F L S AGQLMQRW.. F P H I D V . . . W I W C L V F A A L M H u t m b a c s u M V G I M Y W I N W V V T V G S E F T A SGILMQRW.. F P D S S V . . . W M W S A I F A A L L L y s p e s c c o A L G W N Y W Y N W A V T I A V D L V A AQLVMNWW.. F P D T P G . . . W I W S A L F L G V I P a p l s a c c e A T T W L F C I Q W L T V L P L E L I T S S M T V K Y W .... N . D T I N A D V F I V I F Y V F L V a l l s a c c e A V S V V Y T I Q W L T V L P L Q L V T A A M T V K Y W ...... T S V N A D I F V A V V F V F V G a p l s a c c e A N N F N Y M L Q W L V V L P L E I V S A S I T V N F W .... G T D P K Y R D G F V A L F W L A I H i p l s a c c e A V N L N Y L A Q W L V L L P L E L V A A S I T I K Y W .... N . D K I N S D A W V A I F Y A T I T a t 2 s a c c e V V S T I Y V L Q W F F V L P L E I I A A A M T V Q Y W .... N . S S I D P V I W V A I F Y A V I I n a l t r i h a A M G W N Y V L Q W A A V L P L E L T V C G I T I S Y W .... N S E . I T T A A W I S L F L G V I Isp5schpo T L G W N Y L A S F L A T Y P L E L I T A S I C L Q F W ..... IN.INSG I W I T V F I A L L P u t 4 s a c c e A T G W N Y F Y C Y V I L V A A E C T A A S G W E Y W T T A V P K G V ..... W I T I F L C V V Putxemeni ASGYNYWYSF AMLLACEVST M.ALLSFLSCWNPDNVGHCL GLIIEYWNPP Alplsacce TNGYMYWLSW CFTFALELSV LGKVIQYWT..EAVPLA... AWIVIFWCLL Canlsacce ANGYMYWFSW AITFALELSV VGQVIQFWT..YKVPLA... AWISIFWVII Lyplsacce SNGYMYWFNW AITYAVEVSV IGQVIEYWT..DKVPLA... AWIAIFWVII Canlcanal ANGWLYWFSW AVTFGLELSV VGQVIQFWT..DAVPLA... AWISIFFVIL Dip5sacce AIGYTYLFKY FILPPNQLTAAALVIQYWIS RDRVNPG...VWITIFLVVI C o n s e n s u s ..G..Y . . . . . . . . . . E . . . . . . . . . . W . . . . . . . . . . . . . . . . . . . . . .
y
!i.:L/.).
i!::,~!j~::ii:ili;i
bT
!ii,!~i!ii~i
[~ !i:;:::::ii~::i~:~:ili :iil; !
Aropescco Phepescco Proysalty Cycaescco Anspsalty Gabpbacsu Gabpescco Aropcorgl Roccbacsu Rocebacsu Hutmbacsu Lyspescco Paplsacce Vallsacce Gaplsacce Hiplsacce Tat2sacce Inaltriha Isp5schpo Put4sacce Putxemeni Alplsacce Canlsacce Lyplsacce Canlcanal Dip5sacce Consensus
251 300 NAINLTNVKV FGEMEFWFAI IKVIAVVAMI I..FGGW.LL FSGNGGPQAT NAVNLVNVRL YGETEFWFAL IKVLAIIGMI G..FGLW.LL FSGHGGEKAS CAINLMSVKV FGELEFWFSF FKVATIIIMI VAGIGII VWGIGNGGQPTG LTLNLATVKM FGEMEFWFAM IKIVAIVSLIWGLVMVAMH FQSPTGVEAS GTMNMIGVKW FAEMEFWFAL IKVLAIVIFLWG.TIFLGT GQPLEGNATG TLTNVYSVKS FGEFEYWFSL IKVVTIIAFL IVGFAFIFGF A...PGSEPV TGSNLLSVKN YGEFEFWLAL CKVIAILAFI FLGAVAISGF Y PYAEVS AVVNLVAVRG FGEFEYWFAF IKVAVIIAFL IIGIALIFGW L PGSTFV FLINALSVRS FAEMEFWFSS IKVAAIILFI VIGGAAVFGL IDFKGGQETP FILNAITTKA FAESEFWFSG IKILIILLFI ILGGAAMFGL IDLKGGEQAP FICNAFSVKL FAETEFWFSS VKIVTIILFI ILGGAAMFGL ISLNGTADAP F L L N Y I S V R G F G E A E Y W F S L I K V T T V I V F I I V G V L M I I G I ..FKGAQPAG L F I H F F G V K A Y G E T E F I F N S C K I L M V A G F I I L S V V I N C G G AGVD .... GY I I I N L F G S R G Y A E A E F I F N S C K I L M V I G F V I L A I I I N C G G AGDR .... RY V I I N M F G V K G Y G E A E F V F S F I K V I T V V G F I I L G I I L N C G G GPTG .... GY A L A N M L D V K S F G E T E F V L S M I K I L S I I G F T I L G I V L S C G G GPHG .... GY V S I N L F G V R G F G E A E F A F S T I K A I T V C G F I I L C W L I C G G GPDH .... EF IIINLFGALG YAEEEFWASC FKLAATVIFM IIAFVLVLGG GPKDGRYHEY CFVNMFGVRG YGEVEFFVSS LKVMAMVGFI ICGIVIDCGG VRTDHR..GY V I L N F S A V K V Y G E S E F W F A S I K I L C I V G L I I L S F I L F W G G G P N H D R .... VSVGLWIAIV LVESEFWFAG LKILAIIGLI ILGVVLFFGG GPNHER T S M N M F P V K Y Y G E F E F C I A S I K V I A L L G F I I F S F C V V C G A G Q S D G P .... T I M N L F P V K Y Y G E F E F W V A S I K V L A I I G F L I Y C F C M V C G A G . V T G P .... T L M N F F P V K V Y G E F E F W V A S V K V L A I M G Y L I Y A L I I V C G G SH.QGP .... T I F N F F P V K F Y G E V E F W I A S I K I I A V F G W I I Y A F I M V C G A G K . T G P .... V A I N W G V K F F G E F E F W L S S F K V M V M L G L I L L L F I I M L G G G P N H D R .... ...N...V .... E.EF ..... K . . . . . . . . . . . . . . . . G . . . . . . . . . . .
~_95
.:.:
.
b ":(
!
. . . .
.. -.-
.
..-,
. .....
=
,
...
.
:
.
.:
.
.}...
--.
:..
:
- : ) : .
:...,.:
.:.:...
.
,
.
:::. : :..:.
...., .
.. ::..
.9: . .
':
i:: : - :
. . . .
.:. --:
:.
..
..... -......-: ...
,:..
. ..
t:
....
:-:2-. : : : . i
. : . .
.:.... . . . .
): !: :::-:..
.
: - : . . . . . . ;:. ::.:-. .... .. :.z:..:.:- .;. .......... .....:
.:.>. . ....
Z96
Aropescco Phepescco Proysalty Cycaescco Anspsalty Gabpbacsu Gabpescco Aropcorgl Roccbacsu Rocebacsu Hutmbacsu Lyspescco Paplsacce Vallsacce Gaplsacce Hiplsacce Tat2sacce Inaltriha Isp5schpo Put4sacce Putxemeni Alplsacce Canlsacce Lyplsacce Canlcanal Dip5sacce Consensus
301 .VSNLWDQGG .IDNLWRYGG .IHNLWSNGG .FAHLWNDGG .FHLITDNGG GFSNLTGKGG GISRLWDSGG GTSNFIGDHG FLSNFMTDRG FLTHFYED.G MLSNFTDHGG W.SNWTIGEA IGGKYWRDPG IGAEYWHNPG IGGKYWHDPG IGGKYWHDPG IGAKYWHDPG WGARYWYDPG IGATI.FRKN LGFRYWQHPG LGFRYWQDPG IGFRYWRNPG VGFRYWRNPG IGFRYWRNPG VGFRYWRNGY LGFRYWRDPG ......... G
351 Aropescco TAAEADNPEQ Phepescco TAAEARDPEK Proysalty TAGEAKDPEK Cycaescco TAAETKDPEK AnspsaltyAAGECKDPQK GabpbacsuAAGETSNPIE GabpesccoAAAESDTPEK AropcorglAAAESDKPRE RoccbacsuAAGESESPEK RocebacsuAAGESEDPEK Hutmbacsu TAGESANPQK LyspesccoAAGESEDPAK Paplsacce SINEQSNPRK Vallsacce SAAEQENPTK GaplsacceAASESVEPRK Hiplsacce SAAESKNPRE Tat2sacce ASGET..DPK InaltrihaAAAESTNPTK Isp5schpoAASETKNPAK Put4sacce TSAECADQRR PutxemeniAAGEVEAPRR Alplsacce TAGEAANPRK Canlsacce TAGEAANPRK Lyplsacce TAGEAANPRK
350 FLPHA . . . . . . . . . . . FTGL V M M M A I I M F S F G . G L E L V G I FFATG . . . . . . . . . . . WNGL I L S L A V I M F S F G . G L E L I G I FFSNG . . . . . . . . . . . WLGM I M S L Q M V M F A Y G . G I E I I G I W F P K G . . . . . . . . . . . LSGF F A G F Q I A V F A F V . G I E L V G T FFPHG . . . . . . . . . . . L L P A L V L I Q G V V F A F A . S I E L V G T FFPEG . . . . . . . . . . . ISSV L L G I V V V I F S F M . G T E I V A I FMPNG . . . . . . . . . . . FGAV L S A M L I T M F S F M . G A E I V T I FMPNG . . . . . . . . . . . I S G V A A G L L A V A F A F G . G I E I V T I LFPNG ........... VLAV MFTLVMVNFS FQ.GTELVGI L F P N G . . . . . . . . . . . IKAM L I T M I T V N F A F Q . G T E L I G V L F P N G . . . . . . . . . . . FLAV F I A M I S V S F A F S . G T E L I G V P F A G G . . . . . . . . . . . FAAM I G V A M I V G F S F Q . G T E L I G I S F A E G S G A T R ...... FKGI C Y I L V S A Y F S F G . G I E L F V L PFAHG . . . . . . . . . . . FKGV C T V F C Y A A F S Y G . G I E V L L L A F A G D T P G A K ...... FKGV C S V F V T A A F S F A . G S E L V G L A F V G H S S G T Q ...... FKGL C S V F V T A A F S Y S ~ CLANG . . . . . . . . . . . FPGV L S V L V V A S Y S L G . G I E M T C L A F K N G . . . . . . . . . . . FKGF C S V F V T A A F S F S . G T E L V G L A F I H G . . . . . . . . . . . FHGF C S V F S T A A F S Y A . G T E Y I G I AFAHHLTGGS LGN...FTDI YTGIIKGAFA FILGPELVCM AFNPYLVPGD TGK...FLGF WTALIKSGFS FIFSPELITT AWGPGIISSN KNEGRFL.GW VSSLINAAFT YQ.GTELVGI AWGPGIISKD KNEGRFL.GW VSSLINAAFT FQ.GTELVGI AWGPGIISSD KSEGRFL.GW VSSLINAAFT YQ.GTELVGI AWGDGILV.~ N N N G K Y V A A F V S G L I N S I F T F Q . G S E L V A V A F K E Y S T A I T G G K G K F V . S F V A V F V Y S L F S YT G I E L T G I .... G . . . . . . . . . . . . . . . . . . . . . . . F .... G.E .... 400 S I P K A T N Q V I Y R I L I F Y I G S L A V L L S L M P W T R V T A ..... S I P K A V N Q V V Y R I L L F Y I G S L V V L L A L Y P W VEVKS ..... S I P R A I N S V P M R I W Y F . . . . . . . . . . . . . . . . MSA ..... S L P R A I N S I P I R I I M F Y V F A L I V I M S V T P W SSVVP ..... M V P K A I N S V I W R I G L F Y V G S V V L L V L L L P W N A Y Q A ..... S V T K A T R S V V W R I I V F Y V G S I A I V V A L L P W .~ H I V R A T N S V I W R I S I F Y L C S I F V V V A L I P W ...NMP...G AISLAVRAVI WRISVFYLGS VLVITFLMPY ESINGA...D TLPKSIRNVI WRTLFFFVLA MFVLVAILPY KTAGVI...E TIPRSIKQTV WRTLVFFVLS IIVIAGMIPW KQAGVV...E DIPRSIRNVA WRTVIFFIGA VFILSGLISW KDAGVI...E NIPRAVRQVF WRILLFYVFA ILIISLIIPY TDPSLL...R S T P V A A K R S V Y R I L I I Y L L T M I L I G F N V P H NNDQLMG... SIPNACKKVVYRILLIYMLT TILVCFLVPY NSDELLG..S SVPKAAKQVF WRITLFYILS LLMIGLLVPY NDKSLI...G TIPKAAKRTF WLITASYVTI LTLIGCLVPS NDPRLLN..G GLPSAIKQVF WRILFFFLIS LTLVGFLVPY TNQNLL...G NMPGAIKQVF WRITIFYILG LFFVGLLINS DDPALLS..S AFPKAVKQVF IRVSLFYILA LFVVSLLISG RDERLTT..L NIAKASRRFV WRLIFFYVLG TLAISVIVPY NDPTLVNALA NIPKATKRFI YRVFTFYILG SLVIGVTVAY NDPTLEAGVE A L P R A I K K V V V R I L V F Y I L S L F F I G L L V P Y N D P K L .... D S V P R A I K K V V F R I L T F Y I G S L L F I G L L V P Y N D P K L .... T T V P R A I N K V V F R I V L F Y I M S L F F I G L L V P Y N D S R L .... S
ili~i=:!/y i:= ::ii
Canlcanal TAGEA.. SPR ALRSAIRKVM FRILVFYVLC MLFMGLLVPY NDPKL .... T Dip5sacce V C S E A E N P R K SVPKAIKLTV YRIIVFYLCT VFLLGMCVAY N D P R L L S T . K Consensus .A.E...P .... P ........ RI..FY ........... P ...........
401 450 Aropescco ...... DTSP FVLIFHELGD TFVANALNIV V L T A A L S V Y N S...CVYCNS i~i~::!=:::~=: Phepescco ...... NSSP FVMIFHNLDS N V V A S A L N F V ILVASLSVYN S...GVYSNS Proysalty ...... RCSS LCLSIRGIRS AQTAVHLC . . . . . . . . . . . . . . . . . . . . . . :y :: !iii=k:i:~:ilCycaescco ...... EKSP FVELFVLVGL PAAASVINFV V L T S A A S S A N S...GVFSTS Anspsalty ...... GQSP FVTFFSKLGV PYIGSIMNIV V L T A A L S S L N S...GLYCTG =. . . . . Gabpbacsu .... ILE.SP FVAVLEHIGV PAAAQIMNFI V L T A V L S C L N S...GLYTTS Gabpescco .... LKAVGS YRSVLELLNI PHAKLIMDCV ILLSVTSCLN S...ALYTAS Aropcorgl .... TAAESP FTQILAMANI PGTVGFMEAI IVLALLSAFN A...QIYATS i I:=-; :ili:= i,~== =:ii = = ~~! iilil Roccbacsu ........ SP FVAVLDQIGI PFSADIMNFV ILTAILSVAN S...GLYAAS ..... Rocebacsu ........ SP FVAVFEQIGI PYAADIMNFV ILIALLSVAN S...GLYAST Hutmbacsu ........ SP FVAVFAEIGI PYAADIMNFV ILTALLSVAN S...GLYAST Lyspescco NDVKDISVSP FTLVFQHAGL LSAAAVMNAV ILTAVLSAGN S...GMYAST Paplsacce SGGSATHASP YVLAASIHKV R V I P H I I N A V ILISVISVAN S...ALYAAP Vallsacce SDSSGSHASP FVIAVASHGV KVVPHFINAV I L I S V I S V A N S . . . S L Y S G P Gaplsacce ASSVDAAASP FVIAIKTHGI K G L P S V V N V V I L I A V L S V G N S...AIYACS Hiplsacce SSSVDAASSP LVIAIENGGI KGLPSLMNAI ILIAVVSVAN S...AVYACS Tat2sacce GSSVD..NSP FVIAIKLHHI KALPSIVNAV ILISVLSVGN S..~ I n a l t r i h a A A Y A D S K A S P FVLVGKYAGL KGFDHFMNLV ILASVLSIGV S...GVYGGS Isp5schpo SATA...ASP FILALMDAKI R G L P H V L N A V ILISVLTAAN G...ITYTGS Put4sacce QGKPGAGSSP FVIGIQNAGI KVLPHIINGC ILTSAWSAAN A F M . . . F A S T Putxemeni SGGSGAGASP F V V A I K T L . . . V L E G S T M S S MLPSGSLPGH PVTHGCYAGS Alplsacce SDGIFVSSSP FMISIENSGT KVLPDIFNAV V L I T I L S A G N S...NVYIGS Canlsacce QSTSYVSTSP FIIAIENSGT KVLPHIFNAV ILTTIISAAN S...NIYVGS Lyplsacce ASSAVIASSP FVISIQNAGT YALPDIFNAV VLITVVSAAN S...NVYVGS Canlcanal QDGGFTRNSP FLIAMENSGT KVLPHIFNAV IVTTIISAGN S...NIYSGS Dip5sacce GKSMSAAASP FVVAIQNSGI EVLPHIFNAC V L V F V F S A C N S...DLYVSS Consensus ........ SP F . . . . . . . . . . . . . . . . N . . . L .... S..N S ..... Y... i{~!~Ti~
iii.?ii .
+..=...=..:..
Aropescco Phepescco Proysalty Cycaescco ~li-~ii!il}~i: Anspsalty Gabpbacsu Gabpescco Aropcorgl Roccbacsu Rocebacsu Hutmbacsu Lyspescco Paplsacce : ==~....... Vallsacce Gaplsacce Hiplsacce Tat2sacce Inaltriha }ii,~.:=:=:~=-:i:!',i:)= Isp5schpo Put4sacce ........
=.
~:=..=.:. = :. .....
ii:i!f-i: iii .......
2
,=
451 500 RMLFGLAQQG NAPKALASVD KRGVPVNTIL V S A L V T A L C V LINYLAPE.. RMLFGLSVQG NAPKFLTRVS RRGVPINSLM L S G A I T S L V V L I N Y L L P Q . . .................................................. RMLFGLAQEG VAPKAFAKLS KRAVPAKGLT FSCICLLGGV VMLYVNPSVI RILRSMSMGG SAPKFMAKMS RQHVPYAGIL A T L V V Y V V G V FLNYLVPS.. RMLYSLAERN EAPRRFMKLS KKGVPVQAIV A G T F F S Y I A V VMNYFSPD.. RMLYSLSRRG DAPAVMGKIN RSKTPYVAVL L S T G A A F L T V V V N Y Y A P A . . RLVFSMANRQ DAPRVFSKLS TSHVPTNAVL LSMFFAFVSV GLQYWNPA.. RMMWSLSSNQ MGPSFLTRLT KKGVPMNALL ITLGISGCSL LTSVMAAE.. RILYAMANEG QAFKALGKTN QRGVPMYSLI V T M A V A C L S L LTKFAQAE.. RMMWSLANEN MISSRFKKVT SKGIPLNALM ISMAVSCLSL VSSIVAPG.. RMLYTLACDG KAPRIFAKLS RGGVPRNALY ATTVIAGLCF LTSMFGNQ.~ RLMCSLAQQG YAPKFLNYID R E G R P L R A L V VCSLVGVVGF VA..CSPQEE RLLLSLAEQG VLPKCLAYVD RNGRPLLCFF VSLVFGCIGF VA. oTSDAEE RTMVALAEQR FLPEIFSYVD R K G R P L V G I A VTSAFGLIAF V A . . A S K K E G RCMVAMAHIG NLPKFLNRVD KRGRPMNAIL LTLFFGLLSF V A . . A S D K Q A RTLCSMAHQG LIPWWFGYID RAGRPLVGIM ANSLFGLLAF LV..KSGSMS RTLTALAQQG YAPKLFTYID KSGRPLPSVI FLILFGFIAY VS..LDATGP RTLHSMAEQG HAPKWFKYVD R E G R P L L A M A FVLCFGALGY IC..ESAQSD RSLLTMAQTG QAPKCLGRIN KWGVPYVAVG VSFLCSCLAY L N . . V S S S T A
~_9~
i:: <. " :[ .: i: :--
I;L~]E!?
!E: ;!}; :-::
i~L:i~ii::)
!i!;2~;!'):~.iii:?
i~iiiiii;ili!;~:~
i{?i~.():i.~Q).ii
Putxemeni Alplsacce Canlsacce Lyplsacce Canlcanal Dip5sacce Consensus
501 550 A r o p e s c c o S A F G L L M A L V V S A L V I N W A M ISLAHM . . . . . . . . . . . . . . . . K F R R A . K Q P h e p e s c c o K A F G L L M A L V V A T L L L N W I M ICLAHL . . . . . . . . . . . . . . . . R F R A A . M R Proysalty .................................................. C y c a e s c c o G A F T M I T T V S A I L F M F V W T I ILCSYL . . . . . . . . . . . . . . . . V Y R K Q . R P A n s p s a l t y R V F E I V L N F A S L G I I A S W A F IMVCQM . . . . . . . . . . . . . . . . R L R Q A I K E G a b p b a c s u T V F L F L V N S S G A I A L L V Y L V IAVSQL . . . . . . . . . . . . . . . . K M R K K L E K G a b p e s c c o K V F K F L I D S S G A I A L L V Y L V IAVSQL . . . . . . . . . . . . . . . . R M R K I L . R A r o p c o r g l G L L D F L L N A V G G C L I V V W A M ITLSQL . . . . . . . . . . . . . . . . K L R K E L . Q R o c c b a c s u T V Y L W C I S I S G M V T V V A W M S ICASQF . . . . . . . . . . . . . . . . F F R R R F L A R o c e b a c s u T V Y M V L L S L A G M S A Q V G W I T ISLSQI . . . . . . . . . . . . . . . . M F R R K Y I R H u t m b a c s u T V Y V V M V A I A G F A G V V V W M S IALSQL . . . . . . . . . . . . . . . . L F R K R F L K L y s p e s c c o T V Y L W L L N T S G M T G F I A W L G IAISHY . . . . . . . . . . . . . . . . R F R R G Y V L P a p l s a c c e Q A F T W L A A I A G L S E L F T W S G IMLSHI RFRKAMKV V a l l s a c c e Q V F T W L L A I S S L S Q L F I W M S MSLSHI . . . . . . . . . . . . . . . . R F R D A M A K G a p l s a c c e E V F N W L L A L S G L S S L F T W G G ICICHI . . . . . . . . . . . . . . . . R F R K A L A A H i p l s a c c e E V F T W L S A L S G L S T I F C W M A INLSHI RFRQAMKV T a t 2 s a c c e E V F N W L M A I A G L A T C I V W L S INLSHI . . . . . . . . . . . . . . . . R F R L A M K A InaltrihaVVFDWLLAIS GLAALFTWGS VCLAHI ................ RFRKAWKY I s p 5 s c h p o T V F D W L L S I S N L A T L F V W L S INVSYI . . . . . . . . . . . . . . . . I Y R L A F K K Put4sacce DVFNWFSNIS TISGFLGWMC GCIAYL ................ RFRKAIFY Putxemeni TVFYWFTNIT TVGGFINWVL IGIAYLVCFP PSLHLNTPDQ KQRFRKALQF A l p l s a c c e K A F N W L L N I T G V A G F F A W L L ISFSHI . . . . . . . . . . . . . . . . R F M Q A I R K C a n l s a c c e K V F E W L L N I T G V A G F F A W L F ISISHI . . . . . . . . . . . . . . . . R F M Q A L K Y L y p l s a c c e T A F N W L I N I S T L A G L C A W L F ISLAHI . . . . . . . . . . . . . . . . R F M Q A L K H C a n l c a n a l K A F T W L L N I T A T A G L I S W G F ISVSHI . . . . . . . . . . . . . . . . R F M K T L Q R D i p 5 s a c c e K I F N Y F V N V V S M F G I L S W I T ILIVYI . . . . . . . . . . . . . . . . Y F D K A C R A C o n s e n s u s ..F . . . . . . . . . . . . . . W.. I . . . . . . . . . . . . . . . . . . . . . . . R .....
ii:ii!iiii!i!ii:::
Aropescco Phepescco Proysalty Cycaescco Anspsalty Gabpbacsu Gabpescco Aropcorgl Roccbacsu i~ii!i:!ii:-i.ii:~i.~:: Rocebacsu Hutmbacsu Lyspescco Paplsacce Vallsacce Gaplsacce Hiplsacce .................. :..<
ii!:!iiiii.! :
~_9s
EKLYSLAGEG QAPKIFTRTN RTGVPYVAVL ATWTIGLLSF LN..LSSSGQ RVLYSLSKNS LAPRFLSNVT RGGVPYFSVL STSVFGFLAF LE..VSAGSG RILFGLSKNK LAPKFLSRTT KGGVPYIAVF VTAAFGALAY ME..TSTGGD RVLYSLARTG NAPKQFGYVT RQGVPYLGVVCTAALGLLAF LV..VNNNAN RILYGLAQAG VAPKFFLRTN KGGVPFFAVA FTAAFGALGY LA.~ RNLYALAIDG KAPKIFAKTS RWGVPYNALI LSVLFCGLAY MN..VSSGSA R . . . . . . . . . . . P . . . . . . . . . G.P . . . . . . . . . . . . . . . . . . . . . . . . .
551 600 EQGVVTRF.L ..LLYPLGNW ICLLFMAAVL VIMLMT.RGM GIWVYLIPVW R Q G R E T Q F K A ..LLYPFGNY L C I A F L G M I L L L M C T M . D D M R L S A I L L P V W .................................................. H L H E K S I Y K M ..PLGKLMCW V C M A F F V F V V V L L T L E . D D T R Q A L L V T P L W G K A A D V S F K L ..PGAPFTSW L T L L F L L S V L V L M A F D Y P N G T Y T I A S L P L I TNPEALKIKM W..LFPFLTY LTIIAICGIL VSMAFIDSMR DELLLTGVIT AEGSEIRLRM W LYPWLTW LVIGFITFVLVVMLFRPAQQ LEVISTGLLA ANDEISTVRM W..AHPWLGI LTLVLLAGLVALMLGDAASR SQVYSVAIVY E G G N V N D L E F R T P L Y P L V P I L G F C L Y G C V L ISLIFIP . . . . . . . . DQRIG E G G K I E D L K F K T P L Y P V L P L I G L T L N T V V L ISLAFDP . . . . . . . . E Q R I A K G G D V K D L T F R T P L Y P L M P I A A L L L C S A S C IGLAFDP . . . . . . . . N Q R I A QGHDINDLPY RSGFFPLGPI FAFILCLIIT LGQNYEAFLK DTIDWGGVAA QGRSLDEVGY KANTGIWGSY YGVFFNMLVF MAQFWVALSP IGN..GGKCD QGRSMNEVGY KAQTGYWGSW LAVLIAIFFL VCQFWVAIAPVNE..HGKLN Q G R G L D E L S F K S P T G V W G S Y W G L F M V I I M F I A Q F Y V A V F P VGDS.PS... Q E R S L D E L P F I S Q T G V K G S W Y G F I V L F L V L I A S F W T S L F P LGGSGAS...
Tat2sacce QGKSLDELEF VSAVGIWGSA YSALINCLIL IAQFYCSLWP IGGWTSGKER Inaltriha HGHTLDEIPF KAAGGVYGSY LGLFICVIVL MAQFYTAIAAPPGS.PGVGT Isp5schpoQGKSYDEVGYHSPFGIYGAC Y G A F I I I L V F I T E F Y V S I F P IGAS PD... Put4sacce NGLY.DRLPF KTWGQPYTVWFSLIVIGIIT I T N G Y A I F I P ...... K Y W R P u t x e m e n i H G M L . D M L P F K T P L Q P Y G T Y Y V M F I I S I L T L T N G Y A V F F P ...... G R F T Alplsacce RGISRDDLPY KAQMMPFLAY YASFFIALIV LIQGFTAF.A PTFQ C a n l s a c c e R G I S R D E L P F K A K L M P G L A Y Y A A T F M T I I I I I Q G F T A F . A ...... P K F N L y p l s a c c e R G I S R D D L P F K A K L M P Y G A Y Y A A F F V T V I I F I Q G F Q A F . C ...... P . F K Canlcanal RGISRDTLPF KAFFMPFSAY YGMVVCFIVVLIQGFTVF ........ WDFN D i p 5 s a c c e Q G I D K S K F A Y V A P G Q R Y G A Y F A L F F C I L I A L I K N F T V F L G ...... H K F D Consensus .................................................. Aropescco Phepescco Proysalty Cycaescco Anspsalty Gabpbacsu Gabpescco Aropcorgl Roccbacsu Rocebacsu Hutmbacsu Lyspescco Paplsacce Vallsacce Gaplsacce Hiplsacce Tat2sacce Inaltriha Isp5schpo Put4sacce Putxemeni Alplsacce Canlsacce Lyplsacce Canlcanal Dip5sacce Consensus
601 650 L I V L G I G Y . L F K E K T A K A V K AH . . . . . . . . . . . . . . . . . . . . . . . . . . . . IVFLFMAFKT LRRK .................................... .................................................. FIALGLGWLF IGKKRAAELR K ............................. A I L L V A G W F G V R R R V A E I H R T A P V T A D S T E S V V L K E E A A T .......... GIVLISYLVF RKRKVSEKAAANPVTQQQPD ILP . . . . . . . . . . . . . . . . . IGIICTVPIM ARWKKLVLWQ KTPVHNTR ...................... GFLVLLSFVTVNSPLRGGRT PSDLN ......................... LYCGVPIIIF CYAYYHLSIK KRINHETIEK KQTEAQ .............. L Y C G V P F M I I C Y I I Y H V V I K KRQQ .... AN R Q L E L . . . . . . . . . . . . . . . L F C G V P C I I L C Y L I Y H F . . . K R N V T K A K K I S Q E E Y P A D H I L ......... TYIGIPLFLI IWFGYKLIKG THFVRYSEMK FPQNDKK ............. AQAFFESYLA APLWIFMYVG YMVYK .......... RDFTF LNPLDKIDLD VKVFFQNYLA MPIVLFAYFG HKIYF .......... KSWSF WIPAEKIDLD AEGFFEAYLS FPLVMVMYIG HKIY .......... KRNWKL FIPAEKMDID AESFFEGYLS FPILIVCYVG HKLY .......... TRNWTL MVKLEDMDLD AKIFFQNYLC ALIMLFIFIV HKIYYKC ...... QTGKWWG VKALKDIDLE A E D F F K Q Y L A A P V V L G F W I V G W L W . . . . . . . . . . . . KRQP F L R T K N I D V D A G A F F Q S Y L C F P V V V I V F I A HALI . . . . . . . . . . . . T R Q K F R K L S E I D L D VADFIAAYIT LPIFLVLWFG HKLYT ......... RTWRQW WLPVSEIDVT ASDFLVSYIV FAIFLALYAG HKIWY ......... RT..PW LTKVSEVDIF P I D F V A A Y I S V F L F L A I W L S FQVWF . . . . . . . . . K C R L L W K . . L Q D I D I D G V S F A A A Y I S I F L F L A V W I L FQCIF . . . . . . . . . R C R F I W K . . I G D V D I D VSEFFTSYIS LILLAVMFIG CQIYY ......... KCRFIW K..LEDIDID ASDFFTAYIS VILFVVLWVG FHFFFYGFGK DSFKMSNILV P..LDECDID YKTFITGYIG LPVYIISWAG YKLIYKTKVI KSTDVDLYTF KEIYDREEEE ..................................................
Pap isacce Vallsacce Gaplsacce Hiplsacce Tat2sacce Inaltriha Isp5schpo Put4sacce Putxemeni Alplsacce Canlsacce Lyplsacce
651 700 FHRRGLRP .......................................... SHRNIFVSPS LTEIDKVDDN DDLKEYENSE SSENPNSSRS RKFFKRMTNF TGRREVD ................ LDLLKQE IAEEKAIMAT KPRWYRIWNF TGRKQVD ................ LTLRREE MRIERETLAK RSFVTRFLHF TDRKDID ................ IEIVKQE IAEKKMYLDS RPWYVRQFHF T G L R E F D W D E I N A E R T R I A P L P A W R R I I H H TF . . . . . . . . . . . . . . . . . . TGFSKYDRLE ESDKGPMTAK SLAKSVLSFC V ................... T G L V E I E E K S R E I E E M R L P P T G F K D K F L D A LL . . . . . . . . . . . . . . . . . . T G K D E I D R L C E . . N D M E R Q P R N W L E R V W W W IF . . . . . . . . . . . . . . . . . . SDHRQIEELV WIEPECKTR...W.QRVWDV LS . . . . . . . . . . . . . . . . . . SDRRDIEAIV WEDHEPKTF...W.DKFWNV VA .................. SDRREIEAIIWEDDEPKNL...W.EKFWAAVA ..................
~_9~
H+/amino acid symporter family Canlcanal SGVRDINDAE F D I P P P K N A . . . W . D K F W A N VA . . . . . . . . . . . . . Dip5sacce GRMKDQEKEE RLKSNGKNME WFY.EKFLGN IF . . . . . . . . . . . . . Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . ,-.i "-:: . .
Vallsacce Gaplsacce Hiplsacce Tat2sacce Consensus
.
..
. . . . . . . . . . . .
701
WC WC WC WC ..
.
Residues listed in the consensus sequence are present in at least 75 % of the aligned transporter sequences.
...
..
: -
.:/,-
-::4
Database accession numbers SwISSPR OT Alplsacce P38971 Anspsalty P40812 Aropescco P15993 Aropcorgl Canlsacce P04817 Canl canal P43059 Cycaescco P39312 Dip5sacce Gabpbacsu P46349 Gabpescco P25527 Gaplsacce P19145 Hip lsacce P06775 Hutmbacsu P42087 Ina 1triha P34054 Isp5schpo P40901 Lyp 1sacce P32487 Lyspescco P25737 Pap 1sacce P41815 Phepescco P24207 Proysalty P3 7460 Put4sacce P15380 Putxemeni P18696 Roccbacsu P39636 Rocebacsu P3913 7 Tat2sacce P38967 Vallsacce P38085
PIR $44329 $45191; JS0447 $52754 A23922
$38111 A24519 $33212 $35896; $45492 $34931 $24560 A39431 $35983 JQ0127 S04547 $39733 $48084; $47926 $45932; $48083
EMBL/GENBANK X74069 U04851 X17333; D26562 X85965 X03784; M11724 X76689 U 14003 X95802 U31756 M88334; X65104 X52633; Z28264 M11980; X82408 D31856; X82174 Z22594 D 14062 X67315 M89774 ; U00007 X75076 M58000 X74420 M30583 X73124 X81802 X79150; L33461 U10503; X79151
References Malandro, M. and Kilberg, M. (1996) Annu. Rev. Biochem. 65, 3 0 5 - 3 3 6 . e Pi, J. and Pittard, J. (1996) J. Bacteriol. 178, 2 6 5 0 - 2 6 5 5 .
H+/lactose-sucrose-nucleoside symporter family Summary .........
>d
-:.!
i::.%..:<:-::': :..
::::il} ::2..:i " .. .. . .
.
!!:i{'%::i:i: -i!!.-7"-
.......::.
.
Transporters of the H+/lactose-sucrose-nucleoside symporter family, the example of which is the LACY lactose-H+ symporter of Escherichia cold, mediate symport (H*-coupled substrate uptake)of structurally dissimilar sugars and nucleosides 1-4. Known members of the family are found only in gram-negative bacteria. Statistical analysis of multiple amino acid sequence comparisons suggests that the H§ symporter family is distantly related to the uniporter-symporter-antiporter (USA) superfamily, also known as the major facilitator superfamily (MFS) TM. They are predicted to contain 12 membrane-spanning helices by the hydropathy of their amino acid sequences 6 and the activities of reporter gene fusions r. Based on complementation studies, LACY is predicted to exist as a homodimer 8. Several amino acid sequence motifs are highly conserved in the H*/lactosesucrose-nucleoside symporter family, including motifs unique to the family, elements of the signature motifs of the USA/MSF superfamily, and motifs necessary for activity by the criterion of site directed mutagenesis 1-4
Nomenclature, biological sources and substrates com~
Cscbescco Lacycitfr if< t ?~{ :< ( <
117; 2:1 :-
Lacyescco Lacyklepn
":. ....
Nupgescco !Ii": -<-<
Rafbescco Xapbescco
..,,.... :.
O~SCmPTION [SYNONYMS] Sucrose-H § symporter [CSCB] Lactose-H* symporter [LACY] Lactose-H* symporter [LACY] Lactose-H+ symporter [LACY] Nucleoside-H§ symporter [NUGP] Raffinose-H* symporter [RAFB] Xanthosinepermease [XAPBI
ORGANISM [COMMON NAMES] Escherichia coli [gram-negative bacterium] Citrobacter %eundii [gram-negative bacterium] Escherichia coli [gram-negative bacterium] Klebsiella pneumoniae [gram-negative bacterium] Escherichia coli [gram-negauve bacterium] Escherichia coli [gram-negauve bacterium] Escherichia coli [gram-negauve bacterium]
SUBSTRATE(S)
W/sucrose H+/lactose H*/lactose H+/lactose H+/nucleosides H+/raffinose H*/xanthosine
Cotransported ions are listed.
Phylogenetic tree Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and are therefore not included in the phylogenetic tree: Lacycitfr (Lacyescco). i:::::,::. : <::i: i
:L<:- :. :
Nupgescco Xapbescco Lacyescco Lacyk!epn Rafbescco Cscbescco
~01
Proposed orientation of LACY in the membrane ................
: . r : : ~ : - ; -~:~;.::.: :- .:-. . . . .
~,~:G:ii !~:?4; . .. .. .. .. .. .. .. .. . . . . . . . . . . . .
................. .. . ... . .. . ... . .. . ... . .. . . . .
5~'i~!gt::S:.~ . . . . . . . . .........
The model is based on predictions of membrane-spanning regions and ~-helical content. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane ~. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75% of the aligned transporters (see below) are shown. Consensus residues indicated by an asterisk are not conserved in LACY.
~',{;~{'~@{{iiif OUTSIDE
!i!!,i!!'.i!i~,!i;?.i~
F F
!
.....~::.:~s-.::..;>~:... :. ~ : .........
.:,~#:-~-~:;:<~:
F
|
i;{11:~::G:~::::~iiii!i
!IIF~S'.;'r
ii;
.... , , ,
.......................... . . :::::::::::::::::::::.:,:: ..
~
:.
G
:
.
.
.
.
.
.
.
.
.
...... i ';/!i 't:: ~!)!: !:,~
.
.............
1
:~:::~,.:~:;~:::?~: :...,~,.: .: .: .?.:.i. :. :. . ?. .i . y ?
i
7
.................... ............ ,.... .......
G
{i.~!iig~ii::iii:.:i~{~{: .......... : : : : : : : : : : : : : : : : : : : : : : : : : :f:-~
~i
m
~
......................
:!: :~i:~:~:.f:!~!!~'{: i:>:
FIG
........... .............
!@!i!!~;~:;:!!!
NH
...........
2
.......... .................. .. . .. . ... ... . .. . .. . ... ... ... . .. . . ........................ :: : ' i
?'~?:f
.........
INSIDE
............
COOH
Physical and genetic characteristics ............. .................
.................................. ............
............. ................................ ................................ ..............
:::::::::::::::::::::::::::::::::::
-:~; ~:r:~:?;:~:~; : - : . : . : - : . ...............................
Cscbescco Lacycitfr Lacyescco Lacyklepn Nupgescco Rafbescco Xapbescco
AMINO ACIDS
MOL. WT
415 416 417 416 418 425 418
46 923 46 53 7 46 503 46 220 46 389 46 693 45 736
gm
CHROMOSOMAL LOCUS
Lactose: 1.7 rnM 2
7.77minutes
Raffinose: 1 rnM 9
Multiple amino acid sequence a~gnments Nupgescco Xapbescco Lacyescco Lacyklepn
~02
1
.......... .......... ..... M Y Y L K MKLSELAPRE
MNLKLQLKIL MSIAMRLKVM NTNFWMFGLF RHNFIYFMLF
SFLQFCLWGS WLTTLGSYMF SFLQYFIWGSWLVTLGSYMI FFFYFFIMGA YFPFFPIWLH FFFYYFIMSA YFPFFPVWLA
50
VTLKFDGASI NTLHFTGANV DINHISKSDT EVNHLTKTET
=i~~ :: =i
ii ~:..... '4 "
:
{.
/.-
= . = . . . . = . =!{.
= . .
if: " =.-
;!..
.
?~
-
T
.
=-!i:~
= =
::. .~..
i01 150 AQVTTPEAMF LVILINSFAY MPTLGLINTI SYYRLQNAGM DIVTDFPPIR ASVTDPDMMFWVMLVNAMAF MPTIALSNSV SYSCLAQAGL DPVTAFPPIR FGPLLQYNIL VGSIVGGIYL GFCFNAGAPA VEAFIEKVSR RSNFEFGRAR FSPLLQMNIM AGALVGGVYL GIVFSSRSGA VEAYIERVSR ANRFEYGKVR FAPLLHLNIW AGALTGGVFI GFVFSAGAGA IEAYIERVSR SSGFEYGKAR YEPLLQSNFS VGLILGALFF GLGYLAGCGL LDSFTEKMAR NFHFEYGTAR ................................................. R
Nupgescco Xapbescco Lacyescco Lacyklepn Rafbescco Cscbescco Consensus
151 IWGTIGFIMA MWVVSLSGFE LSHMQLYIGA VFGTVGFIVA MWAVSLLHLE LSSLQLYIAS MFGCVGWALC ASIVGIMFTI NNQFVFWLGS VSGCVGWALC ASITGILFSI DPNITFWIAS MFGCLGWALC ATMAGILFNV DPSLVFWMGS AWGSFGYAIG AFFAGIFFSI SPHINFWLVS ..G..G . . . . . . . . . . . . . . . . . . . . . . . S
Nupgescco Xapbescco Lacyescco Lacyklepn Rafbescco Cscbescco Consensus
201 250 KQQANQSWTT LLGLDAFALF KNKRMAIFFI FSMLLGAELQ ITNMFGNTFL EKKATTSLAS KLGLDAFVLF KNPRMAIFFL FAMMLGAVLQ ITNVFGNPFL A T V A N A V G A N H S A F S L K L A L E L F R Q P K L W F L S L Y V I G V S C TYDVFDQQF. A E V I D A L G A N R Q A F S M R T A A E L F R M P R F W G F I I Y V V G V A S VYDVFDQQF. A M V M N A L G A N S S L I S T R M V F S L F R M R Q M W M F V L Y T I G V A C VYDVFDQQF. C I A A D A G G V K KEDF ..... I A V F K D R N F W V F V I F I V G T W S FYNIFDQQF. . . . . . . . . . . . . . . . . . . . . . . . R ...... F . . . . . . . . . . . . . F...F.
Nupgescco Xapbescco Lacyescco Lacyklepn Rafbescco Cscbescco Consensus
251 300 HSFDKDPMFA SSFIVQHASI IMSISQISET LFILTIPFFL SRYGIKNVMM HDFARNPEFA DSFVVKYPSI LLSVSQMAEV GFILTIPFFL KRFGIKTVML ANFFTSFFAT GEQGTRVFGY VTTMGELLNA SIMFFAPLII NRIGGKNALL ANFFKGFFSS PQRGTEVFGF VTTGGELLNA LIMFCAPAII NRIGAKNALL AIFFRSFFDT PQAGIKAFGF ATTAGEICNA IIMFCTPWII NRIGAKNTLL PVFYAGLFES HDVGTRLYGY LNSFQVVLEA LCMAIIPFFV NRVGPKNALL ..F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P .... R . G . K N . . L
=.= :(.
.
Nupgescco Xapbescco Lacyescco Lacyklepn
301 ISIVAWILRF MSMVAWTCAL LAGTIMSVRI IAGLIMSVRI
=
:-2-.:-..
i
!} -.
,
-:.= ...
.:
,
!. !: ....
., .
:
i .
, .
200 ALSAIL.VLF TLTLPHIPVA GASLLL.SAY ALTLPKIPVA GCALILAVLL FFAKTDAPSS GFALILGVLLWVSKPESSNS GGALLLLLLL YLARPSTSQT LFGAVF.MMI NMRFKDKDHQ ..... L . . . . . . . . . . . . . .
]:
:
.
Nupgescco Xapbescco Lacyescco Lacyklepn Rafbescco Cscbescco Consensus
=
-
:}: :.
51 i00 GAVYSSLGIA AVFMPALLGI VADKWLSAKW .VYAICHTI G . . A I T L F M A GMVYSSKGIA AIIMPGIMGI IAVQ.MRARR TCIHAVSPGV C GV.LFYA GIIFAAISLF SLLFQPLFGL LSDK.LGLRK YLLWIITGML VMFAPFFIFI GIVFSCISLF AIIFQPVFGL ISDK.LGLRK HLLWTITILL ILFAPFFIFV GIVFSCLSLF AISFQPLLGV ISDR.LGLKK NLIWSISLLL VFFAPFFLYV Cscbescco G T L Y S V N Q F T S I L F M M F Y G I V Q D K . L G L K K P L I W C M S F I L V L T G P F M I Y V ConsensusG ................. G...D..L ........................
Nupgescco Xapbescco Lacyescco Lacyklepn Rafbescr
:
z:il
,9
R a f b e s c c o . .MNSASTHK N T D F W I F G L F F F L Y F F I M A T C F P F L P V W L S D V V G L S K T D T C s c b e s c c o . .MALNIPFR N A Y Y R F A S S Y S F L F F I S W S L W W S L Y A I W L K G H L G L T G T E L Consensus ..................... F ............................
,
ALFAYGDPTP ASSPYGDPST IGSSFATSAL LGSSFATSAV
FGTVLLVLSM TGFILLLLSM EVVILKTLHM EVIILKMLHM
IVYGCAFDFF IVYGCAFDFF F .... EVPFL F .... EIPFL
350 NISGSVFVEK NISGSVFVEQ LVGCFKYITS LVGTFKYISS
~ ; ;~{ 77 :i
.::: -:,:
.:.,..?
,:.::r ....... :.
...
:i::~::~:.:;::.~--i~:--:. !:-:
...., .:..:::,.::. ..... ~~.;:;:.~-'.. :: ~:
:::::::::::::::::::::::::::: :~:
R a f b e s c c o V A G G I M T I R I T G S A F A T T M T E V V I L K M L H A L .... E V P F L L V G A F K Y I T G Cscbescco IGVVIMALRI LSCALFVNPW IISLVKLLHA I EVPLC VISVFKYSVA Consensus ........ R ............... L..L .......... F ........... Nupgescco Xapbescco Lacyescco Lacyklepn Rafbescco Cscbescco Consensus
351 400 EVSPAIRASA QGMFLMMTNG FGCILGGIVS GKVVEMYTQN GITDWQTVWL EVDSSIRASA QGLFMTMVNG VGAWVGSILS GMAVDYFSVD GVKDWQTIWL QFEVRFSATI YLVCFCFFKQ LAMIFMSVLA GNMYE ...... SIGFQGAYL AFKGKLSATL FLIGFNLSKQ LSSVVLSAWVGRMYD ...... TVGFHQAYL VFDTRLSATV YLIGFQFSKQ LAAILLSTFA GHLYD ...... RMGFQNTYF NFDKRLSSTI FLIGFQIASS LGIVLLSTPT GILFD ...... HAGYQTVFF ....... A . . . . . . . . . . . . . . . . . . S... G . . . . . . . . . . . . . . Q ....
Nupgescco Xapbescco Lacyescco Lacyklepn Rafbescco Cscbescco Consensus
401 439 I F A G Y S V V L A F A F M A M F K Y K H V R V P T G T Q T V S H ...... V F A G Y A L F L A V I F F F G F K Y N H D P E K I K H R A V T H ...... V L G L V A L G F T L I S V F T L S G P G P L S L L R R Q V N E V A ..... ILGCITLSFT VISLFTLKGS KTLLPATA ........... VLGMIVLTVT VISAFTLSSS PGIVHPSVEK APVAHSEIN A I S G I V C L M L L F G I F F L S K K R E Q I V M E T P V P S A I ..... .............. F ........................
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and are therefore not included in the alignments: Lacycitfr (Lacyescco). Residues listed in the consensus sequence are present in at least 75% of the aligned transporter sequences. Residues indicated by boldface type are also conserved in at least one other family of the USA/MFS superfamily.
Database accession numbers SWISSPROT Cscbescco P30000 Lacycitfr P47234 Lacyescco P02920 Lacyklepn P18817 iiii~!~i:!i?i?:i? Nupgescco P09452 Rafbescco P 16552 Xapbescco P45562
PIR S 19880
......
A03418; $21314 JT0487; C24925 A26226 B43717
EMBL/GENBANK X63740; X81461 U13675 J01636;V00295 M11441; X14154 X06174 M27273 X73828
References 1 Kaback, H.R. et al. (1996) J. Bioenerg. Biomembr. 28, 29-34. i!!!ii~i~ii!!ii~:iiiii 2 Varela, M. and Wilson, T.H. (1996) Biochim. Biophys. Acta 1276, 21-34. 3 Kaback, H.R. (1992) Biochim. Biophys. Acta 1101, 210-213. 4 Kaback, H.R. (1992) Int. Rev. Cytol. 137, 97-125. s Marger, M.D. and Saier, M. (1993) Trends Biochem. Sci. 18, 13-20. 6 Henderson, P.J.F. (1993) Curr. Opin. Cell Biol. 5, 708-721. 7 Calamia, J. and Manoil, C. (1990) Proc. Natl Acad. Sci. USA 87, 4937-4941. 8 Bibi, E. and Kaback, H.R. (1990) Proc. Natl Acad. Sci. USA 87, 4325-4329. iiii!ii.ii!i?r 9 Henderson, P.J.E (1992) Int. Rev. Cytol. 137, 149-208.
304
H+/galactoside-pentose-hexuronide symporter family
Summary
i ii!'~.; "
7i?i! ,,:::i::i 1
:.~:~i)i!!~i:ii
~...::.-,::~
.........
Transporters of the H+/galactoside-pentose-hexuronide symporter family, the example of which is the MELB melibiose-H § symporter of Escherichia coli (Melbescco), mediate symport (cation-coupled substrate uptake)of structurally dissimilar sugars and glucuronides. The favored cation for cotransport can be either H +, Na § or Li § The transport activity of the H+/galactoside-pentose hexuronide symporter family is regulated at the level of enzyme activity by the enol pyruvate:sugar transferase system, either by direct interaction of Cterminal elements of the transporter with unphosphorylated phosphoryl transfer protein, HA (e.g. MELB) or by the phosphorylation of an extra IlAlike domain in the C-terminus of the transporter by PEP, heat stable protein HPR and enzyme 1 (e.g. LACS)1. Members of the family are found in both gram-positive and gram-negative bacteria. Statistical analysis reveals no apparent relationship between the amino acid sequences of the H§ symporter family and any other family of transporters. However, the LACS and RAFP proteins contain a C-terminal hydrophilic extension of approximately 160 residues that is homologous to a domain in the PTGA protein of the PTS family 1. Members of the H+/galactoside-pentose-hexuronide symporter family are predicted to contain 12 membrane-spanning helices by the hydropathy of their amino acid sequences 1 and by the activities of reporter gene fusions 2. Several amino acid sequence motifs are highly conserved in the H§ toside-pentose-hexuronide symporter family, including motifs necessary for function by the criterion of site-directed mutagenesis 1,3. Cation specificity can be altered by single amino acid substitutions 1. Nomenclature, CODE Lacsstrtr Lacslacde Melbescco
..
? ??~. -
Melbsalty Melbklepn "
. .."i :
v
i
9Rafppedpe Uidbescco >.
.
.
.
biological sources and substrates
DESCRIPTION [SYNONYMS] Lactose-H + symporter lactose permease [LACY, LACS] Lactose-H + symporter lactose permease [LACY, LACS] Melibiose-Li+/Na § symporter melibiose permease [MELB, MEL4] Melibiose-Li+/Na+ symporter melibiose perrnease [ M E L B ] Melibiose-Li+/Na+ symporter melibiose permease [MELB] Raffinose-H § symporter [RAFP] Glucuronide-H + symporter [UIDB, GUSB,
ORGANISM /COMMON NAMES] Streptococcus thermophilus /gram-positive bacterium] Lactobacillus delbrueckii /gram-positive bacterium] Escherichia coli /gram-negative bacterium]
SUBSTRATE(S) H+/lactose H§ H +, Na+, Li+/ melibiose
Salmonella typhimurium /gram-negative bacterium] Klebsiella pneumoniae /gram-negative bacterium]
H§ Na+, Li+/ melibiose
Pediococcuspentosaceus /gram-positive bacterium] Escherichia c o l i /gram-negative bacterium]
H§
H § Li+/melibiose
H+/glucuronides
UIDP]
....
.. :-:::;:it ..:,::.: ._--:
Contransported ions are listed.
30~
Phylogenetic tree I
Lacslacde Lacsstrtr Rafppedpe Melbescco Melbsaity Melbkiepn Uidbescco
i
Proposed orientation of MELB in the membrane
.
:.~.:..:::,:. :..:: ::: :..:.:::-:.::
....
~:~!~i:.~+i:,;~i!i: .:: ;iiii!::.
The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane 2. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75 % of the aligned transporters (see below) are shown. OUTSIDE
............
FL
="S
m~ . ::::::::::::::::::::::::::::
i~!;:.i!~ii: .i~i::i!i:;i+::/i;:-
T
~::R
V L
Y
! . p ,1[
P
LY
!
:p
GIIS
!+7
+ II ' +
D G
W P
D
GKF ERE
L
NH
L
A
L
G
I
INSIDE
V S
L
R
:~..:.,-::.: ............. : .... . ...:..
i2o4
'"
_
2
K
L
RE
COOH
i : : "-: ?: .:-
Physical and genetic characteristics
-:::. :. :.. -+: v-:--..::
.++::..:i::2 : :..
~0~
Lacsstrtr Lacslacde Melbescco Melbsalty Melbklepn Rafppedpe Uidbescco
AMINO ACIDS
MOL. WT
634 627 469 476 471 641 457
69 453 68 288 52 217 52 758 52329 69 913 49 892
Km
CHROMOSOMAL LOCUS
Melibiose: 300 pm 4
93.51 minutes
Glucuronate: 132/aM
36.42 minutes
i,i!!i i';
!!:~ii:!ijh:!:
Multiple amino acid sequence alignments 1
i<::~!!~ i~---i }%
-
ikili<< < .:.,~ 9
FGNDVFYATL STYFIVFVTT HLFNAGDHKM FGNDVFYATL STYFIMFVTT HLFNTGDPKQ KGNDAFYSIL SGYLIIFITS HLFDTGNKAL F G K D F A I G I V Y M Y L M Y Y Y T . . . . . . . . . DV F G K D F A I G I V Y M Y L M Y Y Y T . . . . . . . . . DV F G K D F A I G I V Y M Y L M Y Y Y T . . . . . . . . . DI V A N N F A F A M G A L F L L S Y Y T . . . . . . . . . DV .G.D . . . . . . . . YL .... T . . . . . . . . . . .
Lacslacde Lacsstrtr Rafppedpe Melbescco Melbsalty Melbklepn Uidbescco Consensus
51 .... I F I I T N L I T A I R I G E V L L D P L I G N A I D R T E S R W G K F NSHYVLLITN IISILRILEV FIDPLIGNMI DNTNTKYGKF DNRMVSLVTL IIMVLRIVEL FIDPFIGNAI DRTKNSPGHF VGLSVGLVGT LFLVARIWDA INDPIMGWIV NATRSRWGKF VGLSVGLVGT LFLVARIWDA INDPIMGWIV NATRSRWGKF VGLSVGVVGT LFLVARILDA IADPIMGWIV NCTRSRWGKF AGVGAAAAGT MLLLVRVFDA FADVFAGRVVDSVNTRWGKF . . . . . . . . . . . . . . . RI . . . . . D P . . G . . . . . T .... GKF
Lacslacde Lacsstrtr Rafppedpe Melbescco Melbsalty Melbklepn Uidbescco Consensus
i01 SSLALLALFT DFGGINQSKPVVYLVIFGIV SSITLLLLFT DLGGLNKTNP FLYLVLFGII SSIILLLLFT NLGGLYAKNA MIYLVVFAIL NSVI.LFLLF SAHLFEGTTQ IVFVCV...T NSLV.LFLLF SAHLFEGTAQVVFGCV...T NSVV.LYMLF SAHHFSGGAL LAWVWL...T LMIFSVLVFW VPTDWSHGSKVVYAYL...T .S...L . . . . . . . . . . . . . . . . . . . . . . . .
Lacslacde Lacsstrtr Rafppedpe Melbescco Melbsalty Melbklepn Uidbescco Consensus
151 ALSLDSRERE ALSLDSHERE SLTTDSRERE TITLDKRERE TITLDKRERE TITLDKRERE AMTQQPQSRA .... D . . E R E
Lacslacde Lacsstrtr Rafppedpe Melbescco Melbsalty Melbklepn Uidbescco Consensus
201 250 GWFFFALIVA IVGILTSITV GLGTHEV... KSALRESNEK T.TLKQVFKV GWFWFAFIVA LIGVITSIAV GIGTREV... ESKIRDNNEK T.SLKQVFKV GWFIFALIIC LIALISAWGV GLGTREV... DSDIRKNKQD TVGVMEIFKA GFQMFTLVLI AFFIVSTIIT LRNVHEVFS. SDNQPSAGS. HLTLKAIVAL GFQMFTLVLI AFFIASTIVT LRNVHEVYS. SDNGVTAGRP HLTLKTIVGL GFQMFTLVLI AFFVVSTLVT LRNVHEVYS. SDSGVSEDSS HLSLRQMVAL VYHFWTIVLA IAGMVLYFIC FKSTRENVVR IVAQPSLNIS LQTLKRNRPL G . . . F . L . . . . . . . . . . . . . . . . . . EV . . . . . . . . . . . . . . . . L ......
! ~!ii<,:~ )::i!!~i::s
50
Lacslacde ..___MKKKL VSRLSYAAGA Lacsstrtr MEKSKGQM KSRLSYAAGA Rafppedpe MQEEHNYKWVGGRLIYGFGA M e l b e s c c o ......... M T T K L S Y G F G A M e l b s a l t y ..... M S I S L T T K L S Y G F G A M e l b k l e p n ..... M S I S M T T K L S Y G F G A Uidbescco ...MNQQLSW RTIVGYSLGD Consensus ............. L.Y..GA
i00 KPWVVGGGII KPWVVGGGII RPWVVVGGTV KPWILIGTLA KPWILIGTLT KPWILIGTIT RPFLLFGTAP .PW...G...
150 YLIMDIFYSF KDTGFWAMIP YLVMDVFYSI KDIGFWSMIP YITMDIFYSF KDVGFWSMLP YILWGMTYTI MDIPFWSLVP YILWGMTYTI MDIPFWSLVP YLLWGFTYTI MDVPFWSLVP YMGLGLCYSLVNIPYGSLAT Y ...... Y . . . D . . F W S . . P
200 KTSTFARVGS TIGANLVGVVITPIILFFSA SKANPNGDKQ KMATFARIGS TIGANIVGVA IMPIVLFFSM TNNSGSGDKS KTATFARLGS TIGGGLVGVL VMPAVIFFSA .KATSTGDNR QLVPYPRFFA SLAGFVTAGV TLPFVNYVG ..... GGDRGF QLVPFPRFFA SLAGFVTAGI TLPFVSYVG ..... GADRGF QLVPYPRFFA SLAGFVTAGV TLPFVNAVG ..... GADRGF RLGAARGIAASLTFVCLAFL IGPSIKNSS ..... PEEMVS ...... R . . . . . . . . . . . . . . . P . V . . . . . . . . . . . . . . .
251 300 Lacslacde LGQNDQLLWL AFAYWFYGLG INTLNALQLY YFSYILGDAR GYSLLYTINT Lacsstrtr LGQNDQLMWL SLGYWFYGLG INTLNALQLY YFTFILGDSG KYSILYGLNT
30~
Rafppedpe Melbescr Melbsalty Melbklepn Uidbescco Consensus
z~ . . . . . . . . . .
:.:,
.
.
.
) . . .
- u .9. / :
. . . .
... .. ....
301 350 Lacslacde FVGLISASFF PSLAKKFNRNRLFYACIAV..MLLGIGVFSVASG...SLA LacsstrtrVVGLVSVSLF PTLADKFNRKRLFYGCIAV..MLGGIGIFS IAGT...SLP Rafppedpe FLGLIATSLF PVLSKKFSRK GVFAGCLVF..MLGGIAIFT IAGS...NLW M e l b e s c c o A A N L V T L V F F P R L V K S L S R R ILWAGASIL. P V L S C G V L L L M A L M S Y H N V V M e l b s a l t y A A N L L T L I V F P R L V K M L S R R ILWAGASVM. P V L S C A G L F A M A L A D I H N A A M e l b k l e p n A A N L L T L I L F P R L V K G L S R R ILWAGASIM. P V L G C G V L L L M A L G G V Y N I A Uidbescco VQNLVGTVAS APLVPGMVAR IGKKNTFLIG ALLGTCGYLL FFWVSVWSLP C o n s e n s u s ...L ..... F P . L ..... R . . . . . . . . . . . . . L . . . . . . . . A ........ Lacslacde Lacsstrtr Rafppedpe Melbescco Melbsalty Melbklepn Uidbescco Consensus
351 LSLVGAEFFF IPQPLAFLVVLMIISDAVEY IILTAAELFF IPQPLVFLVVFMIISDSVEY LVLLAATMFG FPQQMVFLVVLMVITDSVEY LIVIAGILLN VGTALFWVLQ VIMVADIVDY LIVAAGIFLN IGTALFWVLQ VIMVADTVDY LISLAGVLLN IGTALFWVLQ VIMVADTVDY VALVALAIAS IGQGVTMTVM WALEADTVEY .... A . . . . . . . . . . . . . . . . . . . A D . V . Y
Lacslacde Lacsstrtr Rafppedpe Melbescco Melbsalty Melbklepn Uidbescco Consensus
401 DKLGGALSNW FVSLIALTAG DKLGGAMSNW LVSTFAVAAG DKFGGAISNGVVGQIAIISG VKGGSAFAAF FI...AWLG VKGGSAFAAF FI...ALVLG VKAGSAFAAW FI...AIVLG RKCGQAIGGS IP...AFILG .K.G.A . . . . . . . . . A...G
MTTGATASTI TAHGQMVFKL MTTGASASTI TTHQQFIFKL MTTGATASSI TAAGQLHFKL MIGYVPNVEQ STQALLGMQF LIGYTPNVAQ SAQTLQGMQF IIGYVPNVVQ SSHTLLGMQA LSGYIANQVQ TPEVIMGIRT .....................
Lacslacde Lacsstrtr Rafppedpe Melbescco Melbsalty Melbklepn Uidbescco Consensus
451 LIAVSIFAKK VFLTEEKHAE LIGAFIVARK ITLTEARHAK L I A I G I F S K Q IFLTEEKHAE MVTLILYFRF YRLNGDTLRR MMTLVLYFRY YRLNGDMLRK ALTLFLYFRY YKLNGDMLRR LLAFVI.IWF YPLTDKKFKE . . . . . . . . . . . . L .......
500 IVDQLE .... T Q F G Q S H A Q K P A Q . A E S F T L IVEELE .... H R F S V A T S E N E V K . A N W S L IVAELERTWR TKFDNTTDQV AEKVVTSLDL IQIHLLDKYR KVPP...EPV HADIPVGAVS IQIHLLDKYR KTPPFVEQPD SPAISVVATS I Q I H L L D K Y R R V P E N D V E P E RPIVVPNQV. I V V E I D N R K K V Q Q Q L I S D I T N ......... I...L . . . . . . . . . . . . . . . . . . . . . . . . .
Lacslacde Lacsstrtr Rafppedpe Melbescco Melbsalty Consensus
501 550 ASPVSGQLMN LDMVDDPVFA DKKLGDGFAL VPADGKVYAP FAGTVRQLAK VTPTTGYLVD LSSVNDEHFA SGSMGKGFAI KPTDGAVFAP ISGTIRQILP ATPIAGQVIP LAQVNDPTFA AGTLGDGFAI KPSDGRILAP FDATVRQVFT DVKA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DVKA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..................................................
..
...
.
...:
..
9
:
LAKNDQLLWA ALAYLFYGVG INILGSLEVY YFTYIMGKPK SFSILSIINI IYKNDQLSCL LGMALAYNVA SNIITGFAIY YFSYVIGDAD LFPYYLSYAG IYKNDQLSCL LGMALAYNIA SNIINGFAIY YFTYVIGDAD LFPYYLSYAG IYKNDQLACL LGMALAYNTA ANIIAGFAIY YFTYVIGSAE MFPYYMSYAG F M L C I G A L C V L I S T F A . . . . . . . V S A S S L F Y V R Y V L N D T G LFTVLVL... ...NDQL . . . . . . . . . Y .... N ....... Y Y F . Y . . G . . . . . . . . . . . . .
GQLKTGHRDE GQWKTGHRDE GQLKLGHRDE GEYKLHVRCE GEFKLNIRCE GEYTMNIRCE GEYLTGVRIE G ...... R.E
400 ALTLSVRPLV SLTLSVRPLI SLALSVRPLI SIAYSVQTMV SIAYSVQTMV SIAYSVQTLV GLTYSLFSFT .... SV .... 450 AMFALPAVML GMFAFPAATM TMFAFPALML IMIALPTLFF IMIVLPVLFF IMIALPTLFF SIALVPCGFM M...P ....
Y:
30~
..
Lacslacde Lacsstrtr Rafppedpe Consensus
551 600 TRHSlVLENE HGVLVLIHLG LGTAKLNGTG FVSYVEEGSQ VEAGQQILEF TRHAVGIESE DGVIVLIHVG IGTVKLNGEG FISYVEQGDR VEVGQKLLEF TRHAVGLVGD NGIVLLIHIG LGTVKLRGTG FISYVEEGQH VQQGDELLEF ..................................................
Lacslacde Lacsstrtr Rafppedpe Consensus
601 650 WDPAIKQAKL DDTVIVTVIN SETFANSQML LPIGHSVQAL DDVFKLEGKN WSPIIEKNGL DDTVLVTVTN SEKFSAFHLE QKVGEKVEAL SEVITFKKGE WDPTIKQAGL DDTVIMTVTN STEFTMMDWL VKPGQAVKAT DNILQLHTKA ..................................................
Residues listed in the consensus sequence are present in at least 75% of the aligned transporter sequences. Database accession numbers Lacsstrtr Lacslacde Melbescco Melbsalty Melbklepn Rafppedpe i~i!~i';i~i:i!i~i!~,:~!ii:Uidbescco :i . . . . . . . .
:...:.
SWISSPR OT P23936 P22733 P02921 P30878 Q02581 P43466 P30868
PIR A32241 A38538 A03421 $23576 B44166
EMBL/GENBANK M23009; M38175 M55068 K01991; U 14003 X62101 M97257 Z32771; L32093 M 14641
References 1 2 a 4
Poolman, B. et al. (1996) Mol. Microbiol. 19, 911-922. Pourcher, T. et al. (1996)Biochemistry 35, 4161-4168. Reizer, J. et al. (1994) Biochim. Biophys. Acta 1197, 133-166. Henderson, P.J.F. et al. (1992) Int. Rev. Cytol. 137, 149-208.
~0~
H+/oligopeptide symporter family Summary .
!
i
I i "
!
9
i -!
"
i
Transporters of the H*/oligopeptide symporter family, the example of which is the PET1 oligopeptide-H+ symporter of humans (Petlhomsa), mediate the uptake of oligopeptides, chlorate and nitrate. The mechanism of transport, where known, is symport (H*-coupled substrate uptake). In plants, mutations in the CHL1 nitrate, chlorate-H* symporter confer resistance to the herbicide chlorate. Members of the family have a broad biological distribution that includes bacteria, plants and humans. Statistical analysis reveals no apparent relationship between the amino acid sequences of the H+/oligopeptide symporter family and any other family of transporters. Members of the H§ symporter family are predicted to contain 11, 12 or 13 membrane-spanning helices by the hydropathy of their amino acid sequences. Eukaryotic transporters of the H+/oligopeptide symporter family are glycosylated. Several amino acid sequence motifs are highly conserved in the H§ peptide symporter family and are necessary for function by the criterion of site-directed mutagenesis.
Nomenclature, biological sources and substrates CODE
DESCRIPTION [SYNONYMS]
Chllarath
Nitrate, chlorate-H+ symporter [CHL1] Dipeptide-H§ symporter [DTPT] Peptidetransporter [PT2a, PTR2, PTR2a] High-affinitypeptide transporter [PT2a, PTR2b, NTRll Peptidetransporter [PTR2, YKR093w, YKR413] Peptidetransporter [PTR2]
Dtptlacla Pt2aarath Pt2barath Ptr2sacce Ptr2canal
OR GANISM [COMMON NAMES] Arabidopsis thaliana
S UBSTRATE(S)
H§ chlorate Lactobacillus lactis H§ residue [gram-positive bacterium] oligopeptides Arabidopsis thaliana Peptides [mouse-ear cress] Arabidopsis thah'ana Peptides, [mouse-ear cress] histidine [mouse-ear cress]
Saccharornyces cerevisiae [yeast] Candida albicans
Oligopeptides Oligopeptides
[yeast] Petlratno
Oligopeptide-H§ symporter 1 [PET1, PEPT1] Petlhomsa Oligopeptide-H§ symporter 1 [PET1, PEPTII Petlorycu Oligopeptide-H+symporter 1 [PET1, PEPTI] Pet2homsa Oligopeptide-H§ symporter 2 [PET2, PEPT2] Pet2orycu Oligopeptide-H§ symporter 2 [PET2, PEPT2]
Rattus norvegicus
[rat] Homo sapiens
[hurnan] Oryctolagus cuniculus
[rabbit] Homo sapiens
[human] Oryctolagus cuniculus
[rabbit]
Cotransported ions are listed for known symporters.
31(
H*/2-4 residue oligopeptides H+/2-4 residue oligopeptides H+/2-4residue oligopeptides H§ residue oligopeptides H§ residue oligopeptides
P h y l o g e n e t i c tree :,:::;.-..:- .:~ :7 : L %<: ...
<,
[
?t2aarath
[
:ii<(,::
I
[
I
,,, , ,
r-[ I
:{ .. :,/:
l
.:~ .:: :>:,:..:::~
::i!: <
Ptr2sacce Ptr2canai Pet2homsa Pet2orycu Petlhomsa Petlratno Petlorycu Ch!iarath mt2barath Dtptlacia
Proposed orientation of PET1 in the m e m b r a n e
!!(:<1
ii-:::J:!i/:::!
The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75 % of the aligned transporters (see below) are shown.
/?,i,:i:i OUTSIDE LY
QM
m
I
Q
I
..
5....,~, ,i, I
5.,
::i:..:.. : - - :
uF
,
:j .....
L$ u
-i<%:.::i:i:: . .. ~ .:
:{J. :;%:.. (..,
-::
.
in-
!i:/."< :.!:. (
G
YF:
F 182
i:~: . : - v
A D
NH
i!?"::"
2
!:3:I : h.: ....
t :
I
LGTI
:ii ? ::.:--. .. .:. -
P
GN
ul
i=
,.-,..
1
r.].
G
219
2 K
K
G
oo
I
LJ
!~
I
COOH
INSIDE
Physical and genetic characteristics i; . . . .
?.
:
.. t . . - :
-.
.
Chllarath Dtptlacla Pt2aarath Pt2barath Ptr2sacce
AMINO ACIDS
MOL. WT
590 463 610 585 601
64 921 50 630 67 518 64 421 68 043
EXPRESSION SITES
CHROMOSOMAL LOCUS
Chromosome 11
~11
AMINO ACIDS 623 710 708 707 729 729
Ptr2canal Petlratno Petlhomsa Petlorycu Pet2homsa Pet2orycu
ii"i ii:~ii~ :-~i
-.:,..:..:
..
.:,.-:..:: ..
small intestine small intestine small intestine kidney kidney
1
50
...... M S S I E E Q I T K S D S D F I I S E D Q S Y L S K E K K A D G S A T I N Q A D E Q S S MLNHPSQGSD DAQDEK.QGD FPVIEEE .... KTQAVTLKD SYVSDDVANS ............... MVSSD F...ENEKQP DVVQVLTDEK NISLDDKYDY ............................................. MNPFQ ............................................. MNPFQ .................................................. .................................................. .................................................. .................................................. ........................................ MGSIEEEARP .................................................. ..................................................
Pt2aarath Ptr2sacce Ptr2canal Pet2homsa Pet2orycu Petlhomsa Petlratno Petlorycu Chllarath Pt2barath Dtptlacla Consensus
51 i00 TDELQKSMST GVLVNGDLYP SPTEEELATL PSVCGTIPWK AFIIIIVELC TERYNLSPSP .... E D E D F E G P T E E E M Q T L R H V G G K I P M R C W L I A I V E L S EDPKNYSTNY VDDYNPKGLR RPTPQESKSL RRVIGNIRYS TFMLCICEFA KNESKETLFS PVSIEEVPPR PPSPPKKPSP TICGSNYPLS IAFIVVNEFC QNESKETLFS PVSTEETPPR LSSPAKKTPP KICGSNYPLS IAFIVVNEFC ......................... MGMSK SHSFFGYPLS IFFIVVNEFC MGMSK SRGCFGYPLS IFFIVVNEFC ......................... MGMSK SLSCFGYPLS IFFIVVNEFC ..... M S L P E T K . . S D D I L L D A W D F Q G R P A D R S K T G G W A S A A M I L C I E A V L I E E G L I L Q E V K L Y A E D . . . G S V D F N G N P P L K E K T G N W K A CPFILGNECC .................................................. ........................................... I...E..
.
: .:---: :..
69 941 78 928 78 805 78 927 81 940 81664
CHROMOSOMAL LOCUS
Pt2aarath Ptr2sacce Ptr2canal Pet2homsa Pet2orycu Petlhomsa Petlratno Petlorycu Chllarath Pt2barath Dtptlacla Consensus
i01
~:~
EXPRESSION SITES
Multiple amino acid sequence alignments
%:
i:::;:.ii:il,;:: 12:.:i:
MOL. WT
150
Pt2aarath ERFAYYGLTV PFQNYMQF ............... GPKDATP GALNLGETGA Ptr2sacce ERFSYYGLSA PFQNYMEY.. GVLSLNSQGA Ptr2canalERASYYSTTGILTNYIQRRIDPDSPHGWGAPPP~~~GALGKGLQAA Pet2homsa ERFSYYGMKA VLILYF ................ LYF ....... LHWNEDTS Pet2orycu ERFSYYGMKA VLTLYF ................ LYF ....... LHWNEDTS Petlhomsa ERFSYYGMRA ILILYF ................ TNF ....... ISWDDNLS Petlratno ERFSYYGMRA LLVLYF ................ RNF ....... LGWDDDLS Petlorycu ERFSYYGMRA LLILYF ................ RNF ....... IGWDDNLS Chllarath ERLTTLGIGV NLVTYL ................ TGT ....... MHLGNATA Pt2barath ERLAYYGIAG NLITYL ................ TTK ....... LHQGNVSA Dtptlacla ....... MRA ILVYYL ................ YALTTADN AGLGLPKAQA ConsensusER..YYG .... L..Y ...................................
~:!!-i!:0
ii!i=:L~ iii:i)i.~!,
ii!ii:!,i:i~:i:~
iii!!,i! II;II~i;SO:,
~>.:<<.......
Pt2aarath Ptr2sacce Ptr2canal Pet2homsa Pet2orycu Petlhomsa Petlratno Petlorycu Chllarath Pt2barath Dtptlacla Consensus
151 DGLSNFFTFW CYVTPVGAAL IADQFLGRYN TGLSYFFQFW CYVTPVFGGY VADTFWGKYN SALTNLLTFL AYVFPLIGGY LGDSTIGRWK TSIYHAFSSL CYFTPILGAAIADSWLGKFK TSVYHAFSSL CYFTPILGAAIADSWLGKFK TAIYHTFVAL CYLTPILGAL IADSWLGKFK TAIYHTFVAL CYLTPILGAL IADSWLGKFK TVIYHTFVAL CYLTPILGAL IADAWLGKFK ANTVTNFLGT SFMLCLLGGF IADTFLGRYL ATNVTTWQGT CYLTPLIGAV LADAYWGRYW MAIVSIYGAL VYLSTIVGGW VADRLLGASR ........... Y..P..G...AD...G...
Pt2aarath Ptr2sacce Ptr2canal Pet2homsa Pet2orycu Petlhomsa Petlratno Petlorycu Chllarath Pt2barath Dtptlacla Consensus
201 IP.SVIDAGK S .............. MGGFV VSLIIIGLGT IP.SVGNRDS A .............. IGGFIAAIILIGIAT IPQAIENANA G LGLCV IAIITLSAGS .LPILGGQ . . . . . . . . . . . . . . V V H T V L S L I G L S L I A L G T .FPILGGK . . . . . . . . . . . . . . V V H T V L S L V G L C L I A L G T .INDLTDHNH D G T P D S L ..... P V H V V L S L I G L A L I A L G T .INDLTDHDH D G S P N N L ..... P L H V A L S M I G L A L I A L G T V N E L T D N N H D G T P D S L ..... P V H V A V C M I G L L L I A L G T IIPGLRPPRC NPTTSSHCEQ ASGIQLTVLY LALYLTALGT S V P A L K P A E C ...IGDFCPS A T P A Q Y A M F F G G L Y L I A L G T GLSSL . . . . . . . . . . . . . . . . . . . . . . . . F V A L F L I I L G T . . . . . . . . L.LI.LGT
Pt2aarath Ptr2sacce Ptr2canal Pet2homsa Pet2orycu Petlhomsa Petlratno Petlorycu Chllarath Pt2barath Dtptlacla Consensus
251 300 MAEQLPKIPP YVKTKKNGSK VIVDPVVTTS RAYMIFYWTI NVGSLSVLAT IADQLPKRKP SIKVLKSGER VIVDSNITLQ NVFMFFYFMI NVGSLSLMAT VLDQYPEERD MVKVLPTGES IILDREKSLS RITNVFYLAI NIGAFLQIAT G G D Q F E E K H A EERTR . . . . . . . . . . . . . . . . Y F S V F Y L S I N A G S L I S T F I G G D Q F E E K H A EERTR . . . . . . . . . . . . . . . . Y F S G F Y L A I N A G S L I S T F I G G D Q F E E G Q E KQRNR . . . . . . . . . . . . . . . . F F S I F Y L A I N A G S L L S T I I G G D Q F E E G Q E KQRNR . . . . . . . . . . . . . . . . F F S I F Y L A I N A G S L L S T I I G G D Q F E E G Q E KQRNR . . . . . . . . . . . . . . . . F F S I F Y L A I N A G S L L S T I I GSDQFDETEP KERSKMTY ............. FFNRFFFCI NVGSLLA..V GADQFDDTDS RERVRKAS ............. FFNWFYFSI NIGALVS..S V G H L Y .... S K D D S R R D T . . . . . . . . . . . . . G F N I F V V G I N M G S L I A P L I ..DQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . F . . F Y . . I N . G S L .....
Pt2aarath Ptr2sacce Ptr2canal Pet2homsa Pet2orycu Petlhomsa Petlratno Petlorycu Chllarath Pt2barath Dtptlacla Consensus
301 350 TSLES . . . . . . . . . T K G F V Y A Y L L P L C V F V I P L I I L A V S K T A F T S T L L P P TELEY . . . . . . . . . H K G F W A A Y L L P F C F F W I A V V T L I F G K K Q Y . . . I Q R P SYCER . . . . . . . . . R V G F W L A F F V P M I L Y I I V P I F L F I V K P K L . . K I K P P TPMLRGDVQC F..GEDCYAL AFGVPGLLMV IALVVFAMGS KIY..NKPPP TPMLRGDVQC F..GEDCYAL AFGVPGLLMV IALVVFAMGS KMY..KKPPP TPMLRVQQCG IHSKQACYPL AFGVPAALMA VALIVFVLGS GMY..KKFKP TPILRVQQCG IHSQQACYPL AFGVPAALMA VALIVFVLGS GMY.~ TPMVRVQQCG IHVKQACYPL AFGIPAILMA VSLIVFIIGS GMY..KKFKP T V L V Y V Q ...... D D V G R K W G Y G I C A F A I V L A L S V F L A G T N R Y . . R F K K L S L L V W I Q ...... E N R G W G L G F G I P T V F M G L A I A S F F F G T P L Y . . R F Q K P VGTV ........ GQGVNYHL GFSLAAIGMI FALFAYWYGR LRHFPEIGRE ........................ P ............. G .......... P
.......
:i:!!i!.i'il=~i:il
),.:~~(> ,)-5--
]iiiil ii:~.~i.
i..~
:i..>~:-.
i~:i;li:ii!;i:i~-! !ii!i.ii'.! :~!!I' ~:~
... ....... :: :.,-s-s.:.
i)~!::~r ~.!.- i
i ...... i
i:,iiLii::
200 TIVCSAVIYF IGILILTCTA TICCGTAIYI AGIFILFITS AIQWGVFFGF VAHLFFIFAS TIIYLSLVYV LGHVIKSLGA TIIYLSLVNV LGHVIKSLSA TIVSLSIVYT IGQAVTSVSS TIVSLSIVYT IGQAVISVSS TIVWLSIVYT IGQAVTSLSS TIAIFAAIQA TGVSILTLST TIACFSGIYF IGMSALTLSA TIFLGGILIT LGHVALATPF TI . . . . . . . . . G ........ 250 GGIKSNVSPL GMIKANLSVL GLMKPNLLPL GGIKPCVAAF GGIKPCVAAF GGIKPCVSAF GGIKPCVSAF GGIKPCVSAF GGVKASVSGF GGIKPCVSSF GMLKPNISNM G..K ......
.... ....,
i~;;,G:~: .....
' : i~
...... ..:
...
....
:i?:?: !:
........
....w ,
:..,
:':!.!:!;.;
-:2.
.
.,:.
.
. . . .
Pt2aarath Ptr2sacce Ptr2canal Pet2homsa Pet2orycu Petlhomsa Petlratno Petlorycu Chllarath Pt2barath Dtptlacla Consensus
351 400 V P S L F V . L V K C S S L L L K T N L I S K K L N .... H L A L L L . . . . . . . . . . . LER IGD .... KVI A K S F K V C W I L T K N K F D . . . F N A A K P S . . . . . . . . . . . VHP QGQVMTNVVK ILAVLFSGNF IKRLWNGTFW DHARPSHMEA RGTIYYNSKK EGNIVAQVFK ...CIWFAIS NRFKN ........................ R EGNIVAQVVK ...CIWFAIS NRFKN ........................ R QGNIMGKVAK CIGFAIK NRFRH R QGNIMGKVAK ...CIRFAIK NRFRH ........................ R QGNILSKVVK ...CICFAIK NRFRH ........................ R IGSPMTQVAA...VIVAAWR NRKLELPADP SYLYDVDDIIAAEGSMKGKQ GGSPITRISQ ...VVVASFR KSSVKVPEDA TLLYETQD...KNSAIAGSR PSNPMDAKAK RNFIITLTIV LIVALIGFFL IYQASPANFI NNFINVLSII .G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Pt2aarath Ptr2sacce Ptr2canal Pet2homsa Pet2orycu Petlhomsa Petlratno Petlorycu Chllarath Pt2barath Dtptlacla Consensus
401 Y V K D Q W D D L F ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . E K N Y P W N D K F VD . . . . . . . . . . . . . . . . . . . . . . . . . . . . K S A I T W S D Q W IL . . . . . . . . . . . . . . . . . . . . . . . . . . . . S G D I P K R H D W L D W A A E K Y P K Q . . . . . . . . . . . . . . . . LIM S E D I P K R Q H W L D W A A E K Y P K Q . . . . . . . . . . . . . . . . LIM S K A F P K R E H W L D W A K E K Y D E R . . . . . . . . . . . . . . . . LIS S K A F P K R N H W L D W A K E K Y D E R . . . . . . . . . . . . . . . . LIS SKQFPKRAHW LDWAKEKYDE R ................ LIA KLPHTEQFRS LDKAAIRDQE AGVTSNVFNK WTLSTLTDVE KIEHTDDCQY LDKAAVISEE ESKSGDYSNS WRLCTVTQVE GIVVPIIYFV MMFTSKKVES D ................... ........... D ..............................
Pt2aarath Ptr2sacce Ptr2canal Pet2homsa Pet2orycu Petlhomsa Petlratno Petlorycu Chllarath Pt2barath Dtptlacla Consensus
451 500 TFLFYPIYWVCYGQMTNNLI S Q A G Q M Q T G . . . . . . . . . NV S N D L F Q A F D S V F I F Y P I Y W T Q Y G T M I S S F I T Q A S M M E L H . . . . . . . . . GI P N D F L Q A F D S I F L Y Y I I F N L A D S G L G S V E T S L I G A M K L D . . . . . . . . . GV P N D L F N N F N P L Y I P L P M F W A L L D Q Q G S R W T L Q A I R M N R N L G ..... FFVL Q P D Q M Q V L N P L Y I P L P M F W A L L D Q Q G S R W T L Q A T K M N G N L G ..... FFVL Q P D Q M Q V L N P L Y I P L P M F W A L F D Q Q G S R W T L Q A T T M S G K I G ..... ALEI Q P D Q M Q T V N A L Y I P L P M F W A L F D Q Q G S R W T L Q A T T M T G K I G ..... TIEI Q P D Q M Q T V N A L Y I P L P M F W A L F D Q Q G S R W T L Q A T T M S G R I G ..... ILEI Q P D Q M Q T V N T I W A T C I L F W T V H A Q L T T L S V A Q S E T L D R S I G ..... SFEI P P A S M A V F Y V I W A S G I I F S A V Y A Q M S T M F V Q Q G R A M N C K I G ..... SFQL P P A A L G T F D T LFLSAIVFWA IEEQSSTIIAVWGESRSNLN PTWFGFTFHI DPSWYQLLNP ....... FW .... Q . . . . . . . Q . . . M . . . . . . . . . . . . . . . . . . . . . . . .
Pt2aarath Ptr2sacce Ptr2canal Pet2homsa Pet2orycu Petlhomsa Petlratno Petlorycu Chllarath Pt2barath Dtptlacla Consensus
501 IALIIFIPIC DNIIYPLLRK IALIIFIPIF EKFVYPFIRR LTIIILIPIL EYGLYPLLNK LLVLIFIPLF DFVIYRLVSK LLVLIFIPLF DLVIYRLISK ILIVIMVPIF DAVLYPLIAK ILIVIMVPIV DAVVYPLIAK ILIIILVPIM DAVVYPLIAK GGLLLTTAVY DRVAIRLCKK ASVIIWVPLY DRFIVPLARK LFIVLLSPIF VRIWNKLGDR .... I..P . . . . . . . . L . . K
.... -,;,. , . . , ,
....... ...
!iii:ii!i:iiiii ..........
i!!! !'iill :i:!:~i'!=:i:i!i:!:
H.
y::...:..:.,
..
:-ii!!i?:~ ....
........
.
.
~14
450 ELKRALRACK EIKRALAACK DIKQTFDSCK DVKALTRVLF DVKTLTRVLF QIKMVTRVMF QIKIMTKVMF QIKMVTRVLF EVKQIVRMLP ELKILIRMFP ERRKLTAYIP K .......
550 ...YNIPFKP ILRITLGFMF ATASMIYAAV ...Y.TPLKP ITKIFFGFMF GSFAMTWAAV ...FKIDFKP IWRICFGFVVCSFSQIAGFV ...CGINFSS LRKMAVGMIL ACLAFAVAAR ...CGINFTS LRKMAVGMVL ACLAFAAAAT ...CGFNFTS LKKMAVGMVL ASMAFVVAAI ...CGFNFTS LKKMTVGMFL ASMAFVVAAI ...CGLNFTS LKKMTIGMFL ASMAFVAAAI LFNYPHGLRP LQRIGLGLFF GSMAMAVAAL FTGVDKGFTE IQRMGIGLFV SVLCMAAAAI QP ...... ST I V K F G L G L M L T G A S Y L I M T L . . . . . . . . . . . . . . . . G . . . . . . . . . . AA.
.............. .
j
Pt2aarath Ptr2sacce Ptr2canal Pet2homsa Pet2orycu Petlhomsa Petlratno Petlorycu Chllarath Pt2barath Dtptlacla Consensus
551 600 LQAKIYQRGP CYANFTDTCV SNDISVWIQI PAYVLIAFSE IFASITGLEF LQSFVYKAGP WYNEPLGHNT PNHVHVCWQI PAYVLISFSE IFASITGLEY LQKQVYEQSP CGYYATNCDS PAPITAWKAS SLFILAAAGE CWAYTTAYEL VEI...KINE .MAPAQPGPQ EVFLQVLNLA DDEVKVTVVG ..... NENNS VEI...KINE .MAPPQPGSQ EILLQVLNLA DDEVKLTVLG ..... NNNNS VQV...EIDK .TLPVFPKGN EVQIKVLNIG NNTMNISLPG ..... EM... VQV...EIDK .TLPVFPSGN QVQIKVLNIG NNDMAVYFPG ..... KN... LQV...EIDK .TLPVFPKAN EVQIKVLNVG SENMIISLPG ..... QT... VELKRLRTAH .AHG~ TLPLGFYLLI PQYLIVGIGE ALIYTGQLDF VEIIRLHMAN .DLGLVESGA PVPISVLWQI PQYFILGAAE VFYFIGQLEF PGLLNGTSGR .ASALW ..... LVLMFAVQM AGELLVSPVG LSVSTKLAPV ..................................................
Pt2aarath Ptr2sacce Ptr2canal Pet2homsa Pet2orycu Petlhomsa Petlratno Petlorycu Chllarath Pt2barath Dtptlacla Consensus
601 650 AFTKAPPSMK SIITALFLFT NAFGAILSIC ISSTAVNPKL TWMYTGIAVT AYSKAPASMK SFIMSIFLLT NAFGSAIGCA LSPVTVDPKF TWLFTGLAVA AYTRSPPALK SLVYALFLVM SAFSAALSLA ITPALKDPNL HWVFLAIGLA LLIESIKSFQ KTPHYSKLHL KTKSQDFHFH LKYHNLSLYT EHSVQEKNWY LLADSIKSFQ KTPHYSKIHL NTKSQDFYFH LKYHNLSIYT EHSVEERNWY ...VTLGPMS QTNAFMTFDV NKLTRINISS PGSP.VTAVT .DDFKQGQRH ...VTVAQMS QTDTFMTFDV DQLTSINVSS PGSPGVTTVA .HEFEPGHRH ...VTLNQMS QTNEFMTFNE DTLTSINITS ..GSQVTMIT .PSLEAGQRH FLRECPKGMK GMSTGLLLST LALG.FFFSS VLVTIVEKFT GK..AHPWIA FYDQSPDAMR SLCSALALLT NALG.NYLSS LILTLVTYFT TRNGQEGWIS AFQSQMMAMW FLADSTSQAI NAQITPIFKA ATEVHFFAIT GIIGIIVGII ..................................................
Pt2aarath Ptr2sacce Ptr2canal Pet2homsa Pet2orycu Petlhomsa Petlratno Petlorycu Chllarath Pt2barath Dtptlacla Consensus
651 700 AFI ..... AG IMFWVCFHHY DA.MEDEQNQ LE.FKRNDAL TKKDVEKEVH CFI ..... SG CLFWLCFRKY ND.TEEEMNA MD.YEEEDEF DLNPISAPKA GFL ..... CA IVMLAQFWNL DKWMENETNE RERLDREEEE EANRGIHDVD SLVIREDGNS ISSMMVK.DT ESRTTNGMTT V R F V N T L H K D V N I S L S T D T S SLIIREDGKS ISSIMVK.DM ENETTYGMTA I R F I N T L Q E N V N I S L G T D I S TLLVW..APN HYQVVKD.GL NQKPEKGENG IRFVNTFNEL ITITMSGKVY TLLVW..GPN LYRVVKD.GL NQKPEKGENG IRFVSTLNEM ITIKMSGKVY TLLVW..APN NYRVVND.GL TQKSDKGENG IRFVNTYSQP INVTMSGKVY DDLNKGRLYN FYWLVAVLVA LNFLIFLVFS KWYVYKEKRL AEVGIELDDE DNLNSGHLDY F F W L L A G L S L V N M A V Y F F S A AR..YKQKKA SS ........ LLIIKKPILK LMGDVR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..................................................
Pt2aarath Ptr2sacce Ptr2canal Pet2homsa Pet2orycu Petlhomsa Petlratno Petlorycu Chllarath Consensus
701 750 DSYSMADESQ YNLEKANC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NDIEILEPME SLRSTTKY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HPIEAIVSIK S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LNVGEDYGVS AYRTVQRGEY PAVHCRTEDK NFS...LNLG LLDFGAAYLF LNVGENYGVS AYRTVQRGEY PAVHCKTEDK DFS...LNLG LLDFGASYLF ANIS.SYNAS TYQFFPSGIK GFTISSTEIP PQCQPNFNTF YLEFGSAYTY ENVT.SHSAS NYQFFPSGQK DYTINTTEIA PNCSSDFKSS NLDFGSAYTY EHIA.SYNAS EYQFFTSGVK GFTVSSAGIS EQCRRDFESP YLEFGSAYTY P$IPMGH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..................................................
~1~
!j!<.til.
:. : : . : < ,:.....
q
<.<
...
";:.i;%:> : .
..
i. :.i.!i<:: :-.il
%i=11:! 9n . . . - , :
7: : : ::" }i ::: :.7
........
:i :<:;.7.
VITNNTNQG. VITNSTKQG. IVQ.RKNDSC VIRSRASDGC LITSQAT.GC
Pet2homsa Pet2orycu Petlhomsa Petlratno Petlorycu Consensus
SYSQAPSSMK SYSQAPSSMK SYSQAPSNMK SYSQAPSNMK SYSQAPSNMK
Pet2homsa Pet2orycu Petlhomsa Petlratno Petlorycu Consensus
ANKMSIRWQL ANKVSIAWQL ANTVNMALQI PNTVNMALQI PNTMNMAWQI
PQYALVTAGE VMFSVTGLEF PQYALVTAGE VMFSVTGLEF PQYFLLTCGE VVFSVTGLEF PQYFLLTCGEVVFSVTGLEF PQYFLITSGEVVFSITGLEF
SVLQAAWLLT SVLQAAWLLT SVLQAGWLLT SVLQAGWLLT SVLQAGWLLT
IAVGNIIVLV VAIGNIIVLV VAVGNIIVLI VAIGNIIVLI VAVGNIIVLI
VAQFSGL.VQ VAQFSGL.VQ VAGAGQFSKQ VAEAGHFDKQ VAGAGQINKQ
..................................................
801
800
LQAWKIEDIP LQAWKMEDIP PEVKVFEDIS LEVKEFEDIP PQVTEFEDIP
..................................................
850 WAEFILFSCL WAEFVLFSCL WAEYILFAAL WAEYVLFASL WAEYILFAAL
851 900 LLVICLIFSI MGYYYVPVKT EDMRGPADKH IPHIQGNMIK LETKKTKL.. LLVVCLIFSI MGYYYIPIKS EDIQGPEDKQ IPHMQGNMIN LETKKTKL.. LLVVCVIFAI MARFYTYINP AEIEAQFDED EKKNRLEKSN PYFMSGANSQ LLVVCIIFAI MARFYTYINP AEIEAQFDED EKKKGVGKEN PYSSLEPVSQ LLVVCVIFAI MARFYTYVNP AEIEAQFEED EKKKNPEKND LYPSLAPVSQ .................................................. 901
i?:!,!?:i;::< !:, :i::- C ' <.:-../!!
751
Pet2homsa Pet2orycu Petlhomsa Petlratno Petlorycu Consensus
Pet lhomsa KQM Petlratno TNM Petlorycu TQM Consensus . . .
"
Residues listed in the consensus sequence are present in at least 75 % of the aligned transporter sequences. -:::--- ...
;ii:ii ? t <:-. . . . .
7
:
.... _ < ~ . :
:.!i:i::>:'Z. < :ii:i.:~ :::i :::}: ::-<. : : ; - i : , . : . . . : .,.,..-.
-.- . : : . , .::):h . . . . .... . . . -7...
31(
Database a c c e s s i o n n u m b e r s SWISSPR OT Chl 1arath Q05085 Dtptlacla P36574 Pt2aarath P4603 Pt2barath P46032 Ptr2sacce P32901 Ptr2canal P46030 Petlratno Pet 1homsa P46059 Petl orycu P36836 Pet2homsa Pet2orycu P46029
PIR
A53620 $38171
EMBL/GENBANK L 10357 U05215 U01171 L39082; X77503 L11994; X73541 U09781 D50664 U13173 U06467; U13707 $78203 U32507
H+/fucose symporter family
Summary Transporters of the H*/fucose symporter family, the example of which is the FUCP fucose-H § symporter of Escherichia coli (Fucpescco), mediate symport (H§ substrate uptake) of fucose, glucose and galactose. Known members of the family are found only in gram-negative bacteria. Statistical analysis reveals no apparent relationship between the amino acid i!-:i :>i-:::<.!::: sequences of the H§ symporter family and any other family of transi-;<)!i-;.i-i; porters. They are predicted to contain 12 membrane-spanning helices by the hydropathy of their amino acid sequences and activities of reporter gene fusions 1. ii?~!71:!7 Several amino acid sequence motifs are highly conserved in the H*/fucose ~:i<~d!Z il;:i::!!!:i!!:! ! ....symporter family.
Nomenclature, biological sources and substrates CODE
DESCRIPTION [SYNONYMS]
Fucpescco
Fucose-H§ symporter
i~i~<~:i~!ii~!~ !i'~iii!%,,ii:ii:~.i}
Fucphaein Glupbruab
[FUCP] Fucose-H§ symporter [FUCP, HI0610] Glucose-galactose transporter [GLUP]
OR GANISM [COMMON NAMES] Escherichia coh
SUBSTRATE(S)
H*/fucose
[gram-negative bacterium] Haemophilus influenzae
H§
[gram-negative bacterium] Brucella abortus
Glucose,
[gram-negative bacterium] galactose
1:3:I :{ :if:- <:
L!?I
Cotransported ions are listed for known symporters.
Phylogenetic
tree Fucpescco G!upbruab Fucphaein
i
P r o p o s e d o r i e n t a t i o n of F U C P in t h e m e m b r a n e
The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane 1. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last i~:::::::::::::::::::::::::::::::::::::::: residue of each membrane-spanning helix are boxed. Residues that are l)!!ii!i!!:i!~/~ conserved in all three of the aligned transporters (see below)are shown.
~17
OUTSIDE
!.:."!+:i~!:.. : /:::: i::-.+.
;.::i!;i+::i::. ..i i.;!?:::i:.:.! : ... --. ::..- ..... .+: . ...... . .
.
P D
i! .i :
115
11~2
LQ
~P
F
LF
W
:+..,
Y.
Ft.
+:..
ii
A
L
~Ir;'
~IIII :405
G GVi
G
IG
G
K
F
134
Ol~+
.GI
A
NF P
L
41~
G
Y
G .ii
291'
ii FL
m m:! +,+
;~i;:.,;+:-;i +~;?+i
~
L!
A ::i~. :-:,":. ..... ill:.::;.:~;....g:
~4
G
G P
F : - i ;::
,;+ ,a
L
P
P C
I
TI
161
A N P COOH
G
S
i.. :.:?:NH
INSIDE
2
Physical and genetic characteristics AMiNo Mot. wT Km
CHROMOSOMAL LOCUS
ACIDS Fucpescco Fucphaein
Glupbruab
:!!CY i: +iii.:,;;Y+.. i!?:-:.
::+.:
~...
+
:.
47 5 4 4 46 9 8 7 43 859
F u c o s e : 24.7/aM
63.16 minutes
Multiple amino acid sequence alignments
.~.:....
!9
438 428 412
.
1 50 MGNTSIQTQS YRAVDKDAGQ SRSYIIPFAL LCSLFFLWAV ANNLNDILLP G l u p b r u a b .MATSIPTNN ..PLHTETSS Q K N Y G F A L T S L T L L F F M W G F I T C L N D I L I P Fucphaein .............. MNAKVL EKKFIVPFVL ITSLFALWGF ANDITNPMVA Consensus ................................. LF..W ............
Fucpescr
Fucpescco Glupbruab Fucphaein Consensus
51 QFQQAFTLTN FQAGLIQSAF HLKNVFQLNY TQSMLIQFCF VFQTVMEIPA SEAALVQLAF .............. L.Q..F
i00 YFGYFIIPIP AGILMKKLSY KAGIITGLFL FGAYFIVSLP AGQLVKRISY KRGIVVGLIV YGGYGTMAIPAALFASRYSY KAGILLGLAL ...Y ..... P A ....... S Y K . G I . . G L . .
Fucpescco Glupbruab Fucphaein Consensus
i01 YALGAALFWPAAEIMNYTLF AAIGCALFIPAASYRVYALF YAIGAFLFWPAAQYEIFNFF . A . G . . L F . P A A ....... F
LVGLFIIAAG LGALFVLASG LVSLYILTFG L..L ..... G
.
.
31~
LGCLETAANP VTILQVAANP LAFLETTANP ...L...ANP
150 FVTVLGPESS YVTILGKPET YILAMGDPQT ..... G ....
151 200 GHFRLNLAQT FNSFGAIIAV VFGQSLILSN VPHQSQDVLD KMSPEQLS.. GlupbruabAASRLTLTQA F N S L G T T V A P V F G A V L I L S A A T D A T V N A E A D ......... Fucphaein ATRRLNFAQS FNPLGSITGM FVASQLVLTN LESDKRDAAG NLIFHTLSEA C o n s e n s u s ...RL...Q. F N . . G . . . . . . . . . . L.L . . . . . . . . . . . . . . . . . . . . . .
Fucpescco
201 250 Fucpescco ...AYKHSLV LSVQTPYMII VAIVLLVALL IMLTKFPALQ SDNH..SDAK Glupbruab ........... AVRFPYLLL ALAFTVLAII FAILKPPDVQ EDEPALSDKK F u c p h a e i n E K M S I R T H D L A E I R D P Y I A L G F V V V A V F I I I G L K K M P A V K IE ..... E A G C o n s e n s u s . . . . . . . . . . . . . . . PY . . . . . . . . . . . . . . . . . K.P . . . . . . . . . . . . . 251 300 F u c p e s c c o Q G S F S A S L S R L A R I R H W R W A V L A Q F C Y V G A Q T A C W S Y L I R YAVE .EIPGM Glupbruab EGSAW ....... QYRHLVLG AIGIFVYVGA EVSVGSFLVN FLSDPTVAGL Fucphaein QISFKTAVSR LAQKAKYREG VIAQAFYVGV QIMCWTFIVQ YA...ERLGF C o n s e n s u s .. S . . . . . . . . . . . . . . . . . . . . . . . YVG . . . . . . . . . . . . . . . . . . . G. 301 Fucpescco TAGFAANYLT GTMVCFFIGR FTGTWLISRF APHKVLAAYA Glupbruab SETDAAHHVA YFWGGDMVGR FIGSAAMRYI DDGKALAFDA Fucphaein TKAEGQNFNI IAMAIFISSR FISTALMKYL KAEFMLMLFA Consensus ................... RF .............. L...A
350 LIAMALCLIS FVAIILLFIT IGGFLSILGV ..........
351 Fucpescco AFAGGHVGLI ALTLCSAFMS IQYPTIFSLG IKNLGQDTKY Glupbruab VATTGHIAMW SVLAIGLFNS IMFPTIFSLA LHGLGSHTSQ Fucphaein IFIDGVWGLY CLILTSGFMP LMFPTIYGIA LYGLKEESTL C o n s e n s u s .... G . . . . . . . . . . . . F . . . . . PTI . . . . . . . L ......
400 GSSFIVMTII GSGILCLAIV GAAGLVMAIV G ....... I.
401 Fucpescco GGGIVTPVMG Glupbruab GGAIVPLIQG Fucphaein GGALMPPLQG C o n s e n s u s G G ....... G
FVSD . . . . . A AGNIPTAELI A L A D ..... A IG. I H L A F L M MIIDQGEVMG LPAVNFSFIL ...D . . . . . . . . . . . . . . . .
450 PALCFAVIFI FARFRSQTAT PIICYAYIAF YGLIGTKS.. PLICFVVIAI YGFRAWKILK P..C...I ............
Residues listed in the consensus sequence are present in all three of the transporter sequences.
Database accession numbers Fucphaein Fucpescco Glupbruab
SWISSPR O T
PIR
EMBL/GENBANK
P44776 P 11551
JS0184
L45251; [/32743 X15025; U29581 U43785
References 1 Gunn, F. (1993) PhD thesis, Cambridge Umversity. 2 Muir, J. et al. {1993) Biochem. J. 290, 833-842.
H+/carboxylate symporter family Summary ~iii~. i~ i~'!j~~i :
:5
k
ii!i!iii~!~i~~il
.%-.
::5.: ~,:--:, -
Transporters of the H+/carboxylate symporter family, the example of which is the KGTP a-ketoglutarate-H § symporter of Escherichia coli (Kgtplescco), mediate uptake of carboxylated compounds 1,z. The mechanism of transport, where known, is symport (i.e. H+-coupled substrate efflux). The transport activity of one family member, the PROP proline-betaine protein of E. coli, is stretch-inactivated, allowing it to function as both an osmosensor and osmoregulator a. Members of the family occur in both gram-negative and grampositive bacteria. Statistical analysis of multiple amino acid sequence comparisons places the H+/carboxylate symporter family in the uniporter-symporter-antiporter (USA) superfamily, also known as the major facilitator superfamily (MFS)1"2. Members of the H+/carboxylate symporter family are predicted to form 12 membrane-spanning helices by the hydropathy of their amino acid sequences 4 and activities of reporter gene fusions s. There is considerable similarity between the sequences of the N- and C-terminal halves of these proteins, further implying they arose through gene duplication of an ancestral six helix protein 2. Several amino acid sequence motifs are highly conserved in the H*/ carboxylate symporter family, including motifs unique to the family and signature motifs of the USA/MFS transporter superfamily.
Nomenclature, biological sources and substrates CODE
DESCRIPTION [SYNONYMS]
Bap3strhy
Bialaphosetransporter
ORGANISM [COMMON NAMES] Streptomyces hygroscopicus
SUBSTRATE(S)
Bialaphos
[gram-positive bacterium] Citklepn
Citrate-H+symporter [CIT, CITH] Citlescco Citrate-H+symporter !i~i:/i~i},}.IIi?:i [CIT1, CITA, C I T ] Cit2escco Citrate-H*symporter [CITA, CIT] Citasalty Citrate-H+symporter [CITA] Kgtpescco ~-Ketoglutarate-H+ symporter [KGTP,WITA] Mopaburce 4-Methyl-o-phthalate permease [MPOA] Nantescco Putativesialic acid transporter [ N A N T ] Ousaerwch Osmoprotectantuptake ii!ii!;i=i iii-iiii:: i: system A [ O U S A ] Pcatpsepu Dicarboxylicacid transport protein [PCAT] Propescco Proline-betainetransporter
[PROP, PPII]
KlebsieHa pneumoniae Escherichia coli
H+/citrate
[gram-negative bacterium] Escherichia cold
H+/citrate
[gram-negative bacterium] Salmonella typhimurium
H+/citrate
[gram-negative bacterium] Escherichia coh"
[gram-negativebacterium] Burkholderia cepacia
[purple bacterium] Escherichia coli
H+/a-keto glutarate 4-Methyl-ophthalate Sialic acid
[gram-negative bacterium] Erwiniachrysanthemi
Osmoprotectants
[gram-negative bacterium] Pseudomonasputida
Dicarboxylates
[gram-negative bacterium] Escherichiacoh"
[gram-negative bacterium]
Cotransported ions are listed for known symporters.
]2(
H+/citrate
[gram-negative bacterium]
Proline, betaine
Phylogenetic tree ;-<%: . ..
: : i:.
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the phylogenetic tree: Cit2escco, Citasahy (Citlescco). Citlescco Citklepn Ousaerwch Propescco Kgtpescco Pcatpsepu Bap3strhy Nantescco Mopaburce
Proposed orientation of KGTP in the membrane
<..:.
.
The model is based on predictions of membrane-spanning regions and a-helical content s. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75% of the aligned transporters (see below) are shown. Consensus residues indicated by an asterisk are not conserved in KGTP.
.: ....
OUTSIDE
....
yp'y
/a 60 ve
w
..... .~.~:~ ::.:...........
120
185
G
I
F
F E
ii~7~:
,M
Q
LR
G
~rJ'Gi ~0
3~/
~
4~
P G
G
F
i~:::.iii! . ?:.,i:~i%
~1
PF
OO
F
1N GWR
S
G
PG k
L
G
L
P G
R
135 A
R'G G
D
COOH
!~!!';<+!:::.i%
E
...
INSIDE ...
i!iii"ii~%i:::):i
NH
2
~21
!i.i!ii!ii,li:iii::i!~il ,-:i::~
Physical and genetic characteristics
.:. ...................... _r.,,..~:.::..,-::,:::,:.:.-.:~ ................. . . ................... ... ~,..:. ..................... !::~-~:!!:?.: !:.!i:":~!:~!:;i: ;:.?.:.
Bap3strhy Citklepn Citlescco Cit2escco Citasahy !: i!::~((;-?i::i~i:i Kgtpescco Mopaburce [~:.~:i!~ii:!':i:~:~!i'.~!::i:;i!Nantescco ?~i Ousaerwch Pcatpsepu Propescco ...............
~...~:.........~: .~:: .:.-:,,~ ~ ~::.::~::~:-. :..,::~
~...~,:;::::,,:.~. ......... ,:.:: :::-.,.-- ~ ~ ,..,......~......:~?,
........ ~::..~::..~::~::.::::.:,.:.: ..~ :..::::.::-:;..:......: ......... ................ .........
AMINO ACIDS 448 444 431 431 434 432 431 496 498 429 500
MOL. WT
47 233 48 142 46 979 47077 47 188 47 052 45 627 53 551 54 412 46861 54 845
CHR O M O S O M A L LOCUS
18.00-18.02 minutes 16.00-16.02 minutes 58.51 minutes 72.55-72.57 minutes 93.24 minutes
Multiple amino acid sequence alignments ::::::::::::::::::::::::::::::::::
z-.:/i:;:: :::/::::.S,::-.::.?
................... ,:... .....
::::::::::::::::::::::::::::::::: ~.-~
:~:::.!i"::i(~::!;ii if.?// :::::::::::::::::::::::::::::. !ii-:;;:;ii::~i~:iSTi-! ::::::::::::::::::::::::::::.:::: ::::::::::::::::::::::::::::~:ii.::. ......
::::::::::::::::::::::: ::::::::::::::::::::::::-~-:.~i':!-: .:.:~===================== :-.::::1~
.......................
......................... ..... :::::::::::::::::::::::::::: .~..::~:.. .
....... ::::::::::::::::::::::::::::: ,~:.:::::.:
.~)ii!!ii,i:i:;ili}! :::!{ii~i::~!ii::ii:(~i:.; ........................... ~: :~;::::::::::::::::::::::::~.:.
:!iiiilG:i:;ii:i ;ii~!~i;:ii!;{:ii:!~. ..................... ~i:? Gfi:::i!iii~:7 ::::::::::::::::::::::::::.,:-::~:::~.
!!~:!i;;;','?f,!:~:' ~iiii',i:li! ::::::::::::::::::::::::::::::::::: I::A :~:~;~i;:::i~"!~::: ~.:~,:::~.~::-::~:::~:.;p:::: :::::::::::::::::::::::::::::::::
~2~
Citlescco Citklepn Ousaerwch Propescco Kgtpescco Pcatpsepu Bap3strhy Nantescco Mopaburce Consensus
1 5O .......... MTQQ PSRAGTFGAI LRVTSGNFLE QFDFFLFGFY MPTARCSMRASSTAPVRMMATAGGARIGAI LRVTSGNFLEQFDFFLFGFY ..MKLKRKRV KPIALDDVTI IDDGRLRKAI TAAALGNAME WFDFGVYGFV ...MLKRKKV KPITLRDVTI IDDGKLRKAI TAASLGNAME WFDFGVYGFV ...... MAES T V T A D S K L T S S D T R R R I W A I V G A S S G N L V E W F D F Y V Y S F C ............ MTSTYYTG EERSKRIFAI VGASSGNLVE WFDFYVYAFC .......... M T V R E S D R T T V P R S R S L R Q L V C G G I G H T V E S H D W Y V Y T F L ........ MS T T T Q N I P W Y R H L N R A Q W R A F S A A W L G Y L L D G F D F V L I A L V ............................. M AYHSTNPLLS RWIVGAGQA . . . . . . . . . . . . . . . . A . . . . . . G . . . E .FDF .... F.
51 Citlescco ATYIAKTFFP Citklepn ATYIAHTFFP Ousaerwch AYALGQVFFP Propescco AYALGKVFFP Kgtpescco SLYFAHIFFP Pcatpsepu AIYFAPAFFP Bap3strhy AVYFSDDIFP N a n t e s c c o ........ LT MopaburceAAAVAKTLRA C o n s e n s u s ........ FP Citlescco Citklepn Ousaerwch Propescco Kgtpescco Pcatpsepu Bap3strhy Nantescco Mopaburce Consensus
AESE..FAAL ASSE..FASL ..G.DPGVQS ..GADPSVQM ..SGNTTTQL SDDPTVQL ESSGDPLVPL EVQGEFGLTT EGHRGEIMML
ML ...... TF A V F G S G F L M R MM ...... TF A V F G A G F L M R IA ...... AL A T F S V P F L M . V A ...... AL A T F S V P F L I R LQ ...... TA G V F A A G F L M R LN TA G V F A A G F L M R LN ...... TF A V F A L A F A A R VQ ...... A A S L I S A A F I S R GAERVGPYER PPLSKAWLSA
......................
i01 IDRIGRRK.. GLMITLAIMG CGTLLIALVP IDKVGRRK.. GLIVTLSIMA TGTFLIVLIP GDKYGRQK.. ILAITIIIMS ISTFCIGLIP GDKYGRQK.. ILAITIVIMS ISTFCIGLIP ADKHGRKK.. SMLLSVCMMC FGSLVIACLP ADRHGRKN.. SLMISVLMMC FGSLMIACLP ADRYGRRS.. ALIVTILLMG LGSLMIGLTP GDRYGRRL.. AMVTSIVLFS AGTLACGFAP EKSEAGQSDV DLRTGVNVTR IDRPSCTVHV .D..GR . . . . . L ...... M . . . . . . I...P
F...FL.R
i00
PIGAVVLGAY PIGAIVLGAY PLGGVFFGAL PLGGLFFGML PIGGWLFGRI PIGGWIFGRL PVGATVMGWY WFGGLMLGAM ENVPEIAALL P.G
.... G..
150 GYQTIGLLAP VLVLVGRLLQ SYQTIGLWAP LLVLIGRLLQ SYERIGIWAP ILLLLAKMAQ SYDTIGIWAP ILLLICKMAQ GYETIGTWAPALLLLARLFQ TYGSIGTWAPALLLLARLIQ SYATAGPVAPWLIAARLVQ GYIT ........ MFIARLVI DDGSEICFDR LVIATGGRAR . Y . . . G . . A P ......... Q
151 Citlescco GFSAGVELGG VSVYL..SEI ATPGNKGFYT Citklepn GFSAGAELGG VSVYL..AEI ATPGRKGFYT Ousaerwch GFSVGGEYTG ASIFV..AEY SPDRKRGFMG Propescco GFSVGGEYTG ASIFV..AEY SPDRKRGFMG Kgtpescco GLSVGGEYGT SATYM..SEV AVEGRKGFYA Pcatpsepu GLSVGGEYGT TATYM..SEV ALRGQRGFFA Bap3strhy GFSLGGEYGA ATTFL..VESAAPGRRALFS Nantescco GMGMAGEYGS SATYV..IES WPKHLRNKAS Mopaburce RLAVPGDASD QIAYLRTIDD ALHIRRGLKA C o n s e n s u s G . S . G G E . . . . . . . . . . . E . . . . . . . G...
SWQSASQQVA SWQSGSQQVA SWLDFGSIAG SWLDFGSIAG SFQYVTLIGG SFQYVTLIGG SFQYVASSVG GFLISGFSVG GKRLLLIGGG S ........ G
201 Citlescco LNVTLGHDEI SEWGWRIPFF IGCMIIPLIF Citklepn LNAVLEPSAI SDWGWRIPFL FGVLIVPFIF Ousaerwch ISTLIGEQAF LAWGWRLPFF LALPLGLIGL Propescco ISTIVGEANF LDWGWRIPFF IALPLGIIGL Kgtpescco LQHTMEDAAL REWGWRIPFA LGAVLAVVAL Pcatpsepu LQQLLTEDEL RAWGWRIPFV VGAIAALISL Bap3strhy ASQ.ISGDGM DRWGWRLPFI WGAVICLAGL NantesccoVVPV ........ WGWRALFF IGILPIIFAL Mopaburce RKLGVDVVLV EAAQRLCERT VPAIVGERLL Consensus ............ WGWR.PF .......... L
250 V L R R S L Q E T E A F L Q ...... I L R R K L E E T Q E F T A ...... Y L . A T L E E T P A F R Q ...... Y L R H A L E E T P A F Q Q ...... WLRRQLDET ........... MLRRSLHET ........... ALRSTAEET ........... WLRKNIPEAE DWKEKHAGKA G I Q R S L G V D V R L G A ...... .LR..L.ET ...........
Citlescco Citklepn Ousaerwch Propescco Kgtpescco Pcatpsepu Bap3strhy Nantescco Mopaburce Consensus
i!i!i,~',!::;fi:~,i:~ii:!i
~iii~i~i!i:i'
251 300 .......... R K H R P D . . . . . . . . . . . TRE I F T T I A K N W R I I T A G T L L V A .......... R R H H L A . . . . . . . . . . . MRQ V F A T L L A N W Q V V I A G M M M V A .......... H V E K L E Q N D R D G L K A G P G V S F R E I A T H H W K S L L V C I G L V I .......... H V D K L E Q G D R E G L Q D G P K V S F K E I A T K Y W R S L L T C I G L V I .............. SQQETR ALKEAGSLKG L..WRNR..R AFIMVLGFTA .............. SSAETR NDKDAGTIKG L..FRNHA.A AFITVLGYTA .............. LPTGTE GGRKKTRTGA FAALRSHP.R QTLLWGLTI PVRTMVDILY RGEHRIANIV MTLAAATALW FCFAGNLQNA AIVAVLGLLC ............... GIEQL SKLPGGRYAASLSGVREEFD LVVAGVGMVA .............................................. G...
301 Citlescco MTTTTFYF ........................... ITVYT Citklepn MTTTAFYL ........................... ITVYA Ousaerwch ATNVTYYM ........................... LLTYM Propescco ATNVTYYM ........................... LLTYM Kgtpescco AGSLCFYT ........................... FTTYM Pcatpsepu GGSLIFYT ........................... FTTYM Bap3strhy GGNVAFYT ........................... WTTYL NantesccoAAIFISFMVQ SAGKRWPTGV MLMVVVLFAF LYSWPIQALL Mopaburce NDELA .............................. AEAGL Consensus ...................................... Y.
!;i~i!!ii!!i!ii!~i?!)ili
Citlescco Citklepn Ousaerwch Propescco Kgtpescco
200 IVVAALIGYG IMVAAAMGFA FVLGAGVVVL FVLGAGVVVL QLLALLVVVV QLLAVLVVVI HILAGLSTLA AWAAQVYSL WIGLETACSA ..........
351 SARDSLVVTM LVGISNFIWL SASDSLLVTL LVAISNFFWL SENHGVLIII AIMIGMLFVQ SEDHGVLIII AIMIGMLFVQ HANVASGIMTAALFVFMLIQ
PIGGAISDRI PVGGALSDRF PVMGLLSDRF PVMGLLSDRF PLIGALSDKI
GRRPVLMGIT GRRSVLIAMT GRKPFVVIGS GRRPFVLLGS GRRTSMLCFG
350 PTYGRTVLNL PTFGKKVLML PSYLSHSLHY PSYLSHNLHY QKYLVNTAGM QKYLVNTAGM PTYATVSTGA PTYLKTDLAY PCAGGVLCDV P . Y ....... 400 LLALVTTLPV LLALATAWPA VAMFFLAVPS VALFVLAIPA SLAAIFTVPI
~22
7--
::
:.
, ( . . :
.
.
9
.:. < .... i:!:.
:-'::
< . . . .
.: .. ,
.
. ; , <
.
: . ..... :. ?<..!
.<.
<
Pcatpsepu Bap3strhy Nantescco Mopaburce Consensus
TAKNASYVMT G A L F L F M V V Q DKDSAVLAGT V S L I F F G L I Q NPHTVANVLF FSGFGAAVGC EGRTVDPHVF A C G D V A S F E H ....................
PFFGMLSDRI PLGGLLCERI CVGGFLGDWL PSGPIGMRRL P..G.L.DR.
GRRNSMLLFG A L G T L C T V P L GGRAMMIGFG V A A A V L T V P L GTRKAYVCSL L A S Q L L I I P V ESWDNAQQQG A A C A R A I L G K G.R . . . . . . . . . . . . . . . P.
Citlescco Citklepn Ousaerwch Propescco Kgtpescco Pcatpsepu Bap3strhy Nantescco Mopaburce Consensus
401 MNWLTAAPDF TRMTLVLLWF SFFFGMYNGA MVAALTEVMP LTMLANAPSF L M M L S V L L W L SFIYGMYNGA MIPALTEIMP FMLINSDIIG L I F L G L ~ M L A V . I L N A F T G V MastLPALFP FILINSNVIG L I F A G L L M L A V . I L N C F T G V MastLPAMFP LSALQNVSSP Y A A F G L V M C A LLIVSFYTSI SGILKAEMFP LMALKTVTSP FMAFVLISLA LCIVSFYTSI SGLVKAEMFP LTAM..TGWF W S V L A V Q C A G M L V L T A Y T S V SGAINAELFP F A I . . G G A N V W V L G L L L F F Q QMLGQGIAGI LPKLIGGYFD RAAAHPLPWF W S D Q G D V N I Q ILGFPNATAT P V V R E G D G K A ....................................... P
Citlescco Citklepn Ousaerwch Propescco Kgtpescco Pcatpsepu Bap3strhy Nantescco Mopaburce Consensus
451 AFSLATAIFG AYSLATAVFG AFNISVLIAG AFNISVLVAG SYAVANAIFG AYAVANAAFG PYAASVALFG TYNVG.ALGG DEPAQIVAAV ......... G
Citlescco Citklepn Ousaerwch Propescco Kgtpescco Pcatpsepu Bap3strhy Nantescco Mopabur ce Consensus
501 550 A R L S S G Y Q T V ENKL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RRSAVALQTA R ....................................... E T A N K P L K G A T P A A S D L S E A KEILQEHHDN IEHKIEDITQ Q I A E L E A K R Q E T A N R P L K G A T P A A S D I Q E A KEILVEHYDN IEQKIDDIDH EIADLQAKRT RKGKGMRL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . KQAAYLHHDD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RPVPDTCDGP A S P A Q H L P P D FERHA . . . . . . . . . . . . . . . . . . . . . . . . . SRVQRWLRPE A L R T H D A I D G KPFSGAVPFG SAKNDLVKTK S ......... R S A G A A R N G G ANT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..................................................
450 VYVRTVGFSL AEVRVAGFSL THIRYSALAS THIRYSALAA AQVRALGVGL PQVRALGVGL QELRGRGIGL TDQRAAGLGF TLVWLEEHAA ...R ......
500 GLTPAISTAL VQLTGDKSSP GWW.LMCAAL CGLAATTMLF GFTPVISTAL IEYTGDKASP GYW MSFAAI CGLLATCYLY L.TPTVAAWL VESSQNLYMP A Y Y . L M V I A V IGLLTGLFMK L.TPTLAAWL VESSQNLMMP A Y Y . L M V V A V V G L I T G V T M K G SAEYVALS LKSIGMETAF FWY VTLMAV V A F L V S L M L H G . S A E Y V A L G LKTLGMENTF Y W Y . V T A M M A IAFLFSLRLP G . T A P Y V G T W LKSMGLNDFF PWY.VAVLCL LTALTAIGLP ALAPIIGALI A Q R L D L G T A L A S L S F S L T F V VILLIGLDMP CINAPADMPI L R R M W Q K G V . . R V D R K T L A V Q E V S L K S L L Q ........................................
.
.
.
.
...
.. .<. 7. - < . <:.: :: : : : . . : :/i i:-,:J.::::::. . . . . . :.:<:..:.:.
..... v . . . .
:.
;';7 :~:. : " .
....-:-v:..- :--:..
-! :i-:
iiii!!i .i:ili! :"2 :::g:-: . :-.: .::.:
~24
Citlescco Citklepn Ousaerwch Propescco Kgtpescco Pcatpsepu Bap3strhy Nantescco Mopaburce Consensus
551 561 ........... ........... LLVAQHPRIN D RLVQQHPRID E ........... ........... ........... ........... ...........
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the alignments: Cit2escco, Citasalty (Citlescco). Residues listed in the consensus sequence are present in at least 75% of the aligned transporter sequences. Residues indicated by boldface type are also conserved in at least one other family of the USA/MFS superfamily.
Database accession numbers ~ ~;-,~s:i7:!;:::~; ~--
,.,::-o :.:.:+::.z:....
Bap3strhy Citklepn Cit 1escco Cit2escco Citasalty Kgtpescco Mopaburce Nantescco Ousaerwch Pcatpsepu Propescco
SWISSPR OT
PIR
P16482 P07661 P07680 P24115 P17448
S09681 B23104 A23103 JQ0576 S10178; S12094
P41036 P30848
$32331
EMBL/GENBANK M64783 X51479 M 11559 M22041 $62772; D90203 X53027;X52363 U29532 U19539; U18997 X82267 U48776 M83089; U14003
References t Marger, M.D. and Saier, M. (1993) Trends Biochem. Sci. 18, 13-20. 2 Griffith, J. et al. (1992) Curr. Opin. Cell Biol. 4, 684-695. a Culham, D.E. et al. (1992) J. Mol. Biol. 229, 268-276.
4 Henderson, P.J.F. (1993) Curr. Opin. Cell Biol. 5, 708-721. 5 Seol, W. and Shatkin, A.J. (1993)J. Bacteriol. 175, 565-567.
32"
H+/nucleotide symporter family ........... ::..:.. ,. ................. ....
Summary
!.7....1.::.:..:(:.~ .- ..:_ ........ ...: ..........
I:(1)ii:Q"~#i::!: ...........
k.:i:ili::!,i::i!:iii ! -:..:i
~r
.,-;::-::.~k--..;~
o.. .......... +..
::::.iS:'!(:';-::: ................ ~:-:-!:q:(. y. i:.>:: .~.::::.~.:.:.::::.~:.: :>:.~
........... . ............. Y[:--:i?i-:-:i:": ....
Transporters of the H+/nucleotide symporter family, the example of which is the NUPC pyrimidine nucleoside-H § symporter of Escherichia coh" (Nupcescco), mediate symport {i.e. H+-coupled substrate uptake) of nucleosides, particularly pyrimidines. The two known members of the family are found in gram-positive and gram-negative bacteria. Statistical analysis reveals no apparent relationship between the amino acid sequences of the H+/nucleotide symporter family and any other family of transporters. Both transporters are predicted to form nine membranespanning helices by the hydropathy of their amino acid sequences. Nomenclature, CODE
.... ..... .............. .::.:>.:
~::.~-,:::~:.::~.~: ,... :.::,. s ~..;.,~.:~:::~.+.:~;:, ~, :: ~:::.: :~ >:.:..:...:::: ...~::..:
biological sources and substrates
DESCRIPTION [SYNONYMS]
Nupcbacsu Nucleoside-H+symporter [NUPC] Nupcescco Pyrimidmenucleoside-H+ symporter [NUPC, CRU]
ORGANISM [COMMON NAMES] Bacillussubtilis
SUBSTRATE(S)
H+/pyrimidines
[gram-positive bacterium] Escherichiac o l i
H+/pyrimidines
[gram-negativebacterium]
:::::::::::::::::::::::::::::::::::::::::::
i!W~!ii!i:ii!
::ii~iiliiliii
....::: ..::~:<:~. ":::"?::~:
........... .........
:~:i~:!~.!iiq.. ~:i;:
Cotransported ions are listed.
Proposed orientation of N U P C in the membrane The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the reside and is folded
.................. i:i!!,:.:i
OUTSIDE
COOH
:::::::::.::~:.:~~
.....
-.~:~.~:~..::.,.->.~:::: .~ ::. ......
i!:iii!ii!:ii!~iiii:::!y~ '!:!iilii~,~i::i iii!ii:i!i!i~:ii:~i!:i::: 11.!i!:i.i:i:hiGis ~:::
!S
!iiii!iii:i!ii
ii:iiiiii!i! iii!:i;~i:i i:.ilii:+!!ii;!i~;i
I;:Ilii:ilG!i:':Ii::i ilii|
....
!ii!i!~!
iiii iNii~, :::::::::::::::::::::::::::::::
~!iliiiii
NH 2
ili Iliili~!i
~2(
INSIDE
m
nine times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed.
Physical and genetic characteristics Nupcbacsu Nupcescco i:j:i!i:iS~i:iTi: i:
MOL. WT
CHROMOSOMAL LOCUS
393 400
42 611 43 473
340 ~ 54.11 minutes
Multiple amino acid sequence alignments 1
il; i!~i:ii!;ili:~/!i
AMINO ACIDS
50
N u p c b a c s u .MKYLIGIIG L I V F L G F A W I A S S G K K R I K I R P I V V M L I L Q F I L G Y I L L N T Nupcescco MDRVLHFVLA LAVVAILALL VSSDRKKIRI RYVIQLLVIE VLLAWFFLNS C o n s e n s u s .... L ..... L.V .... A . . . S S . . K . I . I R ..... L ..... L ..... N.
51 i00 Nupcbacsu GIGNFLVGGF AKGFGYLLEY AAEGINFVFG GLVNADQTTF FMNVLLPIVF Nupcescco DVGLGFVKGF SEMFEKLLGF ANEGTNFVFG SNDQGLAEFF FLKVLCPIVF ,!i ;~i ~:!i!:ili!?!:: ~.: C o n s e n s u s ..G...V.GF ...F..LL.. A . E G I N F V F G ......... F F . . V L . P I V F i01 150 Nupcbacsu ISALIGILQK WKVLPFIIRY IGLALSKVNG MGRLESYNAV ASAILGQSEV Nupcescco ISALIGILQH IRVLPVIIRA IGFLLSKVNG MGKLESFNAV SSLILGQSEN C o n s e n s u s I S A L I G I L Q . . . V L P . I I R . I G . . L S K V N G M G . L E S . N A V .S.ILGQSE.
iGi:,i:!iE;i:i !iiiii!!iNNiiuu!pp!icc!ieb!saiicc cs ou
151 200 FISLKKELGL LNQQRLYTLC ASAMSTVSMS IVGAYMTMLK PEYVVTALVL FIAYKDILGK ISRNRMYTMA ATAMSTVSMS IVGAYMTMLE PKYVVAALVL :!S~!!i!iiilE~i C o n s e n s u s F I . . K . . L G ..... R.YT.. A . . M S T V S M S IVGAYMTML. P . Y V V . A L V L !id~!ii!i!:i:!~.i 201 250 Nupcbacsu NLFGGFIIAS IINPYEV.AK EEDMLRVEEE EKQSFFEVLG EYILDGFKVA Nupcescco N M F S T F I V L S L I N P Y R V D A S E E N I Q M S N L H E G Q S F F E M L G E Y I L A G F K V A C o n s e n s u s N . F . . F I . . S .INPY...A. EE ........ E . Q S F F E . L G E Y I L . G F K V A
iii=i=i!lii?i !li!}?i };:;iiii!!!Ei;!ii
251 300 N u p c b a c s u V V V A A M L I G F V A I I A L I N G I FNAV ..... F G I S F Q G I L G Y V F A P F A F L V G Nupcescco IIVAAMLIGF IALIAALNAL FATVTGWFGY SISFQGILGY IFYPIAWVMG C o n s e n s u s . .VAAMLIGF .A. IA..N.. F..V . . . . . . . I S F Q G I L G Y .F.P.A...G
iiiiii!iill ! ii!:~:,iii:!i';i',i!i~;,!ii:/:i!:
301 350 Nupcbacsu IPWNEAVNAG RLMATKMVSN EFVAMTVLTQ NGFHFSGRTT AIVSVFLVSF Nupcescco VPSSEALQVG SIMATKLVSN EFVAMMDLQK IASTLSPRAE GIISVFLVSF C o n s e n s u s .P..EA...G . . M A T K . V S N E F V A M . . L . . . . . . . S . R . . . I . S V F L V S F 351 400 ANFSSIGIIA GAVKGLNEKQ GNVVARFGLK LLYGATLVSF LSAAIVGLIY Nupcescco ANFSSIGIIA GAVKGLNEEQ GNVVSRFGLK LVYGSTLVSV LSASIAALVL C o n s e n s u s A N F S S I G I I A G A V K G L N E .Q G N V V . R F G L K L. YG.TLVS. LSA. I .....
ii!!i:ii!!}iiii:!iiN u p c b a c s u
m
Database accession numbers Nupcbacsu Nupcescco
~28
SWISSPROT P39141 P33031
PIR
$37076
EMBL/GENBANK X82174 X74825
m
Sugar phosphate transporter family
Summary 7--.:.:-, ~.: :~....:, -.
. 9
..
.
.:-
:.,:(.;.. .
.
.
.
..
:
-:-v
.. ........
. :
.
i!i:..:211":::.i!:i..).:'::
Transporters of the sugar phosphate transporter family, the example of which is the UHPT hexose phosphate transporter of Escherichia coli (Uhptescco), mediate uptake of structurally dissimilar phosphoesters and dicarboxylates, including hexose phosphates, glycerol-3-phosphate, hexuronates and phthalates, by an exchange in which the accumulation of a siabstrate phosphate is coupled to the electroneutral release of inorganic phosphate or organophosphate 1-a. Members of the family are found in both gram-negative and gram-positive bacteria. Statistical analysis of multiple amino acid sequence comparisons suggests that the sugar phosphate transporter family may be distantly related to the uniporter-symporter-antiporter transporter superfamily, also known as the major facilitator superfamily (MFS) 4-6. They are predicted to contain 12 membrane-spanning helices by the hydropathy of their amino acid sequences and activities of reporter gene fusion 7. Several amino acid sequence motifs are highly conserved in the sugar phosphate transporter family, including motifs unique to the family, signature motifs of the USA/MFS superfamily, and motifs necessary for function by the criterion of site-directed mutagenesis 1,s
Nomenclature, biological sources and substrates ..: +::
:.
CODE
DESCRIPTION [SYNONYMS]
Exutescco
Hexuronatetransporter
..:
:./-:):.2:
...:;::-::..: -?.
.:.
Glptescco
-..
:.::
:
.
.
..
[EXUTI
Escherichia coli
Gudtbacsu Glucaratetransporter [GUDT, YCBE] :::i:ii:."::i-:-?i Pgtpsalty ..
-:.-.:.::::.::?::-:. !. :::.-
;::-:L:
i::Y
[gram-negauve bacterium]
Glycerol-3-phosphate transporter [GLPT] Glptbacsu Glycerol-3-phosphate transporter [GLPT] Gudtpsepu Glucaratetransporter
[GLUT]
.... ":::.:: 3::
ORGANISM [COMMON NAMES] Escherichia coli
Phlepsefl Pht lpsepu
Phosphoglycerate transporter [PGTP] 2,4-Diacetylphloroglucinol synthesis protein [PHLE] Phthalatetransporter
[PHT1]
[gram-negative bacterium]
SUBSTRATE(S)
Hexuronate Glycerol-3-phosphate
Bacillus subtilis
Glycerol-3-phosphate
[gram-positive bacterium] Pseudomonas putida [gram-negauve bacterium]
Glucarate
Bacillus subtilis
Glucarate
[gram-posiuve bacterium] Salmonella typhimurium [gram-negauve bacterium] Pseudomonas fluorescens [gram-negauve bacterium]
2,4-Diacetylphloroglucinol synthesis
Pseudomonas putida
Phthalate Hexose phosphates
Phosphoglycerate
Uhpcsalty
Regulatoryprotein
[UI-IPC]
[gram-negative bacterium] Salmonella typhimurium [gram-negative bacterium]
Uhpcescco
Regulatoryprotein
Escherichia coli
Hexose phosphates
[gram-negative bacterium] Salmonella typhimurium [gram-negative bacterium]
Hexose phosphates
7 :.:.::-::: ..:
Uhptsalty ....
Uhptescco
[UHPC]
Hexose phosphate transporter [UHPT] Hexose phosphate transporter [UHPT]
Escherichia coli
Hexose phosphates
[gram-negative bacterium]
~2 r
Phylogenetic tree =:-.:.:-v::-.....::.
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the phylogenetic tree: Uhptsahy (Uhptescco); Uhpcsalty (Uhpcescco).
71GG;;;.;IL.
Glptbacsu Glptescco Uhpcescco Pgtpsalty Uhptescco Gudtbacsu Gudtpsepu Exuzescco ?Liepsefl }htlpsepu
I z ?.-~. .?..i. -:.=.:. ... .
[ :.v,:,-:,:,:.:,:,:,::,.-:
Proposed orientation of UHPT in the membrane i i~i i ~i~iii i!i
:::::::::::::::::::::::::::::::::::::
The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane 1. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75% of the aligned transporters (see below} are shown. Consensus residues indicated by an asterisk are not conserved in UHPT.
:~!~?!)!i'~:!i!~:! OUTSIDE
.......... 7, 7" C2~ ::;;'
~ii:i}i!i~!!i!,:!!i~
R[
WPY
v .... ,iGil
N
W
C O O H
N}!Niiil NH
~3[
2
INSIDE
!i:i~ji::i
Physical and genetic characteristics
:::":i!~:-~::: :::~:.:::::.7;:..::~
..
:~ii!i!:.~)!;i~.::~i:::i::.ii: |
Exutescco Glptescco Glptbacsu Gudtpsepu Gudtbacsu Pgt-psahy Pht lpsepu Phlepsefl Uhpcsahy Uhpcescco Uhptescco Uhptsahy
AMINO ACIDS 472 452 444 456 455 406 451 423 442 439 463 463
MOL. W T
51 656 50 310 49 801 49 779 49 254 44 853 49 306 45 235 48 254 48 256 50 606 50 752
CHROMOSOMAL LOCUS 69.6-69.62 minutes 50.59 minutes 18 ~
999 ~
82.84 minutes 82.8 minutes
Multiple amino acid sequence a~gnments 1 50 Glptbacsu .................. ML NIFKPAPHIE R.LDDS.KM. DAAYKRLRLQ Glptescco .................. ML SIFKPAPHKA R.LPAA.EI. DPTYRRLRWQ Uhpcescco .................. ML PFLKAPADAP L.MTDKYEI. DARYRYWRRH Pgtpsalty .................. ML TILKTGQSAH K.VPPE.KV. QATYGRYRIQ Uhptescco .................. ML AFLNQVRKPT LDLPLEVRR. KMWFKPF.MQ Gudtbacsu . . . . . . . . . . . . . . . . . . . . . . . . . . . . MK KDFASVTP.. AGKKTSVRWF Gudtpsepu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MQ.. EPKQTRVRYL Exutescco MATFGACRFF FGYPVVTNIF SLWRDDGRAS CGYNKTMRFY MRKIKGLRWY Phlepsefl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MESTYLA TRPWGGYERR Phtlpsepu ................... M TTTAATDAVP HLLQRSHERI EKVYRKVTLR Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 i00 Glptbacsu VFIGIFIGYA GYYLLRKNFA FAIPYLQEQG .FSKTELGLV LAAVSIAYGF Glptescco IFLGIFFGYA AYYLVRKNFA LAMPYLVEQG .FSRGDLGFA LSGISIAYGF Uhpcescco ILLTIWLGYA L F Y F T R K S F N A A V P E I L A N G VLSRSDIGLL ATLFYITYGV P g t p s a l t y A L L S V F L G Y L AYYIVRNNFT LSTPYLKEQL DLSATQIGLL SSCMLIAYGI Uhptescco SYLVVFIGYL TMYLIRKNFN IAQNDMISTY GLSMTQLGMI GLGFSITYGV Gudtbacsu IVFMLFLVTS INYADRATLS ITGDSVQHDL GLDSVAMGYV FSAFGWAYVI Gudtpsepu ILLMLFLVTT INYADRATIS IAGSSIQKDF GLDAVTLGYI FSAFGWAYVL Exutescco MIALVTLGTV LGYLTRNTVA ARAPTLMEEL NISTQQYSYI IAAYSAAYTV Phlepsefl MVVLLSLSFG LVGLDRFIIM P L F P V I M H D L A L D Y Q D L G L L SAILAFAWGG Phtlpsepu L M T F I F V A W V L N Y L D R V N I S FAQVYLKHDL GMSDADLRTR RKLVFHRLHR Consensus ............ Y..R . . . . . . . . . . . . . . . . . . . . . G ......... Y.. Glptbacsu Glptescco Uhpcescco Pgtpsalty Uhptescco Gudtbacsu Gudtpsepu Exutescco Phlepsefl
i01 SKFIMGMVSD SKFIMGSVSD SKFVSGIVSD SKGVMSSLAD GKTLVSYYAD GQLPGGWLLD GQIPGGWLLD MQPVAGYVLD SALFMGVAIR
150 RCNPRYFLAT GLFLSAIVNI LF.VSMPWV. TSSVTIMFIF RSNPRVFLPA GLILAAAVML FM.GFVPWA. TSSIAVMFVL RSNARYFMGI GLIATGIINI LF.GF ..... STSLWAFAVL KASPKVFMAC GLVLCAIVNV GL.GF ..... SSAFWIFAAL GKNTKQFLPF MLILSAICML GFSASMGSG. SVSLFLMIAF R F G S K T I I A L S I F F W S F F T L LQGAIGFFSA GTAIILLFAL RFGSKKVYAG SIFTWSLFTL LQGYIGEFGI STAVVLLFLL V L G T K I G Y A M F A V L W A V F C G ATALAGSWGG ...... LAVA RLGTKQLLVL SITLVSLLA ...... G A S A L I S S L M G L V L L
331
Pht lpsepu I G N T Q Y A Y L Q K I G A R L T I T R IMVLWGLI ...... SASMAF M T T P T E F Y I A C o n s e n s u s ......... D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
_
.... .:
=
:-.:<
_
:< ........
. _
.,
.
: ~ -::..'-
-.-.I
% :::..I
<
::>-
:.:-
_
:: !_
1 -.s-.:.-:-.: :---:~,!
.
.
151 200 G l p t b a c s u M F I N G W F Q G M G W P P C G R T M A HWFS .... IS E R G T K M S I W N V A H N I G G G I L G l p t e s c c o L F L C G W F Q G M G W P P C G R T M V HWWS .... QK E R G G I V S V W N C A H N V G G G I P U h p c e s c c o W V L N A F F Q G W G S P V C A R L L T AWYS .... RT E R G G W W A L W N T A H N V G G A L I P g t p s a l t y V V F N G L F Q G M RRP ..... LV Y Y Y C K L V P R R E R G R V G A F W N I S H N V G G G I V U h p t e s c c o Y A L S G F F Q S T G G S C S Y S T I T KW .... TPRR K R G T F L G F W N I S H N L G G A G A G u d t b a c s u R F L V G L S E A P S F P G N G R V V A SWF .... PSS E R G T A S A F F N S A Q Y F A I V I F G u d t p s e p u R F M V G L A E A P S F P G N A R I V A SWF .... PTK E R G T A S A I F N S A Q Y F A T R A V E x u t e s c c o R G A V G A A E A A M I P A G L K A S S EWF.. PAK E R S I A V G Y F N V G S S I G A M I A P h l e p s e f l R A L M G I C E G A F T P V S I I V T D E.. VSQPC R R G L N L G I Q Q A L F P I I G L C L P h t l p s e p u R A L L G A A E A G F W P G I I L Y L T YWY .... PGA R R A R I T S R F L L A I A A A G I I G C o n s e n s u s .... G ....... P . . . . . . . . W . . . . . . . . . RG ...... N .......... 201 250 GlptbacsuAPLVTLGIAM FVT...WK...SVFFFPAII AIIISFLIVLLVRDTPQSCG Glptescco PLLFLLGMAW FND...WH...AALYMPAFC AILVALFAFA MMRDTPQSCG Uhpcescco P.IVMAAAAL HYG...WR...AGMMIAGCM AIVVGIFLCW RLRDRPQALG Pgtpsalty APIVGAAFAI LGS...EHWQ SASYIVPACV AVIFALIVLV LGKGSPRKEG U h p t e s c c o A G V A L F G A N Y L F D . . . G H . V IGMFIFPSII A L I V G F I G L R Y G S D S P E S Y G Gudtbacsu PPLMGWLTHS F .... GW H S V F V V M G I A G I L L A V I W L K T V Y E P K K H P K G u d t p s e p u R A L D G L D R L H L ....... R L A A R V H R H G R P G H C V L A H L V D G D L R A E R S P A E x u t e s c c o P P L V V W A I V M H ....... SW Q M A F I I S G A L S F I W A M A W L I F Y K H P R D Q K H Phlepsefl GPL..LAGVL FEM...FGSW RAVFAIISLP GLLVAWYLYR TYQPSQAPHP Phtlpsepu GPLSGWILTH FVDVMGMKNW QWMFILEGLPAAVMGVMAYF YLVDKPEQAK Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 G l p t b a c s u L P P I E E Y R N D .... YPKHAF K N Q E K E L T T K E I L F Q Y V L N N Glptescco LPPIEEYKND ...YPDDYN E K A E Q E L T A K Q I F M Q Y V L P N Uhpcescco LPAVGEWRHD ...ALEIAQ Q Q E G A G L T R K E I L T K Y V L L N Pgtpsalty LPSLEQMMPE EKVVLKTKNT AKAPENMSAW QIFCTYVVRN Uhptescco LGKAEELFGE E...ISEEDK ETESTDMTKW QIFVEYVLKN GudtbacsuVNEAELAYIE QGGGLISMDD SKSKQETESK WPYIKQLLTN Gudtpsepu GYAAEVRSSP H.GGLVDLED SKDKKDGGPK WDYIRQLLTN E x u t e s c c o LTDEE ....... R D Y I I N G Q E A Q H Q V S T A K K M S V G Q I L R N P h l e p s e f l R P L V E P S G S Q W R T A L S S G N V R . . . . . . . . . . . . . . . . LNI Phtlpsepu WLDDEEKSII LDAL...AAD RAGKKPVTDK RHAVLAALKD C o n s e n s u s .... E . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . L.N Glptbacsu Glptescco Uhpcescco Pgtpsalty Uhptescco Gudtbacsu Gudtpsepu Exutescco Phlepsefl Phtlpsepu Consensus
~3,~
300 KFLWYIAFAN KLLWYIAIAN PYIWLLSFCY KNAWYISLVD KVIWLLCFAN RMLIGVYIAQ RMMMGIYLGQ RQFWGIALPR ALMLCILTCQ PRVYVLAAGW ..........
301 350 V F V Y F V R Y G V V D W A P T Y L T E A K G F S P E D S R W S Y F L Y E Y A G IPGTILCGWI VFVYLLRYGI LDWSPTYLKE VKHFALDKSS WAYFLYEYAG IPGTLLCGWM VLVYVVRAAI NDWGNLYMSE TLGVDLVTAN TAVTMFELGG FIGALVAGWG VFVYMVRFGM ISWLPIYLLT VKHFSKEQMS VAFLFFEWAAIPSTLLAGWL IFLYVVRIGI D Q W S T V Y A F Q E L K L S K A V A I Q G F T L F E A G A L V G T L L W G W L YCITTLTYFF LTWFPVYLVQ ARGMSILEAG FVASLPALCG FAGGVLGGIV FCINALTYFF LTWFPVYLVQ ERGMTILKAG IIASLPAICG FLGGVLGGVI FLAEPAWGTF NAWIPLFMFK VYGFNLKEIA MFAWMPMLFA DLGCILGGYL FVLCAL ....... L P S Y L T D V L H L S N F S M A M I I S A I G L G G F F G Q L V I P G L ATVPLCGTIL NYWTPTIIRN TGIQDVLHVG LLSTVPYIVG AIAMILIARS . . . . . . . . . . . . W . P . Y . . . . . . . . . . . . . . . . . . . . . . . . . G..L.G..
!ilIiiii:: i .
.
.
:....
. .-:::.-
. ..:...:.
..
. . . . . .:.:....:.. ...
..
..... .:.
-::-.-..:: ....... ..... .:......: ..... :~....:... ::::. ::..
.....
:..:
4.
.
. . . .
,:-,.::.. ,:4.:: :::::::::::::::::::
-. ,.
_:...:.:.
......... ...........
:,.:..
.........
......
: .:....
%."/::::.::i.:i;r
:::
9........... .......::.r
Glptbacsu Glptescco Uhpcescco Pgtpsalty Uhptescco Gudtbacsu Gudtpsepu Exutescco Phlepsefl Phtlpsepu Consensus
401 450 SIGFLIYGPV MLIGLQAIDL APKKAAGTAAGLTGFFGYIG GSAFANAIMG VIGFLIYGPV MLIGLHALEL APKKAAGTAAGFTGLFGYLG GSVAASAIVG TIGFFVFGPQ MLIGMAAAEC SHKEAAGAAT GFVGLFAYLG ASLAGWP.LA IVGCLIYVPQ FLASVQTMEI VPSFAVGSAV GLRGFMSYIF GASLGTSLFC ALGFLVFGPQ LLIGVAAVGF VPKKAIGAAD GIKGTFAYLI GDSFAKLGLG FFG.KGFGAL GWAVVS..DT SPKECAGLSG GLFNTFGNI. ASITTPIIIG FFG.KAIGAL GWAVVS..DT SPKQIAGLSG GLFNTFGNL. SSISTPIIIG GFAHQALSGA LITLSS..DV FGRNEVATAN GLTGMSAWL. ASTLFALVVG N F S L I C I T V G PLTS .... ES V P P S L L A T A T G L V V G C G E I L G G G V A P V V A G MIAVSYFGAAAIIWSIPPAY LNDESAAGGI SAISSLGQI. GAFCAPIGLG .............................. G .................. G
Glptbacsu Glptescco Uhpcescco Pgtpsalty Uhptescco Gudtbacsu Gudtpsepu Exutescco Phlepsefl Pht i p s e p u Consensus
451 500 F V V D ...... R F N W N G G F I M L I S S C I L A I V F L A L T W N T G K R A E H V ..... Y T V D ...... F F G W D G G F M V M I G G S I L A V I L L I V V M I G E K R R H E Q L L Q E R K V L D ...... T W H W S G F F V V I S I A A G I S A L L L L P F L N A Q T P R E A ...... .................................................. MIADGTPVFG LTGWAGTFAA LDIAAIGCIC LMAIVAVMEE RKIRREKKIQ Y I V N A T G . . . . . SFNGAL. V F V G A N A I A A I L S Y L L L V G P I K R V V L K K Q E Q Y I I A A T G . . . . . V S K W R W S S W V P T H S F A A I .S Y L F I V G E I N R I E L K G V T D .ALADTI . . . . . G F S P L F A V L A V F D L L G A L V I W T V L Q N K P A I E V A Q E T H N YIAVN ...... WGLTAILFL ALAGSLMGGL LSLRLKEASP VFNDRADYGP WINTVTG ..... SLAIGLTI IGALVLAGGM AVLIAVPANA LSEKPLTDE. ..................................................
.............
i :::.::.:::.::~:):.i:}~i-;::--:-. ............ ...:. .....: : .::.. : ".) ::::.--:?C:.-:.::,
::-: -: .~ :7: ~.:~, :.. ::.
........... 7
:"..7.
.,..:.:..:... .:.: ..:................... :::.....--.:-:.:::.-.:.: ............... .:... ......... .::.....:..:. ::.
.
:~:::- :;::. .%- .:: . ::
::... ............
.......
::::::::::::::::::::::: ,.: :........
..........
.:.: .,:, ::?.:::, :.:-. ....... ...... :.....:..:.
....
:: ..:....:_...:. :.....:. :?.;::)-.. i:: :::-;::-i: ):i:::
.............. ..-:. : .......
:;?r .:::?.:;::::....: ........: ............ :...:.::.:. .............. .
.......
400
SDRFFK ...... SRRAPAGV LFMAGVFIAV LVYWLNP.AG NPLVDNIALI SDKVFR ...... GNRGATGV FFMTLVTIAT IVYWMNP.AG NPTVDMICMI SDKLFN ...... GNRGPMNL IFAAGILLSV GSLWLMP.FA SYVMQATCFF SDKLFK ...... GRRMPLAM ICMALIFVCL IGYW..K.SE SLLMVTIFAA SD.LAN ...... GRRGLVAC IALALIIATL GVY..QH.AS NEYIYLASLF SDILLKK.GR SLTFARKVPI IAGMLLSCSM IVCNYTD.SA WLVVVIMSLA SDTLLRR.GN SLSVARKTPI VCGMVLSMSM IICNYVD.AD WMVVCFMALA PPLFQRWFGV NLIVSRKMVVTLGAVLMIGP GMIGLFT.NP YVAIMLLCIG S D Q L G R K P V V S I C F . . L I S T L L V G L L I I S P P L P W L L .... F L Q L F F L S F I S D I R L E R R K H .......... F F F S I A F G A L G A C L L P H V V D S A I I S I T C L A SD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
::" ~":":~..:~ :-.
::";:2.:':;. : .....:. ';: 2:"-.
351
Glptbacsu Glptescco Uhpcescco Pgtpsalty Uhptescr Gudtbacsu Gudtpsepu Exutescco Phlepsefl Phtlpsepu Consensus
Glptbacsu Glptescco Uhpcescco Pgtpsalty Uhptescr Gudtbacsu Gudtpsepu Exutescco Phlepsefl Phtlpsepu Consensus
501
518
.................. NGG ............... .................. .................. QLTVA ............. DPDQSLPV .......... EPATTAHPGE LLPTTRKV DPAPQH ............ L P A R L T L E D K ........ ..................
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in
~3~
the alignments: Uhptsalty (Uhptescco); Uhpcsalty (Uhpcescco). Residues listed in the consensus sequence are present in at least 75 % of the transporter sequences shown. Residues indicated by boldface type are also conserved in at least one other family of the USA/MFS superfamily.
Database accession numbers SWISSPR OT .....................
...............
Exutescco Glptescco Glptbacsu Gudtpsepu Gudtbacsu Pgtpsalty Phlepsefl Pht 1psepu Uhptsalty Uhpcescco Uhpcsalty Uhptescco
PIR
P42609 P08194 P37948 P42205 P4223 7 P 12681
A31089
Q05181 P27670 P09836 P27669 P 13408
D41853 G41853 C41853 Q00500
S00868 $37250
EMBL/GENBANK U 18997 Y00536 Z26522 M69160 D30808 M21278 U41818 D 13229 M89480 M 17102; M89479 M89480 M 17102; M89479
References 1 2 3 4 s 6
Maloney, P. (1994) Curr. Opin. Cell Biol. 6, 571-582. Kadner, R.J. et al. (1994) Res. Microbiol. 145, 381-387. Kadner, R.J. et al. (1993) J. Bioenerg. Biomembr. 25, 637-645. Marger, M.D. and Saier, M. (1993) Trends Biochem. Sci. 18, 13-20. Gri~th, J. et al. (1992) Curr. Opin. Cell Biol. 4, 684-695. Henderson, P.J.F. (1993) Curr. Opin. Cell Biol. 5, 708-721.
7 Lloyd, A. and Kadner, R. {1990} J. Bacteriol. 172, 1688-1693. s Yan, R.T. and Maloney, P.C. (1995) Proc. Natl Acad. Sci. USA 92, 5973-5976.
334
H+-Dependent Antiporters
H+/vesicular amine antiporter family Summary 9 ..
Transporters of the H+/vesicular amine antiporter family, the example of which is the vesicular amine transporter 2 (VAT2) of humans (Vmt2homsa), mediate accumulation of biogenic monoamines, including catecholamines and indoleamines within intracellular vesicles. The mechanism of transport is antiport (i.e. H+-coupled substrate efflux)1-4. In addition, members of the H§ amine antiporter family mediate the accumulation of several toxic compounds into intracellular vesicles, mimicing multidrug resistance proteins s. Members of the H§ amine antiporter family may be inhibited by vesamicol or reserpine s. These transporters are widely distributed in nature, occurring in both invertebrates and vertebrates. Statistical analysis of multiple amino acid sequence comparisons places the H+/vesicular amine antiporter family in the uniporter-symporter-antiporter (USA) superfamily, also known as the major facilitator superfamily (MFS)s-r. They are predicted to form 12 membrane-spanning helices by the hydropathy of their amino acid sequences r,s. Members of the H+/vesicular amine antiporter family are glycosylated. Several amino acid sequence motifs are highly conserved in the H§ amine antiporter family, including motifs unique to the family and signature motifs of the USA/MFS superfamily s'r'9. In C. elegans, defects in UN17 cause deficits in neuromuscular function and deletion of UN17 is lethal 1. The possibility that defective neurotransmitter transport is similarly a source of human neurodegenerative disease has been proposed 2.
-.
.....
.
.
.
.
.
.
.
:: .::-::: : . . . :
... ....:..:. : . . . .
::
-..:.~.:
...
..... :.. . . . . ,-.- ....... . . . . .....
...
....
:..:., c
:
.:- :-
. . . .:.: 4 : -:...::
....~. ::. : :;:.::.. :?
~:.".
-:.-: .i
. . ..... ....... 9 ..:. :..:...:.
....
-...-..-.. .::... ..... .-.: .-:: .-..--
:: ::<
.
:.
: 7 " . - .-...-... .:: .
.
.
' .7:-..") i - - . :::::: :/.;;). : . :..:.::... ~:.: :.:--,:.:.: .......
::
.. ....... ..
.
:-.~.-:':i::.:.:..i . : .:. . . . . . : .... . . . . . ....
::
....
:.?:7 --, .... . .- :..... . . . . .
.:. ..... ?
:.. .:.
.
.
9 :..:.<..:
:
.,.:,..
:..
9
.
.-..:. :..:::-.. ....>:
Nomenclature, biological sources and substrates CODE
:,: :(: .. :.:.:
DESCRIPTION [SYNONYMS]
Unl 7caeel : .-....:: .::.:.... 9.,,.: .:::....:
::
..:.. ....... :. ::. ).)::;.-: ..:-: :.~ : "ii "
..:!:
:i- - . -
.:.
:.-..:
.
:
.....: ... :. :: .::::
: :: ::
9 ~..:.~ : . . . .
:~.:
..~ 77:~..i- -.~.!/ :: .,......::::
. . .. .. . : -:: :.:
... . . . . . : -. .:...
:.--. .. ::. <...:
. ::
..:.. ..
.
....:.:.... -.,...-
.. ....:
:
::..:-
.... ..
:--::i
~ :~ .. ... .
.
.
..:.>. . : . . -:...:.:.
: :
:....7
...
?i-::
-.
:
---
Vesicular acetylcholine transporter [UN 17, UNC 17] Vacthomsa Vesicular acetylcholine transporter [VACT] Vacttorma Vesicular acetylcholine transporter [VACT] Vacttoroc Vesicular acetylcholine transporter [Vesamicol binding protein] Vesicular acetylcholine Vactratno transporter [Vesamicol binding protein] Vmtlratno Chromaffin granule amine transporter [VMT1, CGAT, VAT1] Vmt2bosta Synapticvesicle amine transporter [VMT2, VMAT2, VAT2] Vmt2homsa Synaptic vesicle amine transporter [VMT2, SVMT, VAT2, VMAT2] Vmt2ratno Synapticvesicle amine transporter [VMT2, SVAT, VAT2, VMATI
..:
Cotransported ions are listed.
~3(
ORGANISM [COMMON NAMES] Caenorhabditis elegans
[nematode]
SUBSTRATE(S)
H+/acetylcholine
Homo sapiens [human]
H+/acetylcholine
Torpedo marmorata [ray]
H+/acetylcholme
Torpedo ocellata [ray]
H+/acetylcholine
Rattus norvegicus [rat]
H+/acetylcholine
Rattus norvegicus [rat]
W/amine neurotransmitters
Bos taurus [cow]
W/amine neurotransmitters
Homo sapiens [human]
W/amine neurotransmitters
Rattus norvegicus [rat]
W/amine neurotransmitters
i%i:i~i,i~.i:i:;-
Phylogenetic tree
!!i!.i!-
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the phylogenetic tree: Vmt2ratno, Vmt2bosta (Vmt2homsa); Vactratno (Vacthomsa).
i!i!~ i::~;-. :!.-. !i!i-i:ii.i
[
[
Proposed orientation of VAT2 in the membrane
~{.,.~,:-.::...
...
. . .
7mtlratno ",/mr2homsa 7acthomsa 7acttoroc Unl7caee!
i
.
. . . . .
. .. ... <.......:<.,:..>. -!:. 9 ..:.
,a.::-.:..:: ::.::: -
The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The number corresponding to the first and last residue of each membrane-spanning helix is boxed. Residues that are conserved in more than 75% of the aligned transporters (see below) are shown. Consensus residues indicated by an asterisk are not conserved in VAT2.
. . :.- . :. . . . . . , . : .
OUTSIDE
L~
WM - - T T~ P
P .~::<:.:: ~::...
I P
E
..
.~
A
:m
I~.
::
~
G
GG PP VL
. ~
......
U:
G
i!i%1-<:<< !h
vpr
L
FG
,IP
G
M
A
1,~ ~!leoj 21o
P
LAI GA"
LA N
L
LPA
GA
L
!.:..:<.
G
V
rrP~ LA
LW
GF
D
DAI
LIP
AYV
G
V
LV
GI V"
L
8GY
G
V8
N"
220271 Y L
R
A P
GTP
L
INSIDE
a.:-,:..:-:.::.--.- .:
~.~......:...,:.~:.
S G
L
R GY
}!i.,; :::i: !:.:::::i
YA
P FI
K R
E L
G
GP
P
FL
D
. : -
A
L
W
s,.Q GF
~:ii~-i::il.k
L
M
A y
NH 2
....
COOH
~37
Physical and genetic characteristics CHROMOSOMAL LOCUS
K m
:...................
Unl 7caeel
ACIDS 532
WT 58 643
Vacthomsa Vacttorma
532 568
56 961 61641
Vacttoroc Vactratno Vmtlratno Vmt2bosta
511 530 521 517
55 690 56 536 55 935 55 755
Vmt2homsa 514
55 696
Vmt2ratno
55 779
SITES cholinergic neurons brain brain
Acetylcholine: 300/~M l
CI?.II .... :;~fi:i!i!~i!G.~
i!'f/2i:i;i
515
brain brain adrenal brainstem, stomach brainstem, stomach brainstem, stomach
10q25
Multiple amino acid sequence alignments 1
Vmtlratno ................... M LQVVLGAPQR LLKEGRQSRK Vmt2homsa .................... MALSELALVR WLQESRRSRK V a c t h o m s a M E S A E P A G Q A R A A A T K L S E A VG ....... A A L Q E P R R Q R R V a c t t o r o c .... M V V G Q A K A A M G K I S S A I G E R S K R I S G A M N E P L R K R K U n l 7 c a e e l ...... M G F N V P V I N R D S E I L K A D A K K W . . . L E Q Q D N Q K K L.E .... RK ii:~:i!!i:i!i!i~!:,i C o n s e n s u s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
...................,..
.
.
.
50
LVLVVVFVAL LILFIVFLAL LVLVIVCVAL ILLVIVCIAM CVLVIVSIAL ..LVIV..AL
Vmtlratno Vmt2homsa Vacthomsa Vacttoroc Unl7caeel Consensus
51 LLDNMLLTVVVPIVPTFLYA TEFKDSNSSL LLDNMLLTVVVPIIPSYLYS IKHEKNATEI LLDNMLYMVI VPIVPDYI ............ LLDNMLYMVI VPIVPNYL ............ LLDNMLYMVI VPIIPKYL ............ LLDNML..V. VPI.P.YL . . . . . . . . . . . .
i00 HRGPSVSSQQ ALTSPAFSTI QTARPV..HT ASISDSFQSI AHMRGGGEGP TRTPEVWEPT ETIR . . . . . . . . . . . . . . . T RDIHN ........... YQVT
Vmtlratno Vmt2homsa Vacthomsa Vacttoroc Unl7caeel Consensus
i01 150 FSFFDNTTTT VEEHVPFRVT WTNGTIPPPV TEASSVPKNN CLQGIEFLEE FSYYDN.STM VTGNATRDLT LHQTATQHMV TNASAVPSDC PSEDKDLL.N LPLPTPANA ............ SAYTANTSA SPTAAWPAGS ALRPRYPTES Y K L V Y I T I P . . . . . . . . . . . . S N G T N G S L L N S T ..... QR A V L E R N P N A N F E . G Y H N E T . . . . . . . . . . . . S Q L A N G T Y L ........ VR E V G G R I N F L D ..................................................
Vmtlratno Vmt2homsa Vacthomsa Vacttoroc Unl7caeel Consensus
151 ENVRIGILFA SKALMQLLVNPFVGPLTNRI ENVQVGLLFA SKATVQLITN PFIGLLTNRI EDVKIGVLFA SKAILQLLVNPLSGPFIDRM EDIQIGVLFA SKAILQLLSN PFTGTFIDRV EELELGWLFA SKALLQIFVNPFSGYIIDRV E .... G . L F A S K A . . Q L . . N P F . G .... R.
. . . . . . . . . . . . . . . . . . . .
200 GYHIPMFVGF MIMFLSTLMF GYPIPIFAGF CIMFVSTIMF SYDVPLLIGL GVMFASTVLF GYDIPLLIGL TIMFFSTITF GYEIPMILGL CTMFFSTAIF GY.IP...G...MF.ST..F
201 250 Vmtlratno AFSGTYALLF VARTLQGIGS SFSSVAGLGM LASVYTDNYE RGRAMGIALG Vmt2homsa AFSSSYAFLL IARSLQGIGS SCSSVAGMGM LASVYTDDEE RGNVMGIALG
~3~
Vacthomsa Vacttoroc Unl7caeel Consensus
AFAEDYATLF AARSLQGLGS AFADTSGIAM AFGESYAILF AARSLQGLGS AFADTSGIAM ALGKSYGVLL FARSLQGFG$ AFADTSGLAM A F . . . Y A .... A R S L Q G . G S .F .... G..M
IADKYPEEPE IADKYTEESE IADRFTEENE .A..YT...E
Vmtlratno Vmt2homsa Vacthomsa Vacttoroc Unl7caeel Consensus
251 GLALGLLVGA PFGSVMYEFV GLAMGVLVGP PFGSVLYEFV FISFGSLVAP PFGGILYEFA FISFGSLVAP PFGGVLYQFA FISFGCLVAP PFGSVLYSLA .... G . L V . P P F G . V L Y . F .
Vmtlratno Vmt2homsa Vacthomsa Vactto[oc Unl7caeel Consensus
301 SKVSPESAMG TSLLTLLKDP YILVAAGSIC LANMGVAILE SRVQPESQKG TPLTTLLKDP YILIAAGSIS FANMGIAMLE ARARANLPVG TPIHRLMLDP YIAWAGALT TCNIPLAFLE SRTRGNTLQG TPIHKLMIDP YIAWAGALT TCNIPLAFLE TDSHGEKVQG TPMWRLFMDP FIACCSGALI MANVSLAFLE ......... G T P . . . L . . D P Y I . . . A G ..... N . . . A . L E
RSRALGVALA RTQALGIALA RSAALGIALA R..A.GIAL.
.....
.....
!7.}-!~:r
....i.<::s, 9!::.:.
.....
ii:ii:iii !!!i! .....
::: ::-.-
...............
f-!::~::::7;c: ~::.t! [~ ;gii;.: ....... -~:i::::!~
k"/.::i! :i!~!!~}
9 ,:..:..:; .... k!:'-i:~?i.!-i:!i. .
.
.
t~;.. -;,-r
351 Vmtlratno MC.SPEWQLG Vmt2homsa MC.SRKWQLG Vacthomsa M.AASEWEMG Vacttoroc M.NASEWQMG Unl7caeel MPDTPGWLVG C o n s e n s u s M ..... W..G
GKSSPFLILA FLALLDGALQ GKTAPFLVLA ALVLLDGAIQ GKRVPFLVLA AVSLFDALLL GKWVPFLVLS FVCLLDGILL GKPVPFLILS FVCLADAIAV G K . . P F L . L .... L.D . . . .
LAFLPASVAY LIGTNLFGVL ANKMGR..WL VAFLPASISY LIGTNIFGIL AHTMGR..WL MAWLPAFVPH VLGVYLTVRLAARYPHLQWL ITWLPAFFPH ILGVYITVKLAAKYPNYQWL VIWLPPFFPH VLGVYVTVKM LRAFPHHTWA ...LPA ...... G ...... L A ....... WL
300 LCILWP . . . . LFVLQP . . . . LAVAKPFSAA LMVVTPF..A FMVINPHRRG L . V . . P
. . . .
350 PTLPIWMMQT PALPIWMMET PTIATWMKHT PTISNWMKKT PTITTWMSEM PT...WM..T 400 CSLVGMVAVG CALLGMIIVG YGALGLAVIG YGAFGLVIIG IAMVGLAMEG .... G .... G
Vmtlratno Vmt2homsa Vacthomsa Vacttoroc Unl7caeel Consensus
401 ISLLCVPLAH NIFGLIGPNA GLGFAIGMVD VSILCIPFPK NIYGLIAPNF GVGFANGMVD ASSCIVPACR SFAPLWSLC GLCFGIALVD VSSCTIPACR NFEELIIPLC ALCFGIALVD IACFAIPYTT SVMQLVIPLS FVCFGIALID .S .... P . . . . . . . L..P ..... F . I . . V D
450 SSLMPIMGYL VDLRHTSVYG SSMMPIMGYL VDLRHVSVYG TALLPTLAFL VDVRHVSVYG TALLPTLAFL VDIRYVSVYG TSLLPMLGHL VDTRHVSVYG ..L.P .... L V D . R H . S V Y G
Vmtlratno Vmt2homsa Vacthomsa Vacttoroc Unl7caeel Consensus
451 SVYAIADVAF CVGFAIGPST GGVIVQVIGF SVYAIADVAF CMGYAIGPSA GGAIAKAIGF SVYAIADISY SVAYALGPIV AGHIVHSLGF SVYAIADISY SVAYALGPIM AGQIVHDLGF SVYAIADISY SLAYAFGPII AGWIVTNWGF S V Y A I A D ...... Y A . G P . . . G . I V . . . G F
PWLMVIIGTI PWLMTIIGII EQLSLGMGLA VQLNLGMGLV TALNIIIFAT ..L .... G..
Vmtlratno Vmt2homsa Vacthomsa Vacttoroc Unl7caeel Consensus
501 LQN . . . . . . . . P P A K E E K R A LRS . . . . . . . . P P A K E E K M A L R N V G L L ..... T R S R S E R D L R N V C Q M ..... K P S L S E R N LRKVHSYDTL GAKGDTAEMT L ....................
":-" %)::2."
.,...... .%:>
!~ U.> ~:;~ .,-% :. :-4 .:
ir
:;~-
I:$47~ L!"C::':): [ .-.:(
.:/
500 NIIYAPLCCF DILFAPLCFF NLLYAPVLLL NILYAPALLF NVTYAPVLFL N . . Y A P ....
550 IL.SQECPTE TQMYTFQKPT KAFPLGENSD ILMDHNCPIK TKMYT.QNNI QSYPIGEDEE VLLDEPPQGL YDAVRLRERP VSGQDGEPRS ILLEDGPKGL YDTIIMEERKAAKEPHGTSS QLNSSAPAGG YNGKPEATTA ESYQGWEDQQ L . . . . . . . . . . . . . . . . . . . . . . . . E...
~3c~
!4~.:~i:.!!iL! .
.
.
.
.
_
.
.::!-::.-:::-..~.: .
! _
..~..~
,_
-. _,.
. .
_
:.::::.~:-.: .:,~..,:..:1
551
Vmtlratno
DPSSGE
Vacthomsa
PPGPFDECED
Vmt2homsa
585
.............................
SESD ............................... DYNYYYTRS DQEGYSE
................
Vacttoroc
GNHSVHAVLS
..................
Consensus
...................................
Unl7caeel SYQNQAQIPN HAVSFQDSRP QAEFPAGYDP LNPQW
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the alignments: Vmt2ratno, Vmt2bosta (Vmt2homsa); Vactratno (Vacthomsa). Residues listed in the consensus sequence are present in at least 75% of the aligned transporter sequences. Residues indicated by boldface type are also conserved in at least one other family of the USA/MFS superfamily. Database accession numbers SWISSPR OT
i
Unl 7caeel Vacthomsa Vacttorma Vacttoroc Vactratno Vmt 1r a t n o Vmt2bosta Vmt2homsa Vmt2ratno
PIR
P34711
Q01818 Q05940 Q01827
$43685 $43686; $48219 A54965; I84492 A43319 $41081; $39440 $29810
EMBL/GENBANK
U09277; L19621 U09210 U05591 U05339 X80395 M97380 U02876 L23205;
L09118
M97381; L00603
R cf ~PCI~CS$
,ii22 ~ ....
. . . .
z 3 4 5 6 7 8 9
Usdin, T. (1995)Trends Neurosci. 18, 218-224 Edwards, R.H. (1993) Ann. Neurol. 34, 638-645. Schuldiner, S. (1994) J. Neurochem. 62, 2067-2078. Schuldiner, S. et al. (1995) Physiol. Rev. 75, 369-392. Paulsen, I. et al. (1996) Microbiol. Rev. 60, 575-608. Marger, M.D. and Saier, M. (1993) Trends Biochem. Sci. 18, 13-20. Griffith, J. et al. (1992) Curr. Opin. Cell Biol. 4, 684-695. Henderson, P.J.F. (1993) Curr. Opin. Cell Biol. 5, 708-721. Varela, M. et al. (1995) Mol. Membr. Biol. 12, 271-277.
J+/multidrug antiporterfamily ~i!!i:~ii~!71 i!
Summary
~z~"~4':: .:." ~::~.~, ,..
Transporters of the 14-helix H§ antiporter family, the example of which is the QACA multidrug resistance protein of Staphylococcus aureus (Qacastaau), mediate resistance to one or more structurally dissimilar antibiotics, antiseptics and disinfectants 1-4. The mechanism of transport, i~;ii:,ViI i~:::.~,!] where known, is antiport (i.e. H+-coupled substrate efflux). In some members of the 14-helix H+/multidrug antiporter family (e.g. EMRB), a paired "linker" protein (e.g. EMIL&) is believed to bring the transporter protein, which is i!~ii~:i~~ii?l ;i,;i located in the cytoplasmic membrane, in apposition with a corresponding outer-membrane channel (e.g. TOLC), allowing direct efflux through both the inner and outer membrane s'6. Some members of the 14-helix H*/ multidrug antiporter family also confer one or more collateral effects, including complementation of potassium uptake defects, and Na§ § antiport zs. Members of the family are widely distributed in nature, occurring in various fungi and both gram-negative and gram-positive bacteria, including antibiotic-producing soil bacteria and antibiotic-resistant pathogens. Transporters may be encoded chromosomally or by transmissible plasmids, the latter occurring frequently in hospital-acquired, multidrug !i!i,!~:i;~:~!.i~i:i! resistant organisms. Statistical analysis of multiple amino acid sequence comparisons places the 14-helix H+/multidrug antiporter family in the uniporter-symporterantiporter (USA)superfarnily, also known as the major facilitator superfamily (MFS) 3,4. However, unlike all the other members of the superfamily, which are predicted to contain 12 membrane-spanning helices 3,9, members of the 14helix H+/multidrug antiporter family are predicted to form 14 membranespanning helices by the hydropathy of their amino acid sequences and the activity of reporter gene fusions 1,2. Several amino acid sequence motifs are highly conserved in the 14-helix H§ multidrug antiporter family, including motifs unique to the family, signature motifs of the USA/MFS superfamily, and motifs necessary for function by the criterion of site-directed mutagenesis 1-3,1o,ll :::~.: -::',
i;i:ii:-i:~:~i :.::"~::::.::]
..........
..:m :~ ,: ,:---.~
~:~: :..,:-: -..::.:1
:::~.: .~: .:.:::: ..-:.:: .:~ ,,.~.:.,..:~ :-..
ii~i-iiii:-~iiiil.i~i,.i~:i~! ~
Nomenclature, biological sources and substrates CODE
Ac22strco
DESCRIPTION
ORGANISM
[SYNONYMS]
[COMMON NAMES] Streptomyces coeldcolor
Possibleactinorhodin transporter [ACTII2] Actvastrco Actinorhodintransporter [ACTVA] Atrlsacce Aminotriazoleresistance protein [ATR1,SNQ1] :'~!i!!::i::::.:::;~i~i :i 4] i?:~:i/)i~!?l Bcrescco Bicyclomycinresistance protein [BCR,BICA,BICR, SUR, SUXA] Bicyclomycinresistance ~i::-:!i!;!iii!; !~I Bcrhaein protein [BCR,HI1242] Cmctnocla Cephamycinexport protein [CMCT] ~:":..~:.:: :::.:~::.,: ,:.!
[gram-positive bacterium] Streptomyces coeldcolor
SUBSTRATE(S)a [RESISTANCE(S)]b
Actinorhodin Actmorhodin
[gram-positive bacterium] Saccharornyces cerevisiae [yeast] Escherichia cold
[Aminotriazole]
[Sulfonamide, [gram-negative bacterium] bicyclomycin]
[Sulfonamide, [gram-negative bacterium] bicyclomycin] Nocardia lactamdurans Cephamycin [gram-positive bacterium] Haemophilus influenzae
CODE ii?: :-:::<::.::,
Cmlescco Cmlapseae Efprmyctu
DESCRIPTION [SYNONYMS] Chloramphenicol resistance protein [CMLA] Chloramphenicol resistance protein [CMLA] Possible efflux protein
[EFPR]
~iCi!/:,. Emrbescco
:~.
Multidrug resistance protein B [EMRB] Emrbhaein Multidrug resistance protein B homolog [EMRB, HI0897] Emrdescco Multidrug resistance protein D [EMRD] Lframycsm Antiporter efflux pump [LFRA]
..,.
Lmrastrln
Lincomycin-H+antiporter
[LMKA] i:7.;
;:(
> . . .
.
Mmrbacsu Mmrstrco .
Ppflpaspi Pur8strlp
Methylenomycin A resistance protein [MMR] Methylenomycin A resistance protein [MMR] Florfenicol resistance protein [PPFL, PP-FLO] Puromycin-H +antiporter
[PURS]
Qacastaau 5 ]J:-;: .:; <.. ....
..
.
Smvasalty
...
!i:.:i:/7.. . . .
Tcmastrga
.
Tcr2bacsu
. .
Sgelsacce
.
.
....
Tcr3strau
:. :. . ::. ~"7: )... . ::
Multidrug-H +antiporter [QACA] Crystal violet resistance protein [SGEI, SGE, NOR1, P9677.3] Methyl viologen resistance protein [SMVA] Tetracenomycin-H + antiporter [TCMA]
-
Tetracycline-H+antiporter [TCR2, TET] Tetracycline-H+antiporter [TCR3, TETI
..:.
.... Z.-!!.::
.
Tcrbacst
Tetracycline-H+antiporter [TCR, TETI
Tcrbbacsu
Tetracycline-H +antiporter [TCR, TET, TETB] Tetracycline-H +antiporter [TCR, TET] Tetracycline-H +antiporter [TCR, TET] Tetracycline-H+antiporter [TCR, TET] Tetracycline-H+ antiporter [TCR, TETI
.: ..
..
: . . .
Tcrstaau Tcrstahy Tcrstrag Tcrstrpn
342
ORGANISM [COMMON NAMES] Escherichia coli [gram-negative bacterium] Pseudomonas aeruginosa [gram-negative bacterium] Mycobacterium tuberculosis [gram-positive bacterium] Escherichia coh" [gram-negative bacterium] Haemophilus influenzae [gram-negative bacterium] Escherichia cold [gram-negative bacterium] Mycobacterium smegmatis [gram-positive bacterium] Streptomyces lmcolnensis [gram-positive bacterium] Bacillus subtilis [gram-positive bacterium] Streptomyces coelicolor [gram-positive bacterium] Pasteurella piscicida [gram-negative bacterium] Streptomyces lipmanii [gram-positive bacterium] Staphylococcus aureus [gram-positive bacterium] Saccharomyces cerevisiae [yeast] Salmonella typhimurium [gram-negative bacterium] Streptomyces galucescens [gram-positive bacterium] Bacillus subtilis [gram-positive bacterium] Streptomyces aureofaciens [gram-positive bacterium] Bacillus stearothermophilus [gram-positive bacterium] Bacillus subtilis [gram-positive bacterium] Staphylococcus aureus [gram-positive bacterium] Staphylococcus hyicus [gram-positive bacterium] Streptococcus agalactiae [gram-positive bacterium] Streptococcus pneumoniae [gram-positive bacterium]
SUBSTRATE(S)a [RESISTANCE(S)]b [Chloramphenicol]
[Chloramphenicol]
U
n
k
n
o
w
n
[Hydrophobic antibiotics] [Hydrophobic antibiotics] [Multiple drugs] H+/multiple drugs H+/lincomycin [Methylenomycin] [Methylenomycin] [Florfenicol] H+/puromycin H+/QUACs, EB, antiseptics [Crystal violet, lO-N-nonylacridine] [Methyl viologen] H+/tetracenomycin H+/tetracycline metal chelate H*/tetracyclinemetal chelate H+/tetracyclinemetal chelate H+/tetracycline metal chelate H+/tetracycline metal chelate H+/tetracycline metal chelate H+/tetracycline metal chelate H+/tetracycline metal chelate
CODE Toxacocca
DESCRIPTION [SYNONYMS] Toxinpump [TOXA]
ORGANISM [COMMON NAMES] Cochliobolus carbonum [fungus]
SUBSTRATE(S)a [RESISTANCE(S)] b Unspecified toxins
Cotransported ions are listed for known antiporters. bPresumed substrates; protein confers resistance to specified compounds. Abbreviations: QUAC, quaternary ammonium disinfectants; EB: ethidium bromide.
Phylogenetic tree
iCii(!: :... . . . . <..:., .: { ~ .:,.k< %
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the phylogenetic tree: Tcrbacst, Tcrstahy, Tcrstrpn, Tcrstrag, Tcrstaau (Tcr2bacsu}. Bcrescco Bcrhaein Emrdescco s Pfplpaspi Cmlescco Sgelsacce Toxacocca Tcr2bacsu Tcrbbacsu Emrbescco Emrbhaein Mmrbacsu Mmrstrco Tcmastrga Lframycsm Qacastaau Smvasaity
[G~:-
ictvstrco
Cmctnocla Pur8str!p Efprmyctu Lmrastrin Ac22strco Tcr3strau Atr!sacce
V. <'i-:/ ~;~:~.:,:,, :.::
Proposed orientation of QACA in the membrane i . <,:i-% v ,
[~':b'.(.!
i%:~::)~::! ....
i<?:< i<Jf :< :~<<,,.-
The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the inside and is folded 14 times through the membrane 1,2. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75% of the aligned transporters [see belowl are shown. Consensus residues indicated by an asterisk are not conserved in QACA.
343
';.'"'t...... -v
::.::.~,~-::.. :... . . i "il-i; D f: ~.::;~.i:.::-i- :. :~.
OUTSIDE
":! ("::;i:: ::A L
G ,
.. .
..
Se
.41
"m4,, 1~3
~ ..::.,:,::..:,..~ .. :,-:
i i::ii:!i~:! i!):!~i
~
Y
R
164 G
W
G
.......,,:.::. :::..:.
ii;.i:i:.;i::(;.i..ii'i
G~
..
i
~
147
-:i1.7i; ["::..;:;i: ;:.;;;:-:";: ....................
!i;,iii~i,,)!ii
.......
G
..... :::::::::::::::::::::::....
ii!:i!iG: ::!i::i!:-!!::i~ ............................ >
iiiii).il :ii:i~iii~ii!!ii:.:i~~iiiPhysical iif:~!ii!',',~,i'i ,ISi ...........
~ii~:.i:i!;ii!:~,',i ili:]','i i!!i!Gi~!!,!:~:ii
Ac22strco Actvastrco Atrlsacce Bcrescco i;',!!ii'~i':!i:,~:A!:Bcrhaein Crnctnocla Cmlescco Cmlapseae Efprmyctu i!ili! !=i!ili!!i! Emrbhaein !iiii':i;ii!(ii:i!:!'~iii'~,: Emrbescco i~i!i'~ii!i!:ii: i':)':~:ii Emrdescco Lframycsm Lrnrastrln Mmrbacsu :::i~!:i])~!:i::i,!:i!!i Mmrstrco Ppflpaspi :~i!!iii~ii:!i!):i!i Pur8strlp Qacastaau Sgelsacce Srnvasalty Tcmastrga Tcrstahy Tcrbacst Tcrstrpn ....~:::.:~,.~,.:...~..::. :. Tcrstaau Tcrstrag Tcr2bacsu i:iiii;iiii:i):ii!:i-?.i:::i}ili i;i!;!iiiii:i:ii ::iii!i~
ii!~}:.:~iiiG!'ii]i;~ -:,.,:-: .:, ::.
i] :i~i!!:.:]i i~:i:ii:
ii!iii!ii~.A(:i :~::i:; 51i:!ilB: ==!@
..;:.:ii.i-,::..:i. ?.. ...............,:. >. ,,:.!i...... :!i:i:i!iiii:ii i:.:i.11117:1: ............. . >.:.::
>:
..........
}44
NH
INSIDE
C
2
and genetic characteristics AMINO ACIDS 578 533 547 396 398 486 302 419 529 510 512 396 504 481 466 475 3 74 503 514 543 496 538 458 458 458 433 458 459
MOL. WT
59 771 54 546 60 776 43 366 43 459 49 325 33 390 44 243 55 706 55 826 55 612 42 460 51 539 50 421 48 845 49 238 39 791 51 852 55 015 59 425 52521 54 846 49953 50 119 50 092 47 789 50056 50 695
CHROMOSOMAL LOCUS
Chromosome 13 49.02 minutes
19.1-19.12 minutes Plasmid R1033
60.55 minutes 82.98 minutes
325 ~ Plasmid Scp 1
Plasmid pSTE 1 Plasmid Plasmid Plasmid Plasmid
pLS 1 and others pT181 pMV158 pNS 1
O
O
H
Tcr3strau Tcrbbacsu Toxacocca
ii~iii!!ii~ii~il :~qit
AMINO ACIDS
MOL. WT
512 458 548
53021 49 756 58045
CHROMOSOMAL LOCUS
Multiple amino acid sequence alignments
> ii
1
lILT
iL!::~!?
iilk:.}7 ;.::
:[5:::!i!? i
iLL !~< i!
[%::<,::!
i~i :.:; . .5: ... ~. ~ ..
,<.....
{~: k
..
.
.
!: ;::::: < ,.
Toxacocca Emrbhaein Mmrbacsu Mmrstrco Tcmastrga Lframycsm Qacastaau Actvstrco Cmctnocla Pur8strlp Efprmyctu Lmrastrln Ac22strco Tcr3strau Atrlsacce Consensus
50 ................. MDE QIVSASSNVK DGVEKQPVKD REDVDANVVP ................................................. M ................................................ MK ......................................... MTTVRTGGA ........................................ MSTETHDEPS ....... MST ................................................ MI .............................................. MTAN ....................................... M TSVRGASKTG ....................................... M ARKPDISAVP .............. MTALND TERAVRNWTA GRPHRPAPMR PPRSEETASE .................................... MSVF A R A T S L F S R A . . . . . . . . . . . . . . . . . . . . . . . . . . MSSV E A D E P D R A T A P P S A L L P E D G .................................... MGMA N A T S Q T G E A V MGNQSLVVLT ESKGEYENET ELPVKKSSRD NNIGESLTAT AFTQSEDEMV ..................................................
Bcrescco Bcrhaein Emrdescco Cmlapseae Pfplpaspi Sgelsacce Toxacocca Tcr2bacsu Tcrbbacsu Emrbescco Emrbhaein Mmrbacsu Mmrstrco Tcmastrga Lframycsm Qacastaau Smvasalty Actvstrco Cmctnocla Pur8strlp Efprmyctu Lmrastrln Ac22strco Tcr3strau Atrlsacce Consensus
51 i00 .......... M T T R Q H S S F A I V F I L G L L A M L M P L S I D M Y L P A L P V I S A Q F ........... MNQQKSTFI FILTLGILSM LPPFGVDMYL PSFLEIAKDL .......... V I M K R Q R N V N L L L M L V L L V A V G Q M A Q T I Y I P A I A D M A R D L ...... MSSK N F S W R Y S L A A T V L L L S P F D L L A S L G M D M Y L P A V P F M P N A L ...... MTTL H P A W A Y T L P A A L L L M A P F D I L A S L A M D I Y L P V V P A M P G I L ........ MK S T L S L T L C V I S L L L T L F L A A L D I V I V V T L Y D T I G I K F H D F PHSTPSLPKI SLISLFSIVM SLGAAAFLGA LDATVVAVLT PTLAQEFHSV ...MFSLYKK F K G L F Y S V L F W L C I L S F F S V L N E M V L N V S L P D I A . . . N H F ...MNTSYSQ S T L R H N Q V L I W L C V L S F F S V L N E M V L N V S L P D I A . . . N E F ...MQQQKPL E G A Q L V I M T I A L S L A T F M Q V L D S T I A N V A I P T I A . . . G N L GNSAKKFPPI QGGALILLTL ALSLATFMQV LDSTIANVAI PTIA...GDL NSGSIQESTS STGISVLIV..LALGFLMAT LDVTVVNVAM ADM...KNTL QTAEVPAGGR RDVPSGVKIT ALATGFVMAT LDVTVVNVAG ATI...QESL G V A H T P A S G L RGRPWPTLL. A V A V G V M M V A L D S T I V A I A N P A I . . . Q Q D L CIEGTPSTTR TPTRAWVALA VLALPVLLIA IDNTVLAFAL PLIA...EDF SFFTKTTDMM TSKKRWTALV VLAVSLFVVT MDMTILIMAL PELV...REL ........... MFRQWLTLV IIVLVYIPVA IDATVLHVAAPTLS...MTL PGRPGGPADQ GHPRRWAILG VLVLSLVGII LDNTVLNVTL RTLTDPEQGL R T S K T S T A T T ....... ALV L A C T A H F L V V F D T S V I T V A L P S V . . . R A D L VESAACQGPD PRRWW..GLV VILAAQLLVVLDGTVVNIAL PSV...QRDL RPSRYYPTWL PSRTFIAAVI AIGGMQLLAT MDSTVAIVAL PKI...QNEL ARTRAADEAARSRSRWVTLV FLAVLQLLIA VDVTVVNIAL PAI...RDSF PGDGTAAGPP PYARRWAALG VILGAEIMDL LDGTVMNVAAPAV...RADL ADEAGGPAGF THRQIITALS GLLLAVLLAALDQTIVSTAL RTIGDQLHG DSNQKWQNPN YFKYAWQEYL FIFTCMISQL LNQAGTTQTL SIMNILSDSF ..................................................
~45
14-Helix
7:
::
' r .
-:::]....;
..:... . ..
.
.
.
...
.
.
.
:!i:iii :- :i<:
.
:L.?:%L
;!i~::i)'~?}ii:i
i%i}:ii::-i ~.:i#}:.......
H+/muhidrug
Bcrescco Bcrhaein Emrdescco Cmlapseae Pfplpaspi Sgelsacce Toxacocca Tcr2bacsu Tcrbbacsu Emrbescco Emrbhaein Mmrbacsu Mmrstrco Tcmastrga Lframycsm Qacastaau Smvasalty Actvstrco Cmctnocla Pur8strlp Efprmyctu Lmrastrln Ac22strco Tcr3strau Atrlsacce Consensus
antiporter
family
i01 GVPAGSTQMT LSTYILGFAL GQLIYGPMAD DVSPEQVQHT LTSFAYGMAF GQLFWGPFGD NVREGAVQSV MGAYLLTYGV SQLFYGPISD GTTASTIQLT LTTYLVMIGA GQLLFGPLSD NTTPAMIQLT LSLYMVMLGV GQVIFGPLSD G .... NIGWL VTGYALSNAV FMLLWGRLAE D .... AVAWY GAIYLLMSGT TQPLFGKLYN NTTPGITNWVNTAYMLTFSI GTAVYGKLSD NKLPASANWVNTAFMLTFSI GTALYGKLSD GSSLSQGTWVITSFGVANAI SIPLTGWLAK GASFSQGTWVITSFGVANAI SIPITGWLAK SMSLSGVTWVVDGYILTFAS LLLAGGALAD DTTLTQLTWI VDGYVLTFAS LLMLAGGLAN HASLADVQWI TNGYLLALAV SLITAGKLGD RPSATTQLWI V D V Y S L V L A A L L V A M G S L G D EPSGTQQLWI VDIYSLVLAG FIIPLSAFAD GASGNELLWI IDIYSLVMAG MVLPMGALGD GASHSQVEWVLSAYTLAFAATLFTWGVLGD G F A P A S L Q W V V N S Y T L A F A G LLLFGGRLAD G M S D T S R Q W V I T A Y T L A F G G LLLLGGRVAD SLSDAGRSWVITAYVLTFGG LMLLGGRLGD H V D T R Q L T W V V T G Y T V V G G G LLMVGGRIAD GGSLSVIQWI TVGYTLAFAV LLVVGGRLGD ...QTVQAWVITGYLVSSTI AMPFYGKLSD GSEGNSKSWL MASFPLVSGS FILISGRLGD ........ W .... Y ........... G...D
150 SFGRKPVVLG GTLVFAAAAV SFGRKPIILL GVIVGALTAL RVGRRPVILV GMSIFMLATL RLGRRPVLLG GGLAYVVASM RIGRRPILLA GATAFVIASL ILGTKECLMI SVIVFEIGSL EFSPKWLFIT CLIVLQLGSL YINIKKLLII GISLSCLGSL QLGIKNLLLF GIMVNGLGSI RVGEVKLFLW STIAFAIASW RFGEVRLFLV STFLFVVSSW RFGSKTIYIL GLAVFVMASC RIGAKTVYLW GMGVFFLASL RFGHRQTFLV GVAGFAVTSA RLGRRRVLLI GGAGFAVVSA KWGRKKALLT GFALFGLVSL RIGFKRLLML GGTLFGLASL RLGRRRVLLL GLGLFGLSSL IHGHRRVFLG GLAVFTLTSL AFGRRRIFAV GILGFGLASL TIGRKRTFIV GVALFTISSV LFGRRRTLLF GAFLFGASSL IYGRKRMFVVGAVGFTAASV IYGRKPLYLA AIAVFIVGSA IYGLKKMLLV GYVLVIIWSL ..G ....... G ....... S.
151 200 Bcrescco ACALA..NTI .DQLIVMRFF H G L A A A A A S V V I N A L M R D I Y P . K E . E F S R ~ Bcrhaein VLTEI..NSV .GNFTALRFV QGFFGAAPVVLSGALLRDLF S.KD.QLSKV Emrdescco VAVTT..SSL .TVLIAASAM QGMGTGVGGV MARTLPRDLY E.RT.QLRHA Cmlapseae GLALT..SSA .EVFLGLRIL QACGASACLV STFATVRDIY AGRE.ESNVI Pfplpaspi GAAWS..STA .PAFVAFRLL QAVGASAMLV ATFATVRDVY ANRP.EGVVI Cmlescco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MS i :~i:!'-!i!: !i-:i!r:- Sgelsacce ISALS..NSM .ATLISGRVVAGFGGSGIES LAF.VVGTSI VREN.HRGIM Toxacocca VCALA..RNS .PTFIVGRAV AGIGAGGILS GAL.NIVALI VPLH.HRAAF Tcr2bacsu IAFIG..HNH FFILIFGRLV QGVGSAAFPS LIMVVVARNI TRKK..QGKA Tcrbbacsu IGFVG..HSF FPILILARFI QGIGAAAFPA LVMVVVARYI PKEN..RGKA Emrbescco ACGVS..SSL .NMLIFFRVI QGIVAGPLIP LSQSLLLNNY PPAK..RSIA Emrbhaein LCGIA..DSL .EALIIFRVI QGAVAGPVIP LSQSLLLNNY PPEK..RGMA Mmrbacsu LCAAS..ING .QMLIAGRLI QGIGAALFMP SSLSLLAASY LDER.ARARM i~i :=~;ii~{!:ii: ! Mmrstrco ACALA..PTA .ETLIAARLV QGAGAALFMP SSLSLLVFSF PEKR.QRTRM ::i:i?%ir Tcmastrga AIGLS..GSV .AAIVVFRVL QGLFGALMQP SALGLLRVTF PPGK.L.NMA Lframycsm LAAFA..PST .ELLVGARAL LGVFGAMLMP STLSLIRNIF TDAS.ARRLA Qacastaau AIFFA..ESA .EFVIAIRFL LGIAGALIMP TTLSMIRVIF ENPK.ERATA S m v a s a l t y A A A F S . . H T A .SWLIATRVL LAIGAAMIVP ATLAGIRATF CEEK.HRNMA Actvstrco AGAYA..GSP .EQLIAARAC MGVSGAAVLP STLATIAAVF P.LR.ERPKA Cmctnocla IGGLA TSP ASLIAARAG QGAGAAVLAP LAVTMLTTSF AEGP RRTRA Pur8strlp LGGAA..PDP .GTLFLARAL QGVFAAALAPAALALINTLF TEPG.ERGKA Efprmyctu LCAVA..WDE .ATLVIARLS QGVGSAIASP TGLALVATTF PKGP.ARNAA L m r a s t r l n A A G L A . . P N L .ELLVLARFG QGAGEALSLPAAMSLIACSS RTAP.FQG.. Ac22strco LCSVA..AGP .EMLTAARFL QGGLGALMIP QGLGLIKQMF P.PK.ETAAA !~f?. Tcr3strau ACAMA..NSM .ETLAIARVL QGFGGAGLMS LPTAVIAD.L APVR.ERGRY
i:i!iiii, i
....
..
.
. . . .
:
.
.
:o ~;:.:}.<
..
.
:
.
.
...:
~4~
Atr isacce ICGITKYSGS DTFFIISRAF QGLGIAFVLP NVLGIIGNIY VGGTFRKNIV Consensus ................. R...G ............................
Bcrescco Bcrhaein Emrdescco Cmlapseae .... Pfplpaspi Cmlescco Sgelsacce Toxacocca !~ii~i;iiii: i:ff Tcr2bacsu Tcrbbacsu Emrbescco Emrbhaein Mmrbacsu ~i~ii:i::i',:ii~il M m r s t r c o Tcmastrga Lframycsm Qacastaau !!i;:i( ;! Smvasalty Actvstrco Cmctnocla Pur8strlp ii}i!!:;:i::ii:'!)'~ E f p r m y c t u Lmrastrln Ac22strco Tcr3strau Atrlsacce [~ii-!!~ii::.:i!!~i';:!i:,i~C: o n s e n s u s {~ii:i;ili?iii!i~
~:.~i,::..:.;:;-::.%::
i!!.:S/:::iiii:,i:;: ::~ ?,. . . . .
::..
:
201 250 MSFVMLVTTI APLMAPIVGGWVLVW ...... LSWHYIFWI LALAAILASA MSTITLVFML APLVAPIIGG YIVKF ...... FHWHAIFYV ISLVGLLAAA N S L L N M G I L V S P L L A P L I G G L L D T M . . . . . . W N W R A C Y L F L .... L V L C A YGILGSMLAM VPAVGPLLGA LVDMW ...... LGWRAIFAF LG.LGMIAAS YGLFSSVLAF VPALGPIAGA LIGEF ...... LGWQAIFIT LAILAMLALL FTAYSDP.CW PWSRGRPIAR SARRH ...... VAWVAGYLC VSRFGHDRCI ITALAISYVI AEGVGPFIGG AFNEH ...... LSWRWCFYI NLPIGAFAFI TGMIGALECV ALIIGPIIGG AIADN ...... IGWRWCFWI NLPIGAAVCA FGFIGSIVAL GEGLGPSIGG IIAHYIHWSY LLILPMITIV TIPFLIKVMV FGLIGSLVAM GEGVGPAIGG MVAHYIHWSY LLLIPTATII TVPFLIKLLK LALWSMTVIV APICGPILGG YISDN ...... YHWGWIFFI NVPIGVAWL LAFWSMTIVVAPIFGPILGG WISDN ...... IHWGWIFFI NVPIGLSWL FGLWAALVSA ASALGPFIGG VLVQL ...... AGWQSIFLI NVPIGAAALI LGLWSAIVAT SSGLGPTVGG LMVSA ...... FGWESIFLL NLPIGAIGMA IGIWSGVVGA STAAGPIIGG LLVQH ...... VGWEAVFFI NVPVGLAALV IAIWASCFTA GSALGPIVGG ALLEH FHWGAVFLV AVPILLPLLV LAVWSIASSI GAVFGPIIGG ALLEQ ...... FSWHSAFLI NVPFAIIAVV LGVWAAVGSG GRAFVPLIGG ILLEH ...... FYWGSVFLI NVPIVLVVMG LGIWAASVGF ALGIGPVTGG ILLAH ...... FWWGSVLLV NVPLMAGCLV LTISTAVALV GGASGNLLGG VFTEF ...... LSWRSVLLV NVPIGIPVLF FGVYGAVSGG GAAVGLLAGG LLTEY ...... LDWRWCLYV NAPVAL.LAL TAVFAAMTAI GSVMGLWGG RLTE ....... VSWRWAFLV NVPIGLVMIY VERLASVASV GLVLGFLLSG VITQL ...... FSWRWIFLI NIP..LVSLV FGAFGPAIGL GAVLGPIVAG FLVDAD..LF GTGWRSVFLI NLPIGVAVIV FSYLMMAWVA ASVLGPLVGG LFAGAGEILG VTGWRWAFLI NVPLGLVALL ISFVGAMAPI GATLGCLFAG LIGTEDPK .... QWPWAFYA YSIAAFINFV . . . . . . . . . . . . . . G P . . G G . . . . . . . . . . . . . W . . . . . . . . P .......
251 300 Bcrescco MIFFLIKETLPPERRQP..F HIRTTIGNFAALFRHKRVLS YMLASGFSFA Bcrhaein LVFFIIPETH KKENRIP..L RLNIIARNFL LLWKQKEVLG YMFAASFSFG Emrdescco GVTFSMARWM PETRPVD..A PRTRLLTSYK TLFGNSGFNC YLLMLIGGLA CmlapseaeAAAWRFWPET R V Q R V A G . . L Q W S Q L L L P V K CL .... N F W L Y T L C Y A A G M G P f p l p a s p i N A G F R W H E T R P L D Q V K T . . . . R R S V L P I F A SP .... A F W V Y T V G F S A G M G Cmlescco C R R G S F W P E T R V Q R V A G . . L Q W S Q L L L P V K CL .... N F W L Y T L C Y A A G M G Sgelsacce ILAFCNTSGE PHQKMWLPSK IKKIMNYDYG ELLKASFWKN TFEVLVFKLD T o x a c o c c a I L L F F F . . . H P P R S T Y S A S G V P R .... SYS E I L G . . . . . . . . . . . . . N L D T c r 2 b a c s u PG . . . . . . . . . . K S T K N T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LD T c r b b a c s u KE . . . . . . . . . . E R I R G H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ID Emrbescco MTLQTL.RGR ETRTERRR .............................. ID Emrbhaein ISWKIL.GSR ESEIVHQP .............................. ID Mmrbacsu SAYRIL..SR .VPGKSSR .............................. VN Mmrstrco MTYRYI..AA.TESRATR .............................. LA Tcmastrga AGLVILTDAR .AERAPKS .............................. FD Lframycsm LGPRLVPESR ..DPNPGP .............................. FD Qacastaau AGLFLLPESK LSKEKSHS .............................. WD Smvasalty LTARY..DPR QAGRRDQP .............................. LN Actvstrco AVVLVVPETR ..GTAGRR .............................. VD Cmctnocla LAARVLAGPR KRPWGRVR .............................. LD Pur8strlp LGCRLLPRDR RTGRA.VR .............................. LD
!!!~!: : < ii:
.... 9..::::..::..-.,
..:.. : . . . . . . . .
:
i?ii!~i!~i~i.) .........
.. . . .
.... ....: . . . . . .
:.
::;i-;:i::12.) : 1 2 ; ; .
..... ..: : :-:..-!:!::
:.. ....:..
;i?i:::!f : i :::-
. . . . .
..
.
..... .-. : . . . . . . . .
..:. :.:........
.
.
....... : . . . ~ :
.
.
.....:...
:i-'-I I:: -? :i............... .H:H.'
Efprmyctu Lmrastrln Ac22strco Tcr3strau Atrlsacce Consensus
LARTAL...R ETNKERMK .............................. LD LVAVLLLVKK DETTARNP .............................. VD GAVLLLPEGK ..APVRPK .............................. FD SVRKALNLPH R..RVDHP .............................. ID LSIYAIPSTI PTNIHH.. FSMD ..................................................
301 350 Bcrescco GMFSFLSAGP FVYIEINHIAPENFGYYFALNIVFLFVMTI FNSRFVRRIG Bcrhaein GLFAFVTAGS IVYIGIYGVP VDQFGYFFMM NIVTMIFASF LNSRFVTKVG Emrdescco GIAAFEACSG VLMGAVLGLS SMTVSILFIL PIPAAFFGAW FAGRPNKRFS Cmlapseae SFFVFFSIAP GLMMGRQGVS QLGFSLLFAT VAIAMVFTAR FMGRVIPKWG Pfplpaspi TFFVFFSTAP RVLIGQAEYS EIGFSFAFAT VALVMIVTTR FAKSFVARWG Cmlescco SFFVFFSIAP GLMMGRPRCV SAWLQPAVRH SAIAMVFTAR FMGSVIPKWG Sgelsacce MVGIILSSAG FTLLMLGLSF GGNNFPWNSG IIICFFTVGP ILLLLFCAYD T o x a c o c c a Y I G A G M I I S S L V C L S L A L Q W G G T K Y K W G D G R V V A L L V V F G VL ........ T c r 2 b a c s u I V G I V L M S I S I I C F M L . . . . . . . . . . FTTN Y N W T F L I L F T I F F V ...... T c r b b a c s u M A G I I L M S A G I V F F M L . . . . . . . . . . FTTS Y R F S F L I I S I LAFF ...... E m r b e s c c o A V G L A L L V I G I G S L Q I M L D R G K E L D W F S S Q E I I I L T V V A V V A I C ...... E m r b h a e i n K V G L V L L V L G V G C L Q L M L D Q G R E Q D W F N S N E I I I L A V V A V VCLI ...... Mmrbacsu I I G H L L G M M A L G F L S Y A L I Q G P S A . G W R S P V I L V A F T A A V L A F V ...... Mmrstrco V P G H L L W I V A L A A V S F A L I E G P Q L . G W T A G P V L T A Y A V A V T A A A ...... T c m a s t r g a V S G I V L L S G A M F C L V W G L I K A P A W . G W G D L R T L G F L A A A V L A F A ...... Lframycsm PVSIVLSFTT MLPIVWAVKTAAH.DGLSAAAA.AAFAVGI V S G A ...... Q a c a s t a a u I P S T I L S I A G M I G L V W S I K E F S K . E G L A D I I P W V V I V L A I T M V ....... S m v a s a l t y L G H V V M L I I A I L L L V Y S A K T A L K . G H L S L W V I S Y T L L T G A L L L G ...... ActvstrcoAAGLLLSIAGVVPLVYAIIE A G R S G G V T R P A V W A A G L A G L G L L L ...... C m c t n o c l a L P G A V L A T A G L T L L T L G V S Q T H E . H G W G E A A V A V P L A G G L L A L L ...... P u r 8 s t r l p L P G T L L G C G G L V A I V Y A F A E A . E . S G W G D P L V V R L L V L G V L M L V ...... E f p r m y c t u A T G A I L A T L A C T A A V F A F S I G P E . K G W M S G I T I G S G L V A L A A A V ...... Lmrastrln LPGALLFTAAPLLLIFGVNE L G E . D E P R L P L A V G S L L A A A V C A A ...... Ac22strcoVVGMALVTSG L T L L I F P L V Q G R E . R G W P . A W A F V L M L A G A A V L V ...... T c r 3 s t r a u F R G A L T L A L C L V P L L I V A E E G L D W . G W G S A R S L T L F A V S L IGLV ...... A t r l s a c c e W I G S V L G V I G L I L L N F V W N Q A P . I S G W N Q A Y I I V I L I I S V I ......... Consensus .................................................. Bcrescco Bcrhaein Emrdescco Cmlapseae Pfplpaspi Cmlescco Sgelsacce Toxacocca Tcr2bacsu Tcrbbacsu Emrbescco Emrbhaein Mmrbacsu Mmrstrco Tcmastrga Lframycsm Qacastaau
351 ALNMFRSGLW IQFIMAAWMV ISALLG.LGF AETMLRIALA IQFLSGMWLI LTALLD.LGF ..TLMWQSVI CCLLAGLLMW IPDWFGVMNV SPSVLRMGMG CLIAGAVLLA ITEIWALQSV IAGCGRVGWR CLFA~ IGELYGSLNS SPSVLRMGMG CLIAGAVLLA ITEIWALHRV FHFLSLSGLH YDNKRIKPLL TWNIASNCGI ..FLSASGHQ Y.WKGEKALF PTRLLRQRGF ..IFIKH .... I S R V S N P F I N P K L G K N I P F ..IFVQH .... I R K A Q D P F V D P E L G K N V F F . . F L I V W E L .... T D D N P I V D L S L F K S R N F . . A L V I W E L .... T D D N P V V D I S L F H S R N F . . L F L L R E I S .... A K T P I L P A S L Y K N G R F . . L L A L R E H R .... V T N P V M P W Q L F R G P G F . . G F T L R E S R .... A T E P L M P L A M F R S V P L . . L F V R R Q N R .... S A T P M L D I G L F K V M P F . . I F V K R N L S .... S S D P M L D V R L F K K R S F
400 WSLVVGVAAF VGCVSMVSSN WPMAIGVAFF VGPNPVISSN WTLLVPAALF FFGAGMLFPL LGFIAPMWLV GIGVATAVSV SPSSYRCGLS RSVLSSRCPL RLYCSNVA.S GIGVATAVSV FTSSITGFLS CFAYELQSA. LLSLFNGL.. CFG.GVQYAA MLGLFSGGLI FSIV...AGF VIGTLCGGLI FGTV...AGF TIGCLCISLA YMLY...FGA SVGCLCTSLA FLIY...LGS SAAQFIGFLLNFAL...FGG TGANLVGFLF NFAL...FGS SAGTVLMVLM AFSF...IGG TSSILANFLS IIGL...IGF SAGTIAAFMT MFAM...ASV
Smvasalty Actvstrco iiii!~liii: i: iiili ; !@!;ii: i: ~:ii:ii:f Cmctnocla i!~:ii~!~:i:i!ii~:i Pur8strlp ii!!i~i IIi!~ !i Efprmyctu Lmrastrln Ac22strco Tcr3strau Atrlsacce Consensus %~::/~ ii:{;i:~ i'iii';q
..LFIRTQLA .... TSRPMI DMRLFTHRII L S G V V M A M T A MITL...VGF ..VFLWHERR .... TPEPSL ELGFFRMKAF STAVAAVGFV SFAM...MGF . . A F V V V E . . . A R F A A S P L I PPRLFGLPGV G W G N L A M L L A G A S Q . . . V P V . . A F A L V E . . . R R V Q D . P L L PPGVVAHRVR GGSFLVVGLP QIGL...FGL ..AFVIVE .... R T A E N P V V P F H L F R D R N R L V T F S A I L L A G G V M . . . F S L . . A F V A V E . . . R R T A H . P L V PLTFFGNRVR LVANGATVLL SAAL...STS ..GFVAHELR QERRGGATLI E L S L L R R S R Y A A G L A V A L V F FTGV...SGM ..LFVLAERA ...RGLEAMV PLRLFRRGGI TMATAVNFTI G V G I . . . F G T ..FLVVFIIY EIRFAKTPLL PRAVIKDRHM QIMLALFFG. WGSFGI...F ..................................................
401 450 Bcrescco AMAVILDEFP HMAGTASSLA GTFRFGIGAI VGAL .... LS LATFNSAWPM Bcrhaein AMASALERCP QMAGTANSLI GSVRFAVGAI MGSL .... VA SMKMDTAAPM Emrdescco ATSGAMEPFP FLAGTAGALV G G L Q N I G S G V LASL .... SA M L P Q T G Q G S L Cmlapseae A P N G A L R G . . . F D H V A G T V T A V Y F C L G G V L LGSIGTLIIS L L P R N T A W P V i~/./i~.ii!@!:: Pfplpaspi P R T A L L A E . . . F D D I A G S A V AFYFCVQSLI V S I V G T L A V A L L N G D T A W P V Cmlescco PMALFEDSTM LLERSRQSTS A..WACTARK H R N V D H F A V A A Q R L G P C R V T Sgelsacce ..YLVQLYQL VFKKKPTLAS IHLWELSIPA M I A T M A I A Y L NSKY.GIIKP Toxacocca L Y Y L P T W F Q A IKGETRVGAG IQMLPI.VGA IIGVNIVAGI TISFTGRLAP @.,@:~i! Tcr2bacsu ISMVPYMMKT IYHVNVATIG NSVIFPGTMS VIVFGYFGGF L V D R K G S L F V Tcrbbacsu V S M V P Y M M K D VHHLSTAAIG SGIIFPGTMS V I I F G Y I G G L L V D R K G S L Y V Emrbescco IVLLPQLLQE V Y G Y T A T W A G LASAPVGIIP V I L S P I I . A R FAHKLDMRRL E m r b h a e i n V V L I P L L L Q Q V F H Y T A T W A G LAASPVGLFP ILLSPII.GR FGYKIDMRIL Mmrbacsu MFMLSLFLQE AGGASSFMAG VELLPMMAVF V I G N L . L F A R L A N R F E A G Q L Mmrstrco TFMLGLYFQH A R G A T P F Q A G LELLPMTIFF PVANI.VYAR ISARFSNGTL Tcmastrga LFFVTFYLQN VHGMSPVESG V H L L P L T G M M IVGAP.VSGI VISRFGPGGP ii~:i!ii ~i !iiii!!Sl L f r a m y c s m IFFISQHLQL VLGLSPLTAG LVTLPGAVVS M I A G L . A V V K A A K R F A P D T L i;i~@i:i::i:ii4i:i ii~ Qacastaau LLLASQWLQV V E E L S P F K A G LYLLPMAIGD M V F A P . I A P G L A A R F G P K I V Smvasalty ELLMAQELQF V H G L S P Y E A G VFMLPVMVAS G F S G P . I A G V L V S R L G L R L V Actvstrco LFFSAFYLQS VRGYTPLQAG G C T V A L A V A N V V C G P . L S T V L V R S I G P R N V Cmctnocla W F F L T L S M Q H V L G Y S A A Q A G LGFVPHALVM L V V G L R V V P W LMRHVQARVL @!i,i:@!~i Pur8strlp FLFLTYYLQG ILDYSPVLTG VAFLPLGLGI A V G S S L I A A R L L P R T R P R T L Efprmyctu TVCIGLYVQD ILGYSA.LRR V G F I P F V I A M G I . G L G V S S Q LVSRFSPRVL Lmrastrln FFLLTMHLQE ERDLSPIEAG LSFLPLGLSL ILACVLVRG. L I E R I G T T G A ~{iB@? Ac22strco SLLLALHLQI G L G F S P T R A A L T M T P W S V F L V V G A I L T G A V L G S K F G R K A L @iiii~!i!i~ Tcr3strau VSTLPLFLQL V Q G R S A T V A G L V I I P V M T G A IVSQTICAKI IKKWNRYKKP Atrlsacce TFYYFQFQLN IRQYTALWAG GTYFMFLIWG IIAALLVGFT IKNVSPSVFL Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
i!!!!!! !!i!!f!!! .......
=======================
ii:.i;ilf.i iiiiil
iBili? .. . ... . .. . .. . ... . .. . ... . .. . .. . . . . .
@i~i:i!;:i~i~:~ii
451 500 Bcrescco IWSIAFCATS SIL.FCLYAS RPKKR . . . . . . . . . . . . . . . . . . . . . . . . . Bcrhaein LFTMGACVVI SVLAYYFLTS RNLKSRG . . . . . . . . . . . . . . . . . . . . . . . Emrdescco GLLMTLMGLL IVLCWLPLAT RMSHQGQPV . . . . . . . . . . . . . . . . . . . . . C m l a p s e a e V V Y C L T L A T V V L G L S C V S R V K G S R G Q G E H D V V A L Q S A G S T SNPNR ..... Pfplpaspi IC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cmlescco V . . . W T L A T V V L G L S C V S R V KGSRGQGEHD A G R A T N V G K Y IKSQSLRECG @;ii!iil Sgelsacce AIVFGVLCGI V G S G L F T L I N GELSQS..IG YSILPGIAFG SIFQATLLSS B, iiii'~:";i Toxacocca FIVIATVLAS VGSGLLYTT. PTKSQARIIG YQLIYGAGSG A G V Q Q A F I G A ~di::iiii!@2::: Tcr2bacsu FILGSLSISI SFLTIAFFVE FSM..WLTTF MFIFVMGGLS FT..KTVISK ................ :<: Tcrbbacsu LTIGSALLSS G F L I A A F F I D A A P . . W I M T I IVIFVFGGLS FT..KTVIST Emrbescco V T F S F I M Y A V CFYWRAYTFE PGMD..FGAS A W P Q F I Q G F A V A C F F M P L T T Emrbhaein VTISFIVYAI TFYWRAVTFE PSMT..FVDV A L P Q L V Q G L A V S C F F M P L T T Mmrbacsu MFVSMAVSCI IALLLFVLIS PDFPYWQLAV L..MSVMNLC T G I T V P A M T T
i=!iir
,'iii@
t
Mmrstrco Tcmastrga Lframycsm Qacastaau Smvasalty Actvstrco Cmctnocla Pur8strlp Efprmyctu Lmrastrln Ac22strco Tcr3strau Atrlsacce Consensus
:
.
.
L
: 9
i
.i
. , .
.
, :
...
..... ~ -::i".
/i
~ "
<.
~"u,
'
.
.
:t"
,
,.- . . . : , . .
.,.-s,.
9
.,
'
":.i'..
",'.,
: .....( . "
:.
., ,
. . . . ..
.. .
.
...... ,
,,..~.,
.~,!i!i~i.:.! =
.,
,
LTAFLLLAGA ASLSM.VTIT ASTPYWVVAV A..VGVANIG AGIISPGMTA L V V G M L L T . A ASLWGMSTLE A D S G M G I T S L W . . F V L L G L G LAPVMVGTTD M V T G L V F V . A V G F L M I L L F R H N L T V A A I I A S..FVVLELG V G V S Q T V S N D L P S G I G T A . A IGMFIMYFFG HPLSYSTMAL A . . L I L V G A G M A . S L A V A S A A T G G M A L S . A LSFYGLAMTD FSTQQWQAWG L . . M A L L G F S A A S A L L A S T S CAAGMLAV.T A S L C G V T F V T Q H A P V W L I L V L . . F A A L G A G V A C V M P T A A V IAAGAAIGAL GFWWQSLLTP DSA..YLGGI L G P A V L I S I G G G L V G T P L A R I V G A L L A A A A G M A L L T R L E P DTPQVYLTHL L P A Q I L I G L G IGCMMMPAMH TIGGGYLLFG A M L Y G S F F M H RGVP.YFPNL V M P I V V G G I G IGMAVVPLTL AVLGMALAGP RHRLFALLPS D N S . . L L T S V FPGMILL.LR M A T G L V A L Q N H G G L W L A L G V L I M L L T I G D QAGGLTSWEL V P G I A V A G L G M G I M I G L L F D A I V G L G S M A G A . . . L L S L S A A G A D T P L A V I V V I A A W L G F G IGLSQTVITL FFSMVAFNVG SIMASVTPVH ET...YFRTQ L G T M I I L S F G MDLSFPASSI ..................................................
501 550 Bcrescco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bcrhaein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Emrdescco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cmlapseae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pfplpaspi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cmlescco K L S P N K C C S R P K T Y A F R L G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sgelsacce QVQITSDDPD FQNKFIEVTA FNSFAKSLGF AFGGNMGAMI FTASLKNQMR Toxacocca QAALDPADVT YAS..ASVLL MNSMSGVITL CVCQNL ........ FTNRIN Tcr2bacsu IVSSSLSEEE VASGMSLLNF TSFLSEGTGI A I V G G L L S L Q LINRKLVLEF Tcrbbacsu VVSSSLKEKE AGAGMSLLNF TSFLSEGTGI A I V G G L L S I G FLDHRLLPID Emrbescco ITLSGLPPER LAAASSLSNF TRTLAGSIGT SITTTMWTNR ESMHHAQLTE Emrbhaein ITLSGLPAHK MASASSLFNF L R T L A G S V G T SLTTFMWYNR EAVHHTQLTE Mmrbacsu V I M Q A A G Q R H T N I A G A A L N A N R Q I G A L V G V AITGVII ..... HLSATWYA Mmrstrco A L V D A A G P E N A N V A G S V L N A N R Q I G S L V G I A A M G W L HSTSDWDH Tcmastrga V I V S N A P A E L AGVAGGLQQS A M Q V G G S L G T A V L G V L M A S R V G D V F P D K W A L f r a m y c s m TIVASVPAAK SGAASAVSET A Y E L G A V V G T ATLGTIFTAF YRSNVDV..P Qacastaau LIMLETPTSK AGNAAAVEES M Y D L G N V F G V A V L G S L S S M L YRVFLDI..S Smvasalty A I M A A A P A E K A A A A G A I E T M AYELGAGLGI AIFGLLLSRS FSASIRL..P Actvstrco SIMNAIPREK A G V A S A M N N T V R Q L G G A L G V A V L G S L M G A A Y R R G I E D . . E Cmctnocla T V T S G V G P L D A G A A S G L M N T TRQFGGAFGL A V L L T V T G S G T...SG .... Pur8strlp TATARVAPHE AGAAAAVVNS A Q Q V G G A L G V A L L N T V S T G A T . . . A A Y L A D Efprmyctu SAIAGVGFDQ IGPVSAIALM LQSLGGPLVL A V I Q A V I T S R TLYLGGTTGP L m r a s t r l n A A L H A V T E A D AGVASGVQRC ADQLGGASGI AVYVSIGFSP .......... Ac22strco IALADVDKQE A G T A S G V L T A V Q Q L G F T V G V A V L G T L F F G L L G S Q A T A S V D Tcr3strau AIQSSAPKSE L G V A N A A S G L FRQLGGTSGA A V F M S V L F G V A A G R L D G A D P Atrlsacce IFSDNLPMEY Q G M A G S L V N T V V N Y S M S L C L G M G A T V E T Q V N S D G K H L L K G Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
u ~ / . .
9
.u.,, ... ,.
. . .
. , ~. ,
.,
~5~
Bcrescco Bcrhaein Emrdescco Cmlapseae Pfplpaspi Cmlescco Sgelsacce Toxacocca Tcr2bacsu
551 600 .................................................. .................................................. .................................................. .................................................. .................................................. .................................................. SSQLNIPQFT ..SVETLLAY STEHYDGPQS SLS.KFINTA IHDVFYCALG ALTEVLPGVT KETLQSGFAF LRSTLTPAEF G V A I Q T F N S A IQDAFLVAIV INYSSGVYSN ILVAMAILII LCCLLTIIVF KRSEKQFE ............
Tcrbbacsu Emrbescco Emrbhaein Mmrbacsu Mmrstrco Tcmastrga Lframycsm Qacastaau Smvasalty Actvstrco Cmctnocla Pur8strlp Efprmyctu Lmrastrln Ac22strco Tcr3strau Atrlsacce Consensus
VDHSTYLYSN MLILFAGIIV ICWLVILNVY KRSRRHG ............. SVNPFNPNAQ AMYSQLEGLG MTQQQASGWI AQQITNQGLI ISANEIFWMS HINPYNPISQ SFYHQMNQFG LSDTQTSAYL AQQITSQGFI IGANEIFWLS GAGFAFL ...... MMGAAYS LAALLVWLFLAAHNGTAASE K M P S Q ..... G A A I S F L . . . . . . A V G L A Y L L G G L S A W R L I A R P E R R S A V T A A T ....... EANLPRVGPR EAAAIEDAAE VGAVPPAGTL PGRHAGTLSEVVHSSFISGM A .... G L T P E Q T G A A A E S I G G A A A V A A D L P A A . T A T Q L L D SARAAFDSGI SFSSKGIVGD LAHVAEESVVGAVEVA...K AT.GIKQLAN EAVTSFNDAF A .... G L E A Q E I A R A S S S M G E A V Q L A N S Y P P T Q G Q G K Y L T A A R H A F I W S H ...LAVLPPS ARHQAGESLD ATLLAATRL .... GESGLVG PARQAFLDAM .~ HYGDAFVGIA VFMLAIAVLT PVLPALARST PPGVIHVSPV HGTSPAATVD GTVHGYTVAI AFAVGVLLLT AVLAWVLIDS RTEAADETGS VKFMNDVQLA ALDHAYTYGL LWVAGAAIIV GGMALFIGYT PQQVAHAQEV ....... HLG G D W D P F T V A Y S L A . G I G L I A A V L A V L A L S P D R R L A A P R E Q DGASRARTELAAAGASTTEQ DRLLADLRVC LRESASQQDS ERTPDSCRNL DEAVRRALSD PGSTGGLSAS AVDAFTSGFD TMFLVGGLIL AVGFLLTFPL YRGAQYLGIG LASLACMISG LYMVESFIKG RRQELLQNTI ALWLSGKRY. ..................................................
Bcrescco Bcrhaein Emrdescco Cmlapseae Pfplpaspi Cmlescco Sgelsacce Toxacocca Tcr2bacsu Tcrbbacsu Emrbescco Emrbhaein Mmrbacsu Mmrstrco Tcmastrga Lframycsm Qacastaau Smvasalty Actvstrco Cmctnocla Pur8strlp Efprmyctu Lmrastrln Ac22strco Tcr3strau Atrlsacce Consensus
601 650 .................................................. .................................................. .................................................. .................................................. .................................................. .................................................. .CYALSFFFG IFTSSKKTTI SAKKQQ ........................ LSCASVLGWP FL..SWASVK GQKKMNK ....................... .................................................. .................................................. AGIFLVLLGLVWFAKPAFGA GGGGGGAH ...................... A M G F L G L L I V I W F A K P P F G T QH . . . . . . . . . . . . . . . . . . . . . . . . . . . . .................................................. .................................................. G L A F T V A G A V A L V A A A V A L F T R K A E P D E R A P E E F P V P A S T A G R G ...... APTAVIAAML VLAAAAVVGV AFRR .......................... V A T A L V G G I I M I I I S I V V Y L L I P K S L D I T K QK . . . . . . . . . . . . . . . . . . SVALSSAGSM LLLLAVGMWF SLAKAQRR ...................... HLAAGAAAAV ALVGALAVLR WLPSSVTTPT PPAGAVPGRE HSDHLKVQGS AR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ASVTPARPR ......................................... KEAIDAGEL ......................................... ED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . QQARPAVAEA TARAWRTAHT ENFSTAMVRT LWVVIALLAV SFALAFRLPP RELRDEE ........................................... .................................................. ..................................................
651 Ac22strco KPREEEGF C o n s e n s u s ........
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the
331
alignments: Tcrbacst, Tcrstahy, Tcrstrpn, Tcrstrag, Residues listed in the consensus sequence are present aligned transporter sequences. Residues indicated by conserved in at least one other family of the USA/MFS !:::~;~. i. : 9
Database accession numbers
SWISSPROT !.:.:..:...
:k
171
9
k
.:-r:.
i~~::~i.~ :: :-~
i/:J
.
::
..
: :-:.!:}
:!
I i: !:i,~~~ ii 9
Tcrstaau (Tcr2bacsu). in at least 75 % of the boldface type are also superfamily.
Ac22strco Actvstrco Atr 1sacce Bcrescco Bcrhaein Cmctnocla Cmlescco Cmlapseae Efprmyctu Emrbhaein Emrbescco Emrdescco Lframycsm Lmrastrln Mmrbacsu Mmrstrco Ppflpaspi Pur8 strlp Qacastaau Sgelsacce Smvasalty Tcmast rga Tcrstahy Tcrbacst Tcrstrpn Tcrstaau Tcrstrag Tcr2bacsu Tcr3strau Tcrbbacsu Toxacocca
PIR
P46105 P 13090 P28246 P45123 Q04733 P 12056 P32482 P44927 P27304 P31442 P46104 Q00538 P 11545
S 18539 A28124
A25854 A47033 $27558; JC1345
$22742 B29606
P42670 P23215 P33335 P3 7594 P39886 P36890 P07561 P 11063 P02983 P 13924 P 14512
S27687 $23 743 A23973 S09234; C25599 A03510; A04492 C25599 $42238
P23054
S0332 7
$42086; $40888
EMBL/GENBANK M64683 X58833 M20319 X63 703; U00080 U32804 Z13973 M22614 M64556; U12338 L39922 U32771 M86657 L10328 U40487 X59926 X66121 M18263 D37826 X 76855 X56628 L11640; U02077 D26057 M 80674 X60828 M11036 M63891; X51366 J01764 X 15669 M 16217 D38215 D26185; X08034 L48797
References
..
1 Paulsen, I. et al. (1996) Proc. Natl Acad. Sci. USA 93, 3 6 3 0 - 3 6 3 5 . z Paulsen, I. et al. (1996) Microbiol. Rev. 60, 5 7 5 - 6 0 8 . 3 Griffith, J. et al. (1992) Curr. Opin. Cell Biol. 4, 6 8 4 - 6 9 5 . 4 Marger, M.D. and Saier, M. (1993) Trends Biochem. Sci. 18, 1 3 - 2 0 . s Lewis, K. (1994) Trends Biochem. Sci. 19, 1 1 9 - 1 2 3 . 6 Nikaido, H. (1996) J. Bacteriol. 178, 5 8 5 3 - 5 8 5 9 . 7 Guay, G. et al. (1993) J. Bacteriol. 175, 4 9 2 7 - 4 9 2 9 . s Cheng, J. et al. (1994)J. Biol. Chem. 269, 2 7 3 6 5 - 2 7 3 7 1 . 9 Henderson, P.J.F. (1993) Curr. Opin. Cell Biol. 5, 7 0 8 - 7 2 1 . lo Varela, M. et al. (1995) Mol. Membr. Biol. 1 1 , 2 7 1 - 2 7 7 . 11 Paulsen, I. and Skurray, R. (1993)Gene 124, 1-11.
i/
4-Helix H+/multidrug antiporter family
Summary i::iiII~I :i: ::i::):!
%
: :
::!i::%: .... : .
....
:
.
.
.
.. .:
.
:.,::::.
.
.
Transporters of the 4-helix H+/multidrug antiporter family, the example of which is the EBR rnultidrug resistance protein of Staphylococcus aureus (Ebrstaau), mediate resistance to one or more structurally dissimilar antibiotics, antiseptics or disinfectants. The mechanism of transport, where known, is antiport (i.e. proton-coupled substrate efflux)I-4. Curiously, the amino acid sequences of a subgroup of the family, the SUGE proteins, are similar to chaperones, suggesting a possible role in protein transport. Family members occur in both gram-negative and gram-positive bacteria. Statistical analysis reveals no apparent relationship between the amino acid sequences of the 4-helix H+/multidrug antiporter family and any other family of transporters. Transporters may be encoded chromosomally or by transmissible plasmids. They are predicted to contain four membrane-spanning helices by the hydropathy of their amino acid sequences z-4 and activities of reporter gene fusions 4. Several amino acid sequence motifs are highly conserved in the 4-helix H+/ multidrug antiporter family that are necessary for function by the criterion of site-directed mutagenesis 2,a,s. N o m e n c l a t u r e , b i o l o g i c a l s o u r c e s and s u b s t r a t e s CODE
Ebrescco
~i i.!: .;G!:L
i! -~::;~: : .:?:.!~:i !:2 .ii.! :.:):i.:.-i'-U;: ': :: :"::
DESCRIPTION [SYNONYMS]
ORGANISM [COMMON NAMES] Ethidium bromide resistance Escherichia coli
protein [EBR, El] Ebrstaau Ethidium bromide resistance protein [Multidrug resistance protein, EBR, QACC, SMR] Emreescco Methyl viologen resistance protein [Ethidium resistance protein EMRE, EB, MVRC ] Qaceklepn Small multidrug export protein [QACE] Qacfstasp Quaternaryammonium compound resistance protein [QACF] Sugeescco Possible chaperone protein [SUGE] Sugeprovu Possible chaperone protein [SUGE]
SUBSTRATE(S)a [RESISTANCE(S)]b
[EB, QUACs]
[gram-negative bacterium] Staphylococcus aureus
H+/EB, QUACs
[gram-positivebacterium] [Methyl viologen, [gram-negativebacterium] EBR]
Escherichia coli
[QUACs, [gram-negative bacterium] antiseptics] Staphylococcus species [QUACs] [gram-positive bacterium]
Klebsiella pneumoniae
Escherichia coli
Proteins?
[gram-negative bacterium]
Proteins? [gram-negative bacterium] !L:.~i!% a Cotransported ions are listed for known antiporters. bPresumed substrates; protein confers resistance to specified compounds. ~i(-!?.i!%:i Abbreviations: QUAC, quaternary ammonium disinfectants; EB, ethidium bromide. .....
Proteus vulgaris
..
~5~
..... ~ ;,~:-....: .~..: :, ~:.:--::.-:.,..~,:-:::.:.:. ..::. :::~..: .....::,.: :
:..:~.v.--.:: :~,:
i%iii~<<}:: ?.;f<;Y.
Phylogenetic tree Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the phylogenetic tree: Qacfstasp (Ebrstaau).
;-;; ....
:::::::::::::::::::::::::::::::: :s
~.-:~. ~;,,..~ -~.::.....~..:
Sugeescco
#
.~.:~............. ::.: ......
Gi!}:~:{iii~i;/~i'~i
Sugeprov Ebreescco Qaceklepn Emreescco Ebrstaau
Proposed orientation of EBR in the membrane i!i'.~'4iiii.}!~i!'Z
The model is based on predictions of membrane-spanning regions and ~-helical content. The N-terminus of the protein is illustrated on the inside and is folded four times through the membrane 4. The predicted membrane~ spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75% of the aligned transporters (see below) are shown. Consensus residues indicated by an asterisk are not conserved in EBR. OUTSIDE GF
{::i:i{::i~::~::i~::!:Z~
L
.....
G"
~!7:',,!i :
!' .....' ,
,
,
~;~...~...,:..-..-~:,~
L i.
W':,i < :
NH 2
I'
COOH
INSIDE
Physical and genetic characteristics .... ;,~:~:~.:.~.......:, ....
NN~ :-;)?:~f ff-
::::::::::::::::::::::::::::
}54
Ebrescco Ebrstaau Emreescco Qaceklepn Qacfstasp Sugeescco Sugeprovu
AMINO ACIDS
MOL. WT
CHROMOSOMAL LOCUS
115 107 110 110 107 155 104
12 332 11 673 11 958 11472 11689 16186 11014
Plasmid pVS 1 and others Plasmid pTZ20 and others 12.22 minutes Plasmid R751 94.27 minutes
Multiple amino acid sequence alignments 50 Sugeescco Sugeprovu Ebrstaau Ebrescco Qaceklepn Emreescco Consensus
MPFVFSAIVT KVIVEIPLPP GKISVQPSAL QDDLQTPLFT GDGPNSPEPD .................................................. .................................................. ................................................. M ................................................. M ................................................. M ..................................................
Sugeescco Sugeprovu Ebrstaau Ebrescco Qaceklepn Emreescco Consensus
51 MSWIILVIAG LLEVVWAVGL KYTHGFSRLT MSWIILFVAG LLEIVWAVGL KYTHGFTRLT MPYIYLIIAI STEVIGSAFLKSSEGFSKFI KGWLFLVIAI VGEVIATSAL KSSEGFTKLA KGWLFLVIAI VGEVIATSAL KSSEGFTKLA NPYIYLGGAI LAEVIGTTLM KFSEGFTRLW ..... L . . A . . . E V ..... L K . . . G F . . L .
Sugeescco Sugeprovu Ebrstaau Ebrescco Qaceklepn Emreescco Consensus
i01 KSLPVGTAYAVWTGIGAVGA KGLPAGTAYA IWTGIGAVGT QHLPLNITYA TWAGLGLVLT KSIPVGVAYAVWSGLGVVII KSIPVGVAYAVWSGLGVVII AYIPTGIAYA IWSGVGIVLI ...P.G.AYA .W.G.G.V.T
Sugeescco Sugeprovu Ebrstaau Ebrescco Qaceklepn Emreescco Consensus
151 164 KLSTH ......... KLAS . . . . . . . . . . NIFGTSH ....... RSPSWKSLRRPTPW NLLSKASAH ..... NLLSRSTPH ..... ..............
I00 PSVITVTAMI VSMALLAWAM PSIITISAMI VSMGMLSYAM PSLGTIISFG ICFYFLSKTM PSAVVIIGYG IAFYFLSLVL PSAVVIIGYG IAFYFLSLVL PSVGTIICYC ASFWLLAQTL P S . . . I . . . . . . . . . L ....
150 AITGIVLLGE SANPMRLASL ALIVLGIIGL AIFGIIVFGE SANIYRLLSL AMIVFGIIGL TVVSIIIFKE QINLITIVSI VLIIVGVVSL TAIAWLLHGQ KLDAWGFVGM GLIIAAFLLA TAIAWLLHGQ KLDAWGFVGM GLIVSGVVVL SLLSWGFFGQ RLDLPAIIGM MLICAGVLII ........ G . . . . . . . . . . . . L I . . G ....
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the alignments: Qacfstasp (Ebrstaau). Residues listed in the consensus sequence are present in at least 75 % of the aligned transporter sequences.
Database accession numbers . . . . . . . . . . .
===================== ......
Ebrstaau Ebrescco Emreescco Qaceklepn Qacfstasp Sugeescco Sugeprovu
SWISSPR OT P14319 P14502 P23895
P30743 P20928
PIR S06924; JE0410 S07656; $21846 JN0329; $24063 A48905; $32181 $32181 $36340 S00120
EMBL/GENBANK M37888; X15574 U12416; X58425 M62732; Z11877 X68232 Z3 7964 U14003; X69949 X06151
References 1 a 3 * s
~5~
Lewis, K. (1994) Trends Biochem. Sci. 19, 119-123. Paulsen, I. et al. (1996) Microbiol. Rev. 60, 575-608. Paulsen, I. and Skurray, R. (1996) Mol. Microbiol. 19, 1167-1175. Paulsen, I. and Skurray, R. (1993)Gene 124, 1-11. Paulsen, I. et al. (1995) J. Bacteriol. 177, 2827-2833.
12-Helix H+/multidrug antiporter family .........::, ....... -~-.~ ..... ::. :.-,~...::
Summary
:~.i~:-::- ~.::~:::
Transporters of the 12-helix H§ antiporter family, the example of which is the TETA(C) tetracycline antiporter of Escherichia coli (Tcr2escco), mediate resistance to one or more structurally dissimilar antibiotics 1-a. One member (Arajescco) is involved in either the transport or processing of arabinose polymers. The mechanism of transport, where known, is antiport (i.e. H+-coupled substrate efflux). Some members of the 12-helix H§ multidrug antiporter family also confer one or more collateral effects, including reduced growth, particularly in non-fermentable carbon sources, i!~!;::!i.,/iii-~ complementation of potassium uptake defects and increased susceptibilities to heavy metals and cationic antibiotics s. By analogy to other cation-linked transporters, the basis of these effects could be a proton leak e'7. Two members of this family, NORA (Norastaau' and BMR1 (Bmrlbacsu) are inhibited by reserpine s, while TETA(B) is inhibited by 13-(3-chloropropyl) derivatives of 5-hydroxytetracycline 9. Members of the family have been found in both gram-negative and gram-positive bacteria. Homologous proteins of unknown function are also expressed in human brain lo. Transporters may be encoded chromosomally or by transmissible plasmids, the latter occurring frequently in antibiotic-resistant pathogens. Statistical analysis of multiple amino acid sequence comparisons places the 12-helix H+/multidrug antiporter family in the uniporter-symporter-antiporter USA superfamily, also known as the major facilitator superfamily (MFS)1,4. They are predicted to form 12 membrane-spanning helices by the hydropathy of their amino acid sequences 1,a,ll the activities of reporter gene fusions ~2 and susceptibility to proteolysis ~a. Based on complemention and reconstitution experiments, some members of the family may exist as a homodimer 14"1s. There is considerable similarity between the sequences of the N- and C-terminal halves of these proteins, further implying they arose through gene duplication of an ancestral six helix protein 1,16. Several amino acid sequence motifs are highly conserved in the 12-helix H§ multidrug antiporter family, including motifs unique to the family, signature motifs of the USA/MFS superfamily, and motifs necessary for function by the criterion of site-directed mutagenesis 1,4,17. Mutations affecting susceptibility to inhibitors s and substrate specificity have been reported is. . .
:~::-.-,
~x.;
:
. -~..
-!~:.: :..: -;~:. :-..
:~ .:- .: ,.-. .... ~ ~,:-. :.. .~ ,: .: .,.. .. :~:..:-.... :. :::-:. :..:.: ..... .-i~ :..,- ..,, ~..
.~
.?.
:. ,s:. . . .
:
9
..
,
::!.
qi
i
:C:.
!
~...
N o m e n c l a t u r e , b i o l o g i c a l s o u r c e s and s u b s t r a t e s ~...
:
:i -. :: ~-.:-.
l"
CODE
[SYNONYMS] Arajescco Brnrlbacsu Bmr2bacsu
::,:::.. ~:..::-...:1 ~..:- ::.::: .,
:i:..--.:~.:~ ~. ! .. :-:::~-:~. ;.. I
DESCRIPTION
Cmlrstrli Cmlvstrve Cmrrhofa
ARAJprecursor protein [ARAJ] Multidrug-H+ antiporter 1 [BMR1] Multidrug-H§ antiporter 2 [BMR2, BLT,BMTI Chloramphenicolresistance protein [CMLR] Chloramphenicolresistance protein [CMLV] Chloramphenicolresistance protein [CMRR]
OR GANISM [COMMON N A M E S ] Escherichia coli
S UBSTRATE(S)a [RESISTANCE(S)] b
Bacillus subtilis
H+/multiple drugs
Arabinose [gram-negative bacterium] polymers Bacillus subtilis H+/multiple drugs [gram-positive bacterium] [gram-positive bacterium] Streptomyces lividans
[gram-posiuve bacterium] Streptomyces venezuelae
[gram-posiuve bacterium] Rhodococcus
fasciens
[Chloramphenicol] [Chloramphenicol] [Chloramphenicol]
[gram-negauve bacterium]
35~
,<-:-<<..
:
CODE
DESCRIPTION [SYNONYMS]
ORGANISM [COMMON NAMES] Staphylococcus aureus
Norastaau Quinolone-H* antiporter
[NOKa]
~! ;:i::::::::' :5
..
.......
Tcrlescco
SUBSTRATE(S)a [RESISTANCE(S)]b
H+/multiple drugs
[gram-positive bacterium]
Tetracycline-H*antiporter [TCR1, TETA(B)] Tetracycline-H*antiporter [TCR2, TETA(C)] Tetracycline-H*antiporter [TCR3, TETA(A)] Tetracycline-H*antiporter [TCR1, TETA(D)] Tetracycline-H+ antiporter [TCR4, TETA(E)] Tetracycline-H*antiporter [TETG, TETA(G)] Tetracycline-H* antiporter [TETH, TETA(H)]
W/tetracycline[gram-neganve bacterium] metal chelate Tcr2escco Escherichia coli W/tetracycline[gram-negative bacterium] metal chelate Tcr3escco Escherichia cold W/tetracycline[gram-neganve bacterium] metal chelate Tcrlsalor Salmonella ordonez W/tetracycline[gram-neganve bacterium] metal chelate Tcr4escco Escherichia coli H+/tetracyclme[gram-neganve bacterium] metal chelate Tetgviban Vibrio anguillarum W/tetracycline[gram-negative bacterium] metal chelate Tethpasmu Pasteurella multocida W/tetracycline[gram-negative bacterium] metal chelate a Cotransported ions are listed for known antiporters. bPresumed substrates; protein confers resistance to specified compounds. Escherichia coli
Phylogenetic tree 1
~?i;ii!i::i~iii'.
_ I
I [
[
.....
~..~
................... .............. zzz;":
~7
rj~::#.4:-~::~-~::-~
ZZ:7 ZZY
: 7:: Y : ::
..............
...............
NI
~5~
-- --
-[
......
Cmlrstrli Cmlvstrve irajescco Tcr2escco Tcr3escco Tetgviban Tcrlsaior Tcr4escco Tethpasmu Bmrlbacsu Bmr2bacsu Norastaau
Proposed orientation of TETA(C) in the membrane The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane 12.The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75% of the aligned transporters (see below} are shown. Consensus residues indicated by an asterisk are not conserved in TETA(C).
OUTSIDE
i(:~;!~::?~:~i.iii.:,~ ;i~.~:S:.::~ -:/.~:i
:~,. ~.~.~.~
L GL
~,.:...
L
G
..................
A
G
A
A
:
A
G
V
A L
P
G G
A
A
NSi;!;i:ii!"i ............
:~........... .:.,
NH
r~-:~.~..~:~... ~;:~.:
2
COOH
INSIDE
r.~::r
....
Physical and genetic characteristics
.... ....::-.:.::..-
AMINO ACIDS 394 389 400 392 436 391 388 401 396 399 395 405 393 400
!!~,~tff.--:-ff'.~
.............. ................. ~,~?~.::~.~:.~:~::~::~:.~.~
~-~v~-~:~::,~:.~-.~ ........... o.~ ............
.~::;~-~;::~:.~;~>.t
......................... ...,
..............
Arajescco Bmrlbacsu Bmr2bacsu Cmlrstrli Cmlvstrve Cmrrhofa Norastaau Tcr 1escco Tcr2escco Tcr3escco Tcrl salor Tcr4escco Tetgviban Tethpasmu
MOL. WT
41 926 42258 43 424 38855 43 811 40 323 42 265 43 267 41 510 42 23 7 41 035 43 412 40 881 43 672
CHROMOSOMAL LOCUS 8.84minutes
Km
230 ~
Tetracycline: 36 ~IM
19
Transposon T n l 0 Plasmid pSC 101 Transposon Tn 1721 Plasmid pRA1
Multiple amino acid sequence alignments
............
~:~..,.:-~:~.~
Cmlvstrve Cmlrst[li Cmrrhofa Arajescco Tcr2escco Tcr3escco Tetgviban Tcrlescco Tcrlsalor Tcr4escco Tethpasmu Bmrlbacsu
1
........... ........... ........... ........... ..... M K S N N ..... M K P N I ....... VRS ....... MNS ....... MNK ....... MNR ....... MNK .... M E K K N I
MPFAIYVLG MPLPLYLLA MPFAIYVLG MKKVILSLA ALIVILGTVT PLIVILSTVA SAIIALLIVG STKIALVITL PAVIALVITL TVMMALVIIF SIIIILLITV TLTILLTNLF
.
IAVFAQGTSE VAVCAMGTSE IAVFAQGTSE LGTFGLGMAE LDAVGIGLVM LDAVGIGLIM LDAMGLGLIM LDAMGIGLIM LDAMGIGLIM LDAMGIGIIM LDAIGIGLIM IAFLGIGLVI
FMLSGLIPDM FMLAGLVPDI FMLSGLIPDM FGIMGVLTEL PVLPGLLRDI PVLPGLLRDL PVLPTLLREL PVLPTLLREF PVLPSLLREY PVLPALLREF PVLPTLLNEF PVTPTIMNEL
50
AQDLQVSVPT ASDLGVTVGT AQDLQVSVPT AHNVGISIPA VHSDSIA.SH VHSNDVT.AH VPAEQVA.GH IASEDIA.NH LPEADVA.NH VGKANVA.EN VSENSLA.TH HLS...G.TA
i! :i:!::11
..
Bmr2bacsu MKKSINEQKT IFIILLSNIF VAFLGIGLII PVMPSFMKIM HLS...G.ST N o r a s t a a u ....... MNK Q I F V L Y F N I F L I F L G I G L V I P V L P V Y L K D L G L T . . . G . S D C o n s e n s u s . . . . . . . . . . . . . . . L . . . . . . . . G.G ..... L . . . . . . . . . . . . . . . . .
"
.
:'iii?. :-:--: 7. .
.
.
..
.
:i:. - -. :-:
..
!i!:~iii(;)~:i!:
.
..
......
: % : / : . . .
i:i~ ih:.4
..
Cmlvstrve Cmlrstrli Cmrrhofa Arajescco Tcr2escco Tcr3escco Tetgviban Tcrlescco Tcrlsalor Tcr4escco Tethpasmu Bmrlbacsu Bmr2bacsu Norastaau Consensus
51 i00 AGLLTSAFAI GMIIGAPLMA IVSMRWQRRR ALLTFLITFMVVHVIGALTD AGTLTSAFAT GMIVGAPLVA ALARTWPRRS SLLGFILAFAAAHAVGAGTT AGLLTSAFAI GMIIGAPLMA IVSMRWQRRR ALLTFLITFMVVHVIGALTD AGHMISYYAL GVVVGAPIIA LFSSRYSLKH ILLFLVALCV IGNAMFTLSS YGVLLALYAL MQFLCAPVLG ALSDRFGRRP VLLASLLGAT IDYAIMATTP YGILLALYAL VQFACAPVLG ALSDRFGRRP ILLVSLAGAT VDYAIMATAP YGALLSLYAL MQWFAPMLG QLSDSYGRRP VLLASLAGAAVDYTIMASAP FGVLLALYAL MQVIFAPWLG KMSDRFGRRP VLLLSLIGAS LDYLLLAFSS YGILLALYAV MQVCFAPLLG RWSDKLGRRP VLLLSLAGAAFDYTLLALSN YGVLLALYAM MQVIFAPLLG RWSDRIGRRP VLLLSLLGAT LDYALMATAS YGVLLALYAT MQVIFAPILG RLSDKYGRKP ILLFSLLGAALDYLLMAFST VGYMVACFAI TQLIVSPIAG RWVDRFGRKI MIVIGLLFFS VSEFLFGIGK MGYLVAAFAI SQLITSPFAG RWVDRFGRKK MIILGLLIFS LSELIFGLGT LGLLVAAFAL SQMIISPFGG TLADKLGKKL IICIGLILFS VSEFMFAVGH .G.L .... A ...... AP . . . . . . . . . . R . . . L L . . L . . . . . . . . . . A...
Cmlvstrve Cmlrstrli Cmrrhofa Arajescco Tcr2escco Tcr3escco Tetgviban Tcrlescco Tcrlsalor Tcr4escco Tethpasmu Bmrlbacsu Bmr2bacsu Norastaau Consensus
i01 150 SFGVLLVTRI VGALANAGFL AVALGAAMSM VPADMKGRAT SVLLGGVTIA SFPVLVACRV VAALANAGFL AVALTTAAAL VPADKQGRAL AVLLSGTTVA SFGVLLVTRI VGALANAGFL AVALGAAMSM VPADMKGRAT SVLLGGVTIA SYLMLAIGRL VSGFPHGAFF GVGAIVLSKI IKPGKVTAAV AGMVSGMTVA VLWILYAGRI VAGITGA.TG AVAGAYIADI TDGEDRARHF GLMSACFGVG FLWVLYIGRI VAGITGA.TG AVAGAYIADI TDGDERARHF GFMSACFGFG VLWVLYIGRL VSGVTGA.TG AVAASTIADS TGEGSRARWF GYMGACYGAG ALWMLYLGRL LSGITGA.TG AVAASVIADT TSASQRVKWF GWLGASFGLG VLWMLYLGRI ISGITGA.TG AVAASVVADS TAVSERTAWF GRLGAAFGAG V V W V L Y L G R L I A G I T G A TG A V A A S T I A D V T P E E S R T H W F G M M G A C F G G G TLWMLYIGRI IAGITGA.TG AVCASAMSDV TPAKNRTRYF GFLGGVFGVG TVEMLFITRM LGGISAPFIM PGVTAFIADI TTIKTRPKAL GYMSAAISTG HVSIFYFSRI LGGVSAAFIM PAVTAYVADI TTLKERSKAM GYVSAAISTG NFSVLMLSRV IGGMSAGMVM PGVTGLIADI SPSHQKAKNF GYMSAIINSG .... L . . . R . . . . . . . A .... V ..... A . . . . . . . . . . . . . . . . . . . . . .
Cmlvstrve Cmlrstrli Cmrrhofa Arajescco Tcr2escco Tcr3escco Tetgviban Tcrlescco Tcrlsalor Tcr4escco Tethpasmu Bmrlbacsu Bmr2bacsu Norastaau Consensus
151 200 CWGVPGGAL LGELWGWRAS FWEVVLISAP AVAAIMASTP ADSPTDSVPN TVAGVPGGSL LGTWLGWRAT FWAVAVCCLPAAFGVLKAIP AGRATAAATG CVVGVPGGAL LGELWGWRAS FWEVVLISAP AVAAIMASTP ADSPTDSVPN N L L G I P L G T Y L S Q E F S W R Y T F L L I A V F N I A V M A S V Y F W V P DIRDEAK... MVAGPVAGGL LGAISLHAPF LAAAVLNGLN LLLGCFLMQE SHK.GERRPM MVAGPVLGGL MGGFSPHAPF FAAAALNGLN FLTGCFLLPE SHK.GERRPL MIAGPALGGM LGGISAHAPF IAAALLNGFA FLLACIFLKE THH.SHGGTG LIAGPIIGGF AGEISPHSPF FIAALLNIVT FLVVMFWFRE TKN.TRDNTD LIAGPAIGGL AGDISPHLPF VIAAILNACT FLMVFFIFKP AVQ.TEEKPA MIAGPVIGGF AGQLSVQAPF MFAAAINGLA FLVSLFILHE THN.ANQVSD LIIGPMLGGL LGDISAHMPF IFAAISHSIL LILSLLFFRE TQK.REALVA FIIGPGIGGF LAEVHSRLPF FFAAAFALLA AILSILTLRE PERNPENQEI FIIGPGAGGF IAGFGIRMPF FFASAIALIA AVTSVFILKE SLSIEERHQL F I L G P G I G G F M A E V S H R M P F Y F A G A L G I L A F I M S I V L I H D PKKSTTS... ...G...G . . . . . . . . . . . . . . A . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.:
i((i// :-: -i: ~:i~i.i~
:::-:<..
) .-:..-:
.
.
.,. ....... . : 9: ., .:(-;.T.
:~!!:i:~t:!i>;i ::!..: < :T. ~
.:-.-:: ; . :...
........
.
.:.7.. ... .... . . . . .. .. . ....L - ;
:
..
b-i...
:
Cmlvstrve Cmlrstrli Cmrrhofa Arajescco Tcr2escco Tcr3escco Tetgviban Tcrlescco Tcrlsalor Tcr4escco Tethpasmu Bmrlbacsu Bmr2bacsu Norastaau Consensus
201 250 ATR...ELSS LRQRKLQLIL VLGALINGAT FCSFTYLAPT L ..... TDVA GPPLRVELAALKTPRLLLAM LLGALVNAAT FASFTFLAPV V ..... TDTA ATR...ELSS LRQRKLQLIL VLGALINGAT FCSFTYLAPT L ..... TDVA .GNLREQFHF L R S P A P W L I F A A T M F G N A G V FAWFSYVKPY M ..... MFIS PLRAFNPVSS FRWARGM.TI VAALMTVFFI MQLVGQVPAALWVIFGEDRF RREALNPLSF VRWARGM.TV VAALMAVFFI MQLVGQVPAALWVIFGEDRF KPVRIKPFVL LRLDDAL.RG LGALFAVFFI IQLIGQVPAALWVIYGEDRF T.EVGVETQS NSVYITLFKT MPILLIIYFS A Q L I G Q I P A T V W V L F T E N R F E.Q..KQESA GISFITLLKP LALLLFVFFT A Q L I G Q I P A T V W V L F T E S R F ELKNETINET TSSIREMISP L S G L L W F F I IQLIGQIPAT LWVLFGEERF NRTPENQTAS NTVTVFFKKS LYFWLATYFI IQLIGQIPAT IWVLFTQYRF KGQK ...... TGFKRIFAPM YFIAFLIILI SSFGLASFES LFALFVDHKF SSHTKESNFI KDLKRSIHPV YFIAFIIVFV MAFGLSAYET VFSLFSDHKF GFQKLEPQLL TKINW...KV FITPVILTLV LSFGLSAFET LYSLYTADKV ..................................................
Cmlvstrve Cmlrstrli Cmrrhofa Arajescco Tcr2escco Tcr3escco Tetgviban Tcrlescco Tcrlsalor Tcr4escco Tethpasmu Bmrlbacsu Bmr2bacsu Norastaau Consensus
251 300 GFDSRWIPLL LGLFG.LGSF IGVSVGGRLA DT.RPFQLLV AGSAALLVGW GLGDLWISVA LVLFG.AGSF AGVTVAGRLS DR.RPAQVLA VAGPLLLVGW GFDSRWIPLL LGLFG.LGSF IGVSVGGRLA DT.RPFQLLV AGSAALLVGW GFSETAMTFI MMLVG.LGMV LGNMLSGRIS G R Y S P L R I A A V T D F I I V L A L RWSATMIGLS LAVFGILHAL AQAFVTGPAT KRFGEKQAII AGMAADALGY HWDATTIGIS LAAFGILHSL AQAMITGPVA ARLGERRALM LGMIADGTGY QWNTATVGLS LAAFGATHAI FQAFVTGPLS SRLGERRTLL FGMAADGTGF GWNSMMVGFS LAGLGLLHSV FQAFVAGRIA TKWGEKTAVL LGFIADSSAF AWDSAAVGFS LAGLGAMHAL FQAVVAGALA KRLSEKTIIF AGFIADATAF AWDGVMVGVS LAVFGLTHAL FQGLAAGFIA KHLGERKAIA VGILADGCGL DWNTTSIGMS LAVLGVLHIF FQAIVAGKLA QKWGEKTTIM ISMSIDMMGC GFTASDIAIM ITGGAIVGAI TQVVLFDRFT RWFGEIHLIR YSLILSTSLV GFTPKDIAAI I T I S S I V A W I Q V L L F G K L V NKLGEKRMIQ LCLITGAILA NYSPKDISIA ITGGGIFGAL FQIYFFDKFM KYFSELTFIA WSLLYSVVVL .............. G ........... G .......................
Cmlvstrve Cmlrstrli Cmrrhofa Arajescco Tcr2escco Tcr3escco Tetgviban Tcrlescco Tcrlsalor Tcr4escco Tethpasmu Bmrlbacsu Bmr2bacsu Norastaau Consensus
301 350 IVFAITASHPVVTLVMLFVQ GTLSFAVGST LISRVLYVAD GAPTLGGSFA PALAMLADRP VALLTLVFVQ GALSFALGST L I T R V L Y E A A G A P T M A G S Y A IVFAITASHPVVTLVMLFVQ GTLSFAVGST LISRVLYVAD GAPTLGGSFA LMLFFCGGMK TTSLIFAFIC CAGLFALSAP LQILLLQNAK GGELLGAAGG VLLAFATRGW MAFPI.MILL ASGGIGMPAL QAMLSRQVDD DHQGQLQGSL ILLAFATRGW MAFPI.MVLL ASGGIGMPAL QAMLSRQVDE ERQGQLQGSL VLLAFATQGW M V F P I . L L L L A A G G V G M P A L QAMLSNNVSS NKQGALQGTL AFLAFISEGW LVFPV LILL AGGGIALPAL QGVMSIQTKS HQQGALQGLL LLMSAITSGW MVYPV.LILL AGGGIALPAL QGIISAGASA ANQGKLQGVL FLLAVITQSW MVWPV LLLL ACGGITLPAL QGIISVRVGQ VAQGQLQGVL L L L A W I G H V W V I L P A . L I C L A A G G M G Q P A L QGYLSKSVDD NAQGKLQGTL FLLTTVHSYV AILLVTVTVF VGFDLMRPAV TTYLS.KIAG NEQGFAGGMN FVSTVMSGFL TVLLVTCFIF LAFDLLRPAL TAHLS.NMAG NQQGFVAGMN ILLVFANGYW SIMLISFVVF IGFDMIRPAI TNYFS.NIAG ERQGFAGGLN ............................ A .....................
351 400 Cmlvstrve TAAFNVGAAL GPALGGVAIG IGMGYRAPLW TSAALVALAI VIGAATWTRW Cmlrstr li TAALNVGAAA GPLVAATTLG HTTGNLGPLW ASGLLVAVAL LVAFPFRTVI
361
Cmrrhofa TAAFNVGAAL Arajescco QIAFNLGSAV Tcr2esccoAALTSLTSIT Tcr3esccoAALTSLTSIV Tetgviban TSLTNLSSIA Tcrlescco VSLTNATGVI Tcrlsalor VSLTNLTGVA Tcr4escco TSLTHLTAVI Tethpasmu VSLTNITGII Bmrlbacsu SMFTSIGNVF Bmr2bacsu STYTSLGNIF Norastaau STFTSMGNFI C o n s e n s u s .......... Cmlvstrve Cmlr str li Cmr r ho f a Arajescco Tcr2escco Tcr 3 e s c c o Tetgviban Tcrlescco Tcr isalor Tcr4escco Tethpasmu Bmr i b a c s u Bmr 2 b a c s u Norastaau Consensus
GPALGGVAIG IGMGYRAPLW TSAALVALAI VIGAATWTRW GAYCGGMMLT LGLAYNYVAL .PAALLSFAAMSSLLLYGRY GPLIVTAIYA ASASTWNGLA WIVGAALYLV CLPALRRGAW GPLLFTAIYA ASITTWNGWA WIAGAALYLL CLPALRRGLW GPLGFTALYS ATAGAWNGWVWIVGAILYLI CLPILRRPFA GPLLFAVIYN HSLPIWDGWI WIIGLAFYCI IILLSMTFML GPLLFAFIFS QTQQSADGTV WLIGTALYGL LLAICLLIRK GPLVFAFLYS ATRETWNGWVWIIGCGLYVVALIILRFFHP GPLLFAFIYS YSVAYWDGLL WLMGAILYAM LLITAYFHQR GPIIGGMLFD IDVNY..PFY FATVTLAIGIALTIAWKAPA GPALGGILFD LNIHY..PFL FAGFVMIVGL GLTMVWKEKK GPLIAGALFD VHIEA..PIY MAIGVSLAGV VIVLIEKQHR GP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
401 416 R E P R P A L D T V PP .... T T A A P A D A T R ...... RE P R P A L D T V PP .... KRQQAADTPV LAKPLG SRATST .......... SGAGQRADR ....... TSLVIPSQ ........ T P Q A Q G S K Q E TSA... PAPVAATC ........ GRVIHPINKS DVQQRI K T T P K A V I S T P ..... HLKAST .......... NDAAALN ......... AKLKEQNM ........ ................
Residues listed in the consensus sequence are present in at least 75 % of the aligned transporter sequences. Residues indicated by boldface type are also conserved in at least one other family of the USA/MFS superfamily.
Database accession numbers Arajescco Bmrlbacsu Bmr2bacsu Cmlrstrli Cmlvstrve Cmrrhofa Norastaau Tcrl escco Tcr 1salor Tcr2escco Tcr3escco Tcr4escco Tetgviban Tethpasmu
N 362
SWISSPR OT P23910 P33449 P39843 P31141
P21191 P02980 P33 733 P02981 P02982 Q07282
PIR $27549; B43750 A39705
S18593 $25183 A3 7838 A03507 $30286 A03508 A03509 A36896
EMBL/GENBANK M64787 L25604; M33628 L32599 X59968 U09991 Z 12001 D90119; M62960 V00611; J01830 X65876 V01119 X00006 L06940 $52437 U00792
References 1 Griffith, J. et al. (1992) Curr. Opin. Cell Biol. 4, 684-695. 2 Marger, M.D. and Saier, M. (1993) Trends Biochem. Sci. 18, 13-20.
|174
@
a Lewis, K. (1994) Trends Biochem. Sci. 19, 119-123. 4 Paulsen, I. et al. (1996) Mierobiol. Rev. 60, 575-608. s Grifflth, J. et al. (1994) Mol. Membr. Biol. 11,271-277. 6 Wright, E.M. et al. (1996)Curt. opm. Cell Biol. 8, 468-473. z Fisehbarg, J. and Vera, J.C. (1995) Am. J. Physiol. 268, C1077-C1089. s Ahmed, M. et al. (1993) J. Biol. Chem. 268, 11086-11089. 9 Nelson, M.L. et al. (1994} J. Med. Chem. 37, 1355-1361. zo Duyao, M. et al. (1993} Hum. Mol. Genet. 2, 673-676. 11 Henderson, P.J.E (1993) Curr. Opin. Cell Biol. 5, 708-721. 12 Allard, J. and Bertrand, K. (1992} J. Biol. Chem. 9.67, 17809-17819. la Eckert, B. and Beck, C. (1989} J. Biol. Chem. 264, 11663-11670. 14 Yamaguchi, A. et al. (1993} FEBS Lett. 324, 131-135. is Rubm, R.A. and Levy, S.B. {1991} J. Bacteriol. 173, 4503-4509. 16 Rubin, R.A. et al. {1990)Gene 87, 7-13. 17 Varela, M. and Grifflth, J. {1994} Mol. Membr. Biol. 11,271-277. is Guay, G. et al. {1994} Antimicrob. Agents Chemother. 38, 857-860. 19 Yamaguchi, A. et al. (1993) J. Biol. Chem. 268, 26990-26995.
363
Acriflavin-cation resistance family Summary Transporters of the acriflavin-cation resistance family, the example of which is the ACRB acriflavin resistance protein of Escherichia cold (Acrbescco), mediate resistance to one or more structurally dissimilar antibiotics, including fl-lactams, fluoroquinolones and erythromycin; and ions, including nickel, cadmium and cobalt. The mechanism of transport is proton motive force dependent 1. An interesting feature of the acriflavin-cation resistance family is that a paired "linker" protein (e.g. ACRA) is believed to bring the transporter protein in apposition with a corresponding outer-membrane channel (e.g. TOLB), allowing direct efflux through the inner and outer membranes 1-3. Known members of the family occur only m gram-negative bacteria. Statistical analysis reveals no significant relationship between the amino acid sequences of the acriflavin-cation resistance family and any other family of transporters. However, other membrane proteins, including i!::!!~!!i!~:!i members of the 14-helix H+/multidrug antiporter family, are similarly coupled to outer-membrane channels 1,z. Members of the acriflavin-cation resistance family are predicted to form 12 membrane-spanning helices by the hydropathy of their amino acid sequences. !:...;.:_:.~...I Several amino acid sequence motifs are highly conserved in the acriflavincation resistance family 3. There is considerable similarity between the sequences of the N- and C-terminal halves of these proteins, further implying they arose through gene duplication of an ancestral six helix protein 3,4. .....
:.
-i
iiiii;iiill !:-.-C,.
-:: I
::.: : : ? i..:
:1 ~
! & ?.
.......
Nomenclature, biological sources and substrates : :~. ::.;,/ ...............
CODE
DESCRIPTION
[SYNONYMS] .
..
i- t:. ::: : [...-9-
"
!i ! ? < < ;:
Acrbescco Acriflavinresistance protein [ACRB, ACRE] Acrdescco Acriflavinresistance protein [ACRD] Acrfescco Acriflavinresistance protein [ACRF,ENVD] Cnraalceu Nickel-cobaltresistance protein [CNRA] Czcaalceu Cationefflux protein
[CZCA] !-:i-: ::?: :
OR GANISM [COMMON NAMES] Escherichia cold
SUBSTRATE(S) [RESISTANCE] ~
[Acriflavin, [gram-negative bacterium] multipledrugs] Escherichia cold [Acriflavin, [gram-negative bacterium] multipledrugs] Escherichia cold [Acriflavin] [gram-negative bacterium] Alcaligenes eutrophus Nickel, cobalt [gram-negative bacterium] Alcaligenes eutrophus Cobalt,zinc, [gram-negative bacterium] cadmium
aPresumed substrates; protein confers resistance to specified compounds.
Phylogenetic tree !
}64
Acrbescco icrfescco hcrdescco Cnraalceu Czcaalceu
ii,::i~ Proposed orientation of ACRB in the membrane i~i~: :~ ~
ii'(:i/(:, ,i'::.....i~ il.::::r: .;:,:": i:(:i~;~i): il- ': ii::// :!ii:!::::~:i::i!~: i!~/.-:::
The model is based on predictions of membrane-spanning regions and ~-helical content. The N - t e r m i n u s of the protein is illustrated on the inside and is folded 12 times through the m e m b r a n e . T h e predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75% of the aligned transporters (see below} are shown. C o n s e n s u s residues indicated by an asterisk are not conserved in ACRB. OUTSIDE
!?i" : :::- : ..
N LK R I G A Q
:i.:. -::- :.
:.:"ii"::i: :::i.
A
L P G Y T V I--
VR P D K A G
PLK I G N R
: :.: --
: "
A
-.......,., i
i
G
.-
N
i---::i
A
! -.. : ! ::. : i " ~:'. " :!-i,"
: " " :--~ :- .
i
:
-: :. :::
:-
E G E V* A V D
.... "
"
-
c-
..
,:
[.-
:.
.
:
L* G V D
A Q L V
G
A
Y
D
P
G
L
F
T
T
V
:""
Y
P L
A
N
..
O
]
o
a
.:, :"::
l
D
I-
P
~
~"
L
V
G
K
~
V T
G
LV ~
10
LL p
GL D
U.
380:386
F
S
LA
RA
~.
V
--1
VRDFGG
Q
A~
G
G
LIP
v
V
F PA
LA ,I
440i
4gg
L
N
v
880
096
G AT L
L
TML F
PR
g,m
973
LA
54O
1007
VG G
F
V
413
,~
,i.i s
m
FV
A
B.I
G
S
F'r
Li
M
VA
P
E
I E
IT
F
L
G SPVVR
R
G
v
G
G
G
IP
v
D TT*A
!
P
mt
,,
I
N REL
Y
G
G
:
COOH
.
INSIDE
i::~:~:::! Physical and genetic characteristics i
" . i
i:,~:: i: ::': i :: i L ~. .:~. . I :i i:: [)::( ~
Acrbescco Acrdescco Acrfescco i:::~i.;:.:.~i. Cnraalceu !:.:::5~':: : Czcaalceu
AMINO ACIDS
MOL. W T
CHROMOSOMAL LOCUS
1049 1038 1034 1075 1063
113 573 113 070 111454 115 583 115 644
83.49 minutes 55.58 minutes 73.51 minutes
~6~
Multiple amino acid sequence alignments
il;i, ii!i!iiiiii=
DRPIFAWVIA IIIMLAGGLA ILKLPVAQYP RRPIFAWVLA IILMMAGALA ILQLPVAQYP DRPIFAWVLA ILLCLTGTLA IFSLPVEQYP RYRWLVLFLT AVVAVIGAWQ LNLLPIDVTP QQRWLVLLAV FGMAGLGIFS YNRLPIDAVP . . . . . . . . . . . . . . . . G ...... LP .... P
50 TIAPPAVTIS TIAPPAVSVS DLAPPNVRVT DITNKQVQIN DITNVQVQVN .I .... V...
51 Acrbescco ASYPGADAKT VQDTVTQVIE QNMNGIDNLM YMSSNSDSTG Acrfescco ANYPGADAQT VQDTVTQVIE QNMNGIDNLM YMSSTSDSAG Acrdescco A N Y P G A S A Q T L E N T V T Q V I E Q N M T G L D N L M Y M S S Q S S G T G Cnraalceu SWPTMSPVE VEKRVTYPIE TAIAGLNGVE STRSMSR.NG Czcaalceu TSAPGYSPLE TEQRATYPIEVVMAGLPGLE QTRSLSR.YG C o n s e n s u s ...PG . . . . . . . . . VT..IE ..M.G...L .... S . S . . . G
i00 TVQITLTFES SVTITLTFQS QASVTLSFKA FSQVTVIFKE LSQVTVIFKD .... T..F..
Acrbescr Acrfescco Acrdescco Cnraalceu Czcaalceu Consensus
1 .... MPNFFI .... MANFFI .... MANFFI MIESILSGSV MFERIISFAI ........ FI
I01 150 A c r b e s c c o G T D A D I A Q V Q V Q N K L Q L A M P L L P Q E V Q Q Q G V S V E K S S S S F LM ........ A c r f e s c c o G T D P D I A Q V Q V Q N K L Q L A T P L L P Q E V Q Q Q G I S V E K S S S S Y LM ........ Acrdescco G T D P D E A V Q Q V Q N Q L Q S A M R K L P Q A V Q N Q G V T V R K T G D T N IL ........ Cnraalceu SANLYFMRHE VSERLAQARP NLPENVEPQM GPVSTGLGEV FHYSVEYQYP Czcaalceu GTDVYFARQLVNQRIQEAKD NLPEGWPAMGPISTGLGEI YLWTVE .... ConsensusGTD...A... V...LQ.A...LP..V..Q...V ................. Acrbescco Acrfescco Acrdescco Cnraalceu Czcaalceu Consensus
151 ..WGVINTD GTMTQEDISD ..VAGFVSDN P G T T Q D D I S D ..TIAFVSTD G S M D K Q D I A D DGTGASIKDG EPGWQSDGSF ........ AE E G A R K A D G T A . . . . . . . . . . . . . . . . D...
200 YVAAN . . . . . . . . . . . . . . . . . . . . MKDAI YVASN .................... VKDTL Y V A S N . . . . . . . . . . . . . . . . . . . . IQDPL LTERGERLDD RVSRLAYLRT VQDWIIRPQL YTPTD . . . . . . . . . . . . LRE I Q D W V V R P Q L Y ............................ L
Acrbescr Acrfescco Acrdescr Cnraalceu Czcaalceu Consensus
201 250 SRTSGVGDVQ LFGSQYAMRI WMNP..NELN KFQLTPVDVI TAIKAQNAQV SRLNGVGDVQ LFGAQYAMRIWLDA..DLLN KYKLTPVDVI NQLKVQNDQI SRVNGVGDID AYGSQYSMRI WLDP..AKLN SFQMTAKDVT DAIESQNAQI RTTPGVADVD SLGG.YVKQFVVEPDTGKMA AYGVSYADLA RALEDTNLSV RNVPGVTEIN TIGG.FNKQY LVAPSLERLA SYGLTLTDVVNALNKNNDNV .... GV.D .... G..Y . . . . . . . P .... L ..... T . . D V . . . . . . . N...
251 300 AcrbesccoAAGQLGGTPP VKGQQLNASI IAQTRLTSTE EFGKILLKVNQDGSRVLLRD AcrfesccoAAGQLGGTPA LPGQQLNASI IAQTRFKNPE EFGKVTLRVNSDGSWRLKD Acrdescr A V G Q L G G T P S V D K Q A L N A T I N A Q S L L Q T P E Q F R D I T L R V N Q D G S E V R L G D C n r a a l c e u GAN ...... F I R R S G E S Y L V R A D A R I K S A D E I S R A V I . A H G K . M S H H V G Q C z c a a l c e u GAG ...... Y I E R R G E Q Y L V R A P G Q V A S E D D I R N I I V . G T A Q G Q P I R I R D C o n s e n s u s . . . . . . . . . . . . . . . . . . . . . A . . . . . . . . . . . . . . . . . . . . G ...... D 301 350 Acrbescco VAKIELGGEN YDIIAEFNGQ PASGLGIKLA TGANALDTAA AIRAELAKME Acrfescco VARVELGGEN YNVIARINGK PAAGLGIKLA TGANALDTAK AIKAKLAELQ Acrdescco VATVEMGAEK YDYLSRFNGK PASGLGVKLA SGANEMATGE LVLNRLDELA
36r
Cnraalceu VARVKIGGEL RSGAASRNGN ETVVGSALML VGANSRTVAQ AVGDKLEQIS Czcaalceu IGDVEIGKEL RTGAATENGK EVVLGTVFML IGENSRAVSK AVDEKVASIN C o n s e n s u s V A . V E . G . E . . . . . A . . N G . . . . . . . . . . . . G A N ...... A .... L ....
9
.
Acrbescco Acrfescco Acrdescco Cnraalceu Czcaalceu Consensus
351 PFFPSGLKIV YPYDTTPFVK ISIHEVVKTL PFFPQGMKVL YPYDTTPFVQ LSIHEVVKTL QYFPHGLEYK VAYETTSFVK ASIEDVVKTL KTLPPGVVIV PTLNRSQLVI ATIETVAKNL RTMPEGVKIV TVYDRTRLVD KAIATVKKNL ...P.G ...... Y . . T . . V . . . I . . V . K . L
Acrbescco Acrfescco Acrdescco Cnraalceu Czcaalceu Consensus
401 TLIPTIAVPV VLLGTFAVLA AFGFSINTLT MFGMVLAIGL TLIPTIAVPV VLLGTFAILA AFGYSINTLT MFGMVLAIGL TLIPTIAEPV VLMGTFSVLY AFGYSVNTLT MFAMVLAIGL ATIAALVIPL SLLVSAIGMN QFHISGNLMS LGA..LDFGL ALITATIIPL AMLFTFTGMV NYKISANLMS LGA..LDFGI .LI ..... P . . L L . T F . . . . . . . . S.N . . . . . . . . L A . G L
:-:::.i..:%:
~!!}i?~:%:,.~i!?
451 A c r b e s c c o N V E R V M A E E G ...... LPPK A c r f e s c c o N V E R V M M E D K ...... LPPK A c r d e s c c o N V E R I M S E E G . ._. . . LTPR Cnraalceu NSLRRLAERQ HREGRLLTLD Czcaalceu NCVRRLAHAQ EHHGRPLTRS C o n s e n s u s N . . R . . . E . . . . . . . . L...
.
.
.
.
500 EATRKSMGQI QGA...LVGI AMVLSAVFVP EATEKSMSQI QGA...LVGI AMVLSAVFIP EATRKSMGQI QGA...LVGI AMVLSAVFVP DRLQEVVQSS REMVRPTVYG QLVIFMVFLP ERFHEVFAAAKEARRPLIFG QLIIMIVYLP E . . . . . . . . . . . A . . . L V .... V . . . V F . P
501 550 MAFFGGSTGA IYRQFSITIV SAMALSVLVA LILTPALCAT MLKPIAKGDH MAFFGGSTGA IYRQFSITIV SAMALSVLVA LILTPALCAT LLKPVS.AEH MAFFGPTTGA IYRQFSITIVAAMVLSVLVA MILTPALCAT LLKPLKKGEH SLTFQGVEGK MFSPMVITLM LALASAFVLS LTFVPAMVAV MLRKKVAETE IFALTGVEGK MFHPMAFTVVLALLGAMILS VTFVPAAVAL FIGERVAEKE ...F.G..G . . . . . . . IT.V .A . . . . . . . . . . . . P A . . A . . L ........
Acrbescco Acrfescco Acrdescco Cnraalceu Czcaalceu Consensus
551 GEGKKGFFGW FNRMFEKSTH HYTDSVGGIL HENKGGFFGW FNTTFDHSVNHYTNSVGKIL HGQK.GFFAW FNQMFNRNAE RYEKGVAKIL ........... VRVIVATKE SYRPWLEHAV ........... NRLMLWAKR RYEPLLEKSL . . . . . . . . . . . . . . . . . . . . . Y ....... L
iiii,,iiiii'
.:!!~,=iii:-!) i!:..:~:~ii!:~?:
450 LVDDAIVVVE LVDDAIVVVE LVDDAIVVVE IIDGAVIIVE IIDGAVVIVE ..D.A...VE
Acrbescco Acrfescco Acrdescco Cnraalceu Czcaalceu Consensus
:i~ii~:!iii'?
.
400 VEAIILVFLV MYLFLQNFRA FEAIMLVFLV MYLFLQNMRA LEAIALVFLV MYLFLQNFRA IEGALLVVAI LFALLGNWRA LEGAVLVIVI LFLFLGNIRA .E...LV ..... L F L . N . R A
600 RSTGRYLVLY LIIVVGMAYL GSTGRYLLIY ALIVAGMVVL HRSLRWIVIY VLLLGGMVFL ARPMPFIGAG IATVAVATVA ANTAVVLTFA AVSIVLCVAI ....................
601 650 A c r b e s c c o F V R L P S S F L P .DEDQGVFMT M V Q L P A G A T Q E R T Q K V L N E V T H Y Y L T K E K N A c r f e s c c o F L R L P S S F L P .EEDQGVFLT M I Q L P A G A T Q E R T Q K V L D Q V T D Y Y L K N E K A A c r d e s c c o F L R L P T S F L P .LEDRGMFTT S V Q L P S G S T Q Q Q T L K V V E Q I E K Y Y F T H E K D Cnraalceu FTFVGREFMP TLDELNLNLS SVRIPSTSID QSVA..IDLP LERAV.LSLP CzcaalceuAARLGSEFIP NLNEGDIAIQ ALRIPGTSLS QSVE..MQKT IETTLKAKFP Consensus F.RL...F.P .............. P ......................... Acrbescco Acrfescco
651 NVESVFAVNG NVESVFTVNG
... F G F A G R G Q N T G I A F V S L ...FSFSGQA QNAGMAFVSL
KDWADRPGEE KPWEERNGDE
700 NKVEAITMRA NSAEAVIHRA
~67
s ......
! 84 :~
~-~.-~.I, .- -~ i: :
i!il;:~itii: :;:i~.~t-: : i
....
......
....
..
Acrdescco Cnraalceu Czcaalceu Consensus
NIMSVFATVG EVQTVYSKAG EIERVFARTG .... V F . . . G
Acrbescco Acrfescco Acrdescco Cnraalceu Czcaalceu Consensus
701 750 TRAFSQIKDA MVFAFN.LPA IVELGTATGF DFELIDQAGL GHEKLTQARN KMELGKIRDG FVIPFN.MPA IVELGTATGF DFELIDQAGL GHDALTQARN TKAFNQIKEA RVIAPDSPPA ISGLGSSAGF DMELQDHAGA GHDALMAARN REKTAPMVGN N.YDVTQPIE MRFNELIGGV RSDVAVKVYG ENLDELAATA QEEAGKIPGN N.YEFSQPIQ LRFNELISGV RSDVAVKIFG DDNNVLSETA ...... I . . . . . . . . . . . . . . . . . . . . . G . . . . . . . . . . . . . . . . . . A..
KDWSERDSKT GTSFAIIERA KPKSEWPEGV TTKEQVIERI KPEKDWPEPK KTHAELLSAI K . . . . . . . . . . . . . . . . . R.
751 QLLAEAAKHP DMLTSVRPNG LEDTPQFKID IDQEKAQALG QLLGMAAQHP ASLVSVRPNG LEDTAQFKLE VDQEKAQALG QLLALAAENP E.LTRVRHNG LDDSPQLQID IDQRKAQALG QRIAAVLKKT PGATDVRVPL TSGFPTFDIV FDRAAIARYG KKVSAVLQGI PGAQEVKVEQ TTGLPMLTVK IDREKAARYG Q . . . . . . . . . . . . . . VR . . . . . . . P . . . . . . D . . K A . . . G
800 VSINDINTTL VSLSDINQTI VAIDDINDTL LTVKEVADTI LNMSDVQDAV .... D...T.
Acrbescco Acrfescco Acrdescco Cnraalceu Czcaalceu Consensus
801 GAAWGGSYVNDFIDRGRVKK VYVMSEAKYR STALGGTYVNDFIDRGRVKK LYVQADAKFR QTAWGSSYVNDFMDAGRVKK VYVQAAGPYP STAMAGRPAG QIFDGDRRFD IVIRLPGEQR ATGVGGRDSG TFFQGDRRFD IVVRLPEAVR .TA.GG ..... F . D . . R ..... V ...... R
850 VR ..... AAD VR ..... SAN VR ..... NKD VMLPLSEGQA IPLPKGVDAR V .........
Acrbescco Acrfescco Acrdescco Cnraalceu Czcaalceu Consensus
851 900 GQMVPFSAFS SSRWEYGSPR LERYNGLPSM EILGQAAPGK STGEAMELME GEMVPFSAFT TSHWVYGSPR LERYNGLPSM EIQGEAAPGT SSGDAMALME GGMVPFSAFA TSRWETGSPR LERYNGYSAV EIVGEAAPGV STGTAMDIME RASVPLRQLV QFRFTQGLNE VSRDNGKRRV YVEANVGGRD LGSFVDDAAA TTFIPLSEVA TLEMAPGPNQ ISRENGKRRI VISANVRGRD IGSFVPEAEA ...VP.S . . . . . . . . . G ..... R.NG ..... I . . . . . . . . . . . . . . . . . .
Acrbescco Acrfescco Acrdescco Cnraalceu Czcaalceu Consensus
901 QLAS..KLPT NLAS..KLPA SLVK~ RIAKEVKLPP AIQSQVKIPA ...... KLP.
Acrbescco Acrfescco Acrdescco Cnraalceu Czcaalceu Consensus
951 SWSIPFSVMLVVPLGVIGAL SWSIPVSVMLVVPLGIVGVL SWSVPFSVMLVVPLGVIGAL SAALTATVLT ASPLALAGGV NIKDGLLVFT GIPFALTGGI S ...... V .... PL...G..
Acrbescco Acrfescco Acrdescco Cnraalceu Czcaalceu ti!ii!ti~-!i::~?::.~!i C o n s e n s u s !tii:;2i:ii::~::ii::itil; !i
......
:::.: i-t: -
...SGPGGNG Q N V A R M F I R L TASLAADPMP PNASDNYIIL TAEIASDLMP PNISDGYIML . . . . . . . . . . . N ....... L
)
::i)i!!. :ii:t',tii
........... , .._.:..
:i~::i:iiiiti:t!!:ii !;,!'~i',it!;~:i!!~:i!
MLPDDIGDWY MLPEDVDKLY MLPDDINLWY ENLDVLGALP GEVEALRRLP ..........
.:.:.... ........
: !??": : :it;
: ::::::::::::::::::::::::::: ?.::.i!::
~;ili!:i!~!i:ii~:~i iit~i!!.!;i:i::'iti':~'., ?:,:i ............
:!?s !! !:t: :~
i!!t:it~:~iit::~. :.,:~!!:.i: !:i~i!~:!!t;Fi'~::! ...... ~.<.........
:::::::::::::::::::::::
-.--:..t':: t .....
i!.~ilti!~,ilt~i:i:::~.: ........
i;.~:!i!iqi!,,~i
~68
950 GVGYDWTGMS YQERLSGNQA PSLYAISLIV VFLCLAALYE GIGYDWTGMS YQERLSGNQA PALVAISFVVVFLCLAALYE GFGLEWTAMS YQERLSGAQA PALYAISLLV VFLCLAALYE GMYIEWGGQF QNLQAATKRL AIIVPLCFIL IAATLYMAIG GYWMTWGGTF EQLQSATTRL QVVVPVALLL VFVLLFAMFN G .... W . G . . . Q . . . . . . . . . . . . . . . . . . V F . . L . A . . . i000 LAATFRGLTN DVYFQVGLLT TIGLSAKNAI LAATLFNQKN DVYFMVGLLT TIGLSAKNAI LATWMRGLEN DVYFQVGLLT VIGLSAKNAI FALLLRGIPF SISAAVGFIA VSGVAVLNGL LALWIRGIPM SITAAVGFIA LCGVAVLNGL L A . . . R G . . . . . . . . VG ..... G .... N..
i!~ii ~i~ i
!i ..i:; ::if: L
Acrbescco Acrfescco Acrdescco Cnraalceu Czcaalceu Consensus
i001
LIVEFAKDLM DKEGKGLIEA TLDAVRMRLR PILMTSLAFI LIVEFAKDLM EKEGKGVVEA TLMAVRMRLR PILMTSLAFI LIVEFANEMN QK.GHDLFEA TLHACRQRLR PILMTSLAFI VLISAIRKRL D.DGMAPDAAVIEGAMERVR PVLMTALVAS VMLSFIRSLR E.EGHSLDSA VRVGALTRLR PVLMTALVAS .... F . . . . . . . . G ..... A ....... R L R P . L M T . L . . .
1051 Acrbescco GAGSGAQNAV GTGVMGGMVT Acrfescco GAGSGAQNAV GIGVMGGMVS Acrdescco GAGSGGQHAV GTGVMGGMIS Cnraalceu GTGAEVQKPL ATVVIGGLVT Czcaalceu GTGAEVQRPL ATVVIGGILS C o n s e n s u s G . G . . . Q .... T.V.GG...
.....
ii01
.
.
ATVLAIFFVP ATLLAIFFVP ATILAIYFVP ATVLTLFVLP STALTLLVLP AT.L ..... P
1050
LGVMPLVIST LGVLPLAISN FGVLPMATST LGFVPMAIAT LGFVPMAIAT LG..P.AI.T
ii00 VFFVVVRRRF SRKNEDIEHS V F F V V I R R C F KG ........ LFFVLVRRRF PLKPRPE... ALCGIVLKRR TAGRPEAQAA VLYRLAHRKD EDAEDTREPV ....................
1113
Acrbescco HTVDHH . . . . . . . Acrfescco . . . . . . . . . . . . . Acrdescco C n r a a l c e u LEA . . . . . . . . . . C z c a a l c e u T Q T H Q P D Q G R QPA Consensus . . . . . . . . . . . . .
Residues listed in the consensus sequence are present in at least 75 % of the aligned transporter sequences. Database accession numbers Acrbescco Acrdescco Acrfescco Cnraalceu Czcaalceu
SWISSPR OT P31224 P24177 P24181 P3 7972 P 13511
PIR B36938 C42959; $26997 S 18537
A33830
EMBL/GENBANK M94248; U00734 U12598; X57403 M96848; X57948 M91650 M26073
References 1 e 3 4
Nikaido, H. (1996) J. Bacteriol. 178, 5853-5859. Lewis, K. (1994) Trends Biochem. Sci. 19, 119-123. Paulsen, I. et al. (1996) Microbiol. Rev. 60, 575-608. Saier, M.H. et al. (1994) Mol. Microbiol. 11,841-847.
36~
Yeast multidrug resistance family Summary
::.i :: 117:1: :: .: :, ,<<:.:-:.. ,>..%< ~,~.....
::::, :::~::~:-,~-~~ :-. .. . .
<7;i i
,:.....:::,.,..
Transporters of the yeast multidrug resistance family, the example of which is the BMR benomyl-methotrexate resistance protein of Candida albicans (Bmrpcanal), mediate resistance to one or more structurally dissimilar antibiotics, including methotrexate and cyclohexamide 1. They may contribute to both intrinsic insensitivity and the development of clinical resistance of some species to antifungal agents. Known members of the family occur only in yeasts. Statistical analysis of multiple amino acid sequence comparisons suggests that the yeast multidrug resistance family may be distantly related to the uniporter-symporter-antiporter (USA) transporter superfamily, also known as the major facilitator superfamily (MFS) 2'a. Members of the yeast multidrug resistance family are predicted to contain 12 membrane-spanning helices by the hydropathy of their amino acid sequences. Several amino acid sequence motifs are highly conserved in the yeast multidrug resistance family, including motifs unique to the family, elements of the signature motifs of the USA/MFS superfamily, and motifs necessary for function by the criterion of site-directed mutagenesis 1.
Nomenclature, biological sources and substrates CODE
ii~:!r i~)::.!%!< il;~;iii:i:~i:!? i .. . ...
9
.
.
~;-..c:c ..-: .,,. .
!~.
DESCRIPTION
ORGANISM
[SYNONYMS]
[COMMONNAMES]
Bmrpcanal
Benomyl-methotrexate resistance protein [BMRP,BMR, MDRI] Carlschpo Amilofide resistance protein [CAR1, SOD1] Cyhrcanma Cyclohexamideresistance protein [CYHR]
[RESISTANCE(S)] a
Candida albicans [yeast]
[Benomyl, methotrexate]
Schizosaccharomyces pombe [yeast] Candida mahosa [yeast]
[Amilofide] [Cyclohexamide]
aPresumed substrates; protein confers resistance to specified compounds.
".:~..i:.:."~
Phylogenetic tree I
[
Bmrpcanai Cyhrcanma Carlschpo
P r o p o s e d o r i e n t a t i o n of B M R in t h e m e m b r a n e i:-ii.
:7
:--.,
i/i <
!
...... ..-
...
. _,___
::]::r i
!
ii
.....,
,..
L~Li.i.., !
~7c
The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in all three of the aligned transporters (see below)are shown.
OUTSIDE
M F
:~{~:d:~:::~::.-:!:. .:.~....:.....:.
F
I
A
~i4b,,
4~
V
r,os ,~o
Y
i!::2!:?~~?
LL
G
G
I
G
:i~./i}.:!i::ii:i:
G
!i
(3P
F Y
W
G
!
F1
G
Y
F
IYI
Y
P
L
W
A
411
116 ~:~}-::<:~.i:.:~:.::-
P
S
G
Y
:{,i~/.< -:!il )
431
476
I FY
44~"
551 G
W N P
~ T1
P D
R
R A S
COOH
R EG
E K
INSIDE
NH 2
Physical and genetic characteristics Bmrpcanal Carlschpo Cyhrcanma
AMINO ACIDS 564 526 552
MOL. W T 62930 58545 61 366
Multiple amino acid sequence alignments Bmrpcanal Cyhrcanma Carlschpo Consensus
1 50 MHYRFLRDSF VGRVTYHLSK HKYFAHPEEA KNYIIPEKYL ADYKPTLADD . M A A F I K D S F W G Q I I Y R L S G R K L F R H N D E L P D Y V V P E K Y L LD ........ .................................. MASKIA SLFSPSETAS ..................................... K ............
Bmrpcanal Cyhr c a n m a Car ischpo Consensus
51 TSINFEKEEI DNQGEPNSSQ ..... P K E E V LNSSD.. KSQ KDQHENVAED LELGTAPLNQ ........ E . . . . . . . . . . Q
Bmrpcanal Cyhrcanma Car lschpo Consensus
I01 DDDPENPQNW DDDPENPYNW SDHPAHPQNW .D.P..P.NW
9.
:,i:%
i00 SSSSNNTIVD NNNNNNNDVD GDKIVVTWDG SSENKEQTEG DQATIQNEPA SEHIIVTWDG I G I H E T N S E Y D E K K R E E S P E VIDI. SNLIS . . . . . . . . . . . . . . . . . . . . . . . I ......
150 PTLQKAFFIF QISFLTTSVY MGSAVYTPGI EELMHDFGIG PFAWKAIAAM QIGFLTVSVY MASAIYTPGV EEIMNQFNIN HWAKRWSIVF MFCLMQIYVI WTSNGFGSIE YSVMAQFNVS . . . . . . . . . . . . . . . . . . V . . . S . . . . . . . . . . M..F...
]71
I~i~:ii i: ~i::i !:.~
:.... -.-
i::
:u
:(
..
.
.
:(b: 9
9
.
.
.
:,.... ....
.
151 200 RWATLPLTL FVIGYGVGPL VFSPMSENAI FGRTSIYIIT LFLFVILQIP STLATLPLTM FVIGYGIGPL FWSPLSENSR IGRTPLYIIT LFIFFILQIP AQVATLCLSM NILGSGLGPM FLGPLSDIG.. GRKPVYFCS IFVYTVFNIS . . . A T L . L . . . . . G . G . G P . . . . P . . . . . . . G R . . . Y . . . . F ...... I.
Bmrpcanal Cyhrcanma Car i s c h p o Consensus
201 250 TALVNNIAGL CILRFLGGFF ASPCLATGGA SVADVVKFWN LPVGLAAWSL TALSNHIAGL SVLRVIAGFF AAPALSTGGA SYGDFIAMHY YSIALGVWSI C A L P R N I V Q M I I S H F I I G V A G S T A L T N V A G G I P D L F P E D T A G V P M S L F .V .AL...I .......... G ...... L ........ D ................
Bmrpcanal Cyhrcanma Car i s c h p o Consensus
251 G A V C G P S F G P F F G S I L T V K A .... S W R W T F W F M C I I S G F S FAVAGPSIGP LIGAAVINRS HDADGWRWSF WFMAILSGVC WACAGGAIGA PMATGVDINA KYG... WRWL YYINIIVGGF .... G . . . G . . . . . . . . . . . . . . . . . . . . . . I..G..
300 FVMLCFTLPE FIVLSFSLPE FLIVILIIPE F ....... PE
301 350 Bmrpcanal TFGKTLLYRK AKRLRAITGN DRITSEGEIE NSKMTSHELI IDTLWRPLEI Cyhrcanma TYGKTLLRRK AERLRKLTGN NRIISEGELE DGHKTTSQVVSSLLWRPLEI C a r l s c h p o T L P I K V I T R Y EN ...... A K G R I V . E G I P K N N L K E V L K K C K F V T T M G F R M C o n s e n s u s T ....... R . . . . . . . . . . . . R I . . E G . . . . . . . . . . . . . . . . . . . . . . .
;(:.:;..: . . . .
.
Bmrpcanal Cyhrcanma Car i s c h p o Consensus
.
i~: '~ ;::
:i
i~!(/:J
351 Bmrpcanal TVMEPVVLLI NIYIAMVYSI Cyhrcanma TMLEPVVFLI DIYIALVYSI Carlschpo MLTEPIILSM GLYNFYAYGI C o n s e n s u s ...EP . . . . . . . Y .... Y.I
400 LYLFFEVFPI YFVGVKHFTL VELGTTYMSI MYLIFESVPI VYAGIHHFTL VEMGATYVST SYFFLTAIWP VFYDTYKMSE MGASCTYLSG .Y . . . . . . . . . . . . . . . . . . . . . . . TY...
Bmrpcanal Cyhrcanma Car i s c h p o Consensus
401 VIGIVIAAFI IIGIIIGGAI FVASTL.LFL ..........
450 YIPVIRQKFT KPILRQEQVF .PEVFIPIAI VGGILLTSGL YLPTVYYKFT KKLLAGQNVT PEVFLPPAI FGAICMPIGV YQPIQDWIFR RDKAKNNGVA RPEARFTSAL FITLLFPAGM Y . P ..... F . . . . . . . . . . . . PE ..... A . . . . . . . . . . .
Bmrpcanal Cyhr canma Car l s c h p o Consensus
451 FIFGWSANRT THWVGPLFGA ATTASGAFLI FQTLFNFMGA FIFGWTSSPD INWFVPLIGM ALFAVGAFII FQTLFNYMAV FLFAFTCHPP FPWMSPIVGN SMVTVANGHN WMCILNYLTD F.F . . . . . . . . . W . . P . . G . . . . . . . . . . . . . . . . N ....
Bmrpcanal Cyhr canma Car l s c h p o Consensus
501 550 FASNDLFRSV IASVFPLFGA PLFDNLATPE YPVAWGSSVL GFITLVMIAI FSSNAFFRSV SAGAFPLFGR ALYNNLSIDK FPVGWGSSIL GFISLGMIAI VAAFTLPSFI GATVFAHVSQ IMFNNMS ..... VKWAVATM AFISISIPFI . . . . . . . . . . . A.. F . . . . . . . . . N . . . . . . . V . W . . . . . . FI ...... I
I W .:): : i
:,:::
'::,:(
(. - , : . . .
.:L..-:....!
.
:
..
..:.
~7~
551 581 Bmrpcanal PVLFYLNGPK LRARSKYAN . . . . . . . . . . . . Cyhr canma PVFFYLNGPK LRARSKYAY . . . . . . . . . . . . Car i s c h p o I Y T F Y F F G Q R I R A L S S L T G N K A L K Y L P L E N N Consensus . . . F Y . . G . . . R A . S . . . . . . . . . . . . . . . .
500 SFKPHYIASV SFKVEYLASV SY.PLLSGSA S ....... S.
Residues listed in the consensus sequence are present in all transporter sequences. Residues indicated by boldface type are also conserved in at least one other family of the USA/MFS superfamily.
Database accession numbers Bmrpcanal Car 1schpo Cyhrcanma
SWISSPR OT
PIR
EMBL/GENBANK
P28873 P33532 P32071
S16304 $39919 JC1173
X53823 Z 14035 M64932
References 1 Paulsen, I. et al. (1996) Microbiol. Rev. 60, 575-608. 2 Marger, M.D. and Saier, M. (1993) Trends Biochem. Sci. 18, 13-20. a Griffith, J. et al. (1994) Mol. Membr. Biol. 11,271-277.
372
This Page Intentionally Left Blank
Na+-Dependent Symporters
Na+/Ca2§ exchanger family Summary Transporters of the Na+/Ca ~§ exchanger family, the example of which is the human cardiac sodium/calcium exchanger precursor 1 1 (Naclhomsa), !~!~' i~!;i!i:!,:'i!:-:~:i!il mediate export of calcium from cardiac sarcolemma cells. The mechanism of action is symport (i.e. Na+-coupled substrate effiux), with the stoichiometry of exchange three Na § ions to one Ca 2§ ion 2. In humans, inhibition of sodium/calcium exchanger activity has been implicated in cardiac arrhythmia induced by digitalis toxicity 3. Members of this family have only been found in mammals, including cattle 4, dogs s and rodents. Statistical analysis of multiple amino acid sequence comparisons reveals no ~ii!i;:'i:ili~i apparent relationship between these transporters and any other family of i!:)i:Ei!:i:;:~ii!transporters. Members of the Na§ 2§ exchanger family are predicted to form 11 transmembrane helices by the hydropathy of their amino acid sequences 1, with the N-terminus predicted to lie outside the cell. This topology is not commonly found in transporter proteins. There is a very long cytoplasmic loop between the fifth and sixth helices, with a calmodulinbinding domain at its N-terminus. An N-terminal signal sequence, which contains a highly hydrophobic segment, is cleaved to form the mature protein. The proteins are glycosylated. !i}i~i!',i',ii!i':i!;: The sequences of all known members of the Na+/Ca 2§ exchanger family are ~ii!:!~:~i:i;~i.'iil very similar, with the N-terminal signal sequence the most variable.
!iii: . . . . . .
...........................
..:
. . . .
. . . . . . . . . . . . . . . .
Nomenclature, biological sources and substrates CODE
DESCRIPTION [SYNONYMS]
Naclbosta
Sodium/calciumexchanger precursor 1 [Na*/Ca2§ exchange protein 1, SLC8A1] Sodium/calciumexchanger precursor 1 [Na§ 2§ exchange protein 1, SLCSA1] Sodium/calciumexchanger precursor 1 [Na§ 2§ exchange protein 1, SLC8A1] Sodium/calciumexchanger precursor 1 [Na+/Ca2+exchange protein 1, SLC8A1] Sodium/calciumexchanger precursor 1 [Na§ 2§ exchange protein 1, SLC8A1, NCX1, CNCI Sodium/calciumexchanger [Na+/Ca2+- exchange protein 1, SLC8A1, NCX1]
Nac 1canfa Nac 1cavpo Naclfelca Naclhomsa
Naclratno
Cotransported ions are listed.
~76
ORGANISM [COMMON NAMES] Bos taurus
SUBSTRATE(S)
Na+/Ca2§
[cow] Cams familiaris
Na+/Ca2§
[dog] Cavia porcellus
Na+/Ca2§
[guinea pig] Felix catus
Na§
2§
[cat] Homo sapiens
Na+/Ca2§
[human] Rattus norvegicus
[ratl
Na§
2§
Proposed orientation of NAC1 ~ in the membrane The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the outside and is folded 11 times through the membrane. This is an unusual topology for a transporter protein. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. NH 2 OUTSIDE
COOH
INSIDE
Physical and genetic characteristics Naclbosta Nac 1canfa Naclcavpo Naclfelca Naclhomsa Naclratno
AMINO ACIDS 970 970 970 970 970 971
MOL. W T
108027 108 004 108071 108 004 108 138 108 184
EXPRESSION SITES heart heart heart heart heart heart
CHROMOSOMAL LOCUS
2p23-p22
Multiple amino acid sequence alignments 1 50 Naclhomsa MRRLSLSPTF SMGFHLLVTV SLLFSHVDHV IAETEMEGEG NETGECTGSY 51 i00 Naclhomsa YCKKGVILPI WEPQDPSFGD KIARATVYFV AMVYMFLGVS IIADRFMSSI
i01 150 Naclhomsa EVITSQEKEI TIKKPNGETT KTTVRIWNET VSNLTLMALG SSAPEILLSV 151 200 Naclhomsa IEVCGHNFTA GDLGPSTIVG SAAFNMFIII ALCVYVVPDG ETRKIKHLRV 201 250 Naclhomsa FFVTAAWSIF AYTWLYIILS VISPGVVEVW EGLLTFFFFP ICVVFAWVAD 251 300 Naclhomsa RRLLFYKYVY KRYRAGKQRG MIIEHEGDRP SSKTEIEMDG KVVNSHVENF :::::::::::::::::::::::::::::::: ~:,:::~~.
301 350 Naclhomsa LDGALVLEVD ERDQDDEEAR REMARILKEL KQKHPDKEIE QLIELANYQV
iiii!iii!!iii i
351 400 Nac lhomsa LSQQQKSRAF YRIQATRLMT GAGNILKRHA ADQARKAVSM HEVNTEVTEN iilN:iiii:~i~i:;l::
g:mglii2~ig!ii:r i;iii::i::i~!!~211~ ii~i: ........
401 450 Nac lhomsa DPVSKIFFEQ GTYQCLENCG TVALTIIRRG GDLTNTVFVD FRTEDGTANA 451 500 Nac lhomsa GSDYEFTEGTV VFKPGDTQK EIRVGIIDDD IFEEDENFLV HLSNVKVSSE 501 550 Naclhomsa ASEDGILEANHV STLACLGS PSTATVTIFD DDHAGIFTFE EPVTHVSESI 551 600 Nac lhomsa GIMEVKVLRTSG ARGNVIVP YKTIEGTARG GGEDFEDTCG ELEFQNDEIV
........ ............:'?':77::::
iNiii!!ii i;i!iNi~i:!ii.:
601 650 Naclhomsa KTISVKVIDDEE YEKNKTFF LEIGEPRLVE MSEKKALLLN ELGGFTITGK 651 700 Nac lhomsa YLFGQPVFRKVH AREHPILS TVITIADEYD DKQPLTSKEE EERRIAEMGR 701 750 Nac lhomsa PILGEHTKLEVI IEESYEFK STVDKLIKKT NLALVVGTNS WREQFIEAIT 751 800 Naclhomsa VSAGEDDDDDEC GEEKLPSC FDYVMHFLTV FWKVLFAFVP PTEYWNGWAC
.........
..........
i?ii!~',|
801 850 Naclhomsa FIVSILMIGLLT AFIGDLAS HFGCTIGLKD SVTAVVFVAL GTSVPDTFAS 851 900 Nac lhomsa KVAATQDQYADA SIGNVTGS NAVNVFLGIG VAWSIAAIYH AANGEQFKVS
~N:~i~i.:"J!,iii::
i',i)iii',~i'~!!~:!i :,
N ~7~
901 950 Nac lhomsa PGTLAFSVTLFT IFAFINVG VLLYRRRPEI GGELGGPRTA KLLTSCLFVL 951 969 Naclhomsa LWLLYIFFSSLE AYCHIKGF
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the alignment: Naclbosta, Naclcanfa, Naclcavpo, Naclfelca, Naclratno (Naclhomsa).
Database accession numbers SWlSSPR OT Nac 1bosta P48765 Naclcanfa P23685 Nac 1cavpo P48766 Nac lfelca P48767 Naclhomsa P32418 Naclratno Q01728
PIR A36417 $32815 $25552; $28833
EMBL/GENBANK L06438; G 163034 M57523; G164073 U04955; G507350 L35846; G604519 M91368; G180673 X68191; G57209
References 1 Komuro, I. et al. (1992) Proc. Natl Acad. Sci. USA 89, 4769-4773. 2 Reeves, J.P. and Hale, C.C. (1984) J. Biol. Chem. 259, 7733-7739. a Smith, T.W. et al. (1988) In Heart Disease: A Textbook of Cardiovascular Medicine, Ed. E. Braunwald. Saunders, Philadelphia, pp. 489-507. 4 Aceto, J.F. et al. (1992)Arch. Biochem. Biophys. 298, 553-560. s NicoH, D.A. et al. (1990) Science 250, 562-565.
~7~
Na§
!!!: :R!!!!iI!!II !!Is163 .-'!!!!t!:!i?!" I$1.11]:I:I:IIYII:.:I:I:I:I:II
.................
:.::: :,...~^. :.: :.,.:.:.-,: ................
.............. .,>-: .........
.................
~..~,~,~:.~ ..............................
NVS!i |
symporter family
Summary Transporters of the Na§ symporter family, the example of which is the PUTP proline-Na § symporter of Escherichia coli (Putpescco), mediate symport (Na*-coupled substrate uptake)of carboxylated compounds, including proline and pantothenate 1-3. Members of the family are found in both gramnegative and gram-positive bacteria. Statistical analysis of multiple amino acid sequence alignments indicates that the Na§ symporter family is closely related to the Na§ symporter family of mammals z. The similarity between the kinetics of several families of Na§ and H§ transporters fln'ther suggests a common mechanism of action, despite the lack of amino acid sequence homology 3. Members of the Na§ symporter family are predicted to contain 12 membrane-spanning helices by the hydropathy of their amino acid sequences 3. Several amino acid sequence motifs are very highly conserved in the Na§ proline symporter family, including motifs unique to the family, signature motifs common with the Na§ symporter family, and motifs necessary for function by the criterion of site-directed mutagenesis 2.
Nomenclature, biological sources and substrates CODE
Panfescco Panfhaein Proppsefl Putpescco Putpstaau Putpsalty Putphaein
DESCRIPTION [SYNONYMS] Pantothenate-Na§ symporter [Pantothenate permease, PANF] Pantothenate-Na§ syrnporter [Pantothenate permease, HI09751 Proline-Na§ syrnporter [Proline permease, PROP] Proline-Na§ symporter [Proline permease, PUTP] Proline-Na§ symporter [Pro]me permease, PUTP] Proline-Na§ symporter [Proline permease, PUTP] Proline-Na+symporter [Proline p e r m e a s e , PUTP, HI13521
OR GANISM [COMMON NAMES] Escherichia cold [gram-negativebacterium]
SUBSTRATE(S)
Haemophilus influenzae [gram-negativebacterium]
Na§
Na§
Pseudomonasfluorescens Na§ [gram-negativebacterium] Escherichiacold Na§ Li+/prolme [gram-negativebacterium] Staphylococcusaureus Na§ [gram-positivebacterium] Salmonella typhimurium Na+/pro]me [gram-negativebacterium] Haemophilus influenzae Na§ [gram-negative bacterium]
Cotransported ions are listed.
Phylogenetic tree Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the phylogenetic tree: Putpsahy (Putpescco). [
.
r '
Panfescco Panfhaein Proppsefl Putpescco Putphaein Putpklepn Putpstaau
Proposed orientation of PUTP in the membrane The model is based on predictions of membrane-spanning regions and ~-hehcal content. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75% of the aligned transporters (see below) are shown. The predicted locations of residues that are conserved in more than 75% of the subsequently aligned transporters (see below) are shown. Consensus residues indicated by an asterisk are not conserved in PUTP. LGGRS G OUTSIDE
V
A G I ] iI '~ L
ii
P
TDT
NH 2
p H
RM A TL D
COOH
INSIDE
Physical and genetic characteristics Panfescco Panfhaein Proppsefl Putpescco Putpstaau Putpsahy Putphaein
AMINO ACIDS
MOL. W T
CHROMOSOMAL LOCUS
483 484 494 502 497 502 504
51 717 52 500 53 112 54 344 54 314 54 296 54 898
73.35 minutes 23.25 minutes
Multiple amino acid sequence alignments 1 Panfescco Panfhaein Proppsefl Putpescco
......... M QLEVILP.LV ......... M NLGIILP.LI ........... MSVSNPTLI ........... MAISTPMLV
50 AYLVVVFGIS VYAMRKRSTG IYLTFVFGAAIFAYVKRTKG TFVIYIAAMV LIGLMAYRST TFCVYIFGMI LIGFIAWRST
TFLNEYFLGS DFLTEYYVGN NNLSDYILGG KNFDDYILGG
381
$il;ii@:;! : ~!ii~: ::ii~P.i.i!i:ii:
Putphaein ........... MFGFDPSLI TFTIYIFGML LIGVLAYYYT NNLSDYILGG Putpstaau MLTMGTALSQ QVDANWQTYI MIAVYFLILM LLAFTYKQAT GNLSEYMLGG C o n s e n s u s . . . . . . . . . . . . . . . . P.L . . . . . . . . . . . . . . . . . . . . . . . L . . Y . L G G 51 Panfescco RSMGGIVLAM TLTATYISAS SFIGGPGAAY Panfhaein RSMTGFVLAM TTASTYASAS SFVGGPGAAY Proppsefl RSLGSWTAL SAGASDMSGW LLMGLPGAIY Putpescco RSLGPFVTAL SAGASDMSGW LLMGLPGAVF Putphaein RRLGSFVTAM SAGASDMSGW LLMGLPGAVY Putpstaau RSIGPYITAL SAGASDMSGW MIMGLPGSVY C o n s e n s u s R S . G . . V . A .... A...S ..... G . P G A . Y
{i'!i:iliiii!i!ii!i:
150 L N D M L F A R Y Q SRL .... L V W L A S L S L L V A F I N D L F F Y R Y K NKY .... L V W L S S L A L L L A F LPDYFSSRFE DKSGLLRIIS AVVILVFFTI LPDYFTGRFE DKSRILRIISALVILLFFTI LPEYFHNRFG SSHKLLKLVS ATIILVFLTI LPDFFKNRLN DKNNVLKIIS GLIIVVFFTL L.D.F..R ......................
Panfescco Panfhaein Proppsefl Putpescco Putphaein Putpstaau Consensus
151 VGAMTVQFIG FAAMTVQFIG YCASGIVA.. YCASGIVA.. YCASGVVA.. YTHSGFVS.. ..A .......
IPYETGLLIF GISIALYTAF ISYTQALLLF ALTVGIYTFI MSYETALWAGAAATIAYTFI MSYETALWAGAAATILYTFI VEYSTALWYGAAATIAYTFI LDYHFGLILV AFIVIFYTFF ..Y...L... A ..... YTF.
Panfescco Panfhaein Proppsefl Putpescco Putphaein Putpstaau Consensus
201 250 TMQGLVMLIG TWLLIGVVHAAGGLSNAVQ TLQTIDPQLV TPQGADDILS TIQGTVMIFG TIILLIGTIY ALGGVESAVNKLTEIDPDLV TPYGPNGMLD T V Q A T L M I F A L I L T P I I V L L A T G G V D T T F L A I E A K D P ..... T S F D M L K N TVQASLMIFA LILTPVIVII SVGGFGDSLE VIKQKSI ENVDMLKG TIQATLMIFA LILTPVFVLL SFADTAQFSA VLEQAEAA.V NKDFTDLFTS F F Q G V I M L I A M V M V P I V A M M N L N G W G T F H D V ..... AA.M K P T N L N L F K G T.Q...M ................ G ..........................
...
!ii~:t::i!!;:ii!i%. s::i:!!i~!il}ti
!!!:iiii!iiii:"
..... }ii{?~}i!!:~;~:i::
Panfescco Panfhaein !!!i!!!!!!:i~'S~< P r o p p s e f l iiii!iii'~9!'-i:i?,ili:.. Putpescco Putphaein Putpstaau Consensus [:::i~iiii!i%ii::!;i!::r
~82
GARLLETAAG GARLLETTIG GARLFESTFG GARLFESTFG GAKLFQNIFS GGKLFESAFG GA.L.E...G
251 PAFMTSFWVL V.CFGVIGLP HTAVRCISYK DSKAVHRGII FQFMASFWIL V.CFGVVGLP HTAVRCMAFK DSKALHRGML TTFIGIISLM GWGLGYFGQP HILARFMAAD SVKSIAKARR LNFVAIISLM GWGLGYFGQP HILARFMAAD SHHSIVHARR TTPLGLLSLA AWGLGYFGQP HILARFMAAD SVKSLIKARR LSFIGIISLF SWGLGYFGQP HIIVRFMSIK SHKMLPKARR ..F . . . . . . . . . . . G . . G . P H . . . R . M . . . . . . . . . . . . .
301 FGMHLAGALG Panfhaein LGMHLAGALG Pr oppsefl G G T V A V G F F G Putpescco AGAVAVGFFG
i!%t!:i~!':!:;i',i!:-!!:P a n f e s c c o ...
I00 MIQLPAVWLS MIQVPVVWLA IGLIVGAYLN IGLTLGAWIN IGLTIGAYFN IGLTLGAYIN ..........
I01 LGILGKKFAI LAR.RYNAVT LGALGKKFAL LSR.ETNALT WLFVAGRLRV QTEHNGDALT WKLVAGRLRV HTEYNNNALT WLLVAGRLRV YTELNNNALT YFVVAPRLRV YTELAGDAIT . . . . . . . . . . . . . . . . . A.T
Panfescco Panfhaein Proppsefl Putpescco Putphaein Putpstaau ~!~!i~'~;iii!iitl C o n s e n s u s
#!ii::;i!~@:.?
KYGLGWVLLA KYGLGWVLLA MSGLSESWIA LSGISESWIA LSGLVEGWIA STGLSAMWIT ..GL ..... A
R A V I P D L T V . . . . . . . . PDL R A V I P N L T V . . . . . . . . SDQ IAYFSAHPEV AGPVTENPER IAYFNDHPAL AGAVNQNAER
VIPTLMVKVL VIPTLMIKVL VFIELAKILF VFIELAQILF
200 GGFRASVLND GGFRAVVLTD GGFLAVSWTD GGFLAVSWTD GGFLAVSWTD GGYLAVSITD GGF.AV..TD
300 IGTIVVAILM IGTIVLSIIM ISMTWMILCL ISMTWMILCL ISMGWMVLCL LGISWMAVGL I ......... 350 PPFAAGIFLA PPIVAGIFLA NPWVAGVLLS NPWIAGILLS
;?:?2;?;;?;:::::
;b':-',i~Gti'~iF'~i ............. ................................
Putphaein AGAIGIGLFA IPYFFANPAI AGTVNREPEQ VFIELAKLLF NPWIAGILLS P u t p s t a a u L G A V A V G L T G I A F V P A Y H I K L .... EDPET L F I V M S Q V L F H P L V G G F L L A C o n s e n s u s .G .... G..G .A . . . . . . . . . . . . . . . . . . V . . . L ...... P..AG..L.
351 400 Panfescco APMAAIMSTI NAQLLQSSAT IIKDLYLNIR PDQMQN..ET RLKRMSAVIT Panfhaein APMSAIMSTI DAQLIQSSSI FVKDLYLSAK PEAAKN..EK KVSYFSSIIT Proppsefl AILAAVMSTL SCQLLVCSSA LTEDFYKAF...LRKGASQR ELVWVGRLMV Putpescco AILAAVMSTL SCQLLVCSSA ITEDLYKAF...LRKHASQK ELVWVGRVMV !',~;i!!!!~,!!!!it~!!P u t p h a e i n A I L A A V M S T L S A Q L L I S S S S I T E D F Y K G F . . . I R P N A S E K E L V W L G R I M V i!!i~.~!::t,ii~,ii?:,:~. Putpstaau AILAAIMSTI SSQLLVTSSS LTEDFYKLIR GEEKAKTDQK EFVMIGRLSV ConsensusA..AA.MST...QLL..SS .... D.Y . . . . . . . . . . . . . . . . . . . . . . . . i::i~ii~!!ti!@:i:: :::::::::::::::::::::........ ..............
...............................
~::!~}!!tt:!!!:!~!;: ................ ~ :,,:~: ::..:~. .............
................ ..............
......... ..:.:. ......
:::::::::::::::::::::::::::::::
t:~iii;!i:~!:tRi!:!:',i
.............. ~!~!::F::t~!!!t!~ ::!::;!:
...... ..:..: .....................
:.:::::::::::.:::: ................. .................. ................................ .................................... =======================
.t.t.t.t~ttt7
..
::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::: :::::::::::::::::::::::::::::::
.................
::::::::::::::::::::::
.................
i?t4!?CNi"gi!i?..~
N !i~iN!itit@i N ~ N i N N
401 Panfescco LVLGALLLLA AWKPPEMIIW Panfhaein LILTALLIFA ALNPPDMIIW Proppsefl LVVALIAIAMAANPENRVLG Putpescco LVVALVAIALAANPENRVLG Putphaein LVIAALAIWI AQDENSKVLK Putpstaau LVVAIVAIAI AWNPNDTILN C o n s e n s u s L V ..... I.. A..P ...... Panfescco Panfhaein Proppsefl Putpescco Putphaein Putpstaau Consensus Panfescco Panfhaein Proppsefl Putpescco Putphaein Putpstaau Consensus
450 LNLLAFGGLE AVFLWPLVLG LYWERANAKG LNLFAFGGLEAAFLWVIVLG IYWDKANAYG LVSYAWAGFGAAFGPVVLIS VIWKHMTRNG LVSYAWAGFGAAFGPVVLFS VMWSRMTRNG LVEFAWAGFG SAFGPVVLFS LFWKRMTSSG LVGNAWAGFG ASFSPLVLFA LYWKGLTRAG L...A..G.. A.F . . . . . . . . . W ...... G
451 500 ALSAMIVGGV LYAVLATL ...... NIQYLG FHPIVPSLLL SLLAFLVGNR A L S S M I I G L G S Y I L L T Q L ...... G I K L F N F H Q I V P S L V F G L I A F L V G N K A L A G I L V G A I T V I V W K H F . . . . . . . . ELLG L Y E I I P G F I F A S L A I Y F V S K A L A G M I I G A L T V I V W K Q F . . . . . . . . GWLG L Y E I I P G F I F G S I G I V V F S L AMAGMLVGAV TVFAWKEVVP A...DTDWFK VYEMIPGFAF ASLAIIVISL AVSGMVSGALVVIVWIAWIK PLAHINEIFG LYEIIPGFIV SVIVTYVVSK A ...... G . . . . . . . . . . . . . . . . . . . . . . . . . I.P . . . . . . . . . . V... 501 FGTSVPQATV LGERRIEKTQ MG.APTLGMV LGKAPSAAMQ LSNKPEQDIL LTKKPWCIC
531 LTTDK . . . . . . . . . . . . . . . . LKVTAL E R F D A A E K D Y NLNK KRFAEADAHY HSAPPSRLQE S N T F D K A E K A Y KEAK .
.
.
.
.
.
.
.
.
.
.
.
. .
. .
. .
. .
. .
. .
. .
.
. .
.
. .
.
. .
.
. .
.
. .
.
. .
.
. .
.
. .
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
. .
. .
.
. .
.
. .
.
.
.
.
.
. .
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed m parenthesis and therefore are not included in the alignments: Putpsalty (Putpescco). Residues listed in the consensus sequence are present in at least 75% of the aligned transporter sequences. Residues indicated by boldface type are also conserved in the Na+]glucose symporter family. Database accession numbers
.................... .................... ....................
..................
....................
Parffescco
7:::t?':'.'::'::t':t'::':
Panfhaein
......:..:.-:.-:.....v.....--..:
Proppsefl
:::.:.-~s~s:-~s~::sr
..............................
SWISSPR OT
PIR
EMBL/GENBANK
P16256 P44963
JU0296
M30953; U18997 U32778
JC2382
Putpstaau
Putpsalty Putphaein Putpescco
P10502 P45174 P07117
S03816; $10220 A30258
U06451 X52573; X12569 U32814 X05653
382
Rs163163 1 Hediger, M. et al. (1995) J. Physiol. 482, $7-$17. 2 Reizer, J. et al. (1994) Biochim. Biophys. Acta 1197, 133-166. 3 Wright, E.M. et al. (1996) Curr. Opin. Cell Biol. 8, 468-473.
}84
m
Na+/glucose symporter family
Summary Transporters of the Na+/glucose symporter family, the example of which is the SGLT1 glucose-Na + symporter of humans (Nagchomsa), mediate symport (Na+-coupled uptake)of glucose, myo-inositol, neutral amino acids and uridine t-3. Members of the Na+/glucose symporter family also serve as uniporters, ion and water channels and water transporters 4. In the absence of substrates, transport of ions occurs in an uncoupled "leak" mode that can be 5-10% of substrate-linked ion transport. In the small intestine, secondary active water transport by SGLT1 in the presence of glucose has been estimated to account for up to 5 liters per day 4. Known members of the family are found only in mammals. Statistical analysis of multiple amino acid sequence alignments indicates that the Na+/glucose symporter family is closely related to the Na+/proline symporter family of prokarytotes 3. The similarity between the kinetics of several families of Na+-driven and H+-driven transporters further suggests a common mechanism of action, despite the lack of amino acid sequence homology 4. Members of the Na+/glucose symporter family are predicted to contain 14 membrane-spanning helices by the hydropathy of their amino acid sequences and glycosylation scanning mutagenesis s. However, unlike other symporters predicted to contain either 14 or 12 membrane-spanning elements, the N- and C-termini of at least some members of the family are predicted to be on the extracellular side of the membrane s. Members of the Na+/glucose symporter family are glycosylated. Several amino acid sequence motifs are very highly conserved in the Na+/ glucose symporter family, including motifs unique to the family, signature motifs common with the family, and motifs necessary for function by the criterion of site-directed mutagenesis 3. In humans, mutations in SGTL1 are the cause of glucose-galactose malabsorption syndrome 6. ............... .................
N o m e n c l a t u r e , b i o l o g i c a l s o u r c e s and s u b s t r a t e s CODE
Naaasussc .............
Nagchomsa :~.~.~.~,:.:.~.~...:.:.~.:.~.:~:.: ..............
!!~!?~!':!!!!~:!::!!!
Nagcorycu .:
.:.:.::.:.::.:::::.-::
Nagcsussc
.................
Nagoviar Naglhomsa .................
Namibosta ,:~}i}~!~!i!;!!!i!!:!~! ................
Namicanfa
DESCRIPTION [SYNONYMS] Neutral amino acid-Na + cotransporter [NAAA, SAAT1 l Glucose-Na + cotransporter 1 [NAGC, SLC5al, SGLT1] Glucose-Na + cotransporter 1
ORGANISM /COMMON NAMES] Sus scro[a [pig] Homo sapiens [human] Oryctolagus cuniculus
[NAGC, SGLT1]
[rabbit]
Glucose-Na + cotransporter 1 [NAGC, SGLT1] Glucose-Na § cotransporter [NAG] Low-affinity glucose-Na + cotransporter [NA1, SGLT2] Myo-inositol-Na § cotransporter 1 [NAMI, SLC5a3] Myo-inositol-Na § cotransporter 1
Sus scrofa [pig] Ovis aries [sheep] Homo sapiens [human] Bos taurus [cow] Canis familiaris [dog]
SUBSTRATE(S)
Na+/neutral amino acids Na+/glucose Na+/glucose Na+/glucose Na+/glucose Na+/glucose Na+/myo-inositol Na§
[NAMI, SMIT]
~8~
Na+/glucose symporter family
CODE Namihomsa Nanuorycu
DESCRIPTION [SYNONYMS] Myo-inositol-Na+ cotransporter 1 [NAMI, SLCSa3] Nucleoside-Na + cotransporter 1
ORGANISM [COMMONNAMES] Homo s a p i e n s [human] Oryctolagus cuniculus [rabbit]
Na+/uridine
Low-affinityglucose-Na+ cotransporter [SGL, SGLT2]
Rattus norvegicus [rat]
Na+/glucose
[NANU,SNSTI]
Sgltratno
SUBSTRATE(S) Na+/myo-inositol
Cotransported ions are listed. Phylogenetic
.................. ...................
tree
.................. ..................
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the phylogenetic tree: Namibosta, Namicanfa (Namihomsa}.
................
.................. ................. ................. Iii'i{i~i;)ii~ii
...................................
I
lilN:ii!!! -.-.--.--.-.-..-.-:.--.i.-:y.
NN ....................
I
.................
Naglhomsa Nanuorycu Sgltratno Nagcsussc Nagoviar Nagchomsa Nagcorycu Naaasussc Namihomsa
Proposed orientation of SGLT1 in the membrane The model is based on predictions of membrane-spanning regions and a-helical content. The N- and c-termini of the protein are illustrated on the outside and are folded 14 times through the membrane 6. The predicted membranespanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanmng helix are boxed. More than half of the residues are conserved in at least 75 % of the members of the ).((((((::(:::})))?.:((: Na+/glucose symporter family and, therefore, are not mapped onto the model. ................
N:.~l|
::::::::::::::::::::::
........
~8~
OUTSIDE ..... ===========================
............
..... ::::::::::::::::::::::::
~~
..................
................
I coo.
...................... .....................
.......
"
ii.
84
ii|
INSIDE
Physical and genetic characteristics AMINO ACIDS
MOL. WT
EXPRESSION SITES
Naaasussc
660
72 745
Nagchomsa
664
73 497
kidney, intestine, muscle, liver, spleen intestine, kidney
Nagcorycu Nagcsussc Nagoviar Naglhomsa
662 605 664 672
66 917 73 215 72 896
N!~!N~iii']
N~ilNiZ.~2
............ ...........
...................
73 079
CHROMOSOMAL LOCUS
Km
Glucose: 350 #M 2
22q13.1
Glucose: 1.6 ~ t
16pl 1.2
intestine, kidney intestine, kidney kidney
2
........::..=..::..::
Namibosta Namicanfa Namihomsa Nanuorycu Sgltratno
718 718 718 672 670
79 763 79 545 79 709 73 161 72 961
kidney brain, kidney brain, kidney heart, kidney
21q22
Multiple amino acid sequence ahgnments Naglhomsa Nanuorycu Sgltratno Nagcsussc
1
50
...MEEHTEA GSAPEMGAQKALIDNPADIL VIAAYFLLVI GVGLWSMCRT ...MEEHMEA GSRLGLGDHGALIDNPADIA VIAAYFLLVI GVGLWSMCRT ...MEGHVEE GS..ELGEQK VLIDNPADIL VIAAYFLLVI GVGLWSMFRT ..................................................
~87
Na+/glucose symporter family Nagoviar Nagchomsa Nagcorycu Naaasussc Namihomsa Consensus Naglhomsa .... :11T::h:!?:2 Nanuorycu ....?::2[':: Sgltratno Nagcsussc Nagoviar :',j!!iii!i~i!i~i~!~.Nagchomsa !ii!i Naaasussc Nagcorycu Namihomsa Consensus :TI/:?:i.?? ..
:.:.: :....:.:.::..
:::::::::::::::::::::::::::::::::::
.............
:-.:.-:.:.:.:.v.-:-... ....................... ...........
~!:.ii:.!i::!~i i;::?:i?.!
-:::::r::: : - :
Naglhomsa Nanuorycu Sgltratno Nagcsussc Nagoviar i~ii;i!!!!!i!:,i'!i~i~ Nagchomsa i?i:~' i!i;i!i:~:!i;:~,!i Nagcorycu Naaasussc Namihomsa Consensus ...........
............
............
::::::::::::::::::::::::::::::::::::
............
i~iNii:i~:~,iiii Naglhomsa Nanuorycu !!!i,~!:,:!!ii!iff: Sgltratno Nagcsussc Nagoviar Nagchomsa Nagcorycu Naaasussc Namihomsa Consensus ;::;~.~:;-::,::~:-~---:~:-::-~:
............
............................
............
::.::.:::::::
:
:LT".::2" ?: ........... =====================
:::i~::)~i:i;~i,i:h i~ii
Naglhomsa Nanuorycu Sgltratno Nagcsussc Nagoviar !!~f~fi!ii!!iiiiiii Nagchomsa Nagcorycu Naaasussc Namihomsa Consensus i~!ili!iii;:i:2, ):~;!~ ;i;!i ............
}8~
MDSSTWSPPA TATAEPLQAY ERIRNAADIS VIVIYFVVVM AVGLWAMFST MDSSTWSPKT TAVTRPVETH ELIRNAADIS IIVIYFVVVM AVGLWAMFST MDSSTLSPLT TSTAAPLESY ERIRNAADIS V I V I Y F L W M AVGLWAMFST MASTLSPSTV TKTPGPPEIS ERIQNAADIS VIVIYFVVVM AVGLWAMLRT ................... M RAVLDTADIA IVALYFILVM CIGFFAMWKC ...................... I . N . A D I . . I . . Y F . . V . . V G L W . M . . T 51 NRGTVGGYFL NRGTVGGYFL NRGTVGGYFL ....... FFL NRGTVGGFFL NRGTVGGFFL NRGTVGGFFL NRGTVGGFFL NRSTVSGYFL NRGTVGG.FL
i00 GAASGLAVAG GAANGLAVAG GAASGLAVAG GAAAGIATGG GAAAGIATGG GAASGIAIGG GAASGIAIAA GAASGIATGG GAASGFAVGA GAA.G.A..G
AGRSMVWWPV AGRSMVWWPV AGRSMVWWPV AGRSMVWWPV AGRSMVWWPI AGRSMVWWPI AGRDVTWWPM AGRSMVWWPI AGRSMTWVTI AGRSMVWWPM
GASLFASNIG GASLFASNIG GASLFASNIG GASLFASYIG GASLFASNIG GASLFASNIG GASLFASNIG GASLFASNIG GASLFVSNIG GASLFASNIG
SGHFVGLAGT SGHFVGLAGT SGHFVGLAGT SGHFVGLAGT SGHFVGLAGT SGHFVGLAGT SGHFVGLAGT SGHFVGLAGT SEHFIGLAGS SGHFVGLAGT
i01 FEWNALFVVL LLGWLFAPVY FEWNALFVVL LLGWLFAPVY FEWNALFVVL LLGWLFVPVY FEWNALIWVVVLGWLFVPIY FEWNALILVVLLGWVFVPIY FEWNALVLVVVLGWLFVPIY FEWNALIMVVVLGWVFVPIY FEWNALLLLL VLGWFFVPIY WEFNALLLLQ LLGWVFIPIY FEWNAL..V..LGW.F.P.Y
LTAGVITMPQ LTAGVITMPQ LTAGVITMPQ IKAGVVTMPE IKAGVVTMPE IKAGVVTMPE IRAGVVTMPE IKAGVMTMPE IRSGVYTMPE I.AGV.TMP.
150 YLRKRFGGRR IRLYLSVLSL YLRKRFGGHR IRLYLSVLSL YLRKRFGGRR IRLYLSVLSL YLRKRFGGKR IQVYLSILSL YLRKRFGGQR IQVYLSVLSL YLRKRFGGQR IQVYLSLLSL YLQKRFGGKR IQIYLSILSL YLRKRFGGKR LQIYLSILSL YLSKRFGGHR IQVYFAALSL YLRKRFGG.RI..YLS.LSL
151 FLYIFTKISV FLYIFTKISV FLYIFTKISV MLYIFTKISA VLYIFTKISA LLYIFTKISA LLYIFTKISA FICVALRISS ILYIFTKLSV .LYIFTKIS.
200 DMFSGAVFIQ QALGWNIYAS VIALLGITMI YTVTGGLAAL DMFSGAVFIQ QALGWNIYAS VIALLGITMV YTVTGGLAAL DMFSGAVFIQ QALGWNIYAS VIALLGITMI YTVTGGLAAL DIFSGAIFIT LALGLDLYLA IFLLLAITGL YTITGGLAAV DIFSGAIFIN LALGLDLYLA IFILLAITAL YTITGGLAAV DIFSGAIFIN LALGLNLYLA IFLLLAITALYTITGGLAAV DIFSGAIFIQ LTLGLDIYVA IIILLVITGL YTITGGLAAV DIFSGAIFIK LALGLDLYLA IFSLLAITAI YTITGGLASV DLYSGALFIQ ESLGWNLYVS V I L L I G M T A L L T V T G G L V A V D.FSGA.FI..ALG...Y ..... LL.IT.. YT.TGGLAA.
201 MYTDTVQTFV ILGGACILMG MYTDTVQTFV IIAGAFILTG MYTDTVQTFV ILAGAFILTG IYTDTLQTAI MLVGSFILTG IYTDTLQTVI MLLGSFILTG IYTDTLQTVI MLVGSLILTG IYTDTLQTAI MMVGSVILTG IYTDTLQTII MLIGSFILMG IYTDTLQALL MIIGALTLMI .YTDT.QT ..... G..ILTG
250 YAFHEVGGYS GLFDKYLGAATSLTVSEDP. YAFHEVGGYS GLFDKYMGAMTSLTVSEDP. YAFHEVGGYS GLFDKYLGAV TSLTVSKDP. FAFHEVGGYD AFIEKYMNAI PTVISDGNI. FAFHEVGGYS AFVTKYMNAI PTVTSYGNT. FAFHEVGGYD AFMEKYMKAI PTIVSDGNT. FAFHEVGGYE AFTEKYMRAI PSQISYGNT. FAFVEVGGYE SFTEKYMNAI PTIVEGDNL. ISIMEIGGFE EVKRRYMLAS PDVTSILLTY .AFHEVGGY ..... KYM.A ...........
N
Naglhomsa Nanuorycu Sgltratno Nagcsussc Nagoviar Nagchomsa Nagcorycu Naaasussc Namihomsa Consensus
251 AVGNISSFCY AVGNISSSCY AVGNISSTCY T...IKKECY T...VKKECY T...FQEKCY S...IPQKCY T...ISPKCY NLSNTNSCNV ........ CY
RPRPDSYHLL RPRPDSYHLL QPRPDSYHLL APRADSFHIF TPRADSFHIF TPRADSFHIF TPREDAFHIF TPQGDSFHIF SPKKEALKML .PR.DS.H..
300 RHPVTGDLPW PALLLGLTIV SGWYWCSDQV RDPVTGDLPW PALLLGLTIV SGWYWCSDQV RDPVTGGLPW PALLLGLTIV SGWHWCSDQV RDPLKGDLPW PGLTFGLSIL ALWYWCTDQV RDPLKGDLPW PGLIFGLTII SLWYWCTDQV RDPLTGDLPW PGFIFGMSIL TLWYWCTDQV RDAITGDIPW PGLVFGMSIL TLWYWCTDQV RDAVTGDIPW PGMIFGMTVVAAWYWCTDQV RNPTDEDVPW PGFILGQTPA SVWYWCADQV RDP..GDLPWP .... G..I...WYWC.DQV
Naglhomsa Nanuorycu Sgltratno Nagcsussc Nagoviar Nagchomsa Nagcorycu Naaasussc Namihomsa Consensus
301 IVQRCLAGKS IVQRCLAGRN IVQRCLAGKN IVQRCLSAKN IVQRCLSAKN IVQRCLSAKN IVQRCLSAKN IVQRCLSGKD IVQRVLAAKN IVQRCL..KN
LTHIKAGCIL LTHIKAGCIL LTHIKAGCIL MSHVKAGCVM MSHVKAGCIM MSHVKGGCIL LSHVKAGCIL MSHVKAACIM IAHAKGSTLM ..H.KAGCI.
CGYLKLTPMF CGYLKLTPMF CGYLKLMPMF CGYFKLLPMF CGYMKLLPMF CGYLKLMPMF CGYLKVMPMF CGYLKLLPMF AGFLKLLPMF CGYLKL.PMF
LMVMPGMISR LMVMPGMISR LMVMPGMISR VIVMPGMISR LMVMPGMISR IMVMPGMISR LIVMMGMVSR LMVMPGMISR IIVVPGMISR .MVMPGMISR
Naglhomsa Nanuorycu Sgltratno Nagcsussc Nagoviar Nagchomsa Nagcorycu Naaasussc Namihomsa Consensus
351 VPEVCRRVCG APEVCKRVCG VPEVCKRVCG VPSECEKYCG VPSECEKYCG VPSECEKYCG VPSECERYCG VPSECVKHCG NPEHCMLVCG VP..C...CG
TEVGCSNIAY TEVGCSNIAY TEVGCSNIAY TKVGCSNIAY TKVGCTNIAY TKVGCTNIAY TRVGCTNIAF TEVGCSNYAY SRAGCSNIAY T.VGCSNIAY
PRLVVKLMPN PRLVVKLMPN PRLVVKLMPN PTLVVELMPN PTLVVELMPN PTLVVELMPN PTLVVELMPN PLLVMELMPS PRLVMKLVPV P.LVV.LMPN
400 GLRGLMLAVM LAALMSSLAS GLRGLMLAVM LAALMSSLAS GLRGLMLAVM LAALMSSLAS GLRGLMLSVI LASLMSSLTS GLRGLMLSVM LASLMSSLTS GLRGLMLSVM LASLMSSLTS GLRGLMLSVM MASLMSSLTS GLRGLMLSVM LASLMSSLTS GLRGLMMAVM IAALMSDLDS GLRGLML.VMLA.LMSSL.S
Naglhomsa Nanuorycu Sgltratno Nagcsussc Nagoviar Nagchomsa Nagcorycu Naaasussc Namihomsa Consensus
401 IFNSSSTLFT MDIYTRLRPR IFNSSSTLFT MDIYT.LRPR IFNSSSTLFT MDIYTRLRPR IFNSATTLFT MDVYAKIRKR IFNSASTLFT MDIYTKIRKK IFNSASTLFT MDIYAKVRKR IFNSASTLFT MDIYTKIRKK IFNSASTLFT MDLYTKIRKQ IFNSASTIFT LDVYKLIRKS IFNS.STLFTMD.Y...R..
450 AGDRELLLVG RLWVVFIVVVSVAWLPVVQA AGEGELLLVG RLWVVFIVAV SVAWLPVVQA AGDRELLLVG RLWVVFIVAV SVAWLPVVQA ASEKELMIAG RLFILVLIGI SIAWVPIVQS ASEKELMIAG RLFMLVLIGV SIAWVPIVQS ASEKELMIAG RLFILVLIGI SIAWVPIVQS ASEKELMIAG RLFMLFLIGI SIAWVPIVQS ASEKELLIAG RLFIILLIVI SIVWVPLVQV ASSRELMIVG R I F V A F M W I SIAWVPIIVE A...EL...GRL ........ S.AW.P.VQ.
Naglhomsa Nanuorycu Sgltratno Nagcsussc Nagoviar
451 AQGGQLFDYI AQGGQLFDYI AQGGQLFDYI AQSGQLFDYI AQSGQLFDYI
VSAVFVLALF VSAVFVVALF VSAVFVLALF IAAVFLLAIF IAAVFLLAIF
QAVSSYLAPP QSVSSYLAPP QSVSSYLAPP QSVTSYLGPP QSITSYLGPP
VPRVNEQGAF VPRVNEKGAF VPRVNEKGAF CKRVNEEGAF CKRVNEPGAF
350 ILYPDEVACV ILYPDEVACV ILYPDEVACV VLYTEKIACT ILFTEKVACT ILYTEKIACV ILYTDKVACV ILYTEKVACV ILFTDDIACI ILY .... ACV
500 WGLIGGLLMG WGLIGGLLMG WGLIGGLLMG WGLVIGCMIG WGLIIGFLIG
~8~
Nagchomsa Nagcorycu Naaasussc Namihomsa Consensus
AQSGQLFDYI AQSGQLFDYI AQNGQLFHYI MQGGQMYLYI AQ.GQLFDYI
QSITSYLGPP QSITSYLGPP ESISSYLGPP QEVADYLTPP QS..SYL.PP
IAAVFLLAIF IAAVFLLAIF IAAVFLLAIF VAALFLLAIF . .AVF.LA.F
Naglhomsa Nanuorycu ....... Sgltratno Nagcsussc ii:ii~ili~:::i!?i. Nagoviar ::i[::iiii!biffi::.!~i!;. Nagchomsa Nagcorycu Naaasussc Namihomsa Consensus
501 LARLIPEFSF LARLIPEFSF LARLIPEFFF LARMITEFAY VSRMITEFAY ISRMITEFAY ISRMITEFAY LIRMIAEFVY AVRLILAFAY ..R.I.EF..
GSGSCVQPSA GTGSCVRPSA GTGSCVRPSA GTGSCVEPSN GTGSCMEPSN GTGSCMEPSN GTGSCMEPSN GTGSCLAASN RAPECDQPDN GTGSC..PS.
550 CPAFLCGVHY LYFAIVLFFC SGLLTLTVSL CPAFLCRVHY LYFAIVLFFC SGLLIIIVSL CPAIFCRVHY LYFAIILFFC SGFLTLAISR CPTIICGVHY LYFAIILFVI SIIIVLVVSL CPTIICGVHY LYFAIILFVI TIIVILAISL CPTIICGVHY LYFAIILFAI SFITIVVISL CPTIICGVHY LYFAIILFVI SIITVVVVSL CPQIICGVHY LYFALILFFV SILVVLAISL RPGFIKDIHY MYVATGLFWVTGLITVIVSL CP...C.VHYLYFAI.LF.. S ....... SL
Naglhomsa Nanuorycu ii~!iiii??::)i!?ii~: Sgltratno ~:::%,::::~:::;::% Nagcsussc Nagoviar Nagchomsa Nagcorycu Naaasussc Namihomsa ~ii:iiiii!~i!;i: C o n s e n s u s i',iiiiNi:!',!~A
551 CTAPIPRKHL CTAPIPRKHL CTAPIPQKHL FTKPIPDVHL FTKPIADVHL LTKPIPDVHL FTKPIPDVHL LTKPIPDVHL LTPPPTKEQI .T.PIP..HL
H R L V F S L R H S . . . . . . . . . . . . . . . KEERE H R L V F S L R H S . . . . . . . . . . . . . . . KEERE H R L V F S L R H S . . . . . . . . . . . . . . . KEERE Y R L C W S L R N S . . . . . . . . . . . . . . . KEERI Y R L C W S L R N S . . . . . . . . . . . . . . . KEERI Y R L C W S L R N S . . . . . . . . . . . . . . . KEERI Y R L C W S L R N S . . . . . . . . . . . . . . . KEERI YRLCWALRNS ............... TEERI RTTTFWSKKN LVVKENCSPK EEPYQMQEKS . R L . . S L R . S . . . . . . . . . . . . . . . KEER.
Naglhomsa Nanuorycu Sgltratno Nagcsussc Nagoviar Nagchomsa Nagcorycu Naaasussc Namihomsa Consensus
601 SSLPVQNGCP ASPPVQNGRP APPPVQNGCQ .... IQEAPE .... IQDARE .... I Q E G P K .... IQEAPE .... H E E A H D INHIIPNGKS ..... Q ....
650 ESAMEMNEPQ A ............................. EHAVEMEEPQ A ............................. ECAMGIEEVQ S ............................. ETIEI..EVP E ............................. DALEIDTEAS E ............................. ETIEIETQVP E ............................. EATD..TEVP K ............................. ...GVDEDNP E ............................. EDSIKGLQPE DVNLLVTCRE EGNPVASLGH SEAETPVDAY E .......................................
?:?ii:i: :.::.?::.::;i:
i:;:!!~!::i ff i:!!i?i: ~i
~ii:iii["[iii~!:i~::2:i
WKRVNEPGAF WKRVNEPGAF CKRVNEQGAF WKRCNEQGAF . .RVNE.GAF
WGLILGLLIG WGLVLGFLIG WGLIIGFVMG YGGMAGFVLG WGL..G...G
::::::::::::::::::::: ~::~.::~;:}~:
i :i~fii :ii?:ili":!i[:,:
~i.ii;~:
ii!ii!iliii
iiiii::ii!! ! iiii:i :::::::::::::::::::::::::::::: ~.:/:.::~
651 N a g l h o m s a . . . . . . . . . . . . . . . . . PAP N a n u o r y c u . . . . . . . . . . . . . . . . . PGP !ii::iiii,i:::!ii~:i S g l t r a t n o . . . . . . . . . . . . . . . . . PAP N a g c s u s s c . . . . . . . . . . . . . . . . . EKK Nagoviar . . . . . . . . . . . . . . . . . EKK N a g c h o m s a . . . . . . . . . . . . . . . . . KKK N a g c o r y c u . . . . . . . . . . . . . . . . . KKK Naaasussc ETR ................. ............................. Namihomsa SNGQAALMGE KERKKETDDG Consensus ....................
iiiii!~'ii::iii!'!~i
N ~9~
SLFRQCLLWF GLFRQCLLWF GLLRQCLLWF GCFRRTYDLF GCLRQAYDMF GIFRRAYDLF GFFRRAYDLF GCLRKAYDLF GRYWKFIDWF G . . R ..... F
600 DLDADEQQG. DLDADELEAP DLDAEELEGP DLDAEEED.. DLDAEDED.. DLDAEEEN.. DLDAGEED.. DLDAEEKR.. ILRCSENNET D L D A . E ....
700 CGMSRGGVGS PPPLTQEEAA CGMNRGRAGG PAPPTQEEEA CGMSKSGSGS PPPTT.EEVA CGLDQQKG...PKMTKEEEA CGLDQQKG...PKMTKEEEA CGLEQHGA...PKMTEEEEK CGLDQDKG...PKMTKEEEA CGL.QRKG PKLSKEEEE CGFKSKSLSK RSLRDLMEEE CG . . . . . . . . . P . . T . E E . .
::::::::::::::::::::::::::::::::::::
.....; ......................., .... -_7..-...-;.-_--.
701 NaglhomsaAAARRLEDIS NanuorycuAAARRLEDIN Sgltratno ATTRRLEDIS Nagcsussc AMKLKMTDTS Nagoviar AMKLKMTDTS NagchomsaAMKMKMTDTS Nagcorycu AMKLKLTDTS Naaasussc AQKRKLTDTS N a m i h o m s a AVCLQMLE.. C o n s e n s u s A ..... ED.S
739 E D P S W A R V V N L N A L L M M A V A VFLWGFYA. E D P R W S R V V N L N A L L M M A V A MFFWGFYA. E D P S W A R V V N L N A L L M M T V A VFLWGFYA. E K P L W R T V V N I N G I I L L T V A VFCHAYFA. E K R L W R M V V N I N G I I L L A V A VFCHAYFA. E K P L W R T V L N V N G I I L V T V A VFCHAYFA. E H P L W R T V V N I N G V I L L A V A VFCYAYFA. E K P L W K T I V N I N A I L L L A V A VFVHGYFA. ETRQVKVILN IGLFAVCSLG IFMFVYFSL E . . . W . . V V N . N ...... V A V F ..... A.
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the alignments: Namibosta, Namicanfa (Namihomsa). Residues listed in the i41!!~ii!!iii~i~i!consensus =)i sequence are present in at least 75% of the aligned transporter sequences. Residues indicated by boldface type are also conserved in the Na§ symporter family. i:.i;!i~ii:.iiGii~:i....i
:::::::::::::::::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::::::::::::::::
Database accession numbers SWISSPROT Naaasussc P31636 Nagchomsa P13866 Nagcorycu Pll170 Nagcsussc P26429 P31639 !;i~!~:::ii:!i:!iiiii~i!i-;i~. ':~ii~!~ii!i;i::i!!.'il)il!!;iNaglhomsa ?':i!~i Nagoviar Namibosta Namicanfa P31637 '~!i:::!!~i;!!!~!~i!%;:::ii Namihomsa Nanuorycu P26430 Sgltratno iiii~:i!ili~iiii.ii:ili:.~i~:i!
~ii!::!%@:ii!~i!
PIR A44432 A33545 S00515; S15974 A36361 $48857 A42163 A56851 A42251
EMBL/GENBANK L02900 M24847; L29338 X55355; X06419 M34044 M95549 X82410 U41338 M85068 L38500 M84020 U29881
References , Hediger, M. et al. (1995) J. Physiol. 482, $7-$17. 2 Reinhart, A.F. and Reithman, R. (1994) Curr. Opin. Cell Biol. 6, 583-594. 3 Reizer, J. et al. (1994) Biochim. Biophys. Acta 1197, 133-166. 4 Wright, E.M. et al. (1996) Curr. Opin. Cell Biol. 8, 468-473. !:~i~i~i'!,?!@!'~,i;!,:i; s Turk, E. et al. (1996) J. Biol. Chem. 271, 1925-1934. 6 Turk, E. et al. (1991) Nature 350, 354-356. ii~,i:~iG~i:ii!:,iiil i!;~:!'::i~:Ji',iii!:@!~i :::::::::::::::::::::::::::::::::::::::::
i :?i:.::ii~iii!i::!~i:i!i ii'~:i ====================== :~......
~91
/
Na+/dicarboxylate symporter family Summary
:::::::::::::::::::::::::::::::
Transporters of the Na+/dicarboxylate symporter family, the examples of which are the DCTA dicarboxylate-Na + symporter of Escherichia coli (Dctaescco)and the EAT1 excitatory amino acid-Na § cotransporter of humans (Eatlhomsa), mediate symport (Na+-coupled or H§ substrate uptake) of neutral amino acids, glucose and dicarboxylated compounds, including excitatory ~i~!!~~' !' ~!!:!!i:i:!i amino acids from the post-synaptic cleft 1-a. Members of the Na+/dicarboxylate symporter family also serve as uniporters, ion and water channels. For example, neurotransmitter cotransporters, such as EAT1, contain ligand-gated ion :?C)iiii!;:~ii channels that mediate C1- conductance in parallel with Na+-glutamate transport, while in the absence of substrates, transport ions in an uncoupled "leak" mode a. Members of the family have a broad biological distribution that ranges from bacteria to humans. !!i!iG:~i:! Statistical analysis reveals no apparent relationship between the amino acid sequences of the Na+/dicarboxylate symporter family and any other family of transporter family. However, the similarity between the kinetics of several families of Na+-driven and H§ transporters suggests a common mechanism of action, despite the lack of amino acid sequence homology a. members of the Na§ symporter family are predicted iii'l i' i!i:!:ii Bacterial to contain 12 membrane-spanning helices by the hydrophobicity of their amino acid sequences and the activity of reporter gene fusions s while the !i:',iiiiiiii mammalian proteins are predicted to contain ten membrane-spanning !:!!:.,:!!~!~! elements by the hydrophobicity of their amino acid sequences e. Eukaryotic transporters may be glycosylated. Several amino acid sequence motifs are highly conserved in the Na§ dicarboxylate symporter family that are necessary for function by the criterion of site-directed mutagenesis 1. ..............
C.Y
.=?::~.::
:~:ii!i:iii~i~ili:i!iii::
- :: : ~::-.
............
~-:,~-,
Nomenclature, biological sources and substrates CODE
DESCRIPTION [SYNONYMS]
Dctaescco
N
C-4 Dicarboxylate transporter [DCTA] Dctarhime C-4 Dicarboxylate transporter [DCTA] Dctarhisp C-4 Dicarboxylate transporter [DCTA] Dctarhile C-4 Dicarboxylate transporter [DCTA]
ORGANISM [COMMON NAMES] Escherichda cold
[gram-negative bacterium] Rhizobium m e l d l o t i
[gram-negative bacterium] Rhizobium s p e c i e s
[gram-negative bacterium] Rhizobium leguminosarum
SUBSTRATE(S)
Na+/4-carbon dicarboxylates Na+/4-carbon dicarboxylates Na§ dicarboxylates Na§ dicarboxylates
[gram-negative bacterium] Eaacratno
High-affinityglutamate transporter [EAAC, EAAC1] Eaacmusmu High-affinityglutamate transporter [EAAC, EAAC1, MEAAC1] Eatlhomsa Na+-excitatoryamino acid cotransporter 1 [EAT1, EAAT1, SLCla3, GLAST1] Eatlbosta Na+-excitatoryamino acid cotransporter 1 [EAT1, EAAT1]
39~
Rattus n o r v e g i c u s
[rat] Mus musculus
Na+/glutamate Na§
[mouse] Homo sapiens
[human] Bos taurus
[cowl
Na+/glutamate, aspartate Na+/glutamate, aspartate
CODE
DESCRIPTION [SYNONYMS] Eatlratno Na§ amino acid cotransporter 1 [EAT1, GLAST] Eat2ratno Na+-excltatory amino acid cotransporter 2 [EAT2, GLT1] Eat2homsa Na§ amino acid cotransporter 2 [EAT2, EAAT2, SLCla2] Eat2musmu Na§ amino acid cotransporter 2 [EAT2, EAAT9] amino acid i~iii!~i!!!!!i~:ii Eat3orycu Na§ cotransporter 3 [EAT3, EAAC1] i:::i[::ii:iii:!ii~.~!i~:~i!ii: Eat3homsa Na§ amino acid cotransporter 3 [EAT3, EAAT3, EAAC1, SLClal] Putative glutamate i!:!i!:~il~i!i!i!i~:i~i~:;i'! Gltlcaeel transporter [GLT1, CEGLT1] ::iili?i!iiiii! Gltloncvo Putative glutamate transporter [GLT1, OVGLT1] Gltpbacsu Glutamate-Na § symporter [GLTP] Gltpescco Glutamate-Na + symporter ~:i:,i',ii~!i~}.~;~'Si}ili, [GLTP] ii,'~,~i~i~ii:'i!i!;!,i' Glutamate-Na + symporter i!;,i~,:~,!i!i:i~:yli!i:,i: Glttbacst [GLTT] Glttbacca Glutamate-Na § symporter i:::':;ii!;i~!i~ii-;!i::iii:.i!ii,i [GLTT] Iaatmusmu Insulin-activated amino acid transporter [IAAT] ~{;:::: Satthomsa Neutral amino acid transporter [SATT, SLCla4, ,~!i~i~i!:~i?!;i!!:i!,~' !,:' ASCTI]
!iiiii !!i!i
i!iiiiiiiiiiiiiiii! ! :::::::::::::::::::::::::::::::::: .............
SUBSTRATE(S)
ORGANISM [COMMON NAMES] Rattus norvegicus
Na+/glutamate, aspartate
[~at]
Rattus norvegicus
[~atl
Na+/glutamate, aspartate
Homo sapiens [human]
Na§ aspartate
Mus musculus [mouse]
Na+/glutamate, aspartate
Oryctolagus cuniculus [rabbit]
Na+/glutamate, aspartate
Homo sapiens [humanl
Na+/glutamate, aspartate
Caenorhabditis elegans [nematode] Onchocerca volvulus [nematodel Bacillus subtilis [gram-positive bacterium] Escherichia coli [gram-negative bacterium] Bacillus stearothermophilus [gram-positive bacterium] Bacillus aldotenax [gram-positive bacterium] Mus musculus [mouse] Homo sapiens [human]
Na+/glutamate
==================================
..........................
.............
.............. ...............
::::::::::::::::::::::::::::::::::
!iiiiii:!!iiii! ....................... ~:,:::~:..
:: :::::::-: ::: .:: :ZC:.:I ?. 7.?.i
Na§ Na+/glutamate, aspartate Na+/glutamate, aspartate Na§ aspartate Na§ aspartate Na+/amino acids Na+/amino acids, glucose
Cotransported ions are listed for symporters.
P h y l o g e n e t i c tree
!ili!~i!iiii~::~!%!ii!!!~i:.?i~i
Proteins listed subsequently in italics are at least 90% identical to the listed in parenthesis and therefore are not included in the iiii!iiii:,i phylogenetic tree: Ghtbacca (Glttbacst); Dctarhisp (Dctarhime); Eat2ratno, ii:~:i)ii~j Eat2musmu (Eat2homsa); Eat3orycu (Eat3homsa); Eatlratno, Eatlbosta :!il~:,~!~i!:,ii!,~i~ {Eat lhomsa); Eaacmusmu (Eaacratno); Glt 1caeel (Glt loncvo).
!.:iiii.!iiiiiiiiipaired transporters i".ii!iiii
',~:!!!iG!!!:i~:~i! ............
~,i;iiiiiiii::ii;i':~iii~::. i
;ii,,i!i;i
::ii',',:iii!i'~!?~[!~}
I
i:.~::G!iG::iG::ii:
Dctarhile Dctarhime Dctaescco Gltpescco Glttbacst Gltpbacsu Iaatmusmu Satthomsa Eat2homsa Gltloncvo Eat3homsa Eeacratno Eatlhomsa
39
iii i:
Proposed orientation of DCTA in the membrane
The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane s. The predicted membrane-spanning helices :~ii~i:;:?.iii:::~i~:;:i~:::: are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75% of the aligned transporters (see b e l o w ) a r e shown. Consensus residues indicated by an asterisk are not conserved in ~!i?'.!:i!i!i~i:;~ DCTA.
i i ,::
p ~ l
OUTSIDE
P
GA
T SS
~
D
AAG
PI
:.~.~:~:-::~::;~ ::.
~i~ii!.i!iiiiii.::!!i~ii!
.... ii, i
i: i k ~i~i!:: ::~::::::::::::::::::::::::: i:
.;i,?:is!i -~G:::::~.~::~i::..~ .~-: .......................~.~. ::,:,:~-,~-::.,,:..~:...--..~:.:~
I -I~
G G
......
LI D
F
k Y~
i!!i!4i:,~i;i;,~!i';;~:::
L
:~:~-~:.:::;~--.~.. :: ~.:
INSIDE
iiii~i-Gilil;i': ............ i-!! ~:~:::~ ::~::..-.::. .~:~:~: :::,.~::. ~,::~,::.~,:~.,:~
NH
COOH
2
!!iiiii: ii Physical and genetic characteristics AMINO ACIDS
Dctaescco
428 441 ~.~::~::::G.-:::,:.:~::.::~:;,; Dctarhisp 449 Dctarhile 444 Eaacratno 523 ,i!i~,!i~,~!!ili!!~!!iEaacmusmu 523 Eatlhomsa 542 ..
!~!:!!!!!i~!!ii!i! Dctarhime
,~:~:~.~.~.~:~:....
i~:ii:ii:.iii:?~!::iGi::
MOL. WT
EXPRESSION SITES
45 436 46 142 46 967 46 044 56 679 56 675 59 572
brain motor cortex
542 543
59 591 59 697
motor cortex motor cortex
Eat2ratno
573
62 106
motor cortex
}94
CHROMOSOMAL LOCUS
79.29 minutes
Eatlbosta i!~ii', )i~i',~,ili~!~;!i:! Eatlratno .....:::::::::::::::::::: ........ ::::::::::::::::::::::::::
Km
Glutamate: 12 pM 6 Aspartate: 7/~M 6 Glutamate: 77 pM 7 Aspartate: 65/~M 7
5p13
Eat2homsa Eat2musmu Eat3orycu Eat3homsa Ghlcaeel Gltloncvo Ghpbacsu Ghpescco Ghtbacst Ghtbacca Iaatmusmu Satthomsa
AMINO ACIDS
MOL. WT
EXPRESSION SITES
5 74 572 524 525 503 492 414 437 421 421 553 532
62 060 62 030 56 938 57155 54 732 53391 44 614 47 159 45 469 45 345 58 384 55 723
motor cortex motor cortex brain brain
Km
CHROMOSOMAL LOCUS
llp13-p12
9p24
999 ~
92.44 minutes
adipocytes brain, muscle, pancreas
2plS-p13
Multiple amino acid sequence alignments 1 ................................ Dctarhime ................................... Dctarhile
50 MI/L~PLDA VAGSKGKKPF
MIIEH SAEVRGKTPL Dctaescco ............................................. MKTSL Gltpescco .............................................. MKNI Glttbacst ................................................ MR Gltpbacsu ................................................. M Iaatmusmu MAVDPPKADP KGSSGGFHRN GGPALGSRED QSAKAGGCCG SRDRVRRCIR S a t t h o m s a .......... M E K S N E T N G Y L D S A Q A G P A A G P G A P G T A A G RARRCAGFLR E a t 2 h o m s a ........ MA S T E G A N N M P K Q V E V R M P D S H L G S E E P K H R H L G L R L C D K L G Gltloncvo ............................................ MVSWIR Eat3homsa ................................. MGKPARK GCPSWKRFLK Eaacratno ................................. MGKPTSS GC.DWRRFLR Eatlhomsa .MTKSNGEEP KMGGRMERFQ QGVRKRTLLA KKKVQNITKE DVKSY..LFR Consensus .................................................. Dctarhile Dctarhime Dctaescco Gltpescco Glttbacst Gltpbacsu Iaatmusmu Satthomsa Eat2homsa Gltloncvo Eat3homsa Eaacratno Eatlhomsa Consensus
51 Y S H L Y V Q V L V A I A A G I L L G H FYPE . . . . . . . . . . L G T Q L K Y R H L Y V Q V L A A I A A G I L L G H FYPD . . . . . . . . . . I G T E L K F K S L Y F Q V L T A I A I G I L L G H FYPE . . . . . . . . . . I G E Q M K KFSLAWQILFAMVLGILLGS YLHYHSDSRD W..LVVNLLS KIGLAWQIFI GLILGIIVGA IFYGNPK ....... VATYLQ KKLIAFQILI ALAVGAVIGH FFPD .......... FGMALR ANLLVLLTVA AVVAGVGLGL GVSAAGGADA LGPARLTRFA R Q A L V L L T V S G V L A G A G L G .... A A L R G L S L S R T Q V T Y L A KNLLLTLTVF GVILGAVCGG LLRLASP... IHPDVVMLIA KNLLLVLTVS SVVLGALCGF LLR.GLQ... LSPQNIMYIS NNWVLLSTVA AVVLGITTGV LVREHSN... LSTLEKFYFA NHWLLLSTVA AVVLGIVVGV LVRGHSE... LSNLDKFYFA N A F V L L . T V T A V I V G T I L G F T L R P Y R .... M S Y R E V K Y F S ...L . . . . . . . . . . G . . . G . . . . . . . . . . . . . . . . . . . . . . .
i00 PLGDAFIKLV PLGDAFIRLV PLGDGFVKLI PAGDIFIHLI PIGDIFLRLI PVGDGFIRLI FPGELLLRLL FPGEMLLRML FPGDILMRML FPGELLMHML FPGEILMRML FPGEILMRML FPGELLMRML G .......
Dctarhile Dctarhime Dctaescco Gltpescco
i01 KMIIAPVIFL KMIIAPVIFL KMIIAPVIFC KMIVVPIVIS
150 TLALIIGLIV TLALVVGLVV TIALIIGLII TVAIILGITL
TVATGIAGMS TVATGIAGMT TVVTGIAGME TLVVGIAGVG
DLQKVGRVAG DLAKVGRVAG SMKAVGRTGA DAKQLGRIGA
KAMLYFLTFS KAMIYFLAFS VALLYFEIVS KTIIYFEVIT
39~
Na+/dicarboxylate symporter family Glttbacst Gltpbacsu Iaatmusmu Satthomsa Eat2homsa ~iiii~iii!ii!:~!ii!: G l t l o n c v o Eat3homsa Eaacratno Eatlhomsa Consensus
~~ii!!iiiiiiiiii ,ii!j~! ~-..~:-
!!!!iiiiii,.i!ii:.iii:
KMIVIPIVIS SLVVGVASVG KMIVVPIVFS TIVIGAAGSG K M I I L P L V V C SLIGGAASL. R M I I L P L V V C SLVSGAASL. K M L I L P L I I S SLITGLSGL. K M M I L P L I M S SLISGLAQL. K L I I L P L I I S SMITGVAAL. K L V I L P L I I S SMITGVAAL. Q M L V L P L I I S SLVTGMAAL. KM...P . . . . . . . . G.A...
DLKKLGKLGG KTIIYFEIIT SMKKMGSLGI KTIIWFEVIT DPSALGRVGA WAALFPGHHT DASCLGRLGG IAVAYFGLTT DAKASGRLGT RAMVYYMSTT DARQSGKLGS LAVTYYMFTT DSNVSGKIGL RAVVYYFCTT DSNVSGKIGL RAVVYYFSTT DSKASGKMGM RAVVYYMTTT D .... G..G ..... Y .... T
TIAIWGLLA TLVLGLGLLL ARVGA.RRGF LSASALAVAL IIAAVLGVIL AVAVVTGIFL LIAVILGIVL VIAVILGIVL IIAVVIGIII ..A...G...
151 200 Dctarhile ANVVQPGAGM N...IDPASL DPAAVATFAAKAHEQSIVGF LTNIIPTTIV Dctarhime ANVVQPGAGM H...IDPASL DAKAVATYAE KAHEQSITGF LMNIIPTTLV DctaesccoVNVVQPGAGM N...VDPATL DAKAVAVYAD QAKDQGIVAF IMDVIPASVI Gltpescco ANVFQPGAGV DMSQLATVDI SKYQSTTEAV QSSSHGIMGT ILSLVPTNIV Glttbacst ANIFQPGTGV NMKSLEKTDI QSYVDTTNEV Q..HHSMVET FVNIVPKNIF G l t p b a c s u A N V L K P G V G L D L S H L A K K D I H E L S G Y T D K V V .... DFKQM I L D I I P T N I I Iaatmusmu GPGAEAGAAV TAITSINDSV VDPCARSAPT KEVLDSFLDL VRNIFPSNLV Satthomsa AFIIKPGSGA QTLQS.SDLG LEDSGPPPVP KETVDSFLDL ARNLFPSNLV Eat2homsa VLAIHPGNPK LKKQL ....... GPGKKNDE VSSLDAFLDL IRNLFPENLV G l t l o n c v o V L V I H P G D P T IKKEI . . . . . . . G T G T E G K T V S T V D T L L D L L R N M F P E N V V E a t 3 h o m s a V V S I K P G V T Q KVGEI . . . . . . . A R T G S T P E V S T V D A M L D L I R N M F P E N L V E a a c r a t n o V V S I K P G V T Q KVNEI . . . . . . . N R T G K T P E V S T V D A M L D L I R N I L G E N L V E a t l h o m s a V I I I H P G K G T K.ENM . . . . . . . H R E G K I V R V T A A D A F L D L I R N M F P P N L V C o n s e n s u s ..... PG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P...
========================== .......
............
Dctarhile Dctarhime Dctaescco Gltpescco Glttbacst Gltpbacsu Iaatmusmu Satthomsa Eat2homsa Gltloncvo Eat3homsa Eaacratno Eatlhomsa Consensus
201 250 GA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GA GA AS ES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SAAFRSFATS YEPKDNSCKI PQSCIQREIN STMVQLLCE ........... V A A F R T Y A T D YKV ...... V T Q N S S S G N V T H E K I P I G T E . . . . . . . . . . . QACFQQIQTV TKKVLVAPPP DEEANATSAE VSLLNETVTE VPEETKMVIK Q A T F Q Q V Q T K YIKV .... RP K V V K N N D S A T L A A L N N G S L D ....... YVK QACFQQYKTK REE..VKPPS DPEMNMTEES FTAVMTTAIS KNK.TKEYKI QACFQQYKTK REE..VKPAS DPGGNQTEVS VTTAMTT.MS ENK.TKEYKI EACFKQFKTN YEKRSFKVPI QANETLVGAV INNVSEAMET LTRITEELVP ..................................................
Dctarhile Dctarhime Dctaescco Gltpescco Glttbacst Gltpbacsu Iaatmusmu Satthomsa Eat2homsa Gltloncvo Eat3homsa
251 ..... FADGD I L Q V L F F S V L ..... FAEGD I L Q V L F I S V L ..... FASGN I L Q V L L F A V L MAKGE M L P I I F F S V L ..... L T K G D M L P I I F F S V M ..... M A R N D L L A V I F F A I L ..... V E G M N I L G L V V F A I V ..... IEGMN I L G L V L F A L V KGLEFKDGMN VLGLIGFFIA ASVEYTSGMN VLGVIVFCIA VGM.YSDGIN VLGLIVFCLV
ilfi!~;i:!iiil
39~
FGIALAMVGE FGISLAIVGK FGFALHRLGS FGLGLSSLPA FGLGVAAIGE FGVAAAGIG. FGVALRKLGP LGVALKKLGS FGIAMGKMGD IGISLSQLGQ FGLVIGKMGE
KG.EQVVNFL KA.EPVVDFL KG.QLIFNVI THREPLVTVF KGK.PVLQFF KASEPVMKFF EG.ELLIRFF EG.EDLIRFF QA.KLMVDFF EA.HVMVQFF KG.QILVDFF
300 NSLTAPVFKL QALTLPIFRL ESFSQVIFGI RSISETMFKV QGTAEAMFYV ESTAQIMFKL NSFNDATMVL NSLNEATMVL NILNEIVMKL VIMDKVIMKL NALSDATMKI
Eaacr atno VGL. YSDGIN VLGLIIFCLV FGLVIGKMGE KG. QILVDFF NALSDATMKI Eatlhomsa VPG. SVNGVN A L G L W F S M C FGFVIGNMKE QG. QALREFF DSLNEAIMRL Consensus ........... L .... F... FG ...... G ......... FF .......... Dctarhile Dctarhime Dctaescco Gltpescco Glttbacst Gltpbacsu Iaatmusmu Satthomsa Eat2homsa Gltloncvo Eat3homsa Eaacratno Eatlhomsa Consensus
301 VAILMKAAPI VAILMKAAPI INMIMRLAPI THMVMRYAPV TNQIMKFAPF TQIVMVTAPI VSWIMWYAPV VSWIMWYVPV VIMIMWYSPL VMTVMWYSPF VQIIMCYMPL VQIIMCYMPI VAVIMWYAPV ........ P.
Dctarhile Dctarhime Dctaescco Gltpescco Glttbacst Gltpbacsu Iaatmusmu Satthomsa Eat2homsa Gltloncvo Eat3homsa Eaacratno Eatlhomsa Consensus
351 VLGAVARYNG F.SIVALLRY IKEELLLVLG VLGAVARYNG F.SILSLIRY IKEELLLVLG VLGSIAKATG F.SIFKFIRY IREELLIVLG VLGIVARLCG L.SVWILIRI LKDELILAYS VLGGVAKLFG I.NIFHIIKI LKDELILAYS LFPLVGLIFQ I.KYFEVLKM IWDLFLIAFS VLPLIYFLFT RKNPYRFLWG IMTPLATAFG VLPLIYFVFT RKNPFRFLLG LLAPFATAFA FLPLIYFVVT RKNPFSLFAG IFQAWITALG SLPLIFFVTT KKNPYVFMRG LFQAWITGLG I L P L I Y F I V V R K N P F R F A M G MAQALLTALM V L P L I Y F I V V R K N P F R F A L G MAQALLTALM VLPLLYFLVT RKNPWVFIGG LLQALITALG .L . . . . . . . . . . . . . . . . . . . . . . . . . . . .
400 TSSSEAALPG LMNKM.EKAG TSSSEAALPG LMNKM.EKAG TSSSESALPR MLDKM.EKLG TASSESVLPR IIEKM.EAYG TASSETVLPK IMEKM.ENFG TTSTETILPQ LMDRM.EKYG TSSSSATLPL MMKCVEEKNG TCSSSATLPS MMKCIEENNG TASSAGTLPV TFRCLEENLG TASSSDTLPI TYICLEENLG ISSSSATLPV TFRCAEENNQ ISSSSATLPV TFRCAEEKNH TSSSSATLPI TFKCLEENNG T.SS...LP ....... E..G
Dctarhile Dctarhime Dctaescco Gltpescco Glttbacst Gltpbacsu Iaatmusmu Satthomsa Eat2homsa Gltloncvo Eat3homsa Eaacratno Eatlhomsa Consensus
401 CKRSVVGLVI PTGYSFNLDG TNIYMTLAAL CKRSVVGLVI PTGYSFNLDG TNIYMTLAAL CRKSVVGLVI PTGYSFNLDG TSIYLTMAAV A P V S I T S F V V P T G Y S F N L D G STLYQSIAAI CPKAITSFVI PTGYSFNLDG STLYQALAAI C P K R V V S F V V P S G L S L N C D G SSLYLSVSCI VAKHISRFIL P I G A T V N M D G A A L F Q C V A A V VDKRISRFIL P I G A T V N M D G A A I F Q C V A A V IDKRVTRFVL PVGATINMDG TALYEAVAAI VDRRVTRFVL PVGATINMDG TALYEAVAAI VDKRITRFVL PVGATINMDG TALYEAVAAV VDKRITRFVL PVGATINMDG TALYEAVAAV VDKRVTRFVL PVGATINMDG TALYEALAAI ....... FV. P.G...N.DG ...Y...AA.
FIAQATGIHL FIAQATDTPL FIAQATNSQM FIAQLYGIDL FIAQLYGIDM FLAQAFQVDM FIAQLNGVSL FIAQLNNVEL FIAQMNGVVL FIAQINGVHL FIAQLNDLDL FIAQLNGMDL FIAQVNNFEL FIAQ ..... L
GAFGAMAFTI GAFGAMAFTI GAFGAMAFTI GVFALIAVTV GVFALIGVTV GVLALMAASV GILFLVASKI GIMFLVGSKI GIACLICGKI GILCLIMGKI GILFLIAGKI GILFLIAGKI GILFLIAGKI G...L .... I
350 G K Y G V G S I A . . N L A M L I G T F YITSLLFVFI G K Y G I A S I A . . N L A M L I G T F YLTSFLFVFI G K Y G V G T L V . . Q L G Q L I I C F YITCILFVVL A N F G F S S L W . . P L A K L V L L V HFAILFFALV SKFGVESLI..PLSKLVIVVYATMVFFIFV G Q Y G I E L L L . . P M F K L V G T V FLGLFLILFV VEMKDVRQLF ISLGKYILCC LLGHAIHGLL VEMKDIIVLV TSLGKYIFAS ILGHVIHGGI IAIKDLEVVA RQLGMYMVTV IIGLIIHGGI LEIHDLADTA RMLAMYMVTV LSGLAIHSLI IEVEDWEIF. RKLGLYMATV LTGLAIHSIV IEVEDWEIF. RKLGLYMATV LSGLAIHSLV VEMEDMGVIG GQLAMYTVTV IVGLLIHAVI ............ L .................
451 Dctarhile V A M L S S K G A A G I T G A G F I T L A A T L S V V P S V Dctarhime V A M L S S K G A A G I T G A G F I T L A A T L S V V P S V
450 SWGDQILLLL SYGDQILLLL DIVHQITLLI SIWQEIILVL PISQQISLLL TLSQQLLMML DFVKIITILV NAGQIFTILV DGGQIVTVSL SFGQVVTVSL GIGQIITISI SIGQIITISI NFGQIITISI ..........
500 PVAGMALILG IDRFMSECRA PVAGMALILG IDRFMSECRA
Na+/dicarboxylate Synlporter family Dctaescco Gltpescco Glttbacst Gltpbacsu Iaatmusmu Satthomsa Eat2homsa Gltloncvo Eat3homsa Eaacratno Eatlhomsa Consensus
VLLLSSKGAAGVTGSGFIVLAATLSAVGHL TLMVTSKGIA GVPGVSFVVL LATLGSVG.I VLMVTSKGIA GVPGVSFVVL LATLGTVG.I VLVMTSKGIA AVPSGSLVVL LATANAVG.L TATASSVGAAGIPAGGVLTL AIILEAVS.L TATASSVGAAGVPAGGVLTI AIILEAIG.L TATLASVGAASIPSAGLVTM LLILTAVG.L TATLASIGAASVPSAGLVTM LLVLTAVG.L TATSASIGAAGVPQAGLVTM VIVLSAVG.L TATAASIGAAGVPQAGLVTM VIVLSAVG.L TATAASIGAAGIPQAGLVTM VIVLTSVG.L ..... S . G A A G . P . . G . . T .... L..VG..
PVAGLALILG IDRFMSEARA PLEGLAFIAG VDRILDMART PIEGLAFIAG IDRILDMART PAEGVAIIAG VDRVMDMART PVKDISLILA VDWLVDRSCT PTHDLPLILA VDWIVDRTTT PTEDISLLVA VDWLLDRMRT PVKDVSLIVA VDWLLDRIRT PAEDVTLIIA VDWLLDRFRT PAEDVTLIIA VDWLLDRFRT PTDDITLIIA VDWFLDRLRT P ..... L I . . . D . . . D . . R T
501 550 Dctarhile LTNLVGNAVA TIVVARWENE LDTVQLAAAL GGQTGEDTSA AGLQPAE... Dctarhime LTNFVGNAVA TIVVAKWEGE LDQAQLSAALGGEASVEAIP AVVQPAE... D c t a e s c c o L T N L V G N G V A T I W A K W V K E L D H K K L D D V L N N R A P D G K T H ELSS ...... G l t p e s c c o A L N V V G N A L A V L V I A K W E H K F D R K K A L A Y E R E V L G K F D K T A D Q ....... G l t t b a c s t A V N V I G N S L A A I I M S K W E G Q Y N E E K G .... K Q Y I A Q L Q Q S A ......... Gltpbacsu GVNVPGHAIA CIVVSKWEKA FRQKEWVSAN SQTESI .............. Iaatmusmu VLNVEGDAFG AGLLQSYVDR TKMPSSEPEL IQVKNEVSLN PLPLATEEGN SatthomsaVVNVEGDALG AGILH.HLNQ KATKKGEQEL AEVKVEAIPN ..CKSEEETS Eat2homsa SVNVVGDSFG AGIVY.HLSK SELDTIDSQH RVHEDIEMTK TQSIYDDMKN G l t l o n c v o S I N V L G D A M G A G I V Y . H Y S K A D L D A H D R L . . . . . . A A T T R SHSI ...... Eat3homsa MVNVLGDAFG TGIVE.KLSK KELEQMD .... VSSEVNIVNPFALESTI.L Eaacratno MVNVLGDAFG TGIVE.KLSK KELEQVD .... VSSEVNIVNPFALEPTI.L E a t l h o m s a T T N V L G D S L G A G I V E . H L S R H E L K N R D .... V E M G N S V I E E N E M K K P Y Q L C o n s e n s u s ..NV.G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Dctarhile Dctarhime Dctaescco Gltpescco Glttbacst Gltpbacsu Iaatmusmu Satthomsa Eat2homsa Gltloncvo Eat3homsa Eaacratno Eat l h o m s a Consensus
551 596 .............................................. .............................................. .............................................. .............................................. .............................................. .............................................. PLLK. Q Y Q G P T G D S S A T F E . K E S V M . . . . . . . . . . . . . . . . . . . . . PLVTHQNPAG PVASAPELES KESVL ..................... HRESNSNQCV YAAHNSVIVD ECKVTLAANG KSADCSVEEE PWKREK . .AMNDEKRQ LAVYNSLPTD DEKHTH .................... D N E D S D T K K S Y V N G G F A V D K S D T I S F T Q T S QF . . . . . . . . . . . . . . D N E D S D T K K S Y V N G G F S V D K S D T I S F T Q T S QF . . . . . . . . . . . . . . IAQDNETEKP IDSETKM ............................. ..............................................
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the alignments: Glttbacca (Glttbacstl; Dctarhisp (Dctarhime); Eat2ratno, Eat2musmu (Eat2homsa); Eat3orycu (Eat3homsa); Eatlratno, Eatlbosta (Eatlhomsab Eaacmusmu (Eaacratno); Ghlcaeel (Gltloncvol. Residues listed in the consensus sequence are present in at least 75% of the aligned transporter sequences.
398
:.-:..:.::.::.::..-:: 9 .::-:
|
Database accession numbers SWISSPR OT Iaatmusmu Dctaescco P3 7312 Dctarhime P20672 Dctarhisp P31601 Dctarhile Q01857 Eaacratno Eaacmusmu Eat 1homsa P43003 Eatlratno P24942 Eat 1bosta P46411 Eat2musmu P43006 Eat2ratno P31596 Eat2homsa P43004 Eat3orycu P31597 Eat3homsa P43005 Gltloncvo Glt 1caeel Gltpbacsu P39817 Gltpescco P21345 Glttbacst P24943 Glttbacca P24944 Satthomsa P43007
PIR JC4149
A33597; S04816 $25701; $27384
EMBL/GENBANK
U00039 M26399; J03683 $38912 Zl1529 U39555
$55677 $26609; A46370
JV0092; A42384 $26247 $26246
U03504 X63744 D29661 U11763 X67857 U03505 L12411; $49854 U03506 U35251 U35250 U15147 M32488; M84805 M86508 M86509 L14595; L19444
References 1 Reizer, J. et al. (1994) Biochim. Biophys. Acta 1197, 133-166. 2 Hediger, M. et al. (1995) J. Physiol. 482, $7-$17. a Malandro, M. and Kilberg, M. (1996) Annu. Rev. Biochem. 65, 3 0 5 - 3 3 6 . 4 Wright, E. et al. (1996) Curr. Opin. Cell Biol. 8, 4 6 8 - 4 7 3 . s Jording, D. and Puhler, A. ( 1993} Mol. Gen. Genet. 241, 106-114. 6 Kanai, Y. and Hediger, M. (1992)Nature 360, 4 6 2 - 4 7 1 . 7 Storck, T. et al. (1992) Proc. Natl Acad. Sci. USA 89, 10955-10959.
m
Na+/PO4 symporter family Summary
:iiiiiiii!!ii~iii:i :i-tiii{(::ii:::i::!:i!:.:i!i ii!i111:i:::i.ii!.11!:i:!ii:.:iiii .......
Transporters of the Na+/PO4 symporter family, the example of which is the NPT1 phosphate-Na + cotransporter of humans (Nptlhomsa), mediate symport (Na+-linked substrate uptake} of phosphate. All known members of the family occur in mammals. Statistical analysis reveals no relationship between the amino acid sequences of the Na+/PO4 symporter family and other family of transporters. They are predicted to form six or possibly eight membrane-spanning helices by the hydropathy of their amino acid sequences and contain potential sites for glycosylation and protein kinase C phosphorylation. Several amino acid sequence motifs are highly conserved in the Na+/PO4 !~:ii~i!i:!:!.!ii!!%~:.;::symporter family.
....+:::. ........................
ili-:iiii$ii.iii!:!iiii!!:.1ii -..::-: :-::.. -..
:::::::::::::::::::::::::::::::::,i. :.,ii s
- .-.--.- .--/. ii i.::i!:1 :L,i:.:I
Nomenclature, biological sources and substrates
]::.Gs ................. . .........
CODE
DESCRIPTION [SYNONYMS]
Nptlhomsa
Renal phosphate-Na+ cotransporter [NPT1] Renal phosphate-Na§ cotransporter [NPT1] Renal phosphate-Na§ cotransporter [NPTI] Renal phosphate-Na§ cotransporter [NPT1] Brain phosphate-Na§ cotransporter [NPT2]
Nptlmusmu Npt 1orycu Nptlratno
.=.
=..:. :.
#iiiiii?i
Npt2ratno
OR GANISM [COMMON NAMES] Homo sapiens
S UBSTRATE(S)
Na§
[human] Mus musculus
[mousel Oryctolagus cuniculus
Na+/P04 Na§
[rabbit] Rattus norvegicus
Na§
[rat] Rattus norvegicus
Na*/P04
[rat]
Cotransported ions are listed.
i
Phylogenetic tree Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the phylogenetic tree: N p t l m u s m u (Nptlhomsa}.
iil
[
~'~
Npt lhomsa Nptlorycu Npt I ratno Npt2ratno
Proposed orientation of NPT1 in the membrane The model is based on predictions of membrane-spanning regions and g-helical content. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix is boxed. More than half of the residues are conserved in at least 75% of the members of the Na+/PO4 symporter family and, therefore, are not mapped onto the model.
10(
OUTSIDE ,.::,:,::. ......................... .......... ..........
~:if';:!!i~,f?i:!;~i]
9
,
I ,
,
9
m [-Ol t
m, ,4 :
!!!iiiiil
,
[
,
,
;
(~:
9?
(/
~!;: ~?:!: ~,! ',i!i',!i:~! !
,?~
9
i
ii;~i , ~
;:;:;:;:;::.:-::;:;:;:;-;:............
9
i:,f',~fi!ii',!!)! i:.i:i,~ii!~,!i~,i~,i!i~1
.
, ~
...::,
!m! , 2~
=. . . .
II COOH
f i!il .............. :::::::::::::::::::::::::::::::::
NH 2
Physical and genetic characteristics
.................. ...................
': :r::-:~-::::::7-. ................ :::::::::::::::::::::::::
.................................
:::,..:~ :::::::::::::::::::::
!}~!!N!~I! ..................
.................
................. ....,.:..:.x.:.., ............... .................. z z z T z
.................. ................
.................. .................
..................
INSIDE
Nptlhomsa Nptlmusmu Nptl orycu Npt lratno Npt2ratno
AMINO ACIDS
MOL. WT
EXPRESSION SITES
CHROMOSOMAL LOCUS
467 465 465 465 560
51 143 51 589 51 798 51 349 61 665
kidney kidney kidney kidney brain
6p23-p21.3
Multiple amino acid sequence alignments Nptlhomsa Nptlorycu Nptlratno Npt2ratno Consensus
1 50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M QMDNRLPPKK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MDNQFPSRK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MENRCLPKK MEFRQEEFRK LAGRALGRLH R L L E K R Q E G A E T L E L S A D G R PVTTHTRDPP ................................................. K
51 Nptlhomsa VPGFCSF .... RYGLSFLVH N p t l o r y c u GPCFCSF .... RYVLALFMH Nptlratno VPGFCSF .... RYGLAILLH N p t 2 r a t n o V V D C T C F G L P RRYIIAIMSG C o n s e n s u s V P . F C S F .... RY.LA...H
CCNVIITAQR FCNIVIIAQR FCNIVIMAQR LGFCISFGIR .CN..I.AQR
i00 ACLNLTMVVMVNSTDPHGLP MCLSLTMV~MVNNTNLHGSP VCLNLTMVAMVNKTEPPHLS CNLGVAIVSMVNNSTTHRGG .CL.LTMV.MVN.T..H...
i01 N p t l h o m s a N T S T K K L L D N IKNPMYNWSP DIQGIILSST N p t l o r y c u N T S A E K R L D N TKNPVYNWSP DVQGIIFSSI Nptlratno NKSVAEMLDN VKNPVHSWSL DIQGLVLSSV Npt2ratno HVVVQK ....... AQFNWDP ETVGLIHGSF ConsensusN.S..K.LDN .KNP..NWSPD.QG.I.SS.
150 SYGVIIIQVP V G Y F S G I Y S T FYGAFLIQIP VGYISGIYSI FLGMVVIQVP VGYLSGAYPM FWGYIVTQIP G G F I C Q K F A A F.G...I..PVGY.SG.Y..
:--:::::.::.::_::
................. !i:~!!::!::!ii~'isi!:.~
101
Nptlhomsa Nptlorycu Nptlratno Npt2ratno Consensus
151 KKMIGFALCL KKLIGFALFL EKIIGSSLFL NRVFGFAIVA .K.IGFAL.L
201 Nptlhomsa YVKWAPPLER NptlorycuWVKWAPPLER NptlratnoWVKWAPPLER Npt2ratno WSKWAPPLER Consensus WVKWAPPLER
GRLTSMSTSG GRLTSMSLSG GRLTSMTLSG SRLATTAFCG GRLTSM..SG
FLLGPFIVLL FLLGPFIVLL FVMGPFIALL SYAGAVVAMP F..GPFI.LL
VTGVICESLG VTGIICESLG VSGFICDLLG LAGVLVQYSG V.G.IC..LG
250 WPMVFYIFGA WPMVFYIFGA WPMVFYIFGI WSSVFYVYGS WPMVFYIFG.
VLFYDDPKDH PCISISEKEY VLYYDDPKDH PCVSLHEKEY ILLFDDPNNH PYMSSSEKDY LVSYESPALH PSISEEERKY .L.YDDP..HP..S..EK.Y
300 ITSSLVQQV .... SSSRQSL ITSSLIQQG .... SSTRQSL ITSSLMQQV .... HSGRQSL IEDAIGESAK LMNPVTKFNT ITSSL.QQ ...... S.RQSL
301 Nptlhomsa PIKAILKSLPVWAISIGSFT FFWSHNIMTL Nptlorycu PIKAMIKSLP LWAISFCCFA YLWTYSRLIV Nptlratno PIKAMLKSLP LWAIILNSFA FIWSNNLLVT Npt2ratno PWRRFFTSMP VYAIIVANFC RSWTFYLLLI ConsensusPIKA..KSLP .WAI .... F . . . W .... L..
350 YTPMFINSML HVNIKENGFL YTPTLINSML HVDIRENGLL YTPTFISTTL HVNVRENGLL SQPAYFEEVF GFEISKVGLV YTP..I...LHV.I.RNGLL
Nptlhomsa Nptlorycu Nptlratno Npt2ratno Consensus
251 CGCAVCLLWF CGCAVCLLWF VGCVLSLFWF FGIFWYLFWL .GC...L.WF
200 S S V L S L L I P P A A G I G V A W V V V C R A V Q G A A Q GIVATAQFEI SSLVSIFIPQAAAVGETWII VCRVVQGITQ GTVTTAQHEI SSVLSLLIPPAAQVGAALVI VCRVLQGIAQ GAVSTGQHGI TSTLNMLIPSAARVHYGCVI FVRILQGLVE GVTYPACHGI SS.LS.LIP. A A . V G G . . V I V C R . . Q G . . Q G . V . T A Q H . I
351 SSLPYLFAWI SSLPYLFAWI SSLPYLLAYI SALPHLVMTI SSLPYL.A.I
400 CGNLAGQLSD FFLTRNILSV IAVRKLFTAAGFLLPAIFGV CGVIAGHTAD FLMSRNMLSL TAIRKLFTAI GLLLPIVFSM CGIVAGQMSD FLLSRKIFSV VAVRKLFTTL GIFCPVIFVV IVPIGGQIAD FLRSRHIMST TNVRKLMNCG GFGMEATLLL C G . . A G Q . . D F L . S R. I . S . . A V R K L F T . . G...P..F..
401 Nptlhomsa CLPYLSSTFY Nptlorycu CLLYLSSGFY Nptlratno CLLYLSYNFY Npt2ratnoVVGY.SHSKG Consensus CL.YLS..FY
450 SIVIFLILAG ATGSFCLGGV FINGLDIAPR YFGFIKACST STITFLILAN ASSSFCLGGA LINALDLAPR YYVFIKGVTT STVIFLTLANSTLSFSFCGQ LINALDIAPR YYGFLKAVTA VAISFLVLAV GFSGFAISGF NVNHLDIAPR YASILMGISN S...FL.LA .... S F . . . G . . I N . L D I A P R Y . . F . K ....
451 Nptlhomsa LTGMIGGLIA Nptlorycu LIGMTGGMTS Nptlratno LIGIFGGLIS Npt2ratno GVGTLSGMVC ConsensusL.G..GG...
STLTGLILKQ DPESAWFKTF STVAGLFLSQ DPESSWFKIF STLAGLILNQ DPEYAWHKNF PIIVGAMTKH KTREEWQYVF ST..GL.L.QDPE..W.K.F
Nptlhomsa Nptlorycu Nptlratno Npt2ratno Consensus
Nptlhomsa Nptlorycu Npt ir atno Npt2ratno Consensus
500 ILMAAINVTG LIFYLIVATA LLMSIINVIS VIFYLIFAKA FLMAGINVTC LAFYLLFAKG LIASLVHYGG VIFYGVFASG .LM..INV...IFYL.FA..
501 550 EIQDWAKEKQ HTRL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EIQDWAKEKQ HTRL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DIQDWAKETK TTRL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EKQPWAEPEE MSEEKCGFVG HDQLAGSDES EMEDEVEPPG APPAPPPSYG EIQDWAKE...TRL ....................................
551 Nptlhomsa . . . . . . . . . . . . . . . . . . Nptlorycu . . . . . . . . . . . . . . . . . .
568
Nptlratno ..................
..................
::::-:::.:::-::::::
r!.. :.:.,.:.:.,~.,.,.,::.:.,.,.,
!:!-!:!::::::::::::::::::::::::::::::::::
Npt2ratno ATHSTVQPPR PPPPVRDY Consensus ..................
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the alignments: N p t l m u s m u (Nptlhomsa). Residues listed in the consensus sequence are present in at least 75 % of the transporter sequences shown. Database accession numbers SWISSPR OT
PIR
Npt 1homsa
A48916
Nptlmusmu Nptlorycu Nptlratno Npt2ratno
A56410; $27951
EMBL/GENBANK X71355
X77241 M76466 U28504 U07609
102
Na+/branched amino acid symporter family
Summary Transporters of the Na+/branched amino acid symporter family, the example of which is the BRNQ branched chain amino acid transporter of Salmonella typhimurium (Brnqsalty), mediate symport {cation-coupled substrate uptake) of isoleucine, leucine and valine. The cotransported ion may be H § Na § or Li § depending on the transporter. Known members of the family occur in ~!~!!'!A'/,I gram-negative bacteria. Statistical analysis reveals no apparent relationship between the amino acid sequences of the Na+/branched amino acid symporter family and any other family of transporters. They are predicted to form 12 membrane-spanning helices by the hydropathy of their amino acid sequences. Several amino acid sequence motifs are highly conserved in the Na§ amino acid symporter family, including motifs necessary for iii!iiiiii!ii<:ibranched ;=: iiii~i~;:il;ili function by the criterion of site-directed rnutagenesis 1. ~:{::~:.::::.i::~:.~,:::~::,~:: :, ........................ ~=:~
Nomenclature, biological sources and substrates CODE
Brabpseae
N
DESCRIPTION [SYNONYMS] Branched chain amino acid transport system 2 [BRAB, LIVII] Branched chain amino acid transport system 3
OR GANISM
SUBSTRATE(S)
[COMMONNAM~S]
Brnqclope
Branched chain amino acid transport system 2 [BRNQ]
Pseudomonas aeruginosa [gram-negative bacterium] Pseudornonas aeruginosa [gram-negative bacterium] Clostridium perfringens [gram-positive bacterium]
Bmqhaein
Branched chain amino acid transport system 2 [BRNQ]
Haemophilusinfluenzae [gram-negativebacterium]
Bmqlacde
Branched chain amino acid transport system 2 [BRNQI]
Brnqsalty
Branched chain amino acid transport system 2 [BRNQ, LIVn]
Lactobacillus delbrueckii [gram-positive bacterium] Salmonella typhimurium [gram-negative bacterium]
Brazpseae
[SKaZ LrVUI
Na § Li§ isoleucine, leucine H§ leucine, valine Na § Li§ isoleucine, leucine Na§ Li*/valme isoleucine, valine Na +, Li+/valine isoleucine, valine H§ leucine, valine
~i!i!iiiii!iill Cotransported ions are listed.
Phylogenetic tree I
104
i
Brazpseae Brnqsalty Brabpseae mrnqhaein Brnqclope Brnqlacde
Proposed orientation of BRNQ in the membrane ......:::::::::::::::::::: ....
!ii~!~??!;:!ii :,:..~:.,::.~:~:~:.:;:...... ~.~
The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75% of the aligned transporters (see below) are shown. Consensus residues indicated by an asterisk are not conserved in
BRNQ.
OUTSIDE
NG L V FE
G G
m p L
I p
i iliiiiiil!,.....
N! ~N
I
D L F
NH
COOH
2
INSIDE
Physical and genetic characteristics AMINO ACIDS 43 7 437 338 436 446 439
Brabpseae Brazpseae Bmqclope Bmqhaein Bmqlacde Bmqsalty
MOL. WT 45 282 45 274 35 864 47 038 47 869 46 534
Km Isoleucine: 12/~M
Multiple amino acid sequence afignments Brazpseae Brnqsalty Brabpseae Brnqhaein
1
.MNALKGRDI MTHQLKSRDI .MTHLKGFDL ...MFSRKDI
LALGFMTFAL FVGAGNIIFP IALAFMTFALFVGRGNIIFP LALGFMTFAL FLGAGNIIFP IVLGMMIFAL FLGAGNIIFP
50
PIVGLQSGPHVWLAALGFLI PMVGLQAGEHVWTARIGFLI PSAGMAAGEHVWSAAFGFLL PMEGFSSGQH WTSASLGFVL
105
Brnqclope ...MNKKKDI LVIGFALFSI FFGAGNLIFP PYIGLTSGSE WLISFLGFII Brnqlacde MKEKLTHAES LTISSMLFGL FFGAGNLIFP A Y L G E A S G A N LWISLLGFLI Consensus ........ D ...... M.F.L F . G A G N . I F P P . . G . . . G ........ GF..
~ii.ii!i!iii ~ii~iii~!ii.i:~|
iii,i!~ii~.~.......... ~i:!, i:~ii,i:ii?:s .......
......... :::::::::::::::::::::: .......
Brazpseae Brnqsalty Brabpseae Brnqhaein Brnqclope Brnqlacde Consensus
51 T A V G L P V I T V IALAKV~ V D A L S H P I G R YAGGLLAAVC TAVGLPVLTV VALAKV~ V D S L S T P I G K VAGLLLATVC T G V G L P L L T V V A L A R V . G G G IGRLTQPIGR R A G V A F A I A V T G V L M P F I T L V V V A I L . G R G .EELTKDLPK W A G T G F L V I L SDVGIIFLSI VAVSK..AGS FQGVVGRAGK KFGITLEILM TGVGLPLLAI A S L G M T R S E G LLDLSGRVSH KYSYFFTCLL T.VG.P . . . . . . . . . . . . . . . . . L ........ G .......
Brazpseae Brnqsalty Brabpseae Brnqhaein Brnqclope Brnqlacde Consensus
i01 150 P R T A T V S F E V GVVP..LLG. ESGTALFVYS LAYFLLALAI SLYPGRLLDT P R T A T V S F E V GIAP..LTG. D S A M P L L I Y S V V Y F A I V I L V SLYPGKLLDT PRTAVVSFEM GVAP..FTG. D G G V P L L I Y T VAYFSVVLFL V L N P G R L V D R PRITNVAYEM AWLPLGLTE NNANVRFVFS LIFNLIAMGF MISPNTIISS PRTAATTFEM SISPLL ..... GNVNPYVFP V I F F L I V F V L TIKPNKVMDI PRSFTVPFET GISALLPSGM AKSTGLFIFS LIFFAIMLFF SLRPGQIMDW PR...V.FE .... P . . . . . . . . . . . . . . . . . . . F ......... P .... D.
Brazpseae Brnqsalty Brabpseae Brnqhaein Brnqclope Brnqlacde Consensus
151 200 VGRFLAPLKI L A L A I L G V A A F L W P A G P I G T A Q P E Y T Q A . A .FSQGFVNGY VGNFLAPLKI I A L V I L S V A A I V W P A G P I S N A L D A Y Q N A . A .FSNGFVNGY V G K V I T P V L L S A L L V L G G A A I F A P A G E I G S SSGEYQSA.P .LVQGFLQGY V G K F M T P A L L V L L I A V A I T V FISPLSEIQA PSNAYENSHS .LLIGLTSGY IGKVLTPLLL ISLAVLIIKG IINPIGDLEK V .... NSGKL .FMTGITQGY IGKFLTPAFL L F F F F I M I M A L L H P L G N Y H A V K P V G E Y A S A PLISGVLAGY .G .... P ..... L .......... P.G . . . . . . . . . . . . . . . . . . G...GY
:d:'b ~)!!:iii:i~i~!!i:?~i;!i !9:~i;i::i.!ii.!::: i::! :.:-:.2
iiii!i;ii! !ii!iiii!
i:2!.!~:!i[.;::!1:I: !.i; .F!i~i--. <.i:;i~
:: :ii !::.:, :::kl~
i00 YLAVGPLFAI YLAVGPLFAT YLAIGPLFAT YLTIGSTFAM MLCLGPILVV YLTIGPFFAI YL..GP.FA.
Brazpseae Brnqsalty Brabpseae Brnqhaein Brnqclope Brnqlacde Consensus
201 LTMDTLAALV LTMDDWVAMV LTMDTLGALV QTMDVLAAIA QTMDALGTGG NTMDALAGLA .TMD.L ....
250 FGIVIVNAIR SRGVQSPRLI TRYAIVAGLI A G V G L V L V Y V FGIVIVNAAR SRGVTEARLL TRYTVWAGLM A G V G L T L L Y L FGIVIATAIR D R G I S D S R L V T R Y S M I A G V I A A T G L S L V Y L FGGIVARALS AKNVTKTKDI V K Y T I S A G F V SVILLAGLYF IVALVMASFA SKGYKDKKEN RMLTIKSALI A C I G L A I V Y G FGIIVISSIR T F G V T K P E K V A S A T L K T G V L T C L L M A V I Y A FG .......... G . . . . . . . . . . . . . . G ...... L...Y.
Brazpseae Brnqsalty Brabpseae Brnqhaein Brnqclope Brnqlacde Consensus
251 SLFRLGAGSH A I A A D A S N G A A V L H A Y V Q H T ALFRLGSDSA TLVDQSANGA AILHAYVQHT A L F Y L G A T S Q G I A G D A Q N G V QILTAYVQQT SLFYLGATSA A V A E G A T N G G Q I F S R Y V N V L GLTFLGATSS T L Y D S S I S Q T TLLMNITNAI ITALVGAQSR TALGLAANGG EALSQIARHY .L..LGA.S ........ N G . . . L .......
300 FGSLGSSFLA GLIALACLVT FGGAGSFLLA A L I F I A C L V T FGVSGSLLLAVVITLACLTT FGSAGTWIMA GIIVLASLTT L G S T G T I M L A IVIGLACLTT FPGLGAVIFA L M I F V A C L K T FG..G .... A ..I..ACL.T
Brazpseae Brnqsalty Brabpseae Brnqhaein
301 AVGLTCACAE AVGLTCACAE AVGLITACGE LVGVTSASAD
GFSFIVSNLG GFSMVVSNLG LFSLLVANQG AMTITVSQYG
YFCQR..LPL FFAQY.. IPL FFSDL..LPV YFSKFS.VRF
SYRSLVIILA SYRTLVFILG SYKTVVIVFS SYPFWAALFT
350 LTKLIQVSIP LSHLIQISIP LTQLISLSVP LTDLLRITIP
Brnqclope AVGLTSVTAK YFEDVSNKKL KYKYIVIAIC VFSALSSNLG VDKIIEIAVP Brnqlacde AIGLITACSE TFAEMFPKTL SYNMWAIIFS LLAFGIANVG LTTIISFSLP Consensus A V G L . . A .... F ........ SY . . . . . . . . . . . . . . . N.G L...I .... P 351 400 Brazpseae VLTAIYPPCI VLVALSFCIG LWHSAT...R ILAPVMLVSL AFGVLDALKA Brnqsalty VLTAIYPPCI ALVVLSFTRSWWHNST...R IIAPAMFISL LFGILDGIKA Brabpseae VLVGLYPLAI VLIALSLFDR LWVSAP...R VFVPVMIVAL LFGIVDGLGA Brnqhaein ALLLIYPVAI VLVLLQFLRK KLPSIK...F TYNSTLLVTV CFSLCDSLNN B r n q c l o p e VLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Brnqlacde VLMLLYPLAI SLILLALTSK LFDFKQVDYQ IMTAVTFLCA LGDFFKALPA C o n s e n s u s V L . . . Y P . . I .L..L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Brazpseae Brnqsalty Brabpseae Brnqhaein Brnqclope Brnqlacde Consensus Brazpseae Brnqsalty Brabpseae Brnqhaein Brnqclope Brnqlacde Consensus
401 450 AG.LGQDFPQ WLLHLPLAEQ GLAWLIPSVA TLAACSLVDR LLGKPAQVAA SA.FGDMLPA WSQRLPLAEQ GLAWLMPTVVMVILAIIWDRAAGRQVTSSA AK.LNGWVPD VFAKLPLADQ SLGWLLPVSIALVLAVVCDR LLGKPREAVA V K M L P E S I N S L L K H F P L S S E G M A W L V P T L V M L V A S I F I G K ALHKTHS... .................................................. G M Q V K A V T G L Y G H V L P L Y Q D G L G W L V P V T V I F A I L A I K G V ISKKRA .... . . . . . . . . . . . . . . . PL ...... WL.P . . . . . . . . . . . . . . . . . . . . . . . 451
.
H
. . .
. .
Residues listed in the c o n s e n s u s s e q u e n c e are p r e s e n t in at least 75 % of the aligned t r a n s p o r t e r sequences.
~i~i!ii~!~ Database accession numbers i:!i~/ili:,~;~:~i~i )I:,+i:!:;il!i: ~:iT~i!:;!~' !~i~:~!~:! Brabpseae Brazpseae Brnqclope Brnqhaem Brnqlacde ~i!~i!!~ Brnqsalty
D
SWISSPR OT
PIR
EMBL/GENBANK
P 19072 P25185
S 11497 A38534 D49784 D64056 Z48676 JQO007
X51634 D90222
P14931
L42023 D00332
References 1 Reizer, J. et al. (1994) Biochim. Biophys. Acta 1197, 133-166. e Hoshino, K. et al. (1991) J. Bacteriol. 173, 1855-1861.
107
Na+/citrate symporter f a m i l y ii!::::~ii!i iii:i~::i:ii!::iiii~i:: ~ii!i:!::::zii.!~i!
Summary
Transporters of the Na+/citrate symporter family, the example of which is the CITN citrate transporter of Klebsiella pneumoniae (Citnklepn), mediate !ii!i?i~iiii~i ~, symport (Na+-coupled substrate uptake) of citrate. Known members of the family occur in gram-negative bacteria. Statistical analysis reveals no relationship between the amino acid sequences of the Na+/citrate symporter family and any other family of trans!ii~!i~!i!~!':porters. :!fi!!<~:ili!ii!They are predicted to form 12 membrane-spanning helices by the hydropathy of their amino acid sequences. Several amino acid sequence motifs are highly conserved in the Na+/citrate symporter family.
Nomenclature, biological sources and substrates CODE
Citnklepn Citnlacla Citnsaldu Citnsalpu
DESCRIPTION [SYNONYMS] Citrate-Na +syrnporter [CITS, CITN] Citrate-Na +symporter [C1TP, CITN] Citrate-Na § symporter [CITC, CITN] Citrate-Na +symporter
[crrc, crrN]
Citpleula Citpstrbo
Citrate-Na § symporter ICITS, CITNI Citrate-Na +symporter [CITS, CITNI
ORGANISM [COMMON NAMES] Klebsiella pneumoniae [gram-negatwe bacterium] Lactococcus lactis [gram-positive bacteria] Salmonella dublin [gram-negative bacterium] Salmonella pullorum [gram-negative bacterium] Leuconostoc lactis [gram-posiuve bacterium] Streptococcus bovis [gram-positive bacterium]
SUBSTRATE(S)
Na+/citrate Na+/citrate Na+/citrate Na+/citrate Na+/citrate Na+/citrate
Cotransported ions are listed.
Phylogenetic tree Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the phylogenetic tree: Citnsalpu, Citnsaldu (Citnklepn); Citpleula (Citnlacla).
I
Citnklepn Citpstrbo Citnlacla
Proposed orientation of CITN in the membrane The model is based on predictions of membrane-spanning regions and ~helical content. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane. The predicted membrane~ spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75% of the aligned transporters (see below} are shown.
108
OUTSIDE G
m
G
F
-~, 110
K . lIB
....
ii ,:! ~G, i,i ~i i
G
G IK
i I ,.
r
.4"~ , a
""il'il i i
i[ ! ....
i
L
'~i i:!:i iiii'iif
..~
PI
CO(gH
k
INSIDE NH
Physical and genetic characteristics Citnklepn Citnlacla Citnsaldu Citnsalpu Citpleula Citpstrbo
AMINO ACIDS
MOL. WT
446 442 446 446 441 441
47 557 46629 47 591 47 621 46 533 47 221
Multiple amino acid sequence afignments 1 C i t n k l e p n .... M T N M S Q P P A T E K K G V S Citpstrbo ...MEKKLPA TAANETDWRN Citnlacla MMNHPHSSHI GTTNVKEEIG Consensus ..................... ::,......:.:::::::
L::-'::::
Citnklepn Citpstrbo Citnlacla Consensus
50 DLLGFKIFGM PLPLYAFALI TLLLSHFYNA KLTKTRIGSV TLPVYLVTAS IILVTALLEQ KLDRIRISGI GLIAYAFMAV LLIIAISTKT L .... I . . . . L . . Y . . . . . . . . . . . . . . .
51 i00 LPTDIVGGFA IMFIIGAIFG EIGKRLPIFN KYIGGAPVMI FLVAAYFVYA LPVNMLGGFA VILTMGWLLG TIGGNIPIL. KHFGGPAILS LLVPSIMVFF LPNTMIGAIF ALVLMGHVFY YLGAHLPIFR SYLGGGSVFT ILLTAILVAT LP .... G . . . . . . . . G . . . . . . G . . . P I . . . . . G G . . . . . . L ..... V..
i01 Citnklepn GIFTQKEIDA ISNVMDKSNF LNLFIAVLIT GAILSVNRRL Citpstrbo NLLNQNVLDS TDILMKQANF LYFYIACLVC GSILGMNRKI Citnlacla NVIPKYVVTT ASGFINGMDF LGLYIVSLIA SSLFKMDRKM C o n s e n s u s . . . . . . . . . . . . . . . . . . . F L . . . I . . L . . . . . . . . . R..
150 LLKSLLGYIP LVQGLMRMIV LLKAAVRFLP L .........
I0(
Citnklepn C itpstrbo Citnlacla Consensus
151 TILMGIVGAS IFGIAIGLVF PMALGMILAM GVGTLVGTLL VAFISMALTA VVIGIVGVII . . . . . . . . . . . . . . . . G...
GIPVDRIMML YVLPIMGGGN GLGWKHSLFY IVTPVLAGGI GVGFNYAILY IAMPIMAGGV G . . . . . . . . . . . . P...GG.
200 GAGAVPLSEI GEGILPLSLG GAGIVPLSGI G.G..PLS..
Citnklepn Citpstrbo C itnlacla Consensus
201 YHSVTGRSRE EYYSTAIAIL TIANIFAIVF AAVLDIIGKK YSAITGLPSE QLVGQLIPAT IIGNFFAIMC SGLLSRLGEK Y A H A M G V G S A G I L S K L F P T V I L G N L L A I I S A G L I S R I F .K Y .... G . . . . . . . . . . . . . . . . . N..AI . . . . . . . . . . . K
250 HTWLSGEGEL RPELSGQGQL DSKGNGHGEI ..... G.G..
Citnklepn Citpstrbo Citnlacla Consensus
251 300 VRKASFKVEE DEKTGQITHR ETAVGLVLST TCFLLAY..V VAKKILPSIG IKIT .... NS D D L S D A L E E D K A P I D V K L M G A G V L I A C T L F I T G G L L Q H L T LR ...... GE R E K S A A A E E I K P . . D Y V Q L G V G L I I A V M F F M I G T M L N K V F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A . . . . . . . . . L ....
Citnklepn Citpstrbo Citnlacla Consensus
301 350 GVAIHYFAWM VLIVAALNAS GLCSPEIKAG AKRLSDFFSK QLLWVLMVGV GFPGPVL..M IVVAAFLKYLNVVPKETQRG SKQLYKFISG NFTFPLMVGL P.GINAYAFI ILSIVLTKAF GLLPKYYEDS VIMFGQVIVK NMTHALLAGV ............................................. L..G.
Citnklepn Citpstrbo Citnlacla Consensus
351 400 GVCYTDLQEI INAITFANVVIAAIIVIGAV LGAAIGGWLM GFFPIESAIT GMLYIPLKDV VGMLSWQYFVVVISVVFTVI ATGFFVSRFM NMNPVEAAIV GLSLLDMHVL LAALSWQFVVLCLVSIVAIS LISATLGKLF GLYPVEAAIT G . . . . . . . . . . . . . . . . . . V . . . . . . . . . . . . . . . . . . . . . . . P.E.A..
Citnklepn Citpstrbo Citnlacla Consensus
401 AGLCMANRGG SA.CQSGMGG AGLANNSMGG ........ GG
Citnklepn Citpstrbo Citnlacla Consensus
451 MI LF MK ..
SGDLEVLSAC NRMNLISYAQ TGDVAILSTA NRMTLMPFAQ TGNVAVLAAS ERMNLIAFAQ .G .... L .... R M . L . . . A Q
450 ISSRLGGGIV LVIASIVFGM VATRLGGAIT VITMTAIFRM MGNRIGGALI LVVAGILVTF ...R.GG . . . . . . . . . . . . .
Proteins listed subsequently in itahcs are at least 90% identical to the payed transporters hsted in parenthesis and therefore are not included in the ahgnments: Citnsalpu, Citnsaldu (Citnklepn); Citpleula (Citnlacla). Residues hsted in the consensus sequence are present in at least 75% of the ahgned transporter sequences.
Database accession numbers SWISSPR OT Citnldepn Citnlacla Citnsaldu Citnsalpu Citpleula Citpstrbo
I1G
P31602 P21608 P31603 P31604
FIR
EMBL/GENBANK
A38244 A36136 B42661 A42661
M83146 M58694 D 10258 D 10257 U28212
U35658
Na+/alanine-glycine symporter family Summary Transporters of the Na+/alanme-glycine symporter family, the example of which is the ACP alanine transporter of the thermophilic bacterium PS-3 {Alcpthep3}, mediate symport (H§ and Na§ substrate uptake} of alanine and glycine. The two known members of the family occur only in gram-negative bacteria. Statistical analysis reveals no apparent relationship between the amino acid sequences of the Na+/alanme-glycine symporter family and any other family of transporters. They are predicted to contain eight membrane-spanning helices by the hydropathy of their amino acid sequences. Relatively few amino acid sequence motifs are conserved in the two proteins.
Nomenclature, biological sources and substrates CODE
Dagaaltha Alcpthep3
DESCRIPTION [SYNONYMS] Glycine-alanine-Na§ symporter [DAGA]
ORGANISM [COMMON NAMES] Aheromonas haloplanktis [gram-negative bacterium] Alanine-Na§ H§ symporter PS-3:unclassified [ACP] [thermophilic bacterium]
SUBSTRATE(S)
Na*/gylcine, alanine Na§ H+/alanine
Cotransported ions are listed.
Proposed orientation of ACP in the membrane The model is based on predictions of membrane-spanning regions and ~-helical content. The N-terminus of the protein is illustrated on the inside and is ~olded OUTSIDE
NH
2
INSIDE
COOH
|11
Na+/alanine-glycine symporter family
eight times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed.
Physical and genetic characteristics Alcpthep3 Dagaahha
AMINO ACIDS
MOL. W T
445 542
47 804 59 023
Multiple amino acid sequence alignments 1 glcpthep3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50 MIRL VTMGKSSEAG
Dagaaltha MLGGAVWFPY VLLGVGLFFT IYLKFPQIRY FKHACQVVSG KFDKKDTEGD Consensus ............................................ K..E.. 51 I00 Alcpthep3 VSSFQALTMS LSGRIGVGNV AGTATGIAYG GPGAVFWMWV ITFIGAATAY Dagaaltha TTHFQALATA LSGTVGTGNI GGVALAISIG GPAALFWMWM TAFFGMTTKF Consensus ...FQAL... LSG..G.GN..G.A..I..GGP.A.FWMW...F.G..T.. i01 150 Alcpthep3 VESTWRKFIK RNKTDNTVAV RRSTLKKALA GNGLRCSRAA IILSMAVLMP Dagaaltha VEVTLSHKYR EKTEDGTM .... SGGPMYYM DKRLNMKWLA ILFAVATVIS C o n s e n s u s V E . T . . . . . . . . . . D . T . . . . . S . . . . . . . . . . L ..... A I .... A .... 151 200 A l c p t h e p 3 GI ...... QA N S I A D S F S N A F G I P K L V T G I F V I A V L G F T I F G G V K R I A K T Dagaaltha SFGTGSLPQI NNIAQGMEAT FGFAPMATGA VLSILLALVI LGGIKRIAAI C o n s e n s u s ........ Q. N . I A ...... FG ..... TG . . . . . . L . . . I . G G . K R I A . . 201 250 Alcpthep3 AEIVVPFMAV GYLFVAIAII AANIEKVPDV FGLIFKSAFG ADQVFGGILG Dagaaltha TSRVVPLMAA IYIIGALAVI FYNAENIGPS FSAVFMDAFS GSAAAGGFLG C o n s e n s u s . . . V V P . M A . . . . . . A . A . I ..N.E ..... F . . . F . . A F . . . . . . G G . L G 251 A l c p t h e p 3 S .... A V M W G V K R G L Y A N E A Dagaaltha ASFAYAFNRG VNRGLFSNEA C o n s e n s u s ..... A . . . G V . R G L . . N E A
300 GQGTGAHPAA AAEVSHPAKQ GLVQAFSIYL GQGSAPIAHA SAKADEPVSE GIVSILEPFI G Q G ...... A .A .... P... G . V .......
301 350 Alcpthep3 DVFLVVTATA LMIL .................................... Dagaaltha DTIIICTLTG LVILSSGVWN EKFQTHFERS AMSIIKGDYT EENQTQREDL C o n s e n s u s D ..... T . T . L . IL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 400 Alcpthep3 ............. FTGQYNV INEKTGET ...................... Dagaaltha YKYLNGQKSN IETFTGNIEV VNGEALSTGF TVLHSRSIAE DVRFGITEKH C o n s e n s u s . . . . . . . . . . . . . F T G . . . V .N ..... T . . . . . . . . . . . . . . . . . . . . . .
112
401
450
Alcpthep3 Dagaaltha Consensus
.... I V E H L K G V E P G A G Y . . . . . . . . . . . . . . TQAAVDTL FPGFGSAFIA KYTGVVEVID GMPTDDSISL VGKSLVHSAE LTTKAFKRGY FGDSGQYIVS ..... V E . . . G . . . . . . . . . . . . . . . . . . . . . T . A ..... F . . . G .....
Alcpthep3 Dagaaltha Consensus
451 IALFFFAFTT IGLLLFAFST I.L..FAF.T
Alcpthep3 Dagaaltha Consensus
501 VKTATTAWAM GDIGLGIMVW LNLIAILLLF FADTTLVWKL AAVAIVVMTL PNLIGIMLLR .... T . . W . . . . . . . . . M . . . N L I . I . L L .
Alcpthep3 Dagaaltha Consensus
551 580 EFNASKYGIK NAKFWENGYK RWEEKKGKAL .............................. ..............................
500 MYAYYYIAET NLAYLVRSEK RGTAFFALKL VFLAATFYGT AIAWSYYGDR AMIYLLGHR .... SVMPYRV FYVAAFFWAS ..A..Y ....... YL .................. AA.F... 550 KPAYMALKDY EEQLKQGKDP KEMKESVDDY WVKFKKDNEK K . . . . . . . D Y .... K .....
Residues listed in the consensus sequence are present in both of the transporter sequences.
Database accession numbers Alcpthep3 Dagaaltha
SWISSPR OT
FIR
EMBL/GENBANK
P30145 P30144
$27733; A45111 $25276
D12512 M59081
113
Na+/neurotransmitter symporter family
:i!i~iiiiiii,!!}:!i-~ ~!~ii~iiiii!i~i;:}!!. ii! G;~,i:ii: .:; ::-i :.
~:.~::~::G" ::. ; : .~:"~:':.~:i:. ::: ~:::::~.:,...:- .~: .:~.
~::.:::::: ~::~:::~},, :, ::..
Summary Transporters of the Na+/neurotransmitter symporter family, the example of which is the NET1 noradrenalm-Na + symporter of humans (Ntnohomsa), mediate symport (Na+-coupled substrate uptake) of several structurally dissimilar neurotransmitters, including norepinephrine, 4-aminobutyrate (GABA), serotonin, creatine and dopamine 1-4. Members of the Na+/neuro transmitter symporter family also serve as uniporters, ion and water channels 4. For example, neurotransmitter cotransporters such as NET1 and GAT1 contain ligand-gated ion channels that mediate C1- conductance in parallel with Na+-coupled substrate uptake, while in the absence of substrates they transport ions in an uncoupled "leak" mode. Members of the family have a broad biological distribution that includes both invertebrates and vertebrates. Statistical analysis reveals no significant similarity between the amino acid sequences of the NET1 family and any other family of transporters. However, the similarity between the kinetics of several families of Na§ and H § driven transporters suggests a common mechanism of action, despite the lack of amino acid sequence homology 4. Members of the Na+/neurotrans mitter symporter family are predicted to contain 12 membrane-spanning helices by the hydropathy of their amino acid sequences and reactions with peptide specific antibodies s. Members of the Na+/neurotransmitter symporter family are glycosylated. Several amino acid sequence motifs are highly conserved in the Na+/neurotransmitter symporter family, including motifs necessary for function by the criterion of site-directed mutagenesis 2.
Nomenclature, biological sources and substrates CODE
Gatltorca Ntbecanfa
ii~i~iiii:~i!ii:i:i~i:!
Ntchratno
!i,{i~!ii~ii}!}~ii~::!l
.;i!i!i!!i:~i!!:::.!,i:.!i;.l ................ ..- .:a
Ntcrhomsa
..........
.
.
.
.
.
:~ ~:::::.i~::!:::ii;il}i-~ :~:.:::: I
,:, .. :. {.:{~::~:.:-.{) ,:.-! ::..,.::::,:,::-,-:
;~...:.:...~.:.
Ntcrorycu
::--::a
-.:.~...~.:<
Ntcrtorma
.,.-.-~,.:>~-::~ 9 ..,:.~i
Ntdobosta Ntdohomsa
114
DESCRIPTION [SYNONYMS]
ORGANISM [COMMON NAMESI Na+/C1--dependent GABA Torpedocali[ornica
transporter 1 [TGAT,GAT1] Na+/C1--dependent betaine transporter [NTBE] Choline-Na§ symporter [CHOT1] Na+/C1--dependent creatine transporter [SLC6a8, NTCR] Na+/C1--dependent creatine transporter [SLC6a8, NTCR] Na§ creatine transporter [NTCR] Dopamine-Na§ symporter IDA, DAT, NTDOI Dopamine-Na§ symporter [DAT, DAT1, SLC6a3, NTDO, DA transporter]
[ray] Canis familiaris
SUBSTRATE(S)
Na§ Na+/betaine
[dog] Rattus norvegicus
[rat] Homo sapiens
Na+/choline Na+/creatine
[human] Oryctolagus cuniculus
Na+/creatine
[rabbit] Torpedo marmorata
Na+/creatine
[ray] Bos taurus
[cow] Homo sapiens
[human]
Na+/dopamine Na+/dopamine
CODE
DESCRIPTION [SYNONYMS]
Ntdoratno
Dopamine-Na § symporter [DAT, DAT1, SLC6a3, NTDO, DA transporter] Na*/Cl--dependent GABA transporter 1 [NTG1, GABT1, GAT1, SLC6al] Na§ GABA transporter 1 [SLC6al, GABT1, GAT1, NTG 1] Na+/Cl--dependent GABA transporter 1 [SLC6al, GABT1, GAT1, NTG 1] Na§ GABA transporter 2 [GABT2, GAT2, NTG2] Na*/C1--dependent GABA transporter 2 [GABT2, GAT2, NTG2] Na§ GABA transporter 3 [GATB, GABT3, GAT3, NTG3] Na+/C1--dependent GABA transporter 3 [NTG3, GABT3, GAT3I Na§ GABA transporter 4 [GABT4, GAT4, NTCAI Na*/C1--dependent GABA transporter 1 [NTG] Na§ GABA, fl-alanine transporter 1 [NTGTI Norepinephrine-Na § syrnporter [NAT1, NET1, SLC6a2, NTNO] Norepinephrine-Na § symporter [NAT1, NET1, SLC6a2, NTNO] Proline-Na § symporter [SLC6a7, NTPRI Serotonin-Na § symporter ISHTT, NTS 1] Serotonin-Na§ syrnporter
Ntglhomsa Ntglmusmu Ntglratno Ntg2musmu Ntg2ratno Ntg3musmu Ntg3ratno Ntg4musmu Ntgmanse Ntgtorma Ntnobosta Ntnohomsa Ntprratno Ntslratno Nts2ratno Ntsedrome Ntsehomsa Ntt4ratno Ntt7ratno Nttacanfa Nttahomsa
ORGANISM [COMMON NAMES] Rattus norvegicus
SUBSTRATE(S)
Na§
[rat] Homo sapiens
Na§
[human] Mus musculus
Na§
[mouse] Rattus norvegicus
Na§
[ratl Mus musculus
[mouse] Rattus norvegicus
[rat] Mus musculus
[mouse] Rattus norvegicus
Na§ fl-alanme, taurine Na+/GABA, fl-alanine, taurme Na+/GABA, taurine, fl-alanine Na§
[rat] Mus musculus
Na+/GABA
[mouse] Manduca sexta
Na+/GABA
[tobacco homed worm] [ray]
Na§ /~-alanme
Bos taurus
Na §
Torpedo marmorata
[cowl Homo sapiens [human]
Na§
Rattus norvegicus
Na§
[rat] Rattus norvegicus
[rat] Rattus norvegicus
[SliT, ~rrs2l
[rat]
Serotonin-Na§ symporter [NTSI Serotonin-Na § symporter [HTT, SLC6a4, NTSE] Na§ transporter [NTT4] Na§ transporter [NTT7] Na§ taurine transporter [NTI~A] Na+/Cl--dependent taufine transporter [SLC6a6, NTFA]
Drosophila melanogaster
[fruit flYl Homo sapiens
[human] Rattus norvegicus
[rat] Rattus norvegicus
[rat] Cams [amiliaris
[dog] Homo sapiens
[human]
Na§ Na§ Na§ Na§ Unknown Unknown Na§ Na§
CODE
Nttamusco ,. . =. . . :. ,. :. . :. . , . . : . : . . : . .
i~i!il}:',~;~i~ii{!i~ ::::::::::::::::::::::::::::::::::
I:Y!f!*!2Y.!.!:.:!'.:? ::::::::::::::::::::::::::::::~::~,
::::::::::::::::::::::::::::::::::::::::
Nttaratno Rosiratno
DESCRIPTION
ORGANISM
SUBSTRATE(S)
[SYNONYMS]
[COMMONNAMES]
Na+/C1--dependent taurine transporter [SLC6a6, NTTA] Na§ taurine transporter [SLC6a6, NTTA] Renalosmotic stress-induced Na+/C1--dependent organic acid cotransporter [ROS1T]
Mus cookii [mouse]
Na+/taurine
Rattus norvegicus [rat]
Na+/taurine
Rattus norvegicus [rat]
Na+/organic acids
Cotransported ions are listed. Abbreviations: GABA; 4-aminobutyric acid.
Phylogenetic tree Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the phylogenetic tree: Ntcrorycu, Ntcrhomsa (Ntchratno); Ntglratno, Ntglmusmu (Ntglhomsa); Ntg2ratno (Ntg2musmu); Ntg3ratno (Ntg3musmu); Ntnobosta (Ntnohomsa); Ntslratno, Nts2ratno (Ntsehomsa); Nttacanfa, Nttamusco, Nttaratno (Nttahomsa); Ntdoratno (Ntdohomsa).
|
--<_
7::::
::::7:::
: ':
...................
............................... ==:::. .......
................... ....................................
.................. ...........
...................
.=.:...
11(
Ntt4ratno Ntt7ratno Rosiratno Ntdobosta Ntdohomsa Ntnohomsa Ntsedrome Ntsehomsa Gatltorca Ntglhomsa Ntgmanse Ntchratno Ntcrtorma Ntbecanfa Ntg4musmu Ntg3musmu Ntgtorma Ntg2musmu Nt[ahomsa Ntprratno
Proposed orientation of NET1 in the membrane The model is based on predictions of membrane-spanning regions and .-helical content. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane s. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75% of the aligned transporters (see below} are shown. Consensus residues indicated by an asterisk are not conserved in NET1.
OUTSIDE
v:z::-:-:.-:-:-~:::-:-:-:: .~.?:Y:Y. . . . . . . . . . .Z. .?. .Z.
W CWNE ..-x::.-:.-.-x-:x: ................. ~:-~.:~:~:~:;;~-~
~ ~F--1 ~
Y
.................
~.~/
Y C
,~:~....~-~:~
F
Iv
I~:
l~
..................
VA GPGL
Y
YLD
P W
inllm, [~)~,:~
...............
++.e,<.+:++...:.+:+:+:.++.,,+
~P+r++ +II++~, [+.i. ++ +,,, .,~+,, +++,+++++
................ :::::::::::::::::::::
+++ ~
L
+
+>,., ++i:+
m
+++ II +,++++,+,
:+~++:++::r++++:+:
:++ a +, .................
!l
................
+ ,ii!ii,li
~~ , . ~
+++m
:::::::::::::::::::::::::::::::: ~-:-.,:..,~:-~-~-.~.~:-,++..,:~;~;
-+1 ...... V
NNCY
F*
MG
GW
ET
..................
|
P
INSIDE
NH
C O O H
Physical and genetic characteristics EXPRESSION Km SITES
Gatltorca Ntbecanfa Ntchratno Ntcrhomsa
AMINO ACIDS 598 614 635 635
MOL. WT 67219 69291 70631 70676
Ntcrorycu
635
70 483
Ntcrtorma Ntdobosta Ntdohomsa Ntdoratno Ntglhomsa Ntglmusmu Ntglratno Ntg2musmu
611 693 620 619 599 598 599 602
68 098 75691 68494 68 746 67014 66 841 67 001 68 284
Ntg2ratno Ntg3musmu
602 627
68 262 69888
brain, retina brain
Ntg3ratno Ntg4musmu
627 614
69946 69613
brain, retina brain, liver, kidney
kidney CNS, heart kidney, heart, muscle, brain kidney, heart, muscle, brain bram bram brain bram bram bram brain, liver, kidney
CHROMOSOMAL LOCUS
Xq28
5p15.3 3p24-p25 GABA: 18 nM l~-Alanme: 28 nm Taurine: 540 nM 2 GABA: 8/aM e GABA: 0.8 nM lT-Alanine: 99 nM Taurine: 1.4/aM 2 GABA: 12/aM 6
),17
MOL. WT 67 720 70 248 67 350 69 332 73 684 70 171 72517 69325
EXPRESSION Km SITES
CHROMOSOMAL LOCUS
Ntgrnanse Ntgtorma Ntnobosta Ntnohomsa Ntprratno Ntslratno Nts2ratno Ntsedrome
AMINO ACIDS 597 622 602 617 661 630 653 622
CNS CNS CNS CNS
16q12.2
Ntsehomsa Ntt4ratno Ntt7ratno Nttacanfa
630 727 729 620
70 324 81 055 81 596 69 728
CNS CNS
Nttahomsa Nttamusco Nttaratno Rosiratno
620 590 621 615
69 829 65 868 69 868 69 556
Serotonin: 50OHM 7 17qll.l-q12
kidney, brain, liver, heart, ileum 3p25-q24 brain, kidney brain kidney
Abbreviations: GABA: 4-ammobutyric acid.
Multiple amino acid sequence afignments 1 Ntt4ratno Ntt7ratno Rosiratno Ntdobosta Ntdohomsa Ntnohomsa Ntsedrome Ntsehomsa Gatltorca Ntglhomsa Ntgmanse Ntchratno Ntcrtorma Ntbecanfa Ntg4musmu Ntg3musmu Ntgtorma Ntg2musmu Nttahomsa Ntprratno Consensus
51 N t t 4 r a t n o PVD.. Y K Q S V L N V A G E T G G K Ntt7ratno SVEDVSKKSE LIVDVQEEKD Rosiratno ................ MAQA Ntdobosta AVELVLVKEQ NGVQLTNSTL
118
50
..................... MPKNSKVTQ REHSNEHVTE SVADLLALEE ..................... MPKNSKVVK RDL.DDDVIE SVKDLLSNED .................................................. ....................... MSEGRCS VAHMSSVVAP AKEANAMGPK ....................... MSKSKCS VGLMSSVVAP AKEPNAVGPK ..................... MLLARMNPQ VQPENNGADT GPEQPLRARK ......... M D R S G S S D F A G A A A T T G R S N P A P W S D D K E S P N N E D D S N E D D METTPLNSQK QLSACEDGED CQENGVLQKV VPTPGDK..V ESGQISNGYS ....................................... M ATNGAKTPDG ....................................... M ATNGSKVADG ...................................... ME T K N D S R S D D I .............................. MAKKSAENGI YSVSGDEKKG ......................................... MPSRAVRRC ............................................... MDR ............................................... MDR ...................................... MT A E Q A L P L G N G ...................................... MR A E K A I P I I N G .................................................. .......................................... MATKEKLQ ............................................... MKKL ...................................................
QKVAEEELDA TDAEDGSEVD SGMDPLVDIE LNPPQSPTEA
EDRPAWNSKL DERPAWNSKL DERPKWDNKL QDRETWSKKA
i00 QYILAQIGFS QYILAQVGFS QYLLSCIGFA DFLLSVIGFA
Ntdohomsa Ntnohomsa Ntsedrome Ntsehomsa Gatltorca Ntglhomsa Ntgmanse Ntchratno Ntcrtorma Ntbecanfa Ntg4musmu Ntg3musmu Ntgtorma Ntg2musmu Nttahomsa Ntprratno Consensus
EVELILVKEQ NGVQLTSSTL TNPRQSPVEA QDRETWGKKI DFLLSVIGFA TAELLVVKER NGVQ .... CL LAPRDG..DA QPRETWGKKI DFLLSVVGFA GDHTTPAKVT DPLAPKLANN ERILVVSVTE RTRETWGQKA EFLLAVIGFA AVPSPGA..G DDTRHSIPAT TTTLVAELHQ GERETWGKKV DFLLSVIGYA QISTELHDAP VSNDKPKTLV VKVQ.KTRKI PEREKWGGRY DFLLSCVGYA QISTEVSEAP VANDKPKTLV VKVQKKAADL PDRDTWKGRF DFLMSCVGYA ELS ..... AQ GSGNKPSDVA VK ..... SNL PERGSWASKL DFILSVIGLA PLIVSGPDGA PSKGDGPA.G LGAPSSRLAV PPRETWTRQM DFIMSCVGFA PGHLCKEMRA PRRAQPPDVP AGEPGSRV ...... TWSRQM DFIMSCVGFA KVAVPEDGPPVVSWLPEEGE KLDQEGEDQV KDRGQWTNKM EFVLSVAGEI KVAVHEDGYPVVSWVPEEGE MMDQKGKDQV KDRGQWTNKM EFVLSVAGEI KAAEEARGSE TLGGGGGGAAGTREARDKAV HERGHWNNKV EFVLSVAGEI KPE ...... D TMDIEASNVNLVR.TNDKRM SERGHWNNKI EFVLSVAGEI .MENRASGTT SNGETKPVCPAMEKVEEDGT LEREHWNNKM EFVLSVAGEI CLKDFHKDMV KPSPGKSPGT RPEDEAEGKP PQREKWSSKI DFVLSVAGGF QEAHLRKPVT PDLLMTPSDQ GDVDLDVDFA ADRGNWTGKL DFLLSCIGYC ................................ R..W..K..F.LS..G..
Ntt4ratno Ntt7ratno Rosiratno Ntdobosta Ntdohomsa Ntnohomsa Ntsedrome Ntsehomsa Gatltorca Ntglhomsa Ntgmanse Ntchratno Ntcrtorma Ntbecanfa Ntg4musmu Ntg3musmu Ntgtorma Ntg2musmu Nttahomsa Ntprratno Consensus
i01 VGLGNIWRFP VGLGNVWRFP VGLGNIWRFP VDLANVWRFP VDLANVWRFP VDLANVWRFP VDLGNVWRFP VDLGNVWRFP IGLGNVWRFP IGLGNVWRFP IGLGNVWRFP VGLGNVWRFP VGLGNVWRFP IGLGNVWRFP IGLGNVWRFP IGLGNVWRFP IGLGNVWRFP IGLGNVWRFP VGLGNVWRFP VGLGNVWRFP .GLGNVWRFP
YLVPYLVLLI IIGIPLFFLE YLLPYLILLL VIGIPLFFLE FLIPYFIALV FEGIPLFYIE FLVPYLFFMV VAGVPLFYME FLVPYLLFMV IAGMPLFYME FLIPYTLFLI IAGMPLFYME FLVPYCLFLI FGGLPLFYME FLLPYTIMAI FGGIPLFYME FLIPYFMTLI FAGMPIFLLE FLIPYFLTLI FAGVPLFLLE FLIPYFLTLF LAGIPMFFME FLIPYVLIALVGGIPIFFLE FLIPYLLVAV FGGIPIFFLE FFIPYFIFFF TCGIPVFFLE FFIPYFIFFF SCGIPVFFLE FLIPYVVFFI CCGIPVFFLE FLIPYVIFFI GCGIPVFFLE FFIPYLIFLF TCGIPVFFLE FLIPYFIFLF GSGLPVFFLE FLVPYFLMLA ICGIPLFFLE FL.PY ....... G.P.F..E
150 LAVGQRIRRG LSVGQRIRRG LAIGQRLRRG LALGQFNREG LALGQFNREG LALGQYNREG LALGQFHRCG LALGQYHRNG CSLGQYTSVG CSLGQYTSIG LAMGQMLTIG ISLGQFMKAG ISLGQFMKAG VALGQYTSQG VALGQYSSQG TALGQFTSEG TALGQYTSEG TALGQYTNQG IIIGQYTSEG LSLGQFSSLG ..LGQ .... G
151 Ntt4ratno SIGVWHYVCP RLGGIGFSSC IVCLFVGLYY NVIIGWSVFY Ntt7ratno SIGVWNYISP KLGGIGFASCVVCYFVALYY NVIIGWTLFY Rosiratno SIGVWKTISP YLGGVGLGCF SVSFLVSLYY NTILLWVLWF NtdobostaAAGVW.KICP ILRGVGYTAI LISLYIGFFY NVIIAWALHY NtdohomsaAAGVW.KICP ILKGVGFTVI LISLYVGFFY NVIIAWALHY NtnohomsaAATVW.KICP FFKGVGYAVI LIALYVGFYY NVIIAWSLYY Ntsedrome CLSIWKRICPALKGVGYAIC LIDIYMGMYY NTIIGWAVYY Ntsehomsa CISIWRKICP IFKGIGYAIC IIAFYIASYY NTIMAWALYY Gatltorca GLGIW.RLAP MFKGVGLAAAVLSFWLNIYYVVIIAWAIYY Ntglhomsa GLGVW.KLAP MFKGVGLAAAVLSFWLNIYY IVIISWAIYY Ntgmanse GLGVF.KIAP IFKGIGYAAAVMSCWMNVYY IVILAWAIFY Ntchratno SINVW.NICP LFKGLGYASM VIVFYCNTYY IMVLAWGFYY
200 FFKSFQYPLP FSQSFQQPLP FLNSFQHPLP LLSSFTTELP LFSSFTTELP LFSSFTLNLP LFASFTSKLP LISSFTDQLP LYNSFTSELP LYNSFTTTLP FFMSMRSDVP LVKSFTTTLP
YLCQKNGGGA YLCQKNGGGA YLCHTHGGGA YLCYKNGGGA YLCYKNGGGA YLCYKNGGGA YICYQNGGGA YICYQNGGGA YLCGKNGGGA YLCGKNGGGA YLCYKNGGGA YLCYKNGGGV YLCYKNGGGV YLCYKNGGGA YLCYKNGGGA YLCYKNGGGA YLCYKNGGGA YLCYKNGGGA YLCYKNGGGA YRAYTNGGGA YLCYKNGGGA
llC~
ili:!i~ii~i
i?!i:.~........... !'.i ~i
)iiiiiii!ili:ii
iiiiii:!i:i
iiiiiiil}?:i !iiiii':{!!i: ,i! i!,:,
!!!il;:i!~!~ii:i!!!!!;/i .......
Ntcrtorma Ntbecanfa Ntg4musmu Ntg3musmu Ntgtorma Ntg2musmu Nttahomsa Ntprratno Consensus
GINAW.NIAP SVTAWRKICP SVTAWRKICP GITCWRRVCP GITCWRKICP GITAWRRICP GITCWEKICP PLAVW.KISP .... W . . I . P
Ntt4ratno Ntt7ratno Rosiratno Ntdobosta Ntdohomsa Ntnohomsa Ntsedrome Ntsehomsa Gatltorca Ntglhomsa Ntgmanse Ntchratno Ntcrtorma Ntbecanfa Ntg4musmu Ntg3musmu Ntgtorma Ntg2musmu Nttahomsa Ntprratno Consensus
201 250 WSECPVIRNG TVAV .......................... VEPECEKSSA W D Q C P L V K N A SHTY . . . . . . . . . . . . . . . . . . . . . . . . . . I E P E C E K S S A WSTCPLDLN..RTG .......................... FVQECQSSGT W T H C N H S W N S P R C S D . . . . . . . . . . . . . AR A P N A . . . S S G P N . G T S R T T P W I H C N N S W N S P N C S D . . . . . . . . . . . . . AH P G D S S G D S S G L N . D T F G T T P W T D C G H T W N S P N C T D . . . . . . . . . . . . . PK L L N G S V L G N H T K Y S K Y K F T P W T S C D N P W N T E N C M Q . . . . . . . . . . . . . VT SEN . . . . . . . . . F T E L A T S P W T S C K N S W N T G N C T N . . . . . . . . . . . . . YF SED ...... N I T W T L H S T S P W Q S C G N S W N T DRC . . . . . . . . . . . . . . . . . F S N Y S M T N S T N L S S P I V . . . W K Q C D N P W N T DRC . . . . . . . . . . . . . . . . . F S N Y S M V N T T NMTSAVV. __ WRNCDNYWNT ATCVNPYDRK NLTCWSSLGD MSTFCTLNGR NVSKAVLSDP W A T C G H T W N T P D C V E I F R H E D .... CANAS L A N L T C D Q L A DRRSPVI... W A S C N N T W N T A A C . . ._. Y E A .... GANAS T E . . I Y P P T A PAQSSIV... WTTCTNTWNT EHCMD F . . . . LN.H S G A R T A T S S E N F T S P V M WTTCTNSWNT EHCVD F .LN.H S S A R G V S S S E N F T S P V M W A T C G H E W N T E K C V E . F . . . . . . . . . QKLN F S N Y S H V S L Q N A T S P V M . . . W A T C G H Y W N T E N C L E . F . . . . . . . . . QKLN S T N C N H T A V P N A T S P V I . . . W G S C S H E W N T E N C V E . F . . . . . . . . . QKAN ..DSMNVTSE N A T S P V I . . . W A H C N H S W N T P H C M E D T . . . . . . . . . MRKN K S V W I T I S S T N F T S P V I . . . W E H C G N W W N T ERCLE . . . . . . . . . . . . HRG P K D G N G A L P L N L S S T V . . S P W..C...WN .........................................
251 Ntt4ratno TTYFWYREAL Ntt7ratno TTYYWYREAL ..... Rosiratno VSYFWYRQTL NtdobostaAAEYFERGVL !i!i!i!':i,i;,!i:,ili!:~!~:i;:;~N! t d o h o m s a A A E Y F E R G V L NtnohomsaAAEFYERGVL i~i:iifi~,;~fii,i~;! if~,~,ii!i~,:,i~:i:i~: N t s e d r o m e A K E F F E R K V L Ntsehomsa AEEFYTRHVL G a t l t o r c a ..EFWERNMH N t g l h o m s a ..EFWERNMH iii:ii?!iii!i:i ' Ntgmanse VKEFWERRAL N t c h r a t n o ..EFWENKVL N t c r t o r m a ..QFWERRVL N t b e c a n f a ..EFWERRVL N t g 4 m u s m u ..EFWERRVL !{iiii~'~iiii:i:.:i;; N t g 3 m u s m u ..EFWERRVL Ntgtorma ..EFWERRVL N t g 2 m u s m u ..EFWERRVL N t t a h o m s a ..EFWERNVL Ntprratno SEEYWSRYVL C o n s e n s u s ..EFWER..L
i ~!ii!!i::~/
................
iiii:ii:.ii
12C
LFKGLGYASM VIVFFCNTYY LLQGIGLASV VIESYLNIYY LLQGIGMASV VIESYLNIYY LFEGIGYATQ VIEAHLNVYY LFEGIGYATQ VIEAHLNMFY IFEGIGYASQ MIVSLLNVYY LFSGIGYASV VIVSLLNVYY LFKGAGAAML LIVGLVAIYY ...G.G.A . . . . . . . . . . YY
DI..SNSISE SGGLNWKMTV AI..SSSISE SGGLNWKMTG NI..TSDISN TGTIQWKLFL HLHESQGIDD LGPPRWQLTS HLHQSHGIDD LGPPRWQLTA HLHESSGIHD IGLPQWQLLL ESYKGNGLDF MGPVKPTLAL QIHRSKGLQD LGGISWQLAL QL..TDGLDQ PGQIRAPLAI QM..TDGLDK PGQIRWPLAI QI..SSGIEH IGNIRWELAG RL..STGLEV PGALNWEVTL RL..SSGLGD VGEIGWELTL GI..TSGIHD LGALRWELAL GI..TSGIHD LGSLRWELAL AI..SDGIEH IGNLRWELAL GL..SRGIEH IGRVRWELAL KL..SDGIQH LGSLRWELVL SL..SPGIDH PGSLKWDLAL HIQGSQGIGR PGEIRWNLCL ...... G .... G...W.L..
ILVLTWSSFY LVQSFSSPLP IIILAWALFY LFSSFTSELP IIILAWALFY LFSSFTWELP IIILAWAIFY LSNCFTTELP IIVLAWAIFY LFNCFTSELP IVVLAWALFY LFSSFTTDLP IVILAWATYY LFQSFQKELP NMIIAYVLFY LFASLTSNLP ..I..W...YL..SF...LP
300 CLLVAWSIVG MAVVKGIQSS CLLAAWVMVC LAMIKGIQSS CLVACWTTVY LCVIRGIEST CLVLVIVLLY FSLWKGVKTS CLVLVIVLLY FSLWKGVKTS CLMVVVIVLY FSLWKGVKTS CVFGVFVLVY FSLWKGVRSA CIMLIFTVIY FSIWKGVKTS TLAIAWVLVY FCIWKGVSWT TLAIAWILVY FCIWKGVGWT TLLLVWVLCY FCIWKGVRWT CLLACWVLVY FCVWKGVKST CLTATWMLVY FCIWKGVKTS CLLLAWLICY FCIWKGVKTT CLLLAWIICY FCIWKGVKST CLLAGWTICY FCIWKGTKST CLLAAWIICY FCIWKGPKST CLLLAWIICY FCIWKGVKST CLLLVWLVCF FCICKGVRST CLLLAWVIVF LCILKGVKSS CL...W...YF..WKGV...
30i
Ntt4ratno GKVMYFSSLF Ntt7ratno GKIMYFSSLF Rosiratno GKVIYFTALF Ntdobosta GKVVWITATM Ntdohomsa GKVVWITATM Ntnohomsa GKVVWITATL Ntsedrome GKVVWVTALA Ntsehomsa GKVVWVTATF Gatltorca GKVVYFSAIY Ntglhomsa GKVVYFSATY Ntgmanse GKVVYFTALF Ntchratno GKIVYFTATF Ntcrtorma GKVVYVTATF Ntbecanfa GKVVYFTATF Ntg4musmu GKVVYFTATF Ntg3musmu GKVVYVTATF Ntgtorma GKVVYVTATF Ntg2musmu GKVVYFTATF Nttahomsa GKVVYFTATF Ntprratno GKVVYFTATF ConsensusGKVVY.TA..
350
RGLLLRGAVD RSLLLNGSID RGLTLPGATE RGITLPGAVD RGVTLPGAID HGVTLPGASN RGVSLPGADE RGATLPGAWR RGVTLPGARE RGVTLPGAKE RGITLPGAME RGVLLPGALD RGVTLHGAVQ RGITLPGAYQ RGVTLPGAYQ RGVTLPGASE RGVTLPGAAE RGVTLPGAAQ RGLTLPGAGR RGVTLPGAWK RG.TLPGA..
GILHMFTPKL DKMLDPQVWR GIRHMFTPKL EMMLEPKVWR GLTYLFTPNM KILQNSRVWL AIRAYLSVDF HRLCEASVWI GIRAYLSVDF YRLCEASVWI GINAYLHIDF YRLKEATVWI GIKYYLTPEW HKLKNSKVWI GVLFYLKPNW QKLLETGVWI GILFYITPDF RRLSDSEVWL GILFYITPNF RKLSDSEVWL GIKFYVMPNM SKLLESEVWI GIIYYLKPDW SKLGSPQVWI GIVYYLQPDW GKLGEAQVWI GVIYYLKPDL LRLKDPQVWM GIVFYLKPDL LRLKDPQVWM GIKFYLYPDL SRLSDPQVWV GIKFYLYPDV SRLSDPQVWL GIQFYLYPNI TRLWDPQVWM GIKFYLYPDI TRLEDPQVWI GIQFYLTPQF HHLLSSKVWI GI..Y..P .... L .... VW.
351 Ntt4ratno EAATQVFFAL GLGFGGVIAF SSYNKQDNNC Ntt7ratno EAATQVFFAL GLGFGGVIAF SSYNKRDNNC Rosiratno DAATQIFFSL SLAFGGHIAF ASYNQPRNNC Ntdobosta DAAIQICFSL GVGLGVLIAF SSYNKFTNNC Ntdohomsa DAATQVCFSL GVGFGVLIAF SSYNKFTNNC Ntnohomsa DAATQIFFSL GAGFGVLIAF ASYNKFDNNC Ntsedrome DAASQIFFSL GPGFGTLLAL SSYNKFNNNC Ntsehomsa DAAAQIFFSL GPGFGVLLAF ASYNKFNNNC Gatltorca DAATQIFFSY GLGLGSLVAL GSYNKFHNNV Ntglhomsa DAATQIFFSY GLGLGSLIAL GSYNSFHNNV Ntgmanse DAVTQIFFSY GLGLGTLVAL GSYNKFTNNV Ntchratno DAGTQIFFSY AIGLGALTAL GSYNRFNNNC Ntcrtorma DAGTQIFFSY AIGLGTLTAL GSYNQLHNDC Ntbecanfa DAGTQIFFSF AICQGCLTAL GSYNKYHNNC Ntg4musmu DAGTQIFFSF AICQGCLTAL GSYNKYHNNC Ntg3musmu DAGTQIFFSY AICLGCLTAL GSYNNYNNNC Ntgtorma DAGTQIFFSY AICLGCLTAL GSYNPYHNNC Ntg2musmu DAGTQIFFSF AICLGCLTAL GSYNKYHNNC Nttahomsa DAGTQIFFSY AICLGAMTSL GSYNKYKYNS Ntprratno EAALQIFYSL GVGFGGLLTF ASYNTFHQNI C o n s e n s u s D A . T Q I F F S ..... G . L . A . . S Y N . . . N N C
400 HFDAALVSFI NFFTSVLATL HFDAVLVSFI NFFTSVLATL EKDAVTIALV NSMTSLYASI YRDAIITTSV NSLTSFSSGF YRDAIVTTSI NSLTSFSSGF YRDALLTSSI NCITSFVSGF YRDALITSSI NCLTSFLAGF YQDALVTSVVNCMTSFVSGF YRDSIIVCCI NSTTSMFAGF YRDSIIVCCI NSCTSMFAGF YKDALIVCSV NSSTSMFAGF YKDAIILALI NSGTSFFAGF YKDAFILSLV NSATSFFAGL YRDSIALCFL NSATSFAAGF YRDSIALCFL NSATSFVAGF YRDCIMLCCL NSGTSFVAGF YRDCIMLCCL NSGTSFVAGF YRDCIALCIL NSSTSFMAGF YRDCMLLGCL NSGTSFVSGF YRDTFIVTLG NAITSILAGF Y.D ....... N..TS..AGF
401 Ntt4ratnoVVFAVLGFKA Ntt7ratnoVVFAVLGFKA Rosiratno TIFSIMGFKA NtdobostaVVFSFLGYMA NtdohomsaVVFSFLGYMA Ntnohomsa AIFSILGYMA Ntsedrome VIFSVLGYMA
PYVVLACFLV PYVVLICFLI PYLVLTIFLI PYVVLFALLL PYVVLTALLL PYFVLFVLLV PYVVLIILLV PYIILSVLLV PYIMLLTLFF PYIMLIILFF PYFLLTVLLI PYVVLVVLLV PYIILVILLV PYLMLVILLI PYLMLIILLI PYIMLLILLI PYLMLLVLLI PYLMLVVLLI PFAMLLVLLV PYLILLMLLV PY..L..LL.
450 NIMNEKCVVE NAEKILGYLN SNVLSRDLIP PHVNFSHLTT NIVNEKCISQ NSEMILKLLK TGNVSWDVIP RHINLSAVTA SNDYGRCLDR NILSLINEFD FPELS ............ ISR QKHS . . . . . . . . . . . . . . . . . . . . . . . . . . . . VPIGDVAK QKHS . . . . . . . . . . . . . . . . . . . . . . . . . . . . VPIGDVAK HEHK . . . . . . . . . . . . . . . . . . . . . . . . . . . . VNIEDVAT YVQK . . . . . . . . . . . . . . . . . . . . . . . . . . . . TSIDKVGL
121
Ntsehomsa VIFTVLGYMA Gatltorca VIFSIVGFMA Ntglhomsa VIFSIVGFMA Ntgmanse VIFSVVGFMA NtchratnoVVFSILGFMA NtcrtormaVVFSILGFMA NtbecanfaVVFSILGFMA Ntg4musmuVVFSILGFMS Ntg3musmu AIFSVLGFMA Ntgtorma AIFSVLGFMA Ntg2musmu AIFSILGFMS Nttahomsa AIFSILGFMA Ntprratno AIFSVLGYMS Consensus ..FS.LG.MA
m
EMRN ............................ EDVSEVAK HVTN ............................ RPIADVA. HVTK ............................ RSIADVA. HEQQ ............................ RPVAEVA. TEQG ............................ VHISKVA. VEEG ............................ VDISVVA. QEQG ............................ LPISEVA. QEQG ............................ IPISEVA. YEQG ............................ VPIAEVA. FEQG ............................ VPIAEVA. QEQG ............................ VPISEVA. QEQG ............................ VDIADVA. QELG ............................ VPVDQVA. ..................................... VA.
Ntt4ratno Ntt7ratno Rosiratno Ntdobosta Ntdohomsa Ntnohomsa Ntsedrome Ntsehomsa Gatltorca Ntglhomsa Ntgmanse Ntchratno Ntcrtorma Ntbecanfa Ntg4musmu Ntg3musmu Ntgtorma Ntg2musmu Nttahomsa Ntprratno Consensus
451 KDYSEMYNVI MTVKEKQFSA LGLDPCLLED ELDKSVQGTG EDYHVVYDII QKVKEEEFAV LHLKACQIED ELNKAVQGTG DEYPSVLMYL NATQPERVAR LPLKTCHLED FLDKSASGPG D .................................... GPG D .................................... GPG E .................................... GAG E .................................... GPG D ................................... AGPS A ................................... SGPG A ................................... SGPG A ................................... SGPG E ................................... SGPG E ................................... SGPG E ................................... SGPG E ................................... SGPG E ................................... SGPG E ................................... SGPG E ................................... SGPG E ................................... SGPG K ................................... AGPG ..................................... GPG
Ntt4ratno Ntt7ratno Rosiratno Ntdobosta Ntdohomsa Ntnohomsa Ntsedrome Ntsehomsa Gatltorca Ntglhomsa Ntgmanse Ntchratno Ntcrtorma Ntbecanfa Ntg4musmu Ntg3musmu
501 THFPASPFWS VMFFLMLINL THFPASPFWS VMFFLMLINL LHMPGASVWS VLFFGMLFTL ATLPLSSVWAVVFFVMLLTL ATLPLSSAWAVVFFIMLLTL STLSGSTFWAVVFFVMLLAL ATMSGSVFWS IIFFLMLITL ANMPASTFFA IIFFLMLITL TQLPISPLWS ILFFSMLLML TQLPISPLWA ILFFSMLLML LQLPGAPLWS CLFFFMLLLI TLMPVAPLWA ALFFFMLLLL TLMPFPQVWA VLFFIMLLCL TMMPLSQLWS CLFFIMLIFL TMMPLSQLWS CLFFIMLLFL TMMPLSPLWA TLFFMMLIFL
500 LAFIAFTEAM LAFIAFTEAM LAFIVFTEAV LIFIIYPEAL LIFIIYPEAI LVFILYPEAI LVFIVYPEAI LLFITYAEAI LAFLAYPEAV LAFLAYPEAV LAFLAYPSAV LAFIAYPRAV LAFIAYPKAV LAFIAFPKAV LAFIAFPKAV LAFIAYPKAV LTFIAYPKAV LAFIAYPRAV LAFIAYPKAV LAFVIYPQAM L.F..YP.A.
550 GLGSMIGTMA GITTPIID ...... TFKVPK GLGSMFGTIE GIITPVVD ...... TFKVRK GLSSMFGNME GVITPLFDM..GILPKGVPK GIDSAMGGME SVITGLADEF .QLLHR..HR GIDSAMGGME SVITGLIDEF .QLLHR..HR GLDSSMGGME AVITGLADDF .QVLKR..HR GLDSTFGGLEAMITALCDEY PRVIGR..RR GLDSTFAGLE GVITAVLDEF PHVWAK..RR GIDSQFCTVE GFITALVDEF PKLLRG..RR GIDSQFCTVE GFITALVDEY PRLLRN..RR GLDSQFCTME GFITAVIDEW PKLLRR..RK GLDSQFVGVE GFITGLLDLL PASYYFRFQR GLGSQFVGVE GFVTAILDLW PSKFSFRYLR GLDSQFVCVE CLVTASMDMF PSQLRKSGRR GLDSQFVCME CLVTASMDMF PQQLRKSGRR GLDSQFVCVE SLVTAVVDMY PKVFRRGYRR
Ntgtorma Ntg2musmu Nttahomsa Ntprratno Consensus
TMMPLAPLWA FLFFLMLIFL GLDSQFVCME VMLPFSPLWA CCFFFMVVLL GLDSQFVCVE TMMPLPTFWS ILFFIMLLLL GLDSQFVEVE TMLPLSPFWS FLFFFMLLTL GLDSQFAFLE ...P .... W . . . F F . M L . . L G L D S . F . . . E
551 Ntt4ratno EMFTVGCCVF AFFVGLLFVQ Ntt7ratno EILTVICCLL AFCIGLMFVQ Rosiratno ETMTGVVCFI CFLSAICFTL Ntdobosta ELFTLLVVLA TFLLSLFCVT Ntdohomsa ELFTLFIVLA TFLLSLFCVT Ntnohomsa KLFTFGVTFS TFLLALFCIT Ntsedrome ELFVLLLLAF IFLCALPTMT Ntsehomsa ERFVLAVVIT CFFGSLVTLT Gatltorca EIFIAMVCIV SYLIGLSNIT Ntglhomsa ELFIAAVCII SYLIGLSNIT Ntgmanse EIFIAITCII SYLVGLSCIS Ntchratno EISVALCCAL CFVIDLSMVT Ntcrtorma EVVVAMVICL SFLIDLSMIT Ntbecanfa ELLILAIAVF CYLAGLFLVT Ntg4musmu DVLILAISVL CYLMGLLLVT Ntg3musmu ELLILALSII SYFLGLVMLT Ntgtorma EQLIFVIALA SYLMGLVMVT Ntg2musmu EVLILIVSVI SFFIGLIMLT Nttahomsa EIFIAFVCSI SYLLGLTMVT Ntprratno AVFSGLICVA MYLMGLILTT C o n s e n s u s E .............. L...T Ntt4ratno Ntt7ratno Rosiratno Ntdobosta Ntdohomsa Ntnohomsa Ntsedrome Ntsehomsa Gatltorca Ntglhomsa Ntgmanse Ntchratno Ntcrtorma Ntbecanfa Ntg4musmu Ntg3musmu Ntgtorma Ntg2musmu Nttahomsa Ntprratno Consensus
601 VAWIYGTKKF VSFVYGIDKF VIHVYGIKRF VAWFYGVWQF VAWFYGVGQF VSWFYGVDRF VFWFYGVDRF VSWFYGITQF ISWCYGVNRF ISWFYGVNRF ISWAFGVNRF VAWVYGADRF VAWVYGGDRY ISWVYGADRF IGWVYGADRF IGWVYGSNRF IGWVYGGNRF VAWVYGAGRF IAWIYGGDNL VTRVYGIQRF ..W.YG...F
SLVTAIIDMY PSIFRRGYRR SLVTALVDMY PRVFRKKNRR GQITSLVDLY PSFLRKGYRR TIVTAVTDEF PYYLRP..KK ...T...D ........... R
600 RSGNYFVTMF DDYSAT.LPL TVIVILENIA RSGNYFVTMF DDYSAT.LPL LIVVILENIA QSGSYWLEIF DSFAAS.LNL I I F A F M E W G NGGIYVFTLL DHFAA.GTSI LFGVLMEVIG NGGIYVFTLL DHFAA.GTSI LFGVLIEAIG KGGIYVLTLL DTFAA.GTSI LFAVLMEAIG YGGVVLVNFL NVYGP.GLAI LFVVFVEAAG FGGAYVVKLL EEYAT.GPAV LTVALIEAVA QGGLYVFKLF DYYSASGMSL LFLVFFETVS QGGIYVFKLF DYYSASGMSL LFLVFFECVS EGGMYVFQIL DSYAVSGFCL LFLIFFECVS DGGMYVFQLF DYYSASGTTL LWQAFWECVV EGGMYIFQIF DYYSASGTTL LWTAFWECVA EGGMYIFQLF DYYASSGICL LFLAMFEVIC EGGMYIFQLF DYYASSGICL LFLSLFEVIC EGGMYIFQLF GSYAASGMCL LFVAIFECVC EGGMYIFQLF DAYASSGMCL LFVAIFECIC EGGMYVFQLF DYYAASGMCL LFVAIFESLC EGGMYVFQLF DYYAASGVCL LWVAFFECFV DGGMYWLVLL DDYSAS.FGL MVVVITTCLA . G G . Y . . . L . D . Y . . . G . . L L ..... E...
MQELTEMLGF RPYRFYFYMW LEDLTDMLGF APSKYYYYMW CDDIEWMTGR RPSLYWQVTW SDDIKQMTGR RPSLYWRLCW SDDIQQMTGQ RPSLYWRLCW SNDIQQMMGF RPGLYWRLCW SSDVEQMLGS KPGLFWRICW CRDVKEMLGF SPGWFWRICW FVNIEEMVGH KPCLWWKLCW YDNIQEMVGS RPCIWWKLCW YDGIKEMIGY YPTIWWKFCW MDDIACMIGY RPCPWMKWCW LDDLAWMLGY RPWALVKWCW YDNIEDMIGY RPWPLVKISW YDNVEDMIGY RPWPLVKISW YDNIEDMIGY RPLSLIKWCW YDNIEDMIGY RPFVLIKWCW YDNIEDMIGY KPWPLIKYCW YDGIEDMIGY RPGPWMKYSW CRDIHMMLGF KPGLYFRACW ...... M . G . . P ....... W
650 KFVSPLCMAV LTTASIIQLG KYISPLMLVT LLIASIVNMG RVVSPMLLFG IFLSYIVLLA KFVSPCFLLFVVVVSIATF. KLVSPCFLLFVVVVSIVTF. KFVSPAFLLFVVVVSIINF. TYISPVFLLT IFIFSIMGY. VAISPLFLLF IICSFLMSP. SFFTPIIVGG VFLFSAIQM. SFFTPIIVAG VFIFSAVQM. VGFTPAICIS VFIFNLVQW. SFFTPLVCMG I F I F N W Y Y . SVITPLVCMG IFTFHLVNY. LFLTPGLCLA TFLFSLSQY. LFLTPGLCLA TFFFSLSKY. KVVTPGICAG IFIFFLVKY. IFITPGICAAIFIFFIVRY. LFFTPAVCLA TFLFSLIKY. V.ITPVLCVG CFIFSLVKY. LFLSPATLLA LLVYSIVKY. .... P ...............
651 700 Ntt4ratno VSPPGYSAWI KEEAAERYLY FPNWAMALLI TLIAVATLPI P W F I L R H F H Ntt7ratno LSPPGYNAWI KEKASEEFLS YPMWGMVVCF SLMVLAILPV PVVFVIRRCN
12~
Rosiratno QSSPSYKAW ..... NPQYEH FPSREEKLYP N t d o b o s t a .......... R P P H Y G A . Y V F P E W A T A L G W N t d o h o m s a .......... R P P H Y G A . Y I F P D W A N A L G W N t n o h o m s a .......... K P L T Y D D . Y I F P P W A N W V G W N t s e d r o m e .......... K E M L G E E . Y Y Y P D W S Y Q V G W N t s e h o m s a .......... P Q L R L F Q . Y N Y P Y W S I I L G Y G a t l t o r c a .......... K P L K M G S . Y I F P K W G Q G V G W N t g l h o m s a .......... T P L T M G N . Y V F P K W G Q G V G W Ntgmanse .......... T P I K Y M N . Y E Y P W W S H A F G W N t c h r a t n o .......... K P L V Y N N T Y V Y P W W G E A M G W N t c r t o r m a .......... K P L T Y N K T Y T Y P W W G E A I G W N t b e c a n f a .......... T P L K Y N N I Y V Y P P W G Y S I G W N t g 4 m u s m u .......... T P L K Y N N V Y M Y P S W G Y S I G W N t g 3 m u s m u .......... K P L K Y N N V Y T Y P A W G Y G I G W Ntgtorma .......... Q P L K Y N N V Y V Y P D W G Y A L G W N t g 2 m u s m u .......... T P L T Y N K K Y T Y P W W G D A L G W N t t a h o m s a .......... V P L T Y N K T Y V S P T W A I G L G W N t p r r a t n o .......... Q P S E Y G S . Y R F P A W A E L L G I C o n s e n s u s . . . . . . . . . . . P ...... Y . . P . W .... GW
J,2~
GWVQVTCVLL SFLPSLWVPG AIAASSMSVVPIYAAYKLCS VIATSSMAMV PIYAAYKFCS GIALSSMVLV PIYVIYKFLS AVTCSSVLCI PMYIIYKFFF CIGTSSFICI PTYIAYRLII FMALSSMMLI PGYMGYMFLT LMALSSMVLI PGYMAYMFLA FTALSSMLCI PGYMIYLWRV AFALSSMLCV PLHLLGCLLR CLALASMLCV PTTVLYSLSR FLALSSMICV PLFVIITLLK LLAFSSMACV PLFIIITFLK LMALSSMLCI PLWIFIKLWK ALALSSMICI PLGFIFKMWS LLALSSMICI PAWSIYKLRT SLALSSMLCV PLVIVIRLCQ LMGLLSCLMI PAGMLVAVLR .... SS .... P .........
Ntt4ratno Ntt7ratno Rosiratno Ntdobosta Ntdohomsa Ntnohomsa Ntsedrome Ntsehomsa Gatltorca Ntglhomsa Ntgmanse Ntchratno Ntcrtorma Ntbecanfa Ntg4musmu Ntg3musmu Ntgtorma Ntg2musmu Nttahomsa Ntprratno Consensus
701 750 LLSDGSNTL. SVSYKKGRMM KDISNLEEND ETRFILSKVP SEAPSPMPTH LIDDSSGNLA SVTYKRGRVL KEPVNLDG.D DASLIHGKIP SEMSSPNFGK IALAQLLFQY RQRWKNTHLE SALKPQESRG C ................... LP.GSSREKL AYAITPETEH GRVDSGGGAP VHAPPLARGV GRWRKRKSCW LP.GSFREKL AYAIAPEKDR ELVDRGEVRQ FTLRHWLKV ........... TQ.GSLWERL AYGITPENEH HLVAQRDIRQ FQLQHWLAI ........... A S K G G C R Q R L Q E S F Q P E D N C G S V V P G Q Q G T SV . . . . . . . . . . . . . . . . . . TP.GTFKERI IKSITPETPT EIPCGDIRLN AV .................. S K . G S L K Q R L R L M T Q P N E D M K C R E N G P E Q T E C G N T P S D E A YM ........ L K . G S L K Q R I Q V M V Q P S E D T V R P E N G P E H A Q A G S S T S K E A YI ........ TP.GTWQEKF HKIVRIPEDV PSLRTKM ....................... AK.GTMAERW QHLTQPIWGL HHLEYRAQDA DVRGLTTLTP VSESSKVVVV GR.GSLKERW RKLTTPVWAS HHLAYKMAGA KI.NQPCEGV VSCEEKVVIF T R . G S F K K R L R Q L T T P D P S L P Q P K Q H L Y L D G G T S Q D C G P S P T K E G ..... T Q . G S F K K R L R R L I T P D P S L P Q P G R R P P Q D G S S A Q N C S S S P A K Q E ..... TE.GTLPEKL QKLTVPSADL KMRGKLGASP RTV..TVNDC EAKVKGDGTI TE.GTFLEKI KKLTTPSADL RRKGMGMSNM DTCCSTISDC DGKLKGDECI LK.GPLRERL RQLVCPAEDL PQKNQPEPTA PATPMTSLLR LTELESNC.. TE.GPFLVRV KYLLTPREPN RWAVEREGAT PYNSRTVMNG ALVKPTHIIV EE.GSLWERL QQASRPAIDW GPSLEENRTG MYVATLAGSQ SPKPLMVHMR ...G . . . . . . . . . . . P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Ntt4ratno Ntt7ratno Ntdobosta Ntchratno Ntcrtorma Ntbecanfa Nt g 4 m u s m u Nt g 3 m u s m u Ntgtorma Nttahomsa
751 800 RSYLGPGSTS PLESSSHPNGRYGSGYLLA...STPESEL ........... NIYRKQSGSP TLDTA..PNG RYGIGYLMAD MPDMPESDL ........... VPSRGPGRGG PPTPSPRLAG HTRAFPWTGA PPVPRELTPP STCRCVPPLV ESVM .............................................. ESVL .............................................. LIVGEKETHL ........................................ LIAWE KETHL ........................................ SAITEKETHF ........................................ PAITEKETHF ........................................ ETMM ..............................................
:::::::::::::::::::::::::::::::::
Ntprratno KYGGITSFEN TAIEVDREIA EEEEESMMXD QTPPNRRAGR GLPVCPFLGH Consensus ..................................................
il}~iiiii!:.iiiiill !~ii:::i!!iii!2!iii:i::ii~ii:i~ :.:.......................... ..:::~:
801
815
Ntdobosta CAHPAVESTG LCSVY Ntprratno RG . . . . . . . . . . . . . Consensus . . . . . . . . . . . . . . .
................
!f.;!:i!)~!iiii!Gii::.:ii
:~r
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the alignments: Ntcrorycu, Ntcrhomsa (Ntchratno); Ntglratno, Ntglmusmu (Ntglhomsa); Ntg2ratno (Ntg2musmu); Ntg3ratno (Ntg3musmu); Ntnobosta (Ntnohomsa); Ntslratno, Nts2ratno (Ntsehomsa); Nttacanfa, Nttamusco, Nttaratno (Nttahomsa); Ntdoratno (Ntdohomsa). Residues listed in the consensus sequence are present in at least 75 % of the transporter sequences shown.
Database accession numbers
SWISSPROT Gatltorca Ntbecanfa Ntchratno Ntcrhomsa Ntcrorycu Ntcrtorma Ntdobosta Ntdohomsa Ntdoratno Ntglhomsa Ntglmusmu Ntg 1ratno Ntg2musmu Ntg2ratno Ntg3musmu Ntg3ratno Ntg4musmu Ntgmanse Ntgtorma !?!~:!)i;!:!!i!~!! Ntnobosta Ntnohomsa Ntprratno Nts 1ratno ii!~i!i::i}?!?i!i; Nts2ratno Ntsedrome Ntsehomsa 2"C..CL22 Ntt4ramo Ntt 7ratno Nttacanfa Nttahomsa Nttamusco Nttaratno Rosiratno
P27799 P28570 P31661
:. ........................ ~.~:.
: :::::.:.:::
:::::
::::::::::::::::::::::::::::
P27922 Q01959 P23977 P30531 P28571 P23978 P31649 P31646 P31650 P31647 P31651
:::::::::::::::::::::::.:::,:,.-::.
...............
i!!i:!i~$i::~i :!!~i.;):.
P23975 P28573 P31652 P23976
i:iiii-!~i::~i:::!i~!~.)!i~!~i
P31645 P31662 Q08469 Q00589 P31641 P31642 P31643
PIR
EMBL/GENBANK
I51368; $42808 A41757 $23431 JC2386 X67252 $46260 A41617 A48980 $20346 S 11073 F46027 A35918 A44409 A45708 B44409 JH0695; B45078 A43390 L40373 X87170 U09198 S14278 M88111 $30604 S 19585 U04809 $37688; A47398 $27043 L22022 A46270 $29839 A47194 M96601 U12973
X77139 M80403 X66494
M802.34 M96670; M95167 M80233; M80570 X54673 M92378 M59742 L04663 M95762 L04662 M95738; M95763 M97632
M65105 X63995 M79450 X70697; L05568 $52051; $68944 M95495 Z18956; U09220 L03292
R eferen ces 1 Schloss, P. et al. (1994) Curr. Opin. Cell Biol. 6, 5 9 5 - 5 9 9 . 2 Reizer, J. et al. (1994) Biochim. Biophys. Acta 1197, 133-166.
m
3 4 5 6 7
12r
Edwards, R.H. (1993) Ann. Neurol. 34, 638-645. Wright, E.M. et al. (1996) Curr. Opin. Cell Biol. 8, 468-473. Bruss, M. et al. (1995)J. Biol. Chem. 270, 9197-9201. Borden, et al. (1992) J. Biol. Chem. 267, 21098-21104. Demchyshyn, L. et al. (1994) Proc. Natl Acad. Sci. USA 91, 5158-5162.
Na+-Dependent Antiporters
Na+/H§ antiporter family Summary ~.::::::y:.:::::: ,.
Transporters of the Na§ + antiporter family, the example of which is the NHE1 Na+/H § antiporter of h u m a n s (Nahlhomsa), mediate Na+-linked {i~i}ii!i!ii! extrusion of hydrogen ions from the cell. Members of the Na+/H § antiporter family regulate intracellular pH and are involved in sodium readsorption and | signal transduction. They are fully active at acidic pH, inactive at neutral pH, and differ in their sensitivity to amiloride 1,z. Members of the family occur in both invertebrates and vertebrates. Statistical analysis reveals no apparent relationship between the amino acid sequences of the NHE1 family and other family of transporters. They are predicted to form 10 or 12 membrane-spanning helices by the hydropathy of their amino acid sequences. Members of the Na+/H § antiporter family are glycosylated and may be phosphorylated. N~ig:,!~.:{ Several amino acid sequence motifs are highly conserved in the Na§ + antiporter family.
........
............
..........
...... ......
......
..:;::-:.
.... ::::::::::::::::::::::::
:~
Nomenclature, biological sources and substrates CODE Nahlhomsa Nahlorycu Nahlratno Nah3orycu Nah3ratno Nah4ratno Nahboncmy Nhecarma Nhelmusmu Nhe2orycu Nhe2ratno Nhe3didvi
DESCRIPTION [SYNONYMS] Na+/H§ antiporter 1 [NAH1, NHE1, APNH1] Na+/H+antiporter 1 [NAH1, NHE1] Na+/H+antiporter 1 [NAH1, NHE1] Na§ +antiporter 3 [NAH3, NHE3] Na+/H§ antiporter 3 [NAH3, NHE3] Na§ § antiporter 4 [NAH4, NHE4] Na§ § antiporter fl [NAH, fl NHE] Na§ +antiporter [NHE] Na§ § antiporter isoform 1 [NHE1] Na§ § antiporter isoform 2 [NHE2] Na§ § antiporter isoform 2 [NHE2] Na+/H§ antiporter isoform 3 [NHE3]
OR GANISM [COMMON NAMES/ Homo sapiens [human] Oryctolagus cuniculus [rabbit] Rattus norvegicus [rat] Oryctolagus cuniculus [rabbit] Rattus norvegicus [rat] Rattus norvegicus [rat] Oncorhynchus mykiss [trout l Carcinusmaenas [crab] Mus musculus [mouse] Oryctolagus cuniculus [rabbit] Rattus norvegicus [rat] Didelphis virginiana [opossum]
SUBSTRATE(S) Na§ § Na+/H+ Na+/H+ Na§ + Na§ § Na§
+
Na§ § Na*/H§ Na§ § Na+/H+ Na+/H§ Na*/H§
Cotransported ions are listed.
Phylogenetic tree Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the phylogenetic tree: Nahlorycu, Nahlratno, N h e l m u s m u (Nahihomsa); Nhe2orycu (Nhe2ratno).
128
Nahlhomsa
Nahboncmy
Nah4ratno Nhe2ratno
Nah3ratno Nhe3didvi Nah3orycu Nhecarma Nahcaeel
................................. ~.'-::.t.~:.~:::.~:~.~':.::::.:.
!~:.~:~!~y::.~:::.: ~:. ..................
',G?!~:i',iiii::i
P r o p o s e d o r i e n t a t i o n of N H E 1 in t h e m e m b r a n e The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the inside and is folded ten times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75% of the aligned transporters (see below)are shown. Consensus residues indicated by an asterisk are not conserved in NHE1.
.~.~.}~.{:~ ........................
YGGLRGA
NST S
OUTSIDE
............... i| ~
::::::::::::::::::::::::::::::::
VD
Iv
............. ~!~!~7}!!~!:~!:!~~:.~i
- -
G~
G L A
~[
T
~
'
L I k
!?:?~i~:!Es ..................
::::::::::::::::::::::::::::::
F
:.~.,:.::,:~::::::::~:,-::,:,.,:,:,:,:
K
M* P _ _
L
V*NE
R FF~
-~-::~.~-..r :.:.-;~;--:.:
F
R F T
E
W
N /
_ _
F
L
~
R IEP NH
VK
1
R COOH
INSIDE 2
P h y s i c a l and g e n e t i c c h a r a c t e r i s t i c s ::!!!~2!?,~::!:!!!
AMINO ACIDS
MOL. WT
EXPRESSION SITES
Nahlhomsa Nah 1orycu
815 816
90 763 90 717
kidney, intestine kidney, intestine
Nahlratno Nah3orycu
820 832
91 612 92 748
kidney, intestine intestine, kidney
Nah3ratno
831
93 105
intestine, kidney
:::::::::::::::::::::::::::::::
................... ................
Km
CHROMOSOMAL LOCUS
lp36.1-p35 Na+: 15rnM 3 Na+: 17rnM 3
| g,_)f
Nah4ratno
AMINO ACIDS 717
MOL. WT 81 522
Nahboncmy
759
85 173
Nhecarma Nhelmusmu Nhe2orycu
672 820 809
75 981 91467 90 744
EXPRESSION SITES stomach, intestine, colon nucleated erythrocytes intestine ubiquitous kidney, intestine
Nhe2ratno Nhe3didvi
813
91402
kidney, intestine
839
94 765
::ii~i!:4:,iii~i!i!==~i
N !f~;;iNil i
Km
CHROMOSOMAL LOCUS
Na+: 18mM 3
Multiple amino acid sequence atgnments 1
50
Nahlhomsa MVLRSGICGL SPHRIFPSLLVVVALVGLLP VLRSHGLQLS PTASTIRSSE N a h b o n c m y . . M P A F S C A F P G C R . . R D L L V I V ..... LV V F V G I G L P I E A S A P A Y Q S . . N a h 4 r a t n o ......... M G P A .... M L R A F S S W K W L L L L M V L T C L E A S SYVN ...... N h e 2 r a t n o ......... M G P S G T A H R M R A P L S W L L L L L L S L Q V A V P A G A L A E T L L D A P Nah3ratno .......................................... MWHPALGP Nhe3didvi ...................................... MP L G V R G T R R E F Nah3orycu ........................................ MSGRGGC.GP Nhecarma ....................... MKNRVIL MVCVAWCVLG LAAANTSAKQ Consensus ..................................................
13(
Nahlhomsa Nahboncmy Nah4ratno Nhe2ratno Nah3ratno Nhe3didvi Nah3orycu Nhecarma Consensus
51 i00 PPRERSIGDV TTAPPEVTPE SRPVNHSVTD HGMKPRKAFP VLGIDYTHVR ............... HGTEG SHLTNITNT ...... KKAFP VLAVNYEHVR ESS . . . . . . . . S P T G Q Q T P D A R F A A S S S D P . . D E R . . . I S V F E L D Y D Y V Q GAR ........ GASSNPPSP ASVVAPGTTP FEESR...LP VFTLDYPHVQ G W K P L L A L A V A .... V T S L R G V R G I E E E P N SGG .... SFQ I V T F K W H H V Q R F P V W G L L L L A .... L W M L P R A L G V E E I P G P D S H E K Q G F Q I V T F K W H H V Q C W G L L L A L V L A .... L G A L P W T Q G A E Q E H H . . . D E I Q G F Q I V T F K W H H V Q HHTATNTTTT ADNETLQRVR IDGSEAHNEA EGEHRLERYPVVVLDFERVQ ................................................ V.
Nahlhomsa Nahboncmy Nah4ratno Nhe2ratno Nah3ratno Nhe3didvi Nah3orycu Nhecarma Nahcaeel Consensus
i01 150 TPFEISLWIL LACLMKIGFH VIPTISSIVP ESCLLIWGL LVGGLIKGVG KPFEIALWIL LALLMKLGFH LIPRLSAVVP ESCLLIWGL LVGGLIKVIG IPYEVTLWIL LASLAKIGFH LYHRLPHLMP ESCLLIIVGA LVGSIIFGTH IPFEITLWIL LASLAKIGFH LYHKLPTIVP ESCLLIMVGL LLGGIIFGVD DPYIIALWIL VASLAKIVFH LSHKVTSVVP ESALLIVLGL VLGGIVWAAD DPYIIALWIL VASLAKIVFH LSHKVTSVVP ESALLIVLGL ILGGIVWAAD DPYIIALWVL VASLAKIVFH LSHKVTSVVP ESALLIVLGL VLGGIVLAAD TPF.IGLWIF LACLGKIGFH MTPKISHVFP ESCMLIVLGV LIGLLLIYTQ . . . . . . . . . . . . . . . . . . FN L M K P I S K W C P D S S L L I I V G L A L G W I L H Q T S . P . . I . L W I L . A . L . K I . F H L ........ P E S . L L I . . G L ..GG ......
Nahlhomsa Nahboncmy Nah4ratno Nhe2ratno
151 E. T P P F L Q S D E.EPPVLDSQ HKSPPVMDSS EKSPPAMKTD
VFFLFLLPPI LFFLCLLPPI IYFLYLLPPI VFFLYLLPPI
ILD.AGYFLP ILD.AGYFLP VLE. S G Y F M P VLD.AGYFMP
LRQFTENLGT IRPFTENVGT TRPFFENIGS TRPFFENLGT
200 ILIFAVVGTL ILVFAVIGTL ILWWAGLGAL IFWYAVVGTL
................. ::.:.:-::::.i-::-:.i::~i-::: ........................... ................. :::::::::::::::::::::::::::::::::::
................
................
...................
................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
ii!ili?!!~iii iiiiiiii!:iiiiil . . . . . . ......< ................ :
!Niii{i~i:ili
~iNi;i',
Nah3ratno Nhe3didvi Nah3orycu Nhecarma Nahcaeel Consensus
HIASFTLTPT LFFFYLLPPI HIASFTLTPT VFFFYLLPPI HIASFTLTPT VFFFYLLPPI AATVSPLTAD VFFLYLLPPI .LSGATLDSH TFFLYLLPPI ...... L .... FFLYLLPPI
201 N a h l h o m s a W N A F F L G G L M YAVCLVGGEQ N a h b o n c m y W N A F F M G G L L YALCQIESVG Nah4ratno INAFGIGLSL YFICQIKAFG N h e 2 r a t n o W N S I G I G L S L FGICQIEAFG N a h 3 r a t n o W N A A T T G L S L YGVFLSGLMG N h e 3 d i d v i W N A A T T G L S L YGVYLSGIMG N a h 3 o r y c u W N A A T T G L S L YGVFLSGIMG Nhecarma WNALTIGITM YAISLTGLFG Nahcaeel WNTFAIGGSL LLMAQYDLFT Consensus W N A . . . G . . L Y ........ G
VLD.AGYFMP VLD.AGYFMP VLD.AGYFMP ILD.AVYFMP IFGSSGYFMP .LD.AGYFMP
NRLFFGNLGT NRLFFGNLGT NRLFFSNLGS NRLFFDNLFT NRALFENFDS .R.FF.NLG.
250 INNIGLLDNL LFGSIISAVD PVAVLAVFEE LSGVDLLACL LFGSIVSAVD PVAVLAVFEE LGDINLLQNL LFGSLISAVD PVAVLAVFEE LSDITLLQNL LFGSLISAVD PVAVLAVFEN ELKIGLLDFL LFGSLIAAVD PVAVLAVFEE DLSIGLLDFL LFGSLIAAVD PVAVLAVFEE ELKIGLLDFL LFGSLIAAVD PVAVLAVFEE .LDIPMLHMF LFSSLISAVD PVAVLAVFEE .MSFTTFEIL VFSALISAVD PVAVIAVFEE ...I.LL..LLFGSLI.AVDPVAVLAVFEE
Nahlhomsa Nahboncmy Nah4ratno Nhe2ratno Nah3ratno Nhe3didvi Nah3orycu Nhecarma Nahcaeel Consensus
251 IHINELLHIL IHINELVHIL ARVNEQLYMM IHVNEQLYIL VHVNEVLFII VHVNDVLFII VHVNEVLFII MQVEEVLFIL IHVNEFLFIN ..VNE.L.I.
Nahlhomsa Nahboncmy Nah4ratno Nhe2ratno Nah3ratno Nhe3didvi Nah3orycu Nhecarma Nahcaeel Consensus
301 LSFFVVALGG V L V G W Y G V I A A F T S R F T S H IRVIEPLFVF VCFFVVSLGG V L V G A I Y G F L A A F T S R F T S H TRVIEPLFVF ARFVIVGCGG VFFGIIFGFI SAFITRFTQN ISAIEPLIVF ANFFVVGIGG V L I G I L L G F I A A F T T R F T H N IRVIEPLFVF VSFFVVSLGG TLVGVIFAFL LSLVTRFTKH VRIIEPGFVF VSFFVVSLGG TLIGIIFAFL LSLVTRFTKH VRIIEPGFVF VSFFVVSLGG T L V G W F A F L LSLVTRFTKH VRVIEPGFVF ASFLLVALGG TAIGIIWGFL TAFVTRLTSG VRVIEPVFVF L S F F V V A L G G A A V G I I F A I A ASLTTKYTYD VRILAPVFIF ..FFVV.LGG ...G .... F ..... T R F T . . . R . I E P . F V F
VFGESLLNDA VFGESLLNDA IFGEALLNDG VFGESLLNDA VFGESLLNDA VFGESLLNDA VFGESLLNDA VFGESLLNDG VFGEALFNDG VFGESLLND.
351 Nahlhomsa ELFHLSGIMA LIASGVVMRP Nahboncmy EMFHLSGIMA LIACGVVMRP Nah4ratno ETLYLSGILA ITACAVTMKK Nhe2ratno EMFHLSGIMA ITACAMTMNK Nah3ratno EMLSLSAILA ITFCGICCQK Nhe3didvi EMLSLSAILA ITFCGICCQK Nah3orycu EMLSLSSILA ITFCGICCQK Nhecarma EIFHLSGILS ITFCGITMKN Nahcaeel EMVSLSSIIA IAICGMLMKQ C o n s e n s u s E . . . L S . I . A I . . C C .....
ILLYAVIGTI ILLYAVIGTV ILLYAVVGTV ILVFAVIGTI VLVFSVFGTI IL..AV.GT.
300 VTVVLYHLFE EFAN...YEH VGIVDIFLGF VTVVLYNLFE EFSK...VGT VTVLDVFLGV ISVVLYNILI AFTKMHKFED IEAVDILAGC VTVVLYNLFK SFCQM...KT IQTVDVFAGI VTVVLYNVFE SFVTLGG.DA VTGVDCVKGI VTVVLYNVFD SFVSLGA.DK VTGVDCVKGI VTVVLYNVFQ SFVTLGG.DK VTGVDCVKGI VTVVLYHLFE G F S E L G E . A N I M A V D I A S G V VTVVLYQC.S KFALIGS.EN LSVLDYATGG V T V V L Y . . F . . F ........... VD...G.
YVEANISHKS HTTIKYFLKM YVEANISHKS YTTIKYFLKM YVEENVSQTS YTTIKYFMKM YVEENVSQKS YTTIKYFMKM YVKANISEQS ATTVRYTMKM YVKANISEQS ATTVRYTMKM YVKANISEQS ATTVRYTMKM Y.WNRTSPPS PHDHQIRHED YIKGNVTQAAANSVKYFTKM YV..N.S..S .TT..Y..KM
350 LYSYMAYLSA LYSYMAYLSS MFSYLSYLAA LYSYLSYITA VISYLSYLTS IISYLSYLTS IISYLSYLTS VMAYLAYLNA VLPYMAYLTA ..SY..YL.. 400 WSSVSETLIF WSSVSETLIF LSSVSETLIF LSSVSETLIF LASGAETIIF LASGAETIIF LASGAETIIF VSLVFETIIF LAQSSETVIF ..S..ET.IF
131
Na*/H + antiporter family
Nahlhomsa Nahboncmy Nah4ratno Nhe2ratno Nah3ratno Nhe3didvi Nah3orycu Nhecarma Nahcaeel Consensus
401 IFLGVSTVAG S.HHWNWTFV IFLGVSTVAG P . H A W N W T F V IFMGVSTVGK N . H E W N W A F V IFMGVSTVGK N . H E W N W A F V MFLGISAVDP V I W T W N T A F V MFLGISAVDP AIWTWNTAFI MFLGISAVDP L I W T W N T A F V MFLGVSTIQS D . H Q W N T W F V MFLGLSTISS Q.HHFDLYFI .FLG.S.V ...... WN..FV
ISTLLFCLIA ITTVILCLVS CFTLAFCQIW CFTLAFCLIW LLTLVFISVY LLTLVFISVY LLTLLFVSVF ILTILFCSIY CATLFFCLIY ..TL.F ....
Nahlhomsa Nahboncmy Nah4ratno Nhe2ratno Nah3ratno Nhe3didvi Nah3orycu Nhecarma Nahcaeel Consensus
451 TPKDQFIIAY TKKDQFIVAY SIKDQLIIFY TFKDQFIIAY ETIDQVVMSY EIIDQVVMSY ELIDQVVMSY GFVDKFVMSY EMVDQFIMSY ...DQ .... Y
500 LGYLLDKKHF PMCDLFLTAI ITVIFFTVFV LGYLLSNSH. QMRNLFLTAI ITVIFFTVFV LAFLLPLTLF PRKKLFVTAT LVVTYFTVFF LVFLLPATVF P R K K L F I T A A I V V I F F T V F I L V V L L D E K K V KEKNLFVSTT LIVVFFTVIF L V V L L D E K K V KEKNLFVSTT I I W F F T V I F L V A L L D G N K V KEKNLFVSTT IIVVFFTVIF LVITINPIHI PLQPMFLTAT IAMVYFTVFV LVVSIPAS.I T R K P M F I T A T IAWIYFTVFL LG.LL ......... LF ...... V.FFTV..
GGLRGAIAFS GGLRGAIAFS SGVRGAGSFS GGLRGAICFA GGLRGAVAYA GGLRGAVAYA GGLRGAVAFA GGLRGAVAFA GGLRGAIAYG GGLRGA.A..
450 R V L G V L G L T W FINKFRIVKL RVLGVIGLTF IINKFRIVKL RAISVFTLFY VSNQFRTFPF RALGVFVLTQ V I N W F R T I P L R A I G V V L Q T W ILNRYRMVQL R A I G V V L Q T W LLNKYRMVQL R A I G W L Q T W LLNRYRMVQL R I L G V L I F S A VCNRFRVKKI RAIGIVVQCY ILNRFRAKKF R..GV ....... N..R ....
501 N a h l h o m s a Q G M T I R P L V D L L A V K K K Q E T KRSINEEIHT QFLDHLLTGI N a h b o n c m y QGMTIRPLVE LLAVKKKKES KPSINEEIHT E F L D H L L T G V Nah4ratno Q G I T I G P L V R YLDVRKTNKK .ESINEELHI RLMDHLKAGI Nhe2ratno LGITIRPLVE FLDVKRSNKK QQAVSEEIHC RFFDHVKTGI Nah3ratno QGLTIKPLVQ W L K V K R S E Q R EPKLNEKLHG RAFDHILSAI Nhe3didvi QGLTIKPLVQ W L K V K K S E H R E P K L N E K L H G RAFDHILSAI N a h 3 o r y c u Q G L T I K P L V Q W L K V K R S E H R E P K L N E K L H G RAFDHILSAI N h e c a r m a Q G I T I K P L V Q L L G V K K S E K R SLTMNERLHE R V M D Y V M S G V Nahcaeel Q G I T I R P L V N F L K I K K K E E R DPTMVESVYN KYLDYMMSGV C o n s e n s u s Q G . T I . P L V . . L . V K ......... N E . . H . R . . D H ..... Nahlhomsa Nahboncmy Nah4ratno Nhe2ratno Nah3ratno Nhe3didvi Nah3orycu Nhecarma Nahcaeel Consensus
551 HWKDKLNRFN KKYVKKCLIA HWKEKLNRFN KTYVKRWLIA QVRDKFKKFD HRYLRKILIR FWRDKFKKFD D K Y L R K L L I R YLRDKWSNFD R K F L S K V L M R YLRDKWSNFD R K V L S K L L M R YLRDKWANFD R R F L S K L L M R HIRSKFKRFN N K Y L T P F L V R TFIENFERFN A K V I K P V L M R .... K...F ........ L.R
601 Nahlhomsa GGMGKIPSAV N a h b o n c m y GQLPSVLP.. Nah4ratno GLLSSVA... Nhe2ratnoGMISTVP... Nah3ratno SYVAEGERRG
m
550 EDICGHYGHH EGVCGHYGHY EDVCGQWSHY EDVCGHWGHN EDISGQIGHN EDISGQIGHN EDISGQIGHN EEMIGKQGNL EDIAGQKGHY ED..G..GH.
600 G E R S K E P Q . . . L I A F Y H K M E MKQAIELVES G E N F K E P E . . . L I A F Y R K M E LKQAIMMVES RNQPKSS .... IVSLYKKLE M K Q A I E M A E T ENQPKSS .... IVSLYKKLE IKHAIEMAET RSAQKSRD.. R I L N V F H E L N LKDAI ..... RSAQKSRD.. RILNVFHELN LKDAI ..... QSAQKSRD.. R I L N V F H E L N LKDAI ..... EKNVIEP... KLIETYSNIK KHEAMQQMHN HQKRESFDAS SIVRAYEKIT L E D A I K L A K V .... K . . . . . . . . . . . . . . . . K.AI .....
650 S T V S M Q N I H . . P K S L P S E R I LPALSKDKEE EIRKILRNNL S T I S M Q N I Q . . P R A I P R ...... VSKKREE EIRRILRANL S P T P Y Q S E R . . I Q G I K R ...... LSPEDVE SMRDILTRNM S F A S L N D C R . . E E K I R K ...... L T P G E M D E I R E I L S R N L SLAFIRSPS. TDNMVNVDFS T P R P S T V . E A SVSYFLRENV
Nhe3didvi Nah3orycu Nhecarma Nahcaeel Consensus
SYVAEGERRG SLAFIRSPS. TDNIVNVDFS TPRPSTV.EA SVSYLLRENV SYVTEGERRG SLAFIRSPS. TDNMVNVDFS TPRPSTV.EA SVSYLLRESA SYNASTNIES FSNLIRNDA. THPHVQMNNQ GEWN ................ K N N I Q N K R . . . L E R I K S K G R VAPILPDKIS NQKTMTPKDL QLKRFMESGE .......... S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . L ....
Nahlhomsa Nahboncmy Nah4ratno Nhe2ratno Nah3ratno Nhe3didvi Nah3orycu Nhecarma Nahcaeel Consensus
651 700 QKTRQRL..R SYNRHTLVAD PY .... EEAW NQMLLRRQ ........ K... QNNKQKMRSR SYSRHTLFDA DE .... EDNV SEVRLRKT ........ KMEM YQVRQR..TL SYNKYNLKPQ TS .... EKQA KEILIRRQNT LRESLRKGQS YQIRQR..TL SYNRHNLTAD TS .... ERQA KEILIRRRHS LRESLRKDNS SAVCLDMQSL EQRRRSIRDT EDMVTHHTLQ QYLYKPRQEY KHLYSRHELT STVCLDMQAL EQRRKSIRDT EDTVTHHTLQ QYLYKPRQEY KHLYSRHELT SAVCLDMQSL EQRRRSVRDA EDVITHHTLQ QYLYKPRQEY KHLYSRHVLS .... LDVAEL EYNP.TLRDL NDAKFHHLLS NDYKPVKKNR ASTYKRHAVK NIDSLYTLFS DLLDRKLHEM NRPSVQITDV DGQDDIQDDY MAEVSRSNLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R ....
Nahlhomsa Nahboncmy Nah4ratno Nhe2ratno Nah3ratno Nhe3didvi Nah3orycu Nhecarma Nahcaeel Consensus
701 750 ...ARQLEQK INNYLTVP.A HKLDS.PTMS RARIGSDPLA YEPKEDLPVI ERRVSVMERR MSHYLTVP.A NRESPRPGVR RVRFESDNQV FSA.DSFPTV LPWVKPAGTK NFRYLSFPYS N P Q P A R R G A R A A E S T G N P C C WLLHFLLCRA LNRERRASTS TSRYLSLPKN TKLPEKLQKK NKVSNADGNS ....... SDS P...NEDEKQ DKEIFHRTMR KRLESFKSAK L G I N Q N K K A A K L Y K . R E R A Q S...NEDEKQ DKEIFHRTMR KRLESFKSTK LGINQTKKTA KLYK.RERGQ P...SEDEKQ DKEIFHRTMR KRLESFKSAK LGLGQSKKAT KHKRERERAQ .... DDDMQT Q S D I G H H N M . . Y L H T H T H T Q LL .................. AMFRSTEQLP SETPFHSGRR QSTGDLNATR RADFNV .............. ..................................................
Nahlhomsa Nahboncmy Nah4ratno Nhe2ratno Nah3ratno Nhe3didvi Nah3orycu Consensus
751 800 TIDPASPQS. P E S V D . . L V N E E L K G K V L G L S.RDPAKVAE EDEDDDGGIM HFEQPSPPST PDAVS..L . . . . . . . . . . . . . . . . . . . . . . . EEEEEEVPK MVEKIWGPGG QETQPRLLCR NLN . . . . . . . . . . . . . . . . . . . . . . . . . . . DMDGTTVLNL QPRARRFLPD QFSKKASPAY K.MEWKNEVD VGSARAPPSV KRRNSSIPNG KLPMENLAHN FTIKEKDLEL SEPEEATN.Y ..EEISGGIE KRRNSSIPNG KIPMESPTRD FTFKEKELEF SDPEETNE.Y EAEEMSGGIE KRRNSSVPNG KLPLDSPAYG LTLKERELEL SDPEEAPDYY EAEKMSGGIE ..................................................
Nahlhomsa Nahboncmy Nhe2ratno Nah3ratno Nhe3didvi Nah3orycu Consensus
801 850 MRSKETSSPG TDDVFTPAPS DSPSSQRIQR CLSDPGPHPE PGEGEPFFPK RPSLKADIEG PRGNASDNHQ GELDYQRLAR CLSDPGPNKD KEDDDPFMSC TPAPRSKEGG TQTPGVLRQP LLSKDQRFGR GREDSLTEDV PPKPPPRLVR FLASVTKDVA SDSGAGIDNP VFSPDEDLDP SILSRVPPWL SPGETVVPSQ FLANVTQDTA TDSTTGIDNP VFSPEE..DQ SIFTKVPPWL SPEETVVPSQ FLASVTKDTT SDSPAGIDNP VFSPDEDLAP SLLARVPPWL S P G E A W P S Q ..................................................
Nahlhomsa Nhe2ratno Nah3ratno Nhe3didvi
851 899 GQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RASEPGNRKG RLGNEKP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RARVQIPNSP SNFRRLTPFR LSNKSVDSFL QADGPEEQLQ PASPESTHM RARVQIPYSP SNFRRLTPIR LSTKSVDSFL LADSPEERPR SFLPESTHM
132
Nah3orycu RARVQIPYSP GNFRRLAPFR LSNKSVDSFL LAEDGAEH . . . . . PESTHM Consensus .................................................
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the alignments: Nahlorycu, Nahlratno, Nhelmusrnu (Nahihomsa); Nhe2orycu (Nhe2ratno). Residues listed in the consensus sequence are present in at least 75 % of the aligned transporter sequences.
Database accession numbers SWISSPROT Nahlhomsa Nahlorycu Nah 1ratno Nah3orycu Nah3ratno Nah4ratno Nahboncmy Nhe 1musmu Nhe2orycu Nhe2ratno Nhe3didvi Nhecarma
P19634 P23791 P26431 P26432 P26433 P26434 Q01345
PIR
EMBL/GENBANK
A31311 S13926; S16328 A40204 A40205 B40204 C40204
M81768 X59935; X56536 M85299 M87007 M85300 M85301 M94581 U51112
A46747 A46748; A47449
L11236 L42522 U09274
References 1 Reinhart, A.E and Reithman, R. (1994) Curt. Opin. Cell Biol. 6, 583-594. 2 Tse, C.M. et al. (19911EMBO ]. 10, 1957-1967. 3 Levine, S. et al. (1993)J. Biol. Chem. 268, 25527-25535.
m
PEP-Dependent PhosphotransferaseFamily
Phosphoenolpyruvate-dependent sugar phosphotransferase family Summary Transporters of the phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS) family, the example of which is the PTAA N-acetyl glucosamine permease II ABC protein of Escherichia coli (Ptaaescco), mediate the phosphoenolpyruvate (PEP)-dependent uptake of a variety of sugars. Unphosphorylated PTGA serves an integrative regulatory function by binding, and thereby inhibiting, the activities of several other enzymes and transporters 1"2. Members of the PTS transporter family are found in both gram-positive and gram-negative bacteria. Statistical analysis reveals no apparent relationship between the amino acid sequences of the PTS family and any other family of transporters. However, a subgroup of the galactose-pentose-hexuronide (GPH)family of cation-coupled symporters contains a C-terminal hydrophilic extension of approximately 160 residues that is homologous to the IIA domain of PTGA. Phosphorylation of this llA-like domain by PEP, heat-stable protein HPR and enzyme 1 inhibits the transport activities of these GPH proteins a. Members of the PTS family are predicted to contain from 6 to 14 membranespanning helices by the hydropathy of their amino acid sequences. The structure of the PTGA protein of Escherichia coli has been resolved by NMR and X-ray techniques to 2.5 ~,,s. Athough several amino acid sequence motifs are highly conserved in the family, they are not conserved between the most distantly related members of the family. Similarly, while all members of the PTS family contain sites of histidine phosphorylation, the locations of these sites are not conserved between distantly related members of the family.
Nomenclature, biological sources and substrates CODE
Ptaaescco Ptaaklepn Ptbabacsu Ptbaescco Ptbaerwch Ptgahaein Ptgamycca Ptgasalty
13~
DESCRIPTION [SYNONYMS] N-Acetylglucosamine permeasell ABC component [PTAA, NAGE, PSTN] N-Acetylglucosamine permeasell ABC component [PTAA, NAGE] //-Glucosidepermease IIABC component [PTBA, BGLP,N17CI fl-Glucosidepermease IIABC component [PTBA,BGLF, BGLC] fl-Glucosidepermease IIABC component [PTBA,ARBF] Glucosepermease IIABC component [PTGA, CRR, HI1711] Glucose permease IIABC component [PTGA, CRR] Glucosepermease IIABC component [PTGA, CRR]
ORGANISM [COMMON NAMES] Escherichia cold [gram-negative bacterium]
S UBSTRATE(S)
Klebsiella pneumoniae [gram-negative bacterium]
N-Acetyl glucosamine
Bacillus subtilis [gram-positive bacterium]
fl-Glucosides
Escherichia coli [gram-negative bacterium]
/3-Glucosides
Erwima chrysanthemi [gram-negative bacterium] Haemophilus influenzae [gram-negative bacterium]
p-Glucosides
Mycobacterium capricolum [gram-positive bacterium] Salmonella typhimurium [gram-negative bacterium]
Glucose
N-Acetyl glucosamine
Glucose
Glucose
CODE
DESCRIPTION [SYNONYMS]
Ptgaescco Glucosepermease IIABC component IPTGA, CRR, GSR, IEXI Ptgabacsu Glucosepermease IIABC component [PTGA, PTSG, CRR] Ptgabacst Glucosepermease IIABC component [PTGA, PTSG]
OR GANISM [COMMON NAMES] Eschericbda coli
S UBSTRATE(S)
Glucose
[gram-negative bacterium] Glucose
Bacillus subtilis
[gram-positive bacterium] Glucose
Bacillus stearothermophilus
[gram-positive bacterium] Ptgbsahy Ptgbescco Pticescco Ptoaescco Ptsapedpe Ptsastrmu
Glucosepermease IffiC component [PTGB, PTSG] Glucosepermease I]BC component [PTGB, PTSG, GLCA] PTS system, arbutin-like IIC component [PTIC, GLVC] Maltose-glucosepermease KABC component [PTOA, MALX] Sucrosepermease IIABC component [PTSA, SCRA] Sucrosepermease IIABC component [PTSA, SCRA]
Salmonella typhimurium
Glucose
[gram-negative bacterium] Glucose
Escherichia coh"
[gram-negative bacterium] Glucose
Escherichia coh"
[gram-negative bacterium] [gram-negative bacterium]
Maltose, glucose
Pediococcus pentosaceus
Sucrose
Escherichia cob"
[gram-positive bacterium] Sucrose
Streptococcus mutans
[gram-positive bacterium]
P h y l o g e n e t i c tree Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the phylogenetic tree: Ptgasalty (Ptgaescco); Ptgbsalty (Ptgbescco). I
I t
I t
Ptsapedpe Ptsastrmu Ptbaerwch Ptbaescco Ptbabacsu Ptaaescco Ptaaklepn Ptgahaein Ptgabacst Ptgabacsu Ptgamycca Ptgaescco I Ptgbescco Ptoaescco Pticescco
P r o p o s e d o r i e n t a t i o n of P T G A B in t h e m e m b r a n e The model is based on predictions of membrane-spanning regions and ~-helical content. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75 % of the aligned transporters Isee below} are shown.
137
OUTSIDE ...:..
.
.
r
-
:
:
~
. . . . .
i
: :
" ?
:" ....
.t
-~:
~'
:
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
:t:~-. -~ .~-~:
:
.
:
L L'
,m i~,! me: a
i i i,
T'V ... " I
INSIDE NH
COOH
2
Physical and genetic characteristics . . . . . . .
.....
:.
:-.:
....
...
.
.
..
.
" : :J :i:~ii:. i...:: .
.
.:: . . ,.v:. . . ..
.::
.
...
:i;. 4-;:.::-: .....
:
Ptaaescco Ptaaklepn Ptbabacsu Ptbaescco Ptbaerwch Ptgahaein Ptgamycca Ptgasalty Ptgaescco Ptgabacsu Ptgabacst Ptgbsalty Ptgbescco Pticescco Ptoaescco Ptsapedpe Ptsastrmu
AMINO ACIDS 648 651 609 625 631 165 154 168 168 699 324 477 477 368 530 651 664
MOL. WT
68 346 68 179 64 550 66 482 66 984 17 779 16 703 18 116 18 120 75 525 34 674 50 521 50 676 39 692 56 721 68 454 69 988
CHROMOSOMAL LOCUS 72.02 minutes
335 ~ 84.05 minutes
54.6 minutes 118 ~
24.96 minutes 83.16 minutes 36.53 minutes
i'ili!i!l Multiple amino acid sequence alignments :ii':i:iiii:,i! :- -:-:7~~::~- :~-. :i : :-:::
.:.:..,~ :., .~ -~.: -:.?. .:.. .. .h :. : -.. - -. .
13~
Ptsapedpe Ptsastrmu
1 ............................................
............................................
50 MNHQEV MDYSKV
Ptbaerwch
............................................
MNYETL
Ptbabacsu
............................................
MDYDKL
Ptbaescco
..............................................
MTEL
Ptaaescco Ptaaklepn Ptgahaein !~:i{,i:i!i!J:i:i~:) P t g a b a c s t Ptgabacsu Ptgamycca Ptgaescco Ptgbescr Ptoaescr Pticescco Consensus ii:ii!?;!::~i::~!:i, iiii~
~.i!;-:~::::~.~.i::.~!-i ~:
i)i'!!:S::~i"~!::,.!! 72": ........ ::~
Ptsapedpe Ptsastrmu Ptbaerwch Ptbaescco Ptbabacsu Ptaaescco i!y:!!!~!~:!~)ii;.!:ii:.i: P t a a k l e p n Ptgahaein Ptgabacst i~::ii~:!iiiii!:~il P t g a b a c s u Ptgamycca Ptgaescco ?ii:.~.ii:?;(:r Ptgbescco Ptoaescco Pticescco Consensus ~:~...:, .:: ......:~ :~:
i!i!i}i'i
.
..
i~:':i;i(~!ii :r :i
....
51 i00 ADRVLNAIG. K N N I Q A A A H C A T R L R L V I K D E S K I D Q Q A L D D D A D V K G T F E ASEVITAVG. K D N L V A A A H C A T R L R L V L K D D S K V D Q K A L D K N A D V K G T F K ASEIRDGVGG QENIISVIHC ATRLRFKLRD NTNANADALK NNPGIIMVVE ARKIVAGVGG ADNIVSLMHC ATRLRFKLKD ESKAQAEVLK KTPGIIMVVE SKDILQLVGG EENVQRVIHC MTRLRFNLHD NAKADRSQLE QLPGVMGTNI .... MNILGF F Q R L G R A L Q L P I A V L P V A A L L L R F G Q P D L L N V A F I A Q A G G .... MNILGF F Q R L G R A L Q L P I A V L P V A A L L L R F G Q P D L L N V P F I A Q A G G .................................................. .................................................. LHFLSNDNVQ LVAGVMESAG QIVFDNLPLL FAVGVAIGLA NGDGVAGIAA .................................................. L P A V V S H V . . . . . . . MAEAG G S V F A N M P L I F A I G V A L G F T .NNDGVSALA L P A V V S H V . . . . . . . MAEAG G S V F A N M P L I F A I G V A L G F T .NNDGVSALA IPVLGNPVLQ AIFTWMSKIG SFAFSFLPVM FCIAIPLGLA RENKGVAAFA SLTDPNSLFA QIVHIIEEGG WTVFRNMPLI FAVGLPIGLA KQAQGRACLA ..................................................
Ptsapedpe Ptsastrmu Ptbaerwch Ptbaescco Ptbabacsu Ptaaescco Ptaaklepn Ptgahaein Ptgabacst Ptgabacsu Ptgamycca Ptgaescco Ptgbescco Ptoaescco Pticescco Consensus
i01 150 TNGQYQIIIG PGDVDKVYDA LIVKTGL..K EVTPDDIKAVAAAGQNKNPL TDGQYQVIIG PGDVNFVYDE IIKQTGL..T EVSTDDLKKIAASGKKFNPI S G G Q F Q V V V G .NQVADVYQA L L S L D G M . . A R F S D . . . S A A P E E E K K N S L F SGGQFQVVIG .NHVADVFLAVNSVAGL..D E.KA...QQA PENDDKGNLL S G E Q F Q I I I G .NDVPKVYQA I V R H S N L . . S D . . E . . . K S A G S S S Q K K N V L AIFDNLALIF AIGVASSWSK DSAGAAALAG AVGYFVLTKA MVTINPEINM AIFDNLALIF AIGVASSWSK DNAGSAALAG AVGYFVMTKA MVTINPEINM .................................................. .................................................. IIGYLVMNVS MSAVLLANGT IPSDSVERAK FFTENHPAYV NMLGIPTLAT .................................................. A V V A Y G I M . . . . . . . V K T M A V V ...... AP LVLHLPAE.. E I A S K H L A D T A V V A Y G I M . . . . . . . V K T M A V V ...... AP LVLHLPAE.. E I A S K H L A D T GFIGYAVMNL AVNFWLTNKG ILPTTD..AAVLKANNIQ.. SILGIQSYDT V M V S F L T W N Y F I N A M G M T W G S Y F G V D F T Q D AVAGSGLT.. M M A G I K T L D T ..................................................
Ptsapedpe Ptsastrmu Ptbaerwch Ptbaescco Ptbabacsu
151 MDFLKVLSDI MALIKLLSDI SGFIDIISSI NRFVYVISGI SAVFDVISGV
~i-::i:: :~ :i.~..i:::..!~:i
i!?!!%::~.
.................................................. .................................................. .................................................. .................................................. ...... MFKA L F G V L Q K I G R A L M L P V A I L P A A G I L L A I G N A M Q N K D M I Q V .................................................. ...... MFKN A F A N L Q K V G K S L M L P V S V L P I A G I L L G V G S ANFS ..... W ...... MFKN A F A N L Q K V G K S L M L P V S V L P I A G I L L G V G S ANFS ..... W MTAKTAPKVT LWEFFQQLGK TFMLPVALLS FCGIMLGIGS SLSSHDVITL .......... M L S Q I Q R F G G A M F T P V L L F P F A G I V V G L A I L L Q N P M F V G E ..................................................
FIPIVPALVA GGLLMALNNV FVPIIPALVA GGLLMALNNF FTPFVGVMAATGILKGFLAL FTPLIGLMAATGILKGMLAL FTPILPAIAG AGMIKGLVAL
LTAEHLFMAK LTSEGLFGTK GVATHVISES ALTFQWTTEQ AVTFGWMAEK
200 SVVEVYPGLK SLVQQFPIIK S ......... S ......... S .........
13 ~)
PTS transporter family !ii!i!i@iiiiil;~i Ptaaescco !?!@:i:!ii ii~i:i:ii;i Ptaaklepn !ii:!B ii~:i~i!ilPtgahaein Ptgabacst Ptgabacsu Ptgamycca Ptgaescco Ptgbescco Ptoaescco Pticescco Consensus
GVLAGIITGL VGGAAYNRWS DIKLPDFLSF FGGKRFVPIA TGFFCLVLAA GVLAGIITGL VAGAVYNRWA GIKLPDFLSF FGGKRFVPIA TGFFCLILAA .................................................. .................................................. GVFGGIIVGV LAALLFNRFY TIELPQYLGF FAGKRFVPIV TSISALILGL .................................................. GVLGGIISGA IAAYMFNRFY RIKLPEYLGF FAGKRFVPII SGLAAIFTGV GVLGGIISGA IAAYMFNRFY RIKLPEYLGF FAGKRFVPII SGLAAIFTGV GILGAVIAGI IVWMLHERFH NIRLPDALAF FGGTRFVPII SSLVMGLVGL SIIGAIIISG IVTALHNRLF DKKLPVFLGI FQGTSYVVII AFLVMIPCAW ..................................................
i@!!
201 250 GIAEMINAMA SAPFTFLPIL LGFSATKRFG GNPYLGATMG MIMVLPSLVN G S S D M I Q L M S A A P F W F L P I L V G I S A A K R F G A N Q F L G A S I G MIMVAPGAAN GTYKLLFAAS DALFYFFPIV LGYTAGKKFG GNPFTTLVIG ATLVHPSMIA GTYLILFSAS DALFWFFPII LGYTAGKRFG GNPFTAMVIG GALVHPLILT QVHVILTAVG DGAFYFLPLL LAMSAARKFG S N P Y V A A A I A A A I L H P D L T A IFGYVWPPVQ HAIHA.GGEW IVSAGALGSG IFGFINRLLI PTGLHQVLNT IFGYVWPPVQ HAIHS.GGEW IVSAGALGSG IFGFINRLLI PTGLHQVLNT .................................................. .................................................. IMLVIWPPIQ HGLNAFSTGL VEANPTLAAF IFGVIERSLI PFGLHHIFYS .................................................. VLSFIWPPIG SAIQTFSQWA AYQNPVVAFG IYGFIERCLV PFGLHHIWNV VLSFIWPPIG SAIQTFSQWA AYQNPVVAFG IYGFIERCLV PFGLHHIWNV VIPLVWPIFA MGISGLGHMI NSAGDFGPM. LFGTGERLLL PFGLHHILVA LTLLGWPKVQ MGIESLQAFL RSAGALGVW. VYTFLERILI PTGLHHFIYG ..................................................
iiiiii!i!ii!
iN@i i@ii~i!! iii2i@!i~:ii:i!i:! ?~i~
Ptsapedpe Ptsastrmu Ptbaerwch Ptbaescco Ptbabacsu Ptaaescco Ptaaklepn Ptgahaein Ptgabacst Ptgabacsu Ptgamycca Ptgaescco Ptgbescco Ptoaescco Pticescco Consensus
251 300 Ptsapedpe G Y S V A T T M A A G K ....... M VYWNVFGLHV AQAGYQGQVL PVLGVAFILA Ptsastrmu IIGLAANAPI SKAATIGAYT GFWNIFGLHV TQASYTYQVI PVLVAVWLLS Ptbaerwch AFNAMQAPDH ST .......... LHFLGIPI TFINYSSSVI PILFASWVSC Ptbaescco AFENGQKADA LG .......... LDFLGIPV TLLNYSSSVI PIIFSAWLCS i!;ii@ii~i! Ptbabacsu LLGAGK ..... P .......... ISFIGLPV TAATYSSTVI PILLSIWIAS ii i!i!8 Ptaaescco IAWF.QIGEF TNAAGTVFHG DINRFYA .......... GDG TAGMFMSGFF Ptaaklepn IAWF.QIGEF TNAAGTVFHG DINRFYA .......... GDG TAGMFMSGFF Ptgahaein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i![iii:::;!if2i::iiii@ Ptgabacst . . . . . . . . . . . . . . . . . PtgabacsuPFWY EFFSYKS~GEIIRGDQRIFMAQIKDG VQs Ptgamycca . . . . . . . . . . . . . . . . . . . Ptgaescco PFQM.QIGEY TNAAGQVFHG DIPRYMA..G DPTAGKLSGG ...FLFK... ~iiii!ii~i Ptgbescco PFQM.QIGEY TNAAGQVFHG DIPRYMA..G DPTAGKLSGG ...FLFK... Ptoaescco LIRFTDAGGT QEVCGQTVSG ALTIFQAQLS CPTTHGFSES ATRFLSQGKM Pticescco QFIFGPAA.. VEGGIQMYWA QHLQEFSLSA EPLKSLFPEG GFALHGNSK. Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
!~iiB:~!t i,
N
@!'@ Ptsapedpe Ptsastrmu Ptbaerwch Ptbaescco Ptbabacsu
14[
301 TLEKFFHKHI ILEKFFHKRL KLEKPLNRWL ILERRLNAWL YVEKWIDRFT
KGAFDFTFTP PSAVDFTFTP HANIRNFFTP PSAIKNFFTP HASLKLIVVP
MFAIVITGFL LLSVIITGFL LLCIVISVPL LLCLMVITPV TFTLLIVVPL
TFTIVGPVLR TFIVIGPVMK TFLLIGPSAT TFLLVGPLST TLITVGPLGA
350 TVSDALTNGL EVSDWLTNGI WLSQMLAGGY WISELIAAGY ILGEYLSSGV
Ptaaescco Ptaaklepn Ptgahaein Ptgabacst Ptgabacsu Ptgamycca Ptgaescco Ptgbescco Ptoaescco Pticescco Consensus
P I M M F G L P G A A L A M Y F A A P K ERRPMVGGML L S V A V T A F L T GVTEPLEFLF P I M M F G L P G A A L A M Y L A A P K A R R P M V G G M L L S V A I T A F L T GVTEPLEFLF .................................................. .................................................. P F M M F G L P A A A L A I Y H E A K P Q N K K L V A G I M G S A A L T S F L T GITEPLEFSF .................................................. . . . M Y G L P A A A I A I W H S A K P ENRAKVGGIM ISAALTSFLT GITEPIEFSF . . . M Y G L P A A A I A I W H S A K P ENRAKVGGIM ISAALTSFLT GITEPIEFSF N A F L G G L P G A ALAMYHCARP ENRHKIKGLL ISGLIACVVG GTTEPLEFLF ...IFGAVGI SLAMYFTAAP ENRVKVAGLL IPATLTAMLV GITEPLEFTF ..................................................
351 400 Ptsapedpe VGLYNSTGWI GMGIFGLLYS A I V I T G L H Q T FP.AIETQLL A N V A K T G . . G P t s a s t r m u V W L Y D T T G F L GMGVFGALYS PVVMTGLHQS FP.AIETQLI SAFQNGTGHG Ptbaerwch QWLYGLNSLL A G A V M G A L W Q V C V I F G L H W G FV.PLMLNNF SVIGH ..... Ptbaescco LWLYQAVPAF A G A V M G G F W Q IFVMFGLHWG LV.PLCINNF TVLGY ..... Ptbabacsu N Y L F D H A G L V AMILLAGTFS L I I M T G M H Y A FV.PIMINNI AQNGH ..... Ptaaescco MFLAPLLYLL HALLTGISLF V A T L L G I H A G F S F S A G A I D Y A L M Y N L P A A S Ptaaklepn LFLAPLLYLL HAVLTGISLF IATALGIHAG FSFSAGAIDY VLMYSLPAAS Ptgahaein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ptgabacst . . . . . . . . . . . . . . . . . . . . . . HLLNVKIG MTFSGGVIDF .LLFGVLPNR Ptgabacsu LFVAPVLFAI HCLFAGLSFM V M Q L L N V K I G M T F S G G L I D Y .FLFGILPNR Ptgamycca .................................................. Ptgaescco MFVAPILYII HAILAGLAFP ICILLGMRDG TSFSHGLIDF IVLS...GNS Ptgbescco MFVAPILYII HAILAGLAFP ICILLGMRDG TSFSHGLIDF IVLS...GNS Ptoaescco LFVAPVLYVI HALLTGLGFT V M S V L G V T I G N T . D G N I I D F W F G I L H G L S Pticescco L F I S P L L F A V HAVLAASMST V M Y L F G V . . V GNMGGGLID ........... Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ptsapedpe Ptsastrmu Ptbaerwch Ptbaescco Ptbabacsu Ptaaescco Ptaaklepn Ptgahaein Ptgabacst Ptgabacsu Ptgamycca Ptgaescco Ptgbescco Ptoaescco Pticescco Consensus
401 450 SFIFPVASMA N I G Q G A A T L A IFFATKSQKQ KALTSSAGVS ALLGITEPAI D F I F V T A S M A N V A Q G A A T F A IYFLTKDKKM K G L S S S S G V S A L L G I T E P A L D T L L P L L V P A V L G Q A G A T L G V L L R T Q D L K R KGIAGSAFSA A I F G I T E P A V D T M I P L L M P A IMAQVGAALG V F L C E R D A Q K KVVAGSAALT SLFGITEPAV D Y L L P A M F L A N M G Q A G A S F A VFLRSRNKKF KSLALTTSIT A L M G I T E P A M Q N V W M L L V M G V I F F A I Y F V V F S L V I R M F N L KTPGREDKED E I V T E E A N S N K N V W M L L V M G V V F F F V Y F L L FSAVIRMFNL K T P G R E D K A A D V V T E E A N S N .................................................. T A W W L V I P V G L V F A V I Y Y F G FRFAIRKWDL ATPGREK .... T V E E . A P K A TAWWLVIPVG L G L A V I Y Y F G FRFAIRKFNL KTPGRED .... A A E E T A A P G .................................................. SKLWLFPIVG IGYAIVYYTI FRVLIKALDL KTPGRE .... D A T E D A K A T G SKLWLFPIVG IGYAIVYYTI FRVLIKALDL KTPGRE .... DATEDAKATG T K W Y M V P V V A AIWFVVYYVI FRFAITRFNL KTPGRDSRVA SSIEKAVAGA .................................................. ..................................................
Ptsapedpe Ptsastrmu Ptbaerwch Ptbaescco Ptbabacsu
451 FGVNLKMKFP FGVNLKYRFP YGVTLPLRRP YGVNLPRKYP YGVNMRLKKP
FVFAAIASGI A S A F L G L F H V FFCALIGSASAAAIAGLLQV FIFGCIGGAL GAAVMGYAHT FVIACISGAL G A T I I G Y A Q T FAAALIGGAAGGAFYGMTGV
LSVAMGPASV VAVSLGSAGF TMYSFGFPSI KVYSFGLPSI ASYIVGGNAG
500 IGFIS.IASK LGFLS.IKAS FSFTQVIPPT FTFMQTIPST LPSIPVF...
m
L!{dir - :::i A::wsx:x ::~s::$7< ~
P:i;{2 ::::::: i'. h . : i
. . . . ;::-< :{:::i : . : - : ::::-"2
.......... .::: ........ ..... ............. : . : . :::.: ........
7:i!:(',:~i~ti; -.!/ ii~:il
L-73ii.zi )--i-! ........
:..:
Ptaaescco Ptaaklepn Ptgahaein Ptgabacst Ptgabacsu Ptgamycca Ptgaescco Ptgbescco Ptoaescco Pticescco Consensus
!}:2:{?:::::)i
Ptsapedpe Ptsastrmu i~r Ptbaerwch Ptbaescco Ptbabacsu Ptaaescco Ptaaklepn /3./:~7!(: Ptgahaein iV.;;?ff-ii::i:~i:,i Ptgabacst 7!i7;:71::7!~~i:i Ptgabacsu il/i!(i;7/S~ Ptgamycca Ptgaescco F/::);;J::;:i-i:; ::7i Ptgbescco 17i~7~:7;7;77.~;7 ~:.:~j:;:7::: Ptoaescco Pticescco .....,:. ......... ::::.7;!i :7"i7'i-] :iT'::J::i:!'i: Consensus . . . . .:..
:..:.
<. . . .
TEEGLTQLAT N Y I A A V G G T D NLKAIDACIT R L R L T V A D S A RVNDTMCKRL TEEGLTQLAT SYIAAVGGTD NLKAIDACIT R L R L T V G D S A KVNDAACKRL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
EAAAAGDLPY EVLAALGGKE NIEHLDACIT RLRVSVHDIG RVDKDRLKAL KTGEAGDLPY EILQAMGDQE N I K H L D A C I T RLRVTVNDQK KVDKDRLKQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TSEMA...P. ALVAAFGGKE NITNLDACIT RLRVSVADVS KVDQAGLKKL TSEMA...P. ALVAAFGGKE NITNLDACIT RLRVSVADVS KVDQAGLKKL PGKSGYNVP. A I L E A L G G A D N I V S L D N C I T RLRLSVKDMS LVNVQALKDN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
501 550 SIPAFMLSAV ISFVVAFIPT FIYAK .............. R TLGDDRDQVK S I P F Y W C E L ISFAIAFAVT YGYGKTKAVD VFAAEAAVEE AIEEVQEIPE G V D S S V W A A V IGTLLAFAFA ALTS ...... W S F G V P K D E T Q P A A A D S P A V G I D F T V W A S V IGGVIAIGCA FVGT V M L H F I T A K R QPAQG...AP .IGPTFIYAM IGLVIAFAAE TAAA ...... YLLGF..EDV P S D G S Q Q P A V GASGVVKLNK QTIQVIVGAK A E S I G D A M K K V V A R G P V A A A S A E A T P . . A T GASGVVKLNK QTIQVIVGAK A E S I G D E M K K V V T R G P V A A A A A A P A G N V A T .................................................. GAAGVLEVGN N . V Q A I F G P K SDMLKGQIQD IMQGKAP ............. GASGVLEVGN N . I Q A I F G P R SDGLKTQMQD IIAGRKPRPE PKTSAQEEVG ................................................. GAAGVV.VAG SGVQAIFGTK SDNLKTEMDE YIRNH ............... GAAGVV.VAG SGVQAIFGTK SDNLKTEMDE YIRNH ............... RAIGVVQLNQ HNLQVVIGPQ V Q S V K D E M A G LMHTVQA ............. .................................................. ..................................................
551 600 Ptsapedpe S P A P T S T V I N V N .... DEII SAPVTGASES LKQVNDQVFS AEIMGKGAAI Ptsastrmu EAASAANKAQ VT .... D E V L A A P L A G E A V E LTSVNDPVFS SEAMGKGIAI Ptbaerwch L A E T Q A N A G A VR .... DETL FSPLAGEVLL L E Q V A D R T F A SGVMGKGIAI 7:'-:: ~:::::7~:.. -:.. :'--; Ptbaescco QEKTPEVITP PE .... QGGI CSPMTGEIVP L I H V A D T T F A SGLLGKGIAI [rsS'::::[ih: ; y;i ! :: ,-.:::-:: :. ,. :..:. 7:"i:":7 ".7::"."i Ptbabacsu HEGSREI ............ I HSPIKGEVKA LSEVKDGVFS AGVMGKGFAI P t a a e s c c o A A P V A K P Q A V PNAVSIA.EL V S P I T G D V V A L D Q V P D E A F A SKAVGDGVAV !.~2i:;12 ://;i:": i{7,!::7;:77~ !7:{ P t a a k l e p n A A P A A K P Q A V A N A K T V E . S L V S P I T G D W A L E Q V P D E A F A SKAVGDGIAV Ptgahaein .GLFDKLFGS KENKSVEVEI Y A R I S G E I V N I E D V P D V V F S E K I V G D G V A V Ptgabacst A R A E E K P K T A ASEAAESETI ASPMSGEIVP LAEVPDQVFS Q K M M G D G F A V Ptgabacsu QQVEEVIAEP LQNEIGEEVF VSPITGEIHP ITDVPDQVFS GKMMGDGFAI -..:;-:-: ..... ............ Ptgamycca .......... M W F F N K N L K V LAPCDGTIIT LDEVEDEVFK ERMLGDGFAI ................:. < Ptgaescco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ~!:}7!i;!!!}i~::;;:!:::!:'! Ptgbescco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ptoaescco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ;,2.:.:.;:.;;.:.;" 7::. ::.: : 7S-.I://
.[::i:' . . . ,.;i":.iT;{ . . . . . . .:":,:i. ..
. . . . . . -.. :.:...<:.:
!ii::/,b!i/
......... . . . .
:::..:
~.~-.s.: ,~.:: :-:-,:-::-
.
h:--:.
.
- - . ........
.....
.... ::..< :.: . . . . . . :.:........
:
f,;.:.:,.:::: .: .;;: ;... :.....:. .........
. ...............
,: ~s.::...:.: .:.........
17iiii:!ii!~i i:iiii!ili ;i
142
Ptsapedpe Ptsastrmu Ptbaerwch Ptbaescco Ptbabacsu
601 VPSSDQVVAP KPSGNTVYAP RPTQGRLYAP LPSVGEVRSP EPEEGEVVSP
ADGVITVTYD VDGTVQIAFD VDGTVASLFK VAGRIASLFA VRGSVTTIFK
SHHAYGIKTT TGHAYGIKSD THHAIGLASR TLHAIGIESD TKHAIGITSD
AGAEILIHLG NGAEILIHIG GGAEVLIHVG DGVEILIHVG QGAEILIHIG
650 LDTVNLNGEH IDTVSMEGKG IDTVRLDGRY IDTVKLDGKF LDTVKLEGQW
?;]2::<;:::::
!{!ff {:!::!}:;!, ! :::::::::::::::::.
.........
.................
i]}g ~:.):].:~C:"
Ptaaescco Ptaaklepn Ptgahaein Ptgabacst Ptgabacsu Ptgamycca Ptgaescco Ptgbescco Ptoaescco Pticescco Consensus Ptsapedpe Ptsastrmu Ptbaerwch Ptbaescco Ptbabacsu Ptaaescco Ptaaklepn Ptgahaein Ptgabacst Ptgabacsu Ptgamycca Ptgaescco Ptgbescco Ptoaescco Pticescco Consensus
Ptsapedpe Ptsastrmu Ptbaerwch !!i!!ili':!i i : i !:!: Ptbaescco Ptbabacsu ~ifiii!i:i~i!}i!! Ptaaescco Ptaaklepn Ptgahaein Ptgabacst Ptgabacsu Ptgamycca Ptgaescco Ptgbescco Ptoaescco Pticescco Consensus ............ ........
]Z:2::k
KPTDKIVVSPAAGTIVKIFN KPTDNIVVAPAAGTVVKIFN RPIGNKIVAP VDGVIGKIFE MPTDGTVVSP VDGKIINVFP LPSEGIVVSP VRGKILNVFP NPKSNDFHAP VSGKLVTAFP .
.
.
.
.
.
.
.
.
.
.
.
.
TNHAFCLETE TNHAFCLETN TNHAFSMESK TKHAIGIQSA TKHAIGLQSD TKHAFGIQTK .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
DPTVMLIVTN DNTTMFIVTN DLTTPIVITN DLTTPVLISN DVITPVIVTN SMISPVVCSN SMISPVVCSN SVLTPIVISN SIVTPVIFTN SLMTPIVFTN SIKSPIIFTN
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
701 .KVTNVQAGE ASSGTVAVGD A. SGKVDANA G.TAQISAGE E. IGKVQPKE AQGHIVAGQT ASGKWAGQT KSGEWAGES KQGPVARGED ASGSVNREQE KMGEVK..QG
.
700 TANYANVERL TADYASVETL SEDYRGVEPV SDDFTDVLPH TDQYSFSPVK IDDFSGLIIK SDDYSALVIL MDEISC.IVK LQAGETVHVN LAEGETVSIK .NGGKTLEIV
.
728 QLVALTAPAA SSVAATTV SLLEVKK PLTQLVC PLLSIIR ALLALS PLYEIKK PLYEIKGK VVLALKK AVVTIR DIVKIEK DWAILK .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
<~:ii:?72
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
:<;<::.;::.:;:: .:: ;7
.
.
.
.......
.
DIAALKAANY DSDKIAEAGL DGPAIEAAGY DIPAIREAGF DLEQIKAAGY DLDYLNANAR DLDFLNANAR DLALLESKAK DLDYVKQNAP DLDAVKPNVP DLKSVAKKVP
........
...........
.
IDTVALEGKG IDTVALEGKG IDTVELKGEG IDTVKLNGQG IDTVSLKGEG LDTVSLDGNG
.
651 FTTNVQKGDT VHQGDLLGTF FEQKVQADQK IKKGDVLGTF FTPHVRVGDV VRQGDLLLEF FSAHVNVGDKVNTGDRLISF FTAHIKEGDK VAPGDPLVSF FKRLVEEGAQ VSAGQPILEM FKRLVEEGTD VKAGEPILEM FTRIAQEGQS VKRGDTVIEF FEALVKEGDE VKKGQPILRV FTSFVSEGDR VEPGQKLLEV FESFVTQDQEVNAGDKLVTV
.
KGAEIVVHMG NGAEIVVHMG EGVELFVHFG GGHEILIHVG GGREILIHFG SGVEILLHIG
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the alignments: Ptgasahy (Ptgaescco); Ptgbsahy (Ptgbescco). Residues listed in the consensus sequence are present in at least 75% of the aligned transporter sequences.
m
Database accession numbers ~'!~!'Gi..!i:;ii~.i!i ;i
SWISSPR OT
PIR
EMBL/GENBANK
ii~;~.:~i ...................... !ii~,~.;!~'~i~.iGi~ 'i~i~ii
P09323 P45604 P40739 P08722 P26207 P45338 P45618 P02908 P0883 7 P20166 P42015 P3 7439 P05053 P31452 P19642 P43470 P12655
B29895; A28896
M19284 X63289 Z34526; D31856 M15746; M16487 M81772 U32844 U15110 X05210 J02 796; M21994 M60344; Z11744 U 12340 X 74629 J02618 L10328 M60722; M28539 Z32771; L32093 M22771; D13175
Ptaaescco Ptaaklepn Ptbabacsu Ptbaescco ii,,iii!ii!i i~.:!:i Ptbaerwch Ptgahaein i!iii!iiii:~ii!!iiii~Ptgamycca i!;ii!!! i!i';ii~i~,!iiii;,ii~Ptgasalty !i~!;~i;.i Ptgaescco Ptgabacsu Ptgabacst Ptgbsalty Ptgbescco Pticescco Ptoaescco Ptsapedpe Ptsastrmu ........ < . . . . . . . . . . . . . . . . . . . . .
...................... ~.~....... ..... ~,,:.,,..,. ..............
~iiii~i~i!ililil
ii
$47174 A47616; C25977 B42603 A03405 C29785 $22752 $36620 A25336 PV0011; B42477 B32243
References 1 2 3 4 s
144
Meadows, N.D. et al. (1990) Annu. Rev. Biochem. 59, 497-542. Postma, P.W. et al. (1993) Microbiol. Rev. 57, 543-594. Poolman, B. et al. (1996) Mol. Microbiol. 19, 911-922. Worthylake, D. et al. (1991) Proc. Natl Acad. Sci. USA 88, 10382-10386. Hurly, J.H. et al. (1993) Science 259. 673-677.
Other Transporters
Anion exchanger family Summary !:::ii.:!i- (:: i:-i:::.
.. :.:::. ......
.:
... ....
..
:i i:: : .:.. :.:,. ' :,.,.:::.._-..:. ..... : . : _
...
. . . . ...... . . . . !
::-
-::.:% .-. :. :..:. .:,: ::~:~. . . . . ,. .... ......
.. ............. . -...::7> :. ~.-:....:. -~-::. ....... ...............
~:':::i~:::::"
- ~:i::"
~'i'i~::~7-: "-.::-:.~i::: ~:
.:.:...:.: ........
:::::::::::::::::::::::
~::ii::. :. i::~::!:ii,:
Transporters of the anion exchanger family, the example of which is the AE1 anion exchange protein 3 of humans (B3athomsa), mediate a 1" 1 exchange of inorganic anions across the plasma membrane and regulate intracellular pH and volume. AE1 also has C1- channel activity, transports taurine, and provides binding sites for cytoskeletal proteins 1,z. Members of the family occur in vertebrates, including birds, fish and primates. Statistical analysis reveals no apparent relationship between the amino acid sequences of the anion exchanger family and any other family of transporters. Members of the anion exchanger family are predicted to form 10 or 12 membrane-spanning helices by the hydropathy of their amino acid sequences. Transporters may be glycosylated, phosphorylated or palmitylated. and exist as tetramers in the membrane. Members of the anion exchanger family contain several highly conserved amino acid sequence motifs that are necessary for function by the criterion of mutation: defects in AE 1 are a cause of hemolytic anemia, hereditary ovalocystosis and are associated with hereditary spherocytosis, elliptocytosis l"z.
Nomenclature, biological sources and substrates CODE
DESCRIPTION [SYNONYMS]
Ae2galga
Anionexchange protein 2
[AE2]
:::. . . . . . . . . . . . . . ::..:...:.
Homo sapiens
B3a2musmu Anion exchange protein 2 [B3a, AE2, B3Rpl B3a3musmu Anion exchange protein 3
Mus musculus
B3a3ratno
Rattus norvegicus
[B3a,AE31
:~}i:~:i!(i~ii: :.i~-.i&ii - :-.:::-:~ ~:
Anionexchange protein 3 [B3a, AE3, B3rp3] B3atmusmu Anion exchange protein 3 [B3aT, AE1, MEB3I B3atgalga Anionexchange protein 3 B3atratno
~:?.:. : : . . . : .
[B3aTI
Anionexchange protein 3 [B3aT, AEI] B3atoncmy Anionexchange protein 3 [AE1, B3aT] B3athomsa Anionexchange protein 3 [B3aT, AE1, EPB3,EMPB3]
--: -,-i :- ::.):
.....................
/
Cotransported ions are indicated.
SUBSTRATE(S)
C1-/HCO3-
[chicken]
B3a2homsa Anionexchange protein 2 [B3a, AE2, HKK3, EPK3L1, NBND3I B3a2ratno Anionexchange protein 2
[S3a, AE2,B3rp2]
. . . . . . .: :.:., .......
OR GANISM [COMMON NAMES] Gallus gaHus
C1-/HCO3-
[human] Rattus norvegicus
lratl
C1-/HCO3C1-/HCO3-
[mouse] Mus musculus
[mousel lratl
Mus musculus
C1-/HCO3C1-/HCO3C1-/HCO3-
[mouse] Gallus gallus
C1-/HCO3-
[chicken] Rattus norvegicus
[ratl
Oncorhynchus mykiss
C1-/HCO3C1-/HCO3-
[trout] Homo sapiens
[human]
C1-/HCO3-
P h y l o g e n e t i c tree ...... ,:....~ .............
:
!i!iZiiii .............. ? ??.~t: :.7':.:
!Ii!i:~!1'~;:i!
....... 72]2
:
:)~!'.,R:]]!L:: i).::iiii!~ii]~ii-~i.::i
.:,~:~::~.,.~.~:.?~,~
.~::
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the phylogenetic tree: B3a2musmu, B3a2ratno (B3a2homsa); B3a3musmu (B3a3ratno); B3atmusmu (B3atratno).
<
B3athomsa B3atratno B3atgaiga Ae2galga B3a2homsa B3a3ratno B3atoncmy
P r o p o s e d o r i e n t a t i o n of AE1 in t h e m e m b r a n e The model is based on predictions of membrane-spanning regions and ochelical content. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane. The predicted membranespanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. More than half of the residues are conserved in at least 75% of the members of the anion exchanger family and, therefore, are not mapped onto the model. OUTSIDE
m~
iu,
.: 9
.......:it
lli
'
'+
ira! m~
INSIDE
NH 2
COOH
m
Physical and genetic characteristics
~:i,i~,i!il)i:iTi:
,.:: :,.::.:.
::s, : :
Ae2galga B3a2homsa B3a2ratno B3a2musmu B3a3musmu B3a3ratno B3atmusmu B3atgalga B3atratno B3atoncmy B3athomsa
AMINO ACIDS 1219 1240 1234 1237 1227 1227 929 922 927 918 911
MOL. W T
135 288 136 814 136 635 136813 135 164 135 406 103 135 102 223 103 222 101 923 101 792
EXPRESSION SITES
CHR O M O S O M A L L O CU S
ubiquitous ubiquitous ubiquitous neurons, heart neurons, heart erythrocyte erythrocyte erythrocyte erythrocyte erythrocyte
7q35-q36
17q21-q22
Multiple amino acid sequence alignments ii~!ii!!::iii!il:: : i!
1 B3athomsa .................................................. B3atratno .................................................. B3atgalga ..................................................
50
Ae2galga MSRSPVSSEL HHIVSSAIES PEPPAPGPAS PPLAEEEEKD LNKALGVERF B3a2homsa MSSAP...RL PAKGADSFCT PEPESLGPGT PGFPEQEEDE LHRTLGVERF B3a3ratno MANGVIP... PPGGPSPLPQ VRVPLEEPPL GPDVEEEDDD LGKTLAVSRF B3atoncmy .................................................. Consensus ..................................................
~!!!!i!!:!iii!<'i ........
51 i00 B3athomsa .................................................. B3atratno .................................................. B3atgalga Ae2galga EEILSDAHPR SVEEPGRIYG EEDFEYHRQS SLHIHHPLSA HLPPDARRKK B3a2homsa EEILQEAGSR GGEEPGRSYG MEDFEYRRQS SHHIHHPLST HLPPGARRRK B3a3ratno GDLISKTPAW DPEKPSRSYS ERDFEFHRHT SHHTHHPLSA RLPPPHKLRR B3atoncmy .................................................. Consensus .................................................. i01 150 B3athomsa .................................................. B3atratno .................................................. B3atgalga .................................................. Ae2galga G V P K K G R . . . . . K K R G R A A A P . . G E N P P I . . . . . . . EEGE E D E E E A C D T E B 3 a 2 h o m s a T P Q G P G R . . . . . K P R R R P G A S P T G E T P T I . . . . . . . VEGE E D E D E A S E A E B3a3ratno LPPTSARHAR RKRKKEKTSA PPSEGTPPIQ EEGGAGAEEE EEEEEEEEGE B3atoncmy .................................................. Consensus .................................................. 151 200 B3athomsa .................................................. B3atratno .................................................. B3atgalga .................................................. Ae2galga TERS .AEELR G G P A E G V Q F F L Q E D E V T E R R AE . E P P A P P A P P G P T A E P H G B3a2homsa GARALTQPSP VSTPSSVQFF LREDDSADRK AERTSPSSPA PLPHQEATPR
~4~
B3a3ratno SEAEPVEPPP PGPPQKAKFS IGSD ....... EDDSPGLSI KAPCAKALPS B3atoncmy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B3athomsa B3atratno B3atgalga Ae2galga iitI:;~::!;!i!E:~iji!i B3a2homsa B3a3ratno B3atoncmy Consensus
201 250 .................................................. .................................................. .................................................. ATAPAAASPG AEEGRA ..... ADGGAVPED GGSPGRPAAR A.TEHRSYNL ASKGAQAGTQ VEEAEAEAVA VASGTAGGDD GGASGRPLPK AQPGHRSYNL VGLPSDQSPQ RSGSSPSPRA RASRISTEK ......... SR PWSPSASYDL .................................................. ..................................................
B3athomsa B3atratno B3atgalga Ae2galga B3a2homsa B3a3ratno B3atoncmy Consensus
251 300 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M EELQDDYEDM MEENLEQEEY ................... M GDMQDHEKVL EIPDRDSEEE LEHVIEQIAY . . . . . . . . . M EGPGQDTEDA LRRSLDPEGY HERRRIGSMT GADEAQYQKV PTDESEAQTL ASADLDYMKS HRFEDVPGVR QERRRIGSMT GARQALLPRV PTDEIEAQTL ATADLDLMKS HRFEDVPGVR RERLCPGSAL GNPGPE.QRV PTDEAEAQML GSADLDDMKS HRLEDNPGVR ........ ME NDLSFGEDVM SYEEESDSAF PSPIRPTPPG HSGNYDLEQS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D ..............
B3athomsa B3atratno B3atgalga Ae2galga B3a2homsa B3a3ratno B3atoncmy Consensus
301 EDPDIPESQM EEPA .... AH DTEATATDYH TTSHPGTHKV RDLDIPVTEM QESEXXXXXX XXXXXXXXXX XXXXXXXXXX EDTKGSRTSL GTMSNPLVSD VDLEAAGSRQ PTAHRDTYEG RHLVRKSAKA QVVHVGKEHR EQSARPR... RS.DRQPHEV RHLVRKNAKG .STQSGREGR EPGPTPRARP RA.PHKPHEV RHLVKKPSRI QGGRGSPSGL APILRRKKKK KKLDRRPHEV RQEEDSN... QAIQSIVVHT DPEAYLNLNT NANTRGDAQA .........................................
B3athomsa B3atratno B3atgalga Ae2galga B3a2homsa B3a3ratno B3atoncmy Consensus
351 400 E K N Q E L R W M E A A R W V Q L E E N LGENGA.WGR PHLSHLTFWS LLELRRVFTK Q R N Q E L Q W V E A A H W I G L E E N LREDGV.WGR PHLSYLTFWS LLELQKVFSK SRKDP.CWME AGRWLHLEES MEPGGA.WG. SHLPLLTYHS LLELHRAFAK .KNQELQWKE TARWIKFEED VEEETDRWGK PHVASLSFRS LLELRKTLSH .KNQEPQWRE TARWIKFEED VEEETERWGK PHVASLSFRS LLELRRTLAH .RSQEPHWRE TARWIKFEED VEEETERWGK PHVASLSFRS LLELRRTIAQ S ...... WQE TGRWVGYEEN FNPGTGKWGP SHVSYLTFKS LIQLRKIMST ....... W.E ..RW...EE ........ W G . . H L . . L . . . S L L E L ......
N i~!iiff:~i::ii!i
N
350 YVELQELVMD XXXXXXXXMD YVELHELVLD FVELNELVLD FVELNELLLD FVELNELMLD YVELNELMGN VEL.EL..D
401 450 B3athomsa GTVLLDLQET SLAGVANQLL DRFIFEDQIR PQDREELLRA LLLKHSHAGE B3atratno GTFLLDLAET SLAGVANKLL DSFIYEDQIR PQDRDELLRA LLLKRSHAED B3atgalga GVVLLDVAAN SLAAVAHVLL DQLIYEGQLK PQHRDDVLRA LLLRHKHPSE Ae2galga GAVLLDLDQK T L P G V A H Q V V E Q M V I T D Q I R AEDRANVLRA LLLKHSHPSD B3a2homsa GAVLLDLDQQ T L P G V A H Q V V E Q M V I S D Q I K AEDRANVLRA LLLKHSHPSD B3a3ratno GAALLDLEQT T L P G I A H L V V E T M I V S D Q I R PEDRASVLRT LLLKHSHPND B3atoncmy GAIILDLQAS S L S A V A E K V V D E L R T K G E I R A A D R D G L L R A L L Q R R S Q S E G C o n s e n s u s G . . L L D ..... L.GVA ........... Q I . . . D R . . . L R A L L L . . S H . . .
-1 B3athomsa B3atratno B3atgalga Ae2galga B3a2homsa B3a3ratno B3atoncmy Consensus
451 LEALGGVKPA VLTRS ..... G .............. DPSQP LKDLEGVKPA VLTRS ..... G .............. APSEP AESVWTLPAAQLQCSDGEQK D ADERA EKEFS.FPRN ISAGSLGSLL VHHHSTNHVG EGGEPAVTEP EKDFS.FPRN ISAGSVGSLL GHHHGQ...G AESDPHVTEP DKDSGFFPRN PSSSSVNSVL GNHHPTPSHG PDGAVP...T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AVAQP .............. S . . . . . . . . . . . . . . . . . . . . . . . . .
500 LLPQHSSLET LLPHQPSLET LLRDQRAVEM LIAGHGAEHD LM...GGVPE MADDLGEPAP L...GGDIEM L .........
B3athomsa B3atratno B3atgalga Ae2galga B3a2homsa B3a3ratno !ii~!i:i~i:l~iii!i:!!:i B3atoncmy Consensus
501 QLFCEQGDGG TEGHSPSG ............ ILEKIPPDSE KLYCAQAEGG SEEPSPSG ............ IL.KIPPNSE RELHGAGQSP SRAQLGPQ ............ LHQQLPEDTE ARVDVERERE VPTPAPPAGI TRSKSKHELK LLEKIPDNAE TRLEVERERD VPPPAPPAGI TRSKSKHELK LLEKIPENAE LWPHDPDAKE KPLHMPGGDG HRGKS...LK LLEKIPEDAE QTFSVTKQRD T . . . . . . . . . . . . . . . . . . . . . . . . TDSVE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P...E
55O ATLVLVGRAD TTLVLVGRAS ATLVLVACAA ATVVLVGCVE ATVVLVGCVE ATVVLVGSVP ASIVLSGVMD AT.VLVG...
..
.
-i!~i:i!ii!!!Sii:!i :~:i?~''~:.i!ii::: i:i!i~i :,:il
B3athomsa B3atratno B3atgalga Ae2galga B3a2homsa B3a3ratno B3atoncmy Consensus
r !:~::~;~:;!i:i !;~ii i!i: ::::::::::::::::::::: ~::i~::::;:::~:: !~.~!~/:..~!;:i!;~:i)::!':
~.?~)}:? ?i:?
:}-i~!!-?s2:
...........!iiii!i.,i ili iii:i!i'iii!;,~.:~;~i!i
15C
551 FLEQPVLGFV RLQEAAELE. AVELPVPIRF FLVKPVLGFV RLKEAVPLE. DLVLPEPVSF FLEQPLLALV RLGAPCPDA. VLAVPLPVRF FLDQPTMAFV RLQEAVELDS VLEVPVPVRF FLSRPTMAFV RLREAVELDA VLEVPVPVRF FLEQPAAAFV RLSEAVLLES VLEVPVPVRF SLEKPAVAFV RLGDSVVIEG ALEAPVPVRF F L . . P . . . F V R L ......... L..P.PVRF
601 B3athomsaAATLMSERVF B3atratnoAATLMTERVF B3atgalgaAATVMADRVF Ae2galga ISTLMSDKQF B3a2homsa ISTLMSDKQF B3a3ratno IATLMSDKLF B3atoncmy MAALMADWVF Consensus ..TLM .... F B3athomsa B3atratno B3atgalga Ae2galga B3a2homsa B3a3ratno B3atoncmy Consensus
RIDAYMAQSR RVTASLAQSR RRDAYLCGGR HEAAYLADDR HEAAYLADER HEAAYQADDR SLEAYLAPTN ...AY.A..R
600 LFVLLGPEAP HIDYTQLGRA LLVLLGPEAP HIDYTQLGRA VLTVLGPDSP RLSYHEIRRA LFLLLGPSST HMDYHEIGRS LFLLLGPSSA NMDYHEIGRS LFVMLGPSHT STDYHELGRS VFVLVGPSQG GVDYHESGRA .... LGP ..... DY...GR.
GELLHSLEGF LDCSLVLPPT GELLSSLDSF LDCSLVLPPT AELLGGLQGF LEASIVLPPQ HDLLTAINEF LDCSVVVPPS EDLLTAINAF LDCSVVLPPS QDLLGAISEF LDGSIVIPPS KELTNAIADF MDCSIVIPPT ..LL ..... F L D C S . V . P P .
650 DAPSEQALLS EAPSEKALLN EVPSEQHLHA EVQGEE.LRS EVQGEELLRS EVEGRDLLRS EIQDEGMLQP E...E..L..
651 700 LVPVQRELLR RRYQSSPA ......... KPD SSFYKGLDLN .... GGPDDP LVPVQKELLR KRYLPRPA ......... KPD PNLYEALDGG KEGPGDEDDP LIPLQRHAVR RRYQHPDTV ....... RSPG GPTAPKDTGD KGQAPQDDDP VAHFQREMLK KREEQEKRML .... LEPKSP EEKAL.LKLK VAEDEDEDDP VAHFQRQMLK KREEQGRLLP TGAGLEPKSA QDKAL.LQM. VERQGQLKMI VAAFQRELLR KRREREQTKV EMTTRGGYVA PGKELSLEMG GSEATSEDDP IIDFQKKMLK DRLRPSDTRI IFGG ........... GAKAD EADEEPREDP .... Q . . . L . . R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DP
701 750 B3athomsa LQQTGQLFGG LVRDIRRRYP YYLSDITDAF SPQVLAAVIF IYFAALSPAI B3atratno LRRTGRIFGG LIRDIRRRYP YYLSDITDAL SPQVLAAVIF IYFAALSPAV B3atgalga LLRTRRPFGG LVRDIRRRYP KYLSDIRDAL NPQCLAAVIF IYFAALSPAI
~..::i:: .~:..:.:::~-:.::.~:.:::
::::::::::::::::::::::::::::::
Ae2galga B3a2homsa B3a3ratno B3atoncmy Consensus
LRRTGRPFGG PSADGAAFGG LQRTGSVFGG LARTGIPFGG L..TG..FGG
751 B3athomsa TFGGLLGEKT B3atratno TFGGLLGEKT B3atgalga TFGGLLGEKT TFGGLLGEKT ! ~!:! i.~:! !i?!:-i! ~;i!~ Ae2galga B3a2homsa TFGGLLGEKT B3a3ratno TFGGLLGEKT B3atoncmy TFGGLLADKT ConsensusTFGGLLGEKT :::::::::::::::::::::::::::::::,:?:] ......... " :".-":i .:
:~!i2::!!!!/?!:!~!}.::. !
g,~i~)~.ii!'ii!i:~!:.i
::::::::::::::::::::::::: ~:.~!ii ~:,ii:ii~::ii::::!~i: I~;.:r
,%-::~.~:- ~:~::~.::.:
~:.:,.,..:,~:~. ~::.::~:, ...........
,~}:~i:::::..:.~.~:~::::.~::
~:,:~:~:.::.::..~:~.~:.~
~:.::~:::i============================ ii :::.:-.-...-.-.:..:::
~:,~.:,..:-::~:..~:~,~, iii~:::ii.i::iii!ii::~:ii:.;iii:.i::
;.. :::::::::::::::::::::::
~.::-,~:,~.:.:..~:.~,:~.,,~:~,
~::::::::::::::::::::::::::::::::: :::::::::::::::::::::::::::::::::::::::: : :.:::-::-
f::::W:-:r: ~:':.~: .....................~...... ,,:,,.:,:<- ........... ....~....................... ........ ~.~:.~: ~:.~:.... ,...,:..:~ ::: ~::.,:.::...... ~,:~,.,::.,,.~:..~.-::.~.:
LIRDVRRRYP LIRDVRRRYP LVRDVKRRYP MIKDMKRRYR L.RD..RRYP
HYLSDFRDAL NPQCIAAVIF HYLSDFRDAL DPQCLAAVIF HYPSDLRDALHSQCVAAVLF HYISDFTDAL DPQVLAAVIF .Y.SD..DAL .PQ..AAVIF
RNQMGVSELL ISTAVQGILF RNLMGVSELL ISTAVQGILF RGMMGVSELL LSTSVQCLLF QDLIGVSELI ISTSLQGVLF QDLIGVSELI MSTALQGVVF EGLMGVSELI VSTAVLGVLF EHMMGVSELM ISTCVQGIIF .... GVSEL..ST..QG..F
ALLGAQPLLV ALLGAQPLLV SLLSAQPLLV CLLGAQPLLV CLLGAQPLLV SLLGAQPLLV AFIAAQPTLV .LL.AQPLLV
IYFAALSPAI IYFAALSPAI IYFAALSPAI IYFAALSPAI IYFAALSPAI 800 VGFSGPLLVF LGFSGPLLVF VGFSGPLLVF IGFSGPLLVF IGFSGPLLVF VGFSGPLLVF IGFSGPLLVF .GFSGPLLVF
801 850 B3athomsa EEAFFSFCET NGLEYIVGRV WIGFWLILLV VLVVAFEGSF LVRFISRYTQ B3atratno EEAFYSFCES NNLEYIVGRA WIGFWLILLV VLVVAFEGSF LVQYISRYTQ B3atgalga EEAFFRFCED HGLEYIVGRV WIGFWLILLV LLVVACEGTV LVRYLSRYTQ Ae2galga EEAFFTFCTS NGLEYLVGRV WIGFWLILIV LLMVACEGSF LVRFVSRFTQ B3a2homsa EEAFFSFCSS NHLEYLVGRV WIGFWLVFLA LLMVALEGSF LVRFVSRFTR B3a3ratno EEAFFKFCRA Q D L E Y L T G R V W V G L W L V V F V LALVAAEGSF LVRYISPFTQ B3atoncmy EEAFFAFCKS Q E I E Y I V G R I W V G L W L V I I V V V I V A V E G S F LVKFISRFTQ ConsensusEEAFF.FC .... LEY.VGR. W.G.WL...V ...VA.EGSFLV...SR.TQ 851 B3athomsa EIFSFLISLI B3atratno EIFSFLISLI B3atgalga EIFSFLISLI Ae2galga EIFAFLISLI B3a2homsa EIFAFLISLI B3a3ratno EIFAFLISLI B3atoncmy EIFSILISLI ConsensusEIF.FLISLI
FIYETFSKLI FIYETFSKLI FIYETFAKLV FIYETFSKLG FIYETFYKLV FIYETFHKLY FIYETFSKLG FIYETF.KL.
900 KIFQDHPLQK TYNYN ............... KIFQDYPLQE SYA.P ............... TIFEAHPLQQ SYDTD ............... KIFQEHPLHG CAQPN ............ GTA KIFQEHPLHG CSASNSSEVD GGENMTWAGA KVFTEHPLLP FYPPE ............ EAL KIFKAHPLVL NYEH...LND SLDNPFHPVV KIF..HPL ......................
B3athomsa B3atratno 1:22522;:22g:[2;:22:i::292; B3atgalga Ae2galga B3a2homsa B3a3ratno B3atoncmy !}!i:i !i!!!::!ii!!~!!Consensus
901 ............ VLMVPKPQ GPLPNTALLS ............ VVMKPKPQ GPVPNTALLS ............ V..STEPS VPKPNTALLS WSN.GTAAPN GTAQRGAAKV TGQPNTALLS RPTLGPGNRS LAGQSGQGKP RGQPNTAPLS EPGLELNSSA LPPTEGPPGP RNQPNTALLS KEHIEYHEDG NKTVHEVIHE RAYPNTALLS ....................... PNTALLS
950 LVLMAGTFFF AMMLRKFKNS LVLMVGTFLLAMMLRKFKNS LVLMAGTFFL ALFLRQFKNS LVLMAGTFFI AFFLRKFKNS LVLMAGTFFI AFFLRKFKNS LILMLGTFLI AFFLRKFRNS MCLMFGCFFI AYFLRQFKNG L.LM.GTF.. A..LR.FKNS
B3athomsa i::-!:-.~i~-;:::-:..:-~:-.:; !!i!i'~!~'~i:i!iB3atratno B3atgalga Ae2galga B3a2homsa B3a3ratno B3atoncmy Consensus
951 SYFPGKLRRV TYFPGKLRRV VFLPGKVRRL RFFPGRIRRL RFFPGRIRRV RFLGGKARRV HFLPGPIRRM ...PG..RR.
i000 QDTYTQKLSV PDGFKVSNSS KNTYTQKLSV PDGLKVSNSS KDTYTQKLKV PRGLEVTNGT RDTYTQKLSV PSGFSVTAPD EDTYTQKLSV PSGFSVTAPE TDTYTQKLTV PTGLSVTSPH EDAYTQKLVVPKGLMVSNPN .DTYTQKL.V .P.G..V...
12::::!:!ii::!::;.!!!:::iiii!:-!
.:::.. -:.: :. ......:::.--:.---!:-:" :~ .:.1:v.,,-:~:,,::,:,.~,
::::::::::::::::::::::::::::::::::::::::::
!ii!ii, i~2;::iii::~i~!i:.i
IGDFGVPISI IGDFGVPISI IGDFGVPISI IGDFGVPIAI IGDFGVPIAI IGDFGIPISI IGDFGVPIAI IGDFGVPI.I
LIMVLVDFFI LIMVLVDTFI FVMALADFFI LVMVLVDYSI LIMVLVDYSI LVMVLVDYSI FFMIAVDITI ..M.LVD..I
151
B3athomsa B3atratno B3atgalga Ae2galga B3a2homsa B3a3ratno B3atoncmy Consensus
i001 ARGWVIHPLG ARGWVIHPLG ARGWFIHPMG KRGWVINPLG KRGWVINPLG KRTWFIPPLG ARGWFINPLG .RGW.I.PLG
LRSEFPIWMM LYNHFPKWMM SATPFPIWMM ERSDFPVWMM EKSPFPVWMM SARPFPPWMM EKKPFPAWMM .... FP.WMM
1050 FASALPALLV FILIFLESQI TTLIVSKPER FASVLPALLV FILIFLESQI TTLIVSKPER FASPVPALLV FILIFLETQI TTLIVSKPER VASGLPAVLV FILIFMETQI TTLIISKKER VASLLPAILV FILIFMETQI TTLIISKKER VAAAVPALLV LILIFMETQI TALIVSQKAR GACCVPALLV FILIFLESQI TTLIVSKPER .A...PALLVFILIF.E.QITTLI.SK.ER
B3athomsa B3atratno B3atgalga Ae2galga B3a2homsa B3a3ratno B3atoncmy Consensus
1051 KMVKGSGFHL KMIKGSGFHL KLVKGSGFHL MLQKGSGFHL MLQKGSGFHL RLLKGSGFHL KMVKGSGFHL ...KGSGFHL
DLLLVVGMGG DLLLVVGMGG DLLLIVAMGG DLLLIVAMGG DLLLIVAMGG DLLLIGSLGG DLLILVTMGG DLLL.V.MGG
ii00 VAALFGMPWL SATTVRSVTH ANALTVMGKA VAALFGMPWL SATTVRSVTH ANALTVMGKA LAALFGMPWL SATTVRTITH ANALTVVGKS FFALFGLPWLAAATVRSVTH ANALTVMSKA ICALFGLPWLAAATVRSVTH ANALTVMSKA LCGLFGLPWL T A A T V R S V T H V N A L T V M R T A IASLFGVPWL SAATVRSVTHANALTVMSK. ...LFG.PWL .A.TVRSVTH ANALTVM.K.
B3athomsa B3atratno B3atgalga Ae2galga B3a2homsa B3a3ratno B3atoncmy Consensus
ii01 STPGAAAQIQ SGPGAAAQIQ AVPGERAHIV VAPGDKPKIQ VAPGDKPKIQ IAPGDKPQIQ ...GPKPEIE ..PG .... I.
EVKEQRISGL EVKEQRISGL EVKEQRLSGL EVKEQRVTGL EVKEQRVTGL EVREQRVTGV KVLEQRISGM EV.EQR..G.
LVAVLVGLSI LVSVLVGLSI LVAVLIGVSI LVAVLVGLSI LVALLVGLSI LIASLVGLSI LVAAMVGVSI LVA.LVG.SI
1151 B3athomsa MGVTSLSGIQ B3atratno MGITSLSGIQ B3atgalga MGVTSLFGIQ Ae2galga MGVTSLNGIQ B3a2homsa MGVTSLNGIQ B3a3ratno MGVTSLSGIQ B3atoncmy MGITSLSGIQ ConsensusMG.TSL.GIQ
LFDRILLLFK LFDRILLLFK LFDRILLLLM FYERLQLLLM FYERLHLLLM LSQRLLLIFM MWDRMLLLIV ...R..LL..
1200 PPKYHPDVPY VKRVKTWRMH LFTGIQIICL PPKYHPDVPF VKRVKTWRMH LFTGIQIICL PPKYHPKEPY VTRVKTWRIT SSPLTQILVV PPKHHPDVPY VKKVRT.RMH LFTGLQLACL PPKHHPDVTY VKKVRTLRMH LFTALQLLCL PAKHHPEQPY VTKVKTWRMH LFTFIQLGCI PRKYYPADAY AQRVTTMKMH LFTLIQMVCL P.K.HP...YV..V.T.RMHLF...Q..C.
1201 B3athomsa AVLWVVKSTP B3atratno AVLWVVKSTP B3atgalga ALLWGVKVSP Ae2galga AVLWAVMSTV B3a2homsa ALLWAVMSTA B3a3ratno ALLWVVKSTV B3atoncmy GALWMVKMSA C o n s e n s u s A . L W . V ....
ASLALPFVLI ASLALPFVLI ASLRCPFVLV ASLAFPFILI ASLAFPFILI ASLAFPFLLL FSLALPFVLI ASLA.PF.L.
LTVPLRRVLL LTVPLRRLLL LTVPLRRLLL LTVPVRMCLL LTVPLRMVVL LTVPLRRCLL LTIPLRMAIT LTVPLR...L
1251 1267 B3athomsa FDEEEGRDEY DEVAMPV B3atratno FDEAEGLDEY DEVPMPV B3atgalga FEEAEGQDVY NEVQMPS
152
LMEPILSRIP LMEPILSRIP LMEPILKYIP VIGELLRQIP VIGDLLRQIP VMGAVLRRIP LLEPILKMIP ..... L..IP
PLIFRNVELQ PLIFRELELQ PRIFSEIELK SRIFTDREMK TRIFTDREMK PRLFQDRELQ GTLFTDKEMK ...F...E..
1150 LAVLFGIFLY LAVLFGIFLY LAVLFGIFLY LAVLFGIFLY LAVLFGIFLY LAVLFGIFLY MTALFGIFLY LAVLFGIFLY
1250 CLDADDAKAT CLDGDDAKVT CLDTDDAVVT CLDADEAEPI CLDANEAEPV ALDSEDAEPN CLDASDGKVK CLD...A...
:.::::.:
::::-
Ae2galga B3a2homsa B3a3ratno B3atoncmy Consensus
LDEREGVDEY FDEREGVDEY FDE .DGQDEY FEEEPGEDMY F.E..G.D.Y
NEMPMPV NEMPMPV NELHMPV .ESPLP. .E..MP.
............ ............. ............
2 :?z
2:
iii~i~:i:i!::ii!i~ii!~ .......... G;:;:;:;:G":;-:: .:.:-:.:?:.-:..:..:-:? ~:~ ~-. .............
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the alignments: B3a2musmu, B3a2ratno (B3a2homsa); B3a3musrnu (B3a3ratno); B 3 a t m u s m u (B3atratno). Residues listed in the consensus sequence are present in at least 75 % of the aligned transporter sequences. Database accession numbers
............................
:::::::::::::::::::::..............
.............
,::.::::::::::::::::::::::: ........... ............................
................................. ::::::::::::::::::::::::........ ..............
.............................
Ae2galga B3a2homsa B3a2ratno B3a2musmu B3a3 musmu B3a3ratno B3atmusmu B3atgalga B3atratno B3atoncmy B3athomsa
SWISSPR OT
PIR
EMBL/GENBANK
P04920 P23347 P 13808 P 16283 P23348 P04919 P 15575 P23562 P32847 P02730
A25104; $21086 A34911 A31789 A33638 B34911 A26086; A25314 A30816 A33810 $22173; $24318 A03189; A28079
U48889 X6213 7; X03918 J05166 J04036 M28383 J05167 X02677; M29379 M23404 J04793; L02943 X61699 M27819
Rs163
1 Reithman, R. (1994) Curr. Opin. Cell Biol. 6, 583-594. 2 Rybicld, A.C. et al. (1993) Blood 81, 2155-2165.
J,5z.
Mitochondrial adenine nucleotide translocator family
...: ......... ::"~::i:i : :::.:7. ..........
--...:.... ii::;;::i.:--- .-ZJI.
9 ;,7 ....d:::::: :i-:~.i- :~i" :: i:.:~:.:..!!:i ~i":.?: :.;(:.;::ii~.
.;:::.S: .":i~ :~i~i ?:~.:;.::~. .. ::.... :. -.:...:: ....... ,..:. ...........
9:.i :-iL. :.: ??!-:
-:::. ,:::.-.: :: .: :::.
...... .......
:::.~, .::.~ ....::.-.::~..
Summary Transporters of the mitochondrial adenine nucleotide translocator family, the example of which is the ANT1 ADP, ATP carrier protein of humans (Adtlhomsa), mediate the exchange of substrate pairs across the inner mitochondrial membrane, including ADP and ATP, and 2-oxoglutarate and malate. Mitochondrial brown fat uncoupling proteins generate heat by dissipating the mitochondrial transmembrane proton gradient, thereby driving compensatory electron transport. Members of the family are ubiquitous in eukaryotes. Statistical analysis of multiple amino acid sequence comparisons indicates that the mitochondrial adenine nucleotide translocator family is most closely related to the mitochondrial phosphate carder family ~. Members of the mitochondrial adenine nucleotide translocator family are predicted to exist as a homodimer, with each subunit containing six membrane-spanning helices comprised of three homologous domains. Several amino acid sequence motifs are highly conserved in the mitochondrial adenine nucleotide translocator family, including motifs that are unique to the family, motifs common to the mitochondrial phosphate carrier family and motifs necessary for function. Defects in members of the mitochondrial adenine nucleotide translocator family have been implicated in
Graves' disease e. Nomenclature, biological sources and substrates CODE
DESCRIPTION [SYNONYMS]
Acrlsacce
Regulatorof acetyl CoA synthetase [ACR1, YJR096W] ADP/ATPcarrierprotein 1 [ADP/ATP translocase 1, adenine nucleotide translocator 1, ANT1, ADT1] ADP/ATPcarrier heartskeletal muscle isoform T1 [ANT1, ADT1] ADP/ATPcareer heartskeletal muscle isoform T1 [ADP/ATP translocase 1, adenine nucleotide translocator 1, ANT1, ADT1] ADP/ATPcarrierprotein 1 [ADP/ATP translocase 1, adenine nucleotide translocator 1, ANT1, ADT1] ADP/ATPcarrierheartskeletal muscle isoform T1 [ANT1, ADTI] ADP/ATPcareer protein 1 [ADP/ATP translocase 1, adenine nucleotide translocator 1, ANT1, ADT1, AAC1, YM9796.09c]
Adtlarath
Adtlbosta ::::::::::::::::::::::::::~.:: ~:::.~::::..
Adtlhomsa
::::sili~ii:!!i.!::i::i::ii:)ii
.......... :::::::::::::::::::::::::::::
Adtlmusmu
.....::::::::::::::::::::::
::::::::::::::::::::::: :.:::: :~ ;::::~.:
Adtlratno
~i~i!Z..;i::~i! !:.!~y" ........ ===================== ::~:?:~;.:.::~:.:,:.:.:;..:.:.~.~ ......,.,.... .............
Adtlsacce
::::::::::::::::::::::::::::::::::::::::: :::
;i!~iig..i .i;i;i:::!i!!:::i
ii~j~i:!i:i!:i~,i~,i:!i:~
154
ORGANISM [COMMON NAMES] Saccharomyces cerevisiae [yeast] Arabidopsis thaliana
SUBSTRATE(S)
Unknown ADP/ATP
[mouse-ear cress] Bos taurus
ADP/ATP
[cow] Homo sapiens
ADP/ATP
[human]
Mus musculus
ADP/ATP
[mouse] Rattus norvegicus
ADP/ATP
[rat] Saccharomyces cerevisiae [yeast]
ADP/ATP
i:~,:/:::.:::::.~: 2:~-
.
.
CODE
DESCRIPTION /SYNONYMS]
Adtlsoltu
ADP/ATP cartier protein 1 [ADP/ATP translocase 1, adenine nucleotide translocator 1, ANT1, ADT1,
f2!:!2:2~'~:::="~: .:,, .. :;..-::,
:::-:-7.::
:. :-
SUBSTRATE(S)
ADP/ATP
[potato]
KaCll
o.: .............. . .... }i-!i!iii~:i!i~~ii~!~i~ii.~
ORGANISM [COMMON NAMES] Solanum tuberosum
Adtlzeama
Adt2arath I:-..::;.:.:-;. i .. ::.
ADP/ATP carrier protein 1 [ADP/ATP translocase 1, adenine nucleotide translocator 1, ANT1, ADT1, ANTGI] ADP/ATP career protein 2 [ADP/ATP translocase 2, adenine nucleotide translocator 2, ANT2,
Zea maize
ADP/ATP
lcoml Arabidopsis thaliana
ADP/ATP
[mouse-ear cress]
ADT~I
Adt2homsa :.i!i2:!i~.; Adt2ratno
~. ,~ .:::. :~:: :..-:
!i:i.:i:::":~i :.::~.I~.:!!:::
Adt2sacce ...... . ,...,.. .......... : :...:..
:?.?::-.,::- . '-.;:: :,.-d: ::-:-
Adt2soltu
.
.
Adt2zeama
,ii:~:!!::iii!? ..... .......... : .:.:.::.::>:..:
Adt3bosta Adt3homsa
'~.!i!!i'~!ii!}:!!!,:):Adt3sacce !~ !:~i:i~i.i:i:!i:i~:i 7' "271;2-
Adtanoga .........
ii~:~:::::::::::::::::::::~:~
Adtcaeel .~,~..~L..... 9
84
ii;ili~ii! Adtchlke
,,~:~.
~.~..::~.~
ADP/ATP carrier isoform T2 [ANT2, ADT2I ADP/ATP career isoform T2 [ANT2, ADT2] ADP/ATP carrier protein 2 [ADP/ATP translocase 2, adenine nucleotide translocator 2, ANT2, ADT2, AAC2, PETg, YBL030C, YBL0421l ADP/ATP carrier protein 2 [ADP/ATP translocase 2, adenine nucleotide translocator 2, ANT2, ADT2, AAC2] ADP/ATP carrier protein 2 [ADP/ATP translocase 2, adenine nucleotide translocator 2, ANT2, ADT2, ANTG2] ADP/ATP carrier isoform T3 [ANT3, ADT3I ADP/ATP carrier isoform T3 [ANT3, ADTg] ADP/ATP career protein 3 [ADP/ATP translocase 3, adenine nucleotide translocator 3, ANT3, ADT3, AAC3, YBR085W, YBR07531 ADP/ATP career protein 1 [ADP/ATP translocase 1, adenine nucleotide translocator 1, ANT 1, ADT 1] ADP/ATP career protein 1 [ADP/ATP translocase 1, adenine nucleotide translocator 1, ANT1, ADT1] ADP/ATP career protein [ADP/ATP translocase, Antl, adenine nucleotide translocator, ADT]
Homo sapiens
ADP/ATP
[human] Rattus norvegicus
[~atl
ADP/ATP
Saccharomyces cerevisiae [yeast]
ADP/ATP
Solanum tuberosum
ADP/ATP
[potato]
Zea maize
ADP/ATP
lco~l
Bos taurus
ADP/ATP
[cow] Homo sapiens
ADP/ATP
[human] Saccharomyces cerevisiae [yeast]
ADP/ATP
Anopheles gambiae
ADP/ATP
[mosquito] Caenorhabditis elegans ADP/ATP
[nematode] Chlorella kessleri
ADP/ATP
[algal
15s
CODE
il;i!i!i!!: i Adtchlre
i~,!i!ili~ii!:! ..... :.o ........... :::::::::::::::::::::::::::::
Adtdrome :9 ..y- -:::y :::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::
Adtneucr
.............. ::::::::::::::::::::::::::: .................. ,:.:,
Adtorysa .......... !i!;!i~:!!i:;~!~i! !i::.!'~
!;?!~!~C!:!!}:
Adtplafa
Adtransy
Adtttitu
Adttrybr
Altlhalro
Cithomsa Citsacce
DESCRIPTION /SYNONYMS] ADP/ATP career protein [ADP/ATP translocase, adenine nucleotide translocator, ANT1, ADT, ABT] ADP/ATP career protein 1 [ADP/ATP translocase 1, adenine nucleotide translocator 1, ANT1, ADT1] ADP/ATP career protein [ADP/ATP translocase, adenine nucleotide translocator, ANT1, ADT, ACP] ADP/ATP career protein [ADP/ATP translocase, adenine nucleotide translocator, ANT1, ADT] ADP/ATP career protem 1 [ADP/ATP translocase 1, adenine nucleotide translocator 1, ANT1, ADT1] ADP/ATP career protem 1 [ADP/ATP translocase 1, adenine nucleotide translocator 1, ANT1, ADT1] ADP/ATP career protem 1 [ADP/ATP translocase 1, adenine nucleotide translocator 1, ANT1, ADT1] ADP/ATP career protein 1 [ADP/ATP translocase 1, adenine nucleotide translocator 1, ANT1, ADT1] ADP/ATP career protein 1 [ADP/ATP translocase 1, adenine nucleotide translocator 1, ANT1, ADT1] Mitochondrial citrate transport protein [CIT] Mitochondrial citrate transport protein [CIT, YBR2039,
ORGANISM /COMMON NAMES] Chlamydomonas rheinhardtii [algal
SUBSTRATE(S)
ADP/ATP
Drosophila melanogaster [fruit fly]
ADP/ATP
Neurospora crassa [mold]
ADP/ATP
Oryza sativa
ADP/ATP
[~icel Hasmodium falciparum ADP/ATP [mosquito] Rana sylvaticum
ADP/ATP
[frog] Triticum turgidum [wheat]
ADP/ATP
Trypanosoma brucei [trypanosome]
ADP/ATP
Halocynthia roretzi [sea squirt]
ADP/ATP
Homo sapiens [human] Saccharomyces cerevisiae [yeast]
Citrate Citrate
YBR29K] Diflcaeel Flxlsacce Gdcbosta Gdchomsa Gdcratno M2omcaeel
Mitochondrial carrier protein [DIFI] FAD carrier protein [FLX1, YIL134W] Graves disease carrier protein [GDC] Graves disease cartier protein [GDC] Graves disease carrier protein [GDC] 2-Oxoglutarate/malate carrier protein
IOGCp] M2ombosta
2-Oxoglutarate/malate carrier protein
[OGCPI
156
Caenorhabditis elegans [nematode] Saccharomyces cerevisiae [yeast] Bos taurus [cow] Homo sapiens [human] Rattus norvegicus [rat] Caenorhabditis elegans [nematode] Bos taurus
[cowl
FAD Unknown Unknown Unknown 2-Oxoglutarate/ malate 2-Oxoglutarate/ malate
CODE
DESCRIPTION [SYNONYMS]
M2omhomsa 2-Oxoglutarate/malate carrier protein [OGCPI Putative mitochondrial Pet8sacce carrier protein [PET8, N2012] Putative mitochondrial Pmtsacce carrier protein [PMT, PMT1, YKL120W,YKL5221 Mitochondrial carrier protein Rim2sacce [MCP, RIM2, YBR192w, YBR1402] Tricarboxylate carrier Txtpratno [CTP, citrate transport protein] Mitochondrial brown fat Ucpbosta uncoupling protein [ocP] Mitochondrial brown fat Ucphomsa uncoupling protein Ucpmesau Ucpmusmu
[UCP]
Mitochondrial brown fat uncoupling protein [UCPI Mitochondrial brown fat uncoupling protein
[ucPI
Ucporycu Ucpratno
Mitochondrial brown fat uncoupling protein [UCP] Mitochondrial brown fat uncoupling protein [UCP]
ORGANISM [COMMON NAMES] Homo sapiens
[human]
SUBSTRATE(S)
2-Oxoglutarate/ malate
Saccharomyces cerevisiae [yeast]
Unknown
Saccharomyces cerevisiae [yeast]
Unknown
Saccharomyces cerevisiae [yeast]
Unknown
Rattus norvegicus
Citrate
[rat] Bos taurus
H
+
[cow] Homo sapiens
H§
[human] Mesocricetus auratus
H§
[golden hamster] Mus musculis
H§
[mouse] Oryctolagus cuniculus
H§
[rabbit] Rattus norvegicus
H§
[rat]
m
Phylogenetic
tree
7.2 .<
i~: ;,~!!iii::ili< i~i,
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the phylogenetic tree: Adtlratno, Adtlbosta (Adtlhomsa); Adt2ratno, Adt3bosta, Adt3homsa (Adt2homsa}; Adt3sacce {Adt2sacce); Gdcbosta, Gdcratno (Gdchomsa); M2ombosta (M2omhomsa); Ucpmesau, Ucpmusmu (Ucphomsa). -----J
{
:i~:!i':i~i~i;:~: ii%i:iii:::i::,i::
............ :....:
{
:.::;!4::;if:::.:-:< !i-li, .~!':iJ<5:i:::-:< +:.:.::::..:...... :.
~iiiii!!:<~,:
iii:!i!:~:ki!:!~:i'~:,;:il ;:i:~!i?:i~ili!i;i:i:~.! ~}!i:i::i!:iii::J:4i:~i ::::::::::::::::::::::::::::::::::
I
F!xlsacce F,im2sacce Adtlsacce Adt2sacce Adtlsoltu Adt2soltu Adtlarath Adt2arath Adtlzeama Adt2zeama Adtorysa Adttritu idtchlre Adtneucr Adttrybr Adtch!ke Adtplafa Adtlhomsa Adt2homsa idtdrome Adtanoga idtlhalro Adtcaeel Adtransy Gdchomsa Diflcaeel Pet8sacce [--- Cithomsa Txtpratno Citsacce Acrlsacce Ucpmesau Ucpratno Ucporycu Ucphomsa Ucpbosta M2omcaeel M2omhomsa Pmtsacce
Ii
OUTSIDE
:.ii i+!i~i~:i!!g!!i :~ii i i'.+il]!i:+ii!.~ii .:.::.::~?~::: !i,~:(:. :::.:~ ............... . . . . . . . . . . . .
R G
: ?i!i:i::;~!)-'!~i: : :~i:::::::::::::::::::::::.::
Y
+i!]?:ili?~S+:~i:;i:?ii ]i!i: ,+:;?iilii!!:~ ................. :!~!ii::-~+7ii:+i;:
o
KI I
:!,i!!: ~ ii?i::i!:.i,.iiii~:.': :~..-:. :: .>:.;- ::: ~::.::
K
l-m II-m]l',,mll,m+ m m
:~i:i]i!ii::!' ~:~ ?:!~: '
:: ............... ..: ..... ]::::iii?: .i?,i:!i,,::)!:!
i:!!i:ii(~:~!::ii ..:.: ................... :":A::::1::::::::::::::::::::::::--: .............. ++..
;.,.,L.LI,;I,[.... !~iiii!ii~:~i!iii!i:i? }:':!~!;i:!~iii~?! .
-, +
!+ii:-:.):iiii?7 :~:+
!1
!i!!:::::::::::::::::::::::::::::::::::::::: ii)il;~ii.:i!:i '~:...].~:L.~];]:]::
NH
k:::iii}:!iiiii!:!iiii?i; ;::::.::: ..:.:.-.::::: .::::-:.
i~i~:i!~i!!ii!::!: iii:::!+:!ii!?:-!i?? :::.i::!i~:-:i!:)!'~!! :~!~i
Acrlsacce Adtlarath Adtlbosta i!:ii:).i'~!!i::?,!:',::ii:!:. Adtlhomsa Adtlmusmu Adtlratno iiG,:!+:~!: ~' Adt 1sacce Adtlsoltu Adtlzeama Adt2arath Adt2homsa Adt2ratno Adt2sacce i!i:ii.i:.!::,i:iii::i!:.:ii! Adt2soltu Adt2zeama Adt3bosta Adt3homsa Adt3sacce Adtanoga Adtcaeel :f:i+:~i.;i~'?~.:i:iiii, Adtchlke Adtchlre ~+L!i'),I Adtdrome !ii!!~?~!!i::!i~;ii Adtneucr Adtorysa Adtplafa ...... :::::::::::::::::::::: ....... . .............. +. :.9 +.......:. :+ .:..
::::::::::::::::::::::::::::: ~:.;~Y:H:::%
:i~,ii'!':iii::+ ..............
....... .:::... ....... ..... :....: .........
....... ::::::::::::::::::::::::::::::::::
(-:;,:.~f.;:::.:.---i.::::
INSIDE
P h y s i c a l and g e n e t i c characteristics
......
i,::i, iL.)::::,-
2
AMINO ACIDS 322 379 297 298 298 298 309 386 387 385 298 298 318 386 387 298 298 307 301 300 339 308 297 313 382 301
MOL. WT 35 340 41297 32 836 33 064 32 870 32 989 34 120 42058 42391 41845 32 895 32 901 34 426 41 829 42 332 32 877 32 866 33 313 32 681 33 211 36 686 33 258 32 914 33 888 41 510 33 756
CHROMOSOMAL LOCUS Chromosome 10
EXPRESSION SITES
heart, heart, heart, heart,
skeletal skeletal skeletal skeletal
muscle muscle muscle muscle
4q35 Chromosome 7
fibroblasts fibroblasts
Xq24-26 Chromosome 2
liver liver
Xp22.32 Chromosome 2
15 ~,
~i)iii;U,i :!i ::;;:::iii!::ili~i'===================== ~:
ii:.i:,Q?:i;~,!:2,1.::i:!~i
:::::::::::::::::::::::::::::::::::::::::::
;!::;!:i;!!ii:!i:iig:::-!~iiii
.....,:.::..,::~.::.........
Adtransy Adttritu Adttrybr Ahlhalro Cithomsa Citsacce Diflcaeel Flxlsacce Gdcbosta Gdchomsa Gdcratno M2ombosta M2omcaeel M2omhomsa Pet8sacce Pmtsacce Rim2sacce Txtpramo Ucpbosta Ucphomsa Ucpmesau Ucpmusmu Ucporycu Ucpratno
AMINO ACIDS
M O L. WT
263 331 307 304 311 299 312 311 330 332 322 313 290 313 284 324 377 311 286 307 306 306 306 306
29351 35921 33975 33 307 34085 32 173 33 134 34 409 36085 36 235 35056 34 040 32 022 33 948 31027 35 153 42101 33 835 30934 33 044 33 215 33 116 33 083 33 080
EXPRESSION SITES
CHROMOSOMAL LOCUS
Chromosome 9 thyroid thyroid thyroid heart, liver, brain heart, liver, brain Chromosome 14 Chromosome 4 Chromosome 2 liver brown brown brown brown brown
fat fat fat fat fat
4q31
Multiple amino acid sequence alignments
!ii:ji!i!!:!:~i:~i!!~! Rim2sacce :~i!!iiii::i~ii::!ii',!!!:.! Adtlsoltu Adt2soltu Adtlarath Adt2arath Adtlzeama Adt2zeama Adtorysa i!ii$i;ii!i!:ii!iii?i' ~ ~ ....=========================== Adtchlke iiiiii:~:ilili'.!!',
1 50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MP KKSIEEWEED MADMNQHPTV FQKAANQLDL RSSLSQDVHA RYGGVQ.PAI YQRHFAYGNY ..ADNQHPTV YQKVASQMHL SSSLSQDVHA RYGGIQRPAL SQRRFPYGNY ..HQVQHPTI AQKAAGQF.M RSSVSKDVQV ...GYQRPSM YQRHATYGNY MVEQTQHPTI LQKVSGQL.L SSSVSQDIRG YASASKRPAT YQKHAAYGNY MADQANQPTV LHKLGGQFHL RSIISEGVRA R.NICPSVSS YERRFATRNY MADQANQPTV LHKLGGQFHL SSSFSEGVRA R NICPSFSP YERRFATRNY MAEQANQPTV LQKFGGQFHL GSSFSEGVRA R.NICPSVSS YDRRFTTRSY ................................................. M
,~' ii!i!:.;,:'~,i?!!i!~ i'! Flxlsacce Rim2sacce Adtlsacce !!~iii!!:,.f~ii::!i}i Adt2sacce Adtlsoltu Adt2soltu :::::::::::::::::::::::::::::::::::: Adtlarath ii!i!i~i;:~,i::i:~:i~:iiAdt2arath i Adtlzeama Adt2zeama i:.iii!Y!:: ii!i~ii.~ii~:.::i"!iiii:i!i .ii!i!!i:'il Adtorysa Adttritu Adtchlre Adtneucr Adttrybr
51 i00 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MVDHQWTPL QKEVISGLSA AIESVPYLAS DEKGSNYKEA TQIPLNLKQS EIENHPTVKPWVHFVAGGIG ........................... MSH TETQTQQSHF GVDFLMGGVS ................. MSS NAQVKTPLPP APAPKKESNF LIDFLMGGVS SNAGLQRG.. QATQDLSLIT SNASPVFVQA PQE.KGFAAF ATDFLMGGVS SNAGLQTC.. QATQDLSLIA ANASPVFVQA PQE.KGLAAF ATDFLMGGVS SNAAFQFP .PTS..RMLA TTASPVFVQT PGE KGFTNFALDFLMGGVS SNAAFQYP.. LVAA..SQIA TTTSPVFVQA PGE.KGFTNF AIDFMMGGVS MTQSLWGPSM SVSGGINVPV M.QTPLCANA PAE.KGGKNF MIDFMMGGVS MTQSLWGPSM SVSGGINVPV M.PTPLFANA PAE.KGGKNF MIDFMMGGVS MTQGL ...... VNGGINVPM MSSSPIFANA PAE.KGGKNF MIDFLMGGVS MTQNL ......... GISVPI MSSSPMFANA PPEKKGVKNF AIDFLMGGVS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MAKEEKNF MVDFLAGGLS ........................... MAE QQKVLGMPPF VADFLMGGVS .......................... MTDK KREPAPKLGF LEEFMIGGVA
i!iiiiiiN.;:?::i~::{;::i~i::i~i
!:!i! iiiiii ii!i:!ili!i! i!i
16(
Adtchlke LSSALYQQAG LSGLLRASAM GPQTPFIASP KETQADPMAF VKDLLAGGTA Adtplafa .......................... M S S D I K T N ..... F A A D F L M G G I S ................................ MGDHAWSF LKDFLAGGVA i
:i...::::;:--;:-::::::::::::::::::::::::: .........
~.~:~:: :::::::::: :::::::::: .... ..........
....:.
-.-
-
!iiiiiiiiii
....................
.......
i01 150 F l x l s a c c e G S V T T L V V H P L D L L K V R L Q . . . . L S A T S A Q K A H Y G .... P F M V I K E I I R S Rim2sacce GMAGAVVTCP FDLVKTRLQS DIFLKAYKSQ AVNISKGSTR PKSINYVIQA Adtlsacce AAIAKTGAAP IERVKLLMQN QEEMLK.QGS L ................... Adt2sacce AAVAKTAASP IERVKLLIQN QDEMLK. QGT L ................... Adtlsoltu AAVSKTAAAP IERVKLLIQN QDEMLK.AGR L ................... Adt2soltuAAVSKTAAAP IERVKLLIQN QDEMIK.AGR L ................... AdtlarathAAVSKTAAAP IERVKLLIQN QDEMIK.AGR L ................... Adt2arathAAVSKTAAAP IERVKLLIQN QDEMLK.AGR L ................... AdtlzeamaAAVSKTAAAP IERVKLLIQN QDEMIK.SGR L ................... Adt2zeamaAAVSKTAAAP IERVKLLIQN QDEMIK.SGR L ................... Adtorysa AAVSKTAAAP IERVKLLIQN QDEMIK.AGR L ................... Adttritu AAVSKTAAAP IERVKLLIQN QDEMIK.AGR L ................... Adtchlre AAVSKTAAAP IERVKLLIQN QDEMIK.QGR L ................... Adtneucr AAVSKTAAAP IERIKLLVQN QDEMIR.AGR L ................... Adttrybr AGLSKTAAAP IERVKLLVQN QGEMMK.QGR L ................... Adtchlke GAISKTAVAP IERVKLLLQT QDSNPMIKSG Q ................... Adtplafa AAISKTVVTP IERVKMLIQT QDSIPEIKSG Q ................... AdtlhomsaAAVSKTAVAP IERVKLLLQV QHASKQISAE K ................... Adt2homsaAAISKTAVAP IERVKLLLQV QHASKQITAD K ................... Adtdrome AAVSKTAVAP IERVKLLLQV QHISKQISPD K ................... Adtanoga AAVSKTAVAP IERVKLLLQV QAASKQIAVD K ................... AdtlhalroAAISKTIVAP IERVKLLLQV QAVSTQMKAG T ................... Adtcaeel AAVSKTAVAP IERVKLLLQV QDASKAIAVD K ................... Adtransy .................................................. Gdchomsa GCCAKTTVAP LDRVKVLLQA HNHHYK ........................ Diflcaeel GSCTVIVGHP FDTVKVRIQT MPMPKPGEKP .................... Pet8sacce GTSTDLVFFP IDTIKTRLQA .............................. Cithomsa GGIEICITFPTEYVKTQLQLDERS...HPP ....................
161
Txtpratno Citsacce Acrlsacce Ucpmesau Ucpratno !?!!r!!!!!~ !!!~!:i Ucporycu Ucphomsa Ucpbosta M2omcaeel !i!i~ii:i;iii!)i~!~il M2omhomsa Pmtsacce Consensus . . . . . . . . . .
. . . . . . . .
T??!:!:!......
Flxlsacce Rim2sacce Adtlsacce Adt2sacce Adtlsoltu Adt2soltu Adtlarath Adt2arath Adtlzeama Adt2zeama Adtorysa Adttritu Adtchlre Adtneucr Adttrybr Adtchlke Adtplafa Adtlhomsa Adt2homsa Adtdrome Adtanoga Adtlhalro Adtcaeel Adtransy Gdchomsa Diflcaeel Pet8sacce Cithomsa Txtpratno Citsacce Acrlsacce Ucpmesau Ucpratno Ucporycu Ucphomsa Ucpbosta M2omcaeel M2omhomsa Pmtsacce Consensus
GGIEICITFP GAAEACITYP GLFEALCCHP ACLADIITFP ACLADIITFP ACLADVITFP ACLADVITFP ACVADIITFP GAMAACCTHP GMGATVFVQP ACIAVTVTNP ......... P
TEYVKTQLQL DERA...NPP .................... FEFAKTRLQL IDKA...SKA .................... LDTIKVRMQI YRRVAGIEHV .................... LDTAKVRLQI QGEGQISSTI .................... LDTAKVRLQI QGEGQASSTI .................... LDTAKVRQQI QGEFPITSGI .................... LDTAKVRLQV QGECPTSSVI .................... LDTAKVRLQI QGECLISSAI .................... LDLLKVQLQT QQQGKL..TI .................... LDLVKNRMQL SGEGAK..TR .................... IELIKIRMQL QGEMSASAAK .................... .... K...Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
151 200 SANSGRSVT ........... NELYRGLSIN LFGNAIAWGV YFGLYGVTKE GTHFKETLGI IGNVYKQEGF RSLFKGLGPN LVGVIPARSI NFFTYGTTKD DTRYKGILDC FKRTATHEGI VSFWRGNTAN VLRYFPTQAL NFAFKDKIKS DRKYAGILDC FKRTATQEGV ISFWRGNTAN VIRYFPTQAL NFAFKDKIKA SEPYKGIGEC FGRTIKEEGF GSLWRGNTAN VIRYFPTQAL NFAFKDYFKR SEPYKGIGDC FSRTIKDEGFAALWRGNTAN VIRYFPTQAL NFAFKDYFKR SEPYKGIGDC FGRTIKDEGF GSLWRGNTAN VIRYFPTQAL NFAFKDYFKR TEPYKGIRDC FGRTIRDEGI GSLWRGNTAN VIRYFPTQAL NFAFKDYFKR SEPYKGIVDC FKRTIKDEGF SSLWRGNTAN VIRYFPTQAL NFAFKDYFKR SEPYKGIADC FKRTIKDEGF SSLWRGNTAN V I R Y F P T Q A L N F A F K D Y F K R SEPYKGIGDC FGRTIKDEGF ASLWRGNTAN VIRYFPTQAL NFAFKDYFKR SEPYKGIGDC FGRTIKDEGF GSLWRGNTANVIRYFPTQAL NFAFKDYFKR ASPYKGIGEC FVRTVREEGF GSLWRGNTAN VIRYFPTQAL NFAFKDKFKR DRRYNGIIDC FKRTTADEGV MALWRGNTAN VIRYFPTQAL NFAFRDKFKK DKPYNGVVDC FRRTISTEGV YPLWRGNLSN V L R Y F P T Q A L N F A F K D K F K R VPRYTGIVNC FVRVSSEQGV ASFWRGNLANVVRYFPTQAF NFAFKDTIKG VERYSGLINC FKRVSKEQGV LSLWRGNVANVIRYFPTQAF NFAFKDYFKN ..QYKGIIDCVVRIPKEQGF LSFWRGNLAN V I R Y F P T Q A L N F A F K D K Y K Q ..QYKGIIDCVVRIPKEQGV L S F W R G N L A N V I R Y F P T Q A L N F A F K D K Y K Q ..QYKGMVDC FIRIPKEQGF SSFWRGNLAN VYRYFPTQAL NFAFKDKYKQ ..QYKGIVDC FVRIPKEQGI GAFCGGNLAN VIRYFPTQAL NFAFKDVYKQ ..EYKGIIDA FVRIPKEQGF FSLWRGNLAN VIRYFPTQAL NFAFKDTYKK ..RYKGIMDV LIRVPKEQGVAALWRGNLAN V I R Y F P T Q A M N F A F K D T Y K A ....... M D C W R I P K E Q G F ISFWRGNLAN VIRYFPTQAL NFGFKDKYKK ...HLGVFSA LRAVPQKEGF LGLYKGNGAMMIRIFPYGAI QFMAFEHYKT ..QFTGALDC VKRTVSKEGF FALYKGMAAP LVGVSPLFAV FFGGCA .... .... KGGF ...... FANGGY KGIYRGLGSAVVASAPGASL FFISYDYMKV ..RYRGIGDC VRQTVRSHGV LGLYRGLSSL LYGSIPKAAV RFGMFEFLSN ..RYRGIGDC VRQTVRSHGV LGLYRGLSSL LYGSIPKAAV RFGMFEFLSN ..SRNPLVL. IYKTAKTQGI GSIYVGCPAF IIGNTAKAGI RFLGFDTIKD ..KPPGFIKT GRTIYQKEGF LALYKGLGAV VIGIIPKMAI RFSSYEFYRT ..RYKGVLGT ITTLAKTEGL PKLYSGLPAG IQRQISFASL RIGLYDTVQE ..RYKGVLGT ITTLAKTEGL PKLYSGLPAG IQRQISFASL RIGLYDTVQE ..RYKGVLGT ITTLAKTEGP LKLYSGLPAG LQRQISFASL RIGLYDTVQE ..RYKGVLGT ITAVVKTEGR MKLYSGLPAG LQRQISSASL RIGLYDTVQE ..RYKGVLGT IITLAKTEGP VKLYSGLPAG LQRQISLASL RIGLYDTVQE ..G ..... QL SLKIYKNDGI LAFYNGVSAS VLRQLTYSTT RFGIYETVKK ..EYKTSFHA LTSILKAEGL RGIYTGLSAG LLRQATYTTT RLGIYTVLFE ..VYKNPIQG MAVIFKNEGI KGLQKGLNAAYIYQIGLNGS RLGFYEPIRS ...Y.G ............ G ...... G . . A . . . R ........ F...D ....
Flxlsacce Rim2sacce Adtlsacce Adt2sacce Adtlsoltu Adt2soltu Adtlarath Adt2arath Adtlzeama Adt2zeama Adtorysa Adttritu Adtchlre Adtneucr Adttrybr Adtchlke Adtplafa Adtlhomsa Adt2homsa Adtdrome Adtanoga Adtlhalro Adtcaeel Adtransy Gdchomsa Diflcaeel Pet8sacce Cithomsa Txtpratno ::~:iii:,ii~!il C i t s a c c e Acrlsacce iiiiii;iii:;~!!iii;~'~ U c p m e s a u Ucpratno Ucporycu Ucphomsa Ucpbosta N:i:!!| M2omcaeel M2omhomsa Pmtsacce Consensus
201 250 LIYKSVAKPG ETQLKGVGND HKMNSLIYLS AGASSGLMTA ILTNPIWVIK M Y A K A F N N G Q ET . . . . . . . . . . . . P M I H L M A A A T A G W A T A TATNPIWLIK LLS ...... Y D R E R D . G Y A K W F A G N L F S G G A A G G L S L L F V YSLDYARTRL M F G ...... F K K E . E . G Y A K W F A G N L A S G G A A G A L S L L F V YSLDYARTRL LFN ...... F K K D R D . G Y W K W F A G N L A S G G A A G A S S L F F V YSLDYARTRL LFN ...... F K K D R D . G Y W K W F A G N L A S G G G A G A S S L L F V Y S L D Y A R T R L LFN ...... F K K D R D . G Y W K W F A G N L A S G G A A G A S S L L F V YSLDYARTRL LFN ...... F K K D K D . G Y W K W F A G N L A S G G A A G A S S L L F V YSLDYARTRL LFN ...... F K K D R D . G Y W K W F A G N L A S G G A A G A S S L F F V YSLDYARTRL LFN ...... F K K D R D . G Y W K W F A G N L A S G G A A G A S S L F F V YSLDYARTRL LFN ...... F K K D K D . G Y W K W F G G N L A S G G A A G A S S L F F V YSLDYARTRL M F N ...... F K K D K D . G Y W K W F G G N L A S G G A A G A S S L F F V YSLDYARTRL M F G ...... F N K D K E . . Y W K W F A G N M A S G G A A G A V S L S F V YSLDYARTRL MFG ...... Y K K D V D . G Y W K W M A G N L A S G G A A G A T S L L F V YSLDYARTRL MFN ...... Y K K E K D . G Y G K W F M G N M A S G G L A G A A S L C F V Y S L D Y V R T R L LF ....... P K Y S P K T D F W R F F V V N L A S G G L A G A G S L L I V Y P L D F A R T R L IF ....... P R Y D Q N T D F S K F F C V N I L S G A T A G A I S L L I V Y P L D F A R T R L LFL ...... G G V D R H K Q F W R Y F A G N L A S G G A A G A T S L C F V YPLDFARTRL IFL ...... G G V D K R T Q F W R Y F A G N L A S G G A A G A T S L C F V YPLDFARTRL VFL ...... G G V D K N T Q F W R Y F A G N L A S G G A A G A T S L C F V YPLDFARTRL V F L ...... G G V D K N T Q F W R Y F L G N L G S G G A A G A T S L C F V YPLDFARTRL IFL ...... A G V D K R K Q F W R Y F H G N L A S G G A A G A T G L C F V YPLDFARTRL IFL ...... E G L D K K K D F W K F F A G N L A S G G A A G A T S L C F V YPLDFARTRL IFL ...... D N V D K R T Q F W R Y F A G N L A S G G A A G A T S L C F V YPLDFARTRL LIT ...... T K L G I S G H V H R L M A G S M . . . . . A G M T A V I C T D P V D M V R V R L ..... V G K W L Q Q T D P S Q E M T F I Q N A N A . G A L A G V F T T I V M V P G E R I K C L L KSRPYISKLY SQ.GSEQLID TTTHMLS.SS IGEICACLVR VPAEVVKQRT HMR ...... D AQ G R L D S T R G L L C G L G A . . G V A E A V V V V C P M E T I K V K F HMR ...... D A Q . G R L D S R R G L L C G L G . A . . G V A E A V V V V C P M E T V K V K F L L R ...... D R E T G E L S G T R G V I A G L G . A . . G L L E S V A A V TPFEAIKTAL L L V ...... N K E S G I V S T G N T F V A G V G . A . . G I T E A V L V V N P M E V V K I R L YFS ...... S G K E T P P T L G N R I S A G L M . T . . G G V . A V L I G QPTEVVKVRL YFS ...... S G R E T P A S L G S K I S A G L M . T . . G G V . A V F I G Q P T E V V K V R M FFT ...... S G E E T P . S L G S K I S A G L T . T . . G G V . A V F I G Q P T E V V K V R L FLT ...... A G K E S K P . L G S K I L A G L T . T . . G G V . A V F I G QPSEVVKVRL FFT ...... T G K E ~ KISAGLM.T..GGV.AVFIGQPTEVVKVRL QL ........ P Q D Q P L P F Y Q K A L L A G F . A . ~ TPGDLVNVRM RLT ...... G A D G T P P G F L L K A V I G M T . A . . G A T . G A F V G TPAEVALIRM SLNQLFFPDQ EPHKVQSVGV NVFSGAA.S..GII.GAVIG SPLFLVKTRL ................................................ RL
251 Flxlsacce TRIM..STSK Rim2sacce TRVQLDKAGK AdtlsacceAADARGSKST Adt2sacceAADSKSSKKG Adtlsoltu ANDRKASKK. Adt2soltu ANDAKAAKKG Adtlarath ANDAKAAKKG Adt2arath ANDSKSAKKG Adtlzeama ANDAKAA.KG Adt2zeama ANDAKAA.KG Adtorysa ANDAKAA.KG
GAQGAYTSMY TSVRQYKNSW .SQRQFNGLL .GARQFNGLI GGERQFNGLV GGGRQFDGLV GGGRQFDGLV RGERQFNGLV GGERQFNGLV GGDRQFNGLV GGERQFNGLV
NGVQQLL.RT DCLKSVI.RN DVYKKTL.KT DVYKKTL.KS DVYKKTL.KS DVYRKTL.KS DVYRKTL.KT DVYKKTL.KS DVYRKTL.KS DVYRKTL.KS DVYRKTL~
DGFQGLWKGL EGFTGLYKGL DGLLGLYRGF DGVAGLYRGF DGIAGLYRGF DGVAGLYRGF DGIAGLYRGF DGIAGLYRGF DGIAGLYRGF DGIAGLYRGF DGIAGLYRGF
300 VPALFG.VSQ SASYLG.SVE VPSVLGIIVY LPSVVGIVVY NISCVGIIVY NISCVGIIVY NISCVGIIVY NISCAGIIVY NISCVGIIVY NISCVGIIVY NISCVGIIVY
163
i!i::!!i:iii!i ii i:iiii:i:!i ;i!i ......
~:i:ii!i!.s ::::::::::::::::::::::
. . . . . . . .
....... :::::::::::::::::::::::::::::::::
9-:,.<~:~,:LL . . . . . . . . . . .
i~ii~i'ii!:~:i~
i!:!i:i!iii:l i!
Adttritu ANDAKAS.KG GGDRQFNGLV DVYRKTL.KS DGIAGLYRGF NISCVGIIVY Adtchlre ANDAKSAKKG GGDRQFNGLV DVYRKTI.AS DGIAGLYRGF NISCVGIWY Adtneucr ANDAKSAKKG .GERQFNGLV DVYRKTI.AS DGIAGLYRGF GPSVAGIVVY Adttrybr ANDTKSV.KG GGERQFNGIV DCYVKTW.KS DGIAGLYRGFVVSCIGIVVY Adtchlke AAD...VGS. GKSREFTGLV DCLSKVV.KR GGPMALYQGF GVSVQGIIVY Adtplafa ASD...IGK. GKDRQFTGLFDCLAKIY.KQTGLLSLYSGFGVSVTGIIVY AdtlhomsaAAD...VGKGAAQREFHGLG DCIIKIF.KS DGLRGLYQGF NVSVQGIIIY Adt2homsaAAD...VGKA GAEREFRGLG DCLVKIY.KS DGIKGLYQGF NVSVQGIIIY Adtdrome AAD...TGKGG.QREFTGLGNCLTKIF.KS DGIVGLYRGFGVSVQGIIIY Adtanoga GAD...VGPG AGEREFNGLL DCLKKTV.KS DGIIGLYRGF NVSVQGIIIY AdtlhalroAAD...IGSG GS.RQFTGLG NCLATIV.KK DGPRGLYQGFWSIQGIIVY Adtcaeel AAD IGK. A N D R E F K G L A D C L I K I V KS D G P I G L Y R G F F V S V Q G I I I Y Adtransy AAD...VGKA GAGREFNGLG DCLAKIF.KS DGLKGLYQGF NVSVQGIIIY G d c h o m s a A F Q V K ..... G E H R Y T G I I H A F K T I Y A K E G G F F G F Y R G L M P T I L G M A P Y Diflcaeel QVQQAGSAGS GVH..YDGPL DVV.KKLYKQ GGISSIYRGT GATLLRDIPA Pet8sacce QVHSTNSS.. WQTLQ SIL.RNDNKE GLRKNLYRGW STTIMREIPF Cithomsa . I H D Q T S P N P KYI .RGFF H G V . R E I V R E Q G L K G T Y Q G L T A T V L K Q G S N T x t p r a t n o . I H D Q T S S N P KY. .RGFF H G V . R E I V R E Q G L K G T Y Q G L T A T V L K Q G S N Citsacce .IDDKQSATP KYHNNGRGVVRNY.SSLVRD KGFSGLYRGV LPVSMRQAAN Acrlsacce QAQHLTPSEP NAGPKYNNAI HAA.YTIVKE EGVSALYRGV SLTAARQATN Ucpmesau QAQSHLHGI. K.PR.YTGTY NAY.RIIATT ESFSTLWKGT TPNLLRNVII Ucpratno QAQSHLHGI. K.PR.YTGTY NAY.RVIATT ESLSTLWKGT TPNLMRNVII Ucporycu QAQSHLHGL. K.PR.YTGTY NAY.RIIATT ESLTSLWKGT TPNLLRNVII Ucphomsa QAQSHLHGI. K.PR.YTGTY NAY.RIIATT EGLTGLWKGT TPNLMRSVII Ucpbosta QAQSHLHGP. K.PR.YTGTY NAY.RIIATT EGLTGLWKGT SPNLTTNVII M2omcaeel QNDSKLPLE. Q.RRNYKHAL DGL.VRITRE EGFMKMFNGA TMATSRAILM M2omhomsa TADGRLPAD. Q.RRGYKNVF NAL.IRITRE EGVLTLWRGC IPTMARAVVV Pmtsacce QSYSEFIKIG E.QTHYTGVWNGL.VTIFKT EGVKGLFRGI DAAILRTGAG Consensus ............. R...G ............. G...LY.G ...........
i!:iii!!?ii!S:!'!i:;il
N N i!i;:!!~:ii i:!!i?t
164
Flxlsacce Rim2sacce Adtlsacce Adt2sacce Adtlsoltu Adt2soltu Adtlarath Adt2arath Adtlzeama Adt2zeama Adtorysa Adttritu Adtchlre Adtneucr Adttrybr Adtchlke Adtplafa Adtlhomsa Adt2homsa Adtdrome Adtanoga Adtlhalro Adtcaeel
301 GALYFAVYDT GILQWLLYEQ RGLYFGLYDS RGLYFGMYDS RGLYFGMYDS RGLYFGMYDS RGLYFGLYDS RGLYFGLYDS RGLYFGLYDS RGLYFGLYDS RGLYFGMYDS RGLYFGLYDS RGLYFGMYDS RGLYFGLYDS RGFYFGLYDT RGAYFGLYDT RGSYFGLYDS RAAYFGVYDT RAAYFGIYDT RAAYFGFYDT RAAYFGCFDT RAAYFGTYDT RAAYFGMFDT
LKQRKLRRKR EN.GLDIHLT MKRLIKERSI EKFGYQAEGT FKPV ......... LLTGALE LKPL ......... LLTGSLE LKPV ......... LLTGNLQ LKPV ......... LLTGKME VKPV ......... LLTGDLQ VKPV ......... LLTGDLQ IKPV ......... VLTGNLQ IKPV . . . . . . . . . V L T G S L Q LKPV ......... VLTGSLQ LKPV ......... LLTGTLQ LKPV ......... VLVGPLA IKPV ......... LLVGDLK L Q P M . . . . . . . . . L P V .... AKGV ......... LFKDERT AKAL ......... LFTNDKN AKGM ......... LP.DPKN AKGM ......... LP.DPKN AR.M ......... LP.DPKN AKGM ......... LP.DPKN VKGM ......... LP.DPQN AKMV ......... FASDGQK
350 NLETIEI ......... TSLG KSTSEKVKEW CQRSGSAGLA GS ....... F V A S F L L G W V I GS ....... F L A S F L L G W V V DS ....... F F A S F G L G W L I DS ....... F F A S F A L G W L I DS ....... F F A S F A L G W V I DS ....... F F A S F A L G W L I DN ....... F F A S F A L G W L I DN ....... F F A S F A L G W L I DN ....... F F A S F A L G W L I DN ....... F F A S F A L G W L I NN ....... F L A A F L L G W G I NN ....... F L A S F A L G W C V DT ....... F I V N F F L G W A V AN ....... F F A K W A V A Q A V TN ....... I V L K W A V A Q S V VH ....... I F V S W M I A Q S V TH ....... I V I S W M I A Q T V TP ....... I Y I S W A I A Q V V TS ....... I F V S W A I A Q V V TP ....... I I V S W A I A Q V V LN ....... F F A A W G I A Q V V
Adtransy i~iiii ~!i~iiii~i!iiil,iiii
RAAYFGIYDT AGVSFFTFGT Diflcaeel SAAYLSVYEY :ili~iiii~ili:ii!ii!ii:i Pet8sacce TCIQFPLYEY Cithomsa QAIRFFVMTS Txtpratno QAIRFFVMTS Citsacce QAVRLGCYNK
AKGM . . . . . . . . . L P . D P K N TH ....... I F V S W M I A Q S V LKSVGL..SH APTLLGSPSS DNPNVLVLKT HVNLLCGGVA L K K K F S . . G E ...GAQRTLS P ......... G A T L M A G G L A L K K T W A . . K A ...NGQSQVE P ......... W K G A I C G S I A LRNW.. .Y R G D N P N K P M N P LITGVFGAIA L R N W ..... Y Q G D N P N K P M N P ......... L I T G V F G A V A I K T L I Q . . D Y T D S P K D K P L S S ......... G L T F L V G A F S AcrlsacceQGANFTVYSKLKEFLQ N Y H Q M D . . . V L P S WETSCIGLIS Ucpmesau NCVELVTYDL MKGALV..NN QILADDVP ............. CHLLSAFVA Ucpratno NCTELVTYDL MKGALV..NH HILADDVP ............. CHLLSALVA Ucporycu NCTELVTYDL MKGALV..RN EILADDVP ............. CHFVSALIA Ucphomsa NCTELVTYDL IKEAFV..KN NILADDVP ............. CHLVSALIA Ucpbosta NCTELVTYDL MKEALV..KN KLLADDVP ............. ATVRCC..A M2omcaeel TIGQLSFYDQ IKQTLI..SS GVAEDNLQ ............. THFASSISA M2omhomsa NAAQLASYSQ SKQFLL..DS GYFSDNIL ............. CHFCASMIS Pmtsacce SSVQLPIYNT AKNILV..KN DLMKDGPA ............. LHLTASTIS C o n s e n s u s ....... Y . . . K . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
li!i!i!i!~i!~!!~Gi!idiic h o m s a
:::::::::::::::::::::::::
. . . . . . . . . . . . .
. . . . . . . . . . . .
Flxlsacce Rim2sacce Adtlsacce Adt2sacce Adtlsoltu Adt2soltu Adtlarath Adt2arath Adtlzeama Adt2zeama Adtorysa Adttritu Adtchlre Adtneucr Adttrybr Adtchlke Adtplafa Adtlhomsa Adt2homsa Adtdrome Adtanoga Adtlhalro Adtcaeel Adtransy Gdchomsa Diflcaeel Pet8sacce Cithomsa Txtpratno Citsacce Acrlsacce Ucpmesau Ucpratno Ucporycu A!~!~I!~!!!! U c p h o m s a . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
............................
.............................
~,:,,~:, ~:~
. . . . . . . . . . . . . . . . .................................
[ N ' : i i ' : ! i ~ N }
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
................................
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
351 KMVSVTLVYP KFVASIATYP TMGASTASYP TTGASTCSYP TNGAGLASYP TNGAGLASYP TNGAGLASYP TNGAGLASYP TNGAGLASYP TNGAGLASYP TNGAGLASYP TNGAGLASYP TIGAGLASYP TTAAGIASYP TIVAGLLSYP TAGAGVLSYP TILAGLISYP TAVAGLVSYP TAVAGLTSYP TTVAGIVSYP TTASGIISYP TTGAGIISYP TVGSGILSYP TAVAGFGSYP RAIAQTISYP GIANWGVCIP GGIAAATTTP GAASVFGNTP GAASVFGNTP GIVTVYSTMP GAIGPFSNAP GFCTTFLASP GFCTTLLASP GFCTTLLSSP GFCATAMSSP
400 FQLLKSNL.. Q S F R A N E Q K F R L F P L I . . . K L I I A N D G F V . HEWRTRL.. RQTPKENGKR KYTGLVQSFK VIIKEEGLF. L D T V R R R M M M TSG .... QTI K Y D G A L D C L R K I V Q K E G A . Y L D T V R R R M M M TSG .... QAV K Y D G A F D C L R K I V A A E G V . G I D T V R R R M M M TSG .... EAV K Y K S S L D A F S Q I V K N E G P . K I D T V R R R M M M TSG .... EAV K Y K S S F D A F N Q I L K N E G P . K I D T V R R R M M M TSN .... EAV K Y K S S L D A F K Q I L K N E G A . K I D T V R R R M M M TSG .... EAV K Y K S S F D A F S Q I V K K E G A . K I D T V R R R M M M TSG .... EAV K Y K S S L D A F Q Q I L K K E G P . K I D T V R R R M M M TSG .... EAV K Y K S S L D A F Q Q I L K K E G P . K I D T V R R R M M M TSG .... EAV K Y K S S M D A F S Q I L K N E G A . K I D T V R R R M M M TSG .... EAV K Y K S S L D A F Q Q I L A K E G A . K I D T I R R R M M M TSG .... SAV K Y N S S F H C F Q E I V K N E G M . K L D T I R R R M M M TSG .... EAV K Y K S S F D A A S Q I V A K E G V . K L D T V R G R M M M TSG .... AAV K Y K N S M D C M L Q V I K Q E G A . A F D T V R R R L M M QSG .... GER Q Y N G T I D C W R K V A Q Q E G M . K FDTVRRRMMM MSGRKGKEEI QYKNTIDCWI KILRNEGF.K FDTVRRRMMM QSGRKG.ADI MYTGTVDCWR KIAKDEGA.K FDTVRRRMMM QSGRKG.TDI MYTGTLDCWR KIARDEGG.K FDTVRRRMMM QSGRKA.TEV IYKNTLHCWA TIAKQEGP.. FDTVRRRMMM QSWPCK.SEV MYKNTLDCWVKIGKQEGS.G FDTVRRRMMM QSGRNK.EDR MYKGTVDCWG KIYKNEGG.K WDTVRRRMMM QSGRK...DI LYKKHPRLRK EDHPNEGM.S FDTVRRRMMM QSGRKGAEEI MYSGTIDCWK KIARDEGG.R FDVTRRRMQL GTVLPEFE.. KCLTMRDTMK YDYGHHGIRK A D V L K S R L Q T A P E G K Y P D ..... G I R G V L R E V L R E E G P . R L D F L K T R L M L ...NKTTA ..... SLGSVII R I Y R E E G P . A L D V I K T R M Q ..... G L E A H K .YRNTWDCGL Q I L K K E G L . K LDVIKTRMQ GLEAHK YRNTLDCGV QILKNEGP K L D T V K T R M Q ..... SLDSTK . Y S S T M N C F A T I F K E E G L . K LDTIKTRLQK DKSISLEKQS GMKKIITIGA QLLKEEGF.R A D V V K T R F I N .... S L P G Q . . Y P S V P S C A M T M L T K E G P . T VDWKTRFIN .... S L P G Q . . Y P S V P S C A M T M Y T K E G P . A V D V V K T R F I N .... S P P G Q . . Y A S V P N C A M T M F T K E G P . T V D V V K T R F I N .... S P P G Q . . Y K S V P N C A M K V F T N E G P . T
Mitochondrial
::::::::::::::::::::::::: ii.::-. ). i:t i~:::i::i::::: :7:-~::;::i::i.::..:::!
adenine
Ucpbosta M2omcaeel
M2omhomsa
,,<. ..........
Pmtsacce Consensus
..............
nucleotide
GFCTTVLSSP ASVATVMTQP GLVTTAASMP GLGVAVVMNP ......... P
translocator
family
V D V V K T R F V N .... S S P G Q . . N T S V P N C A M MMLTREGP.S L D V M K T R M M N .... A A P G E . . F K G I L D C F M F T A K L . G P .M VDIAKTRIQN MRMIDGKPE.. YKNGLDVLF KVVRYEGF. F W D V I L T R I Y N Q K .... G D L . . Y K G P I D C L V KTVRIEGV.T .D .... R . . . . . . . . . . . . . . Y ............. EG...
f!-2!!;?{}ff~!!~i-::-~ :! L!
Pmtsacce Consensus
401 450 G L Y K G L S A N L V R A I P S T C I T F ...... C V Y E N L K H R L . . . . . . . . . . . . . S M Y S G L T P H L M R T V P N S I I M F ...... G T W E I V I R L L S . . . . . . . . . . . . SLFKGCGANI FRGVAAAGVI S ....... LY DQLQLIMFGK KFK ....... SLFKGCGANI LRGVAGAGVI S ....... MY DQLQMILFGK KFK ....... SLFKGAGANI LRAVAGAGVL A ....... GY DKLQVLVLGK KFGSGGA... SLFKGAGANV LRAVAGAGVL A ....... GY DKLQVIVFGK KYGSGGG... SLFKGAGANI LRAVAGAGVL S ....... GY DKLTLIVFGK KYGSGGA... SLFKGAGANI LRAVAGAGVL A ....... GY DKLQLIVFGK KYGSGGA... SLFKGAGANI LRAIAGAGVL S ....... GY DQLQILFFGK KYGSGGA... SLFKGAGANI LRAIAGAGVL S ....... GY DQLQILFFGK KYGSGGA... SLFKGAGANI LRAIAGAGVL S ....... GY DQLQILFFGK KYGSGGA... SLFKGAGAKL LRAIAGAGVL S ....... GY DQLQILFFGK KYGSGGA... SLFKGAGANI LRAVAGAGVL A ....... GY DQLQVILLGK KYGSGEA... S L F K G A G A N I L R G V A G A G V L S . . . . . . . IY D Q L Q V L L F G K A F K G G S G . . . S L M R G A G A N I L R G I A G A G V L S . . . . . . . G V D A L K P I Y V E W R R S N ...... AFFKGAWSNV LRGAGGAFVL VL ....... Y DEIKKFINPN AVSSASE... GFFKGAWANV IRGAGGALVL VF ....... Y DELQKLI ............. AFFKGAWSNV LRGMGGAFVL VL ....... Y DEIKKYV ............. AFFKGAWSNV LRGMGGAFVL VL ....... Y DEIKKYT ............. SFFKGAFSNI LRGTGGAFVL VL ....... Y DEIKKVL ............. AFFKGAFSNV LRGTGGALVL VF ....... Y DEVKALLG ............ A F F K G A L S N V I R G T G G A L V L V L . . . . . . . Y D E L K K L V F G T S V H N ...... A M F K G A L S N V F R G T G G A L V L AI . . . . . . . Y D E I Q K F L . . . . . . . . . . . . . A F F R V P .... G P T C S E A W V V L L S W S C T M S S RKSSKFILVQ MSVTWHAVLC GLYRGLSLNY IRCIPSQAVA FTTYELMKQF FHLN ................ A L F K G F W P V M L R A F P A N A A C FF . . . . . . G L E L T L A A F . . R Y F G I G G H P T P V F F S G V G P R T M W . I S A G G A I FL ...... G M Y E T V H S L L S K S F P T A G E M R A A F Y K G T V P R L G R V C L D V A I V F V ...... IY D E V V K L L N K . . V W K T D .... A F Y K G T V P R L G R V C L D V A I V F V ...... IY D E V V K L L N K . . V W K T D .... T F W K G A T P R L G R L V L S G G I V F T . . . . . . IY E K V L V M L A . . . . . . . . . . . . A L Y K G I T P R V M R V A P G Q A V T F T ...... V Y E Y V R E H L E N L G I F K K N D T P K AFFKGFVPSF LRLASWNVIM FV CF E Q . L K K E L S K S R Q T V D C T T . A F F K G F A P S F L R L G S W N V I M F V . . . . . . CF E Q . L K K E L M K S R Q T V D C T T . A F F K G F V P S F L R L G S W N V I M F V . . . . . . CF E K . L K G E L M R S R Q T V D C A T . A F F K G L V P S F L R L G S W N V I M F V ...... CF E Q . L K R E L S K S R Q T M D C A T . AFFKGFVPSFLRLGSWN.IMFV ...... C F E R . L K Q E L M K C R H T M D C A T . G F F K G F I P A W A R L A P H T V L T FI . . . . . . FF E Q . L R L K F G . . Y A P P V K A . . S L W K G F T P Y Y A R L G P H T V L T FI FL E Q M N K A Y K R L F L S G . . . A L Y K G F A A Q V F R I A P H T I M C L T ...... FM E Q T M K L V Y S I E S R V L G H N . . ..FKG ...... R ......................................
Adtransy Diflcaeel Acrlsacce
451 461 NIP ........ STEWPLPHD E PKPLK ......
Flxlsacce Rim2sacce i%ii!7~:;7:il A d t l s a c c e Adt2sacce Adtlsoltu 171!r 71i77.F :i: :::i Adt2soltu Adtlarath Adt2arath Adtlzeama Adt2zeama . _ Adtorysa 7.j::!i~ii!iiT.!r A d t t r i t u :i;!.....:;,i?;:~iii! ~'(.i;.;i Adtchlre Adtneucr Adttrybr Adtchlke Adtplafa i7........... :~-~:~:!:7i!:i-.:7:-:!i!A d t l h o m s a Adt2homsa Adtdrome Adtanoga Adtlhalro Adtcaeel Adtransy ;ii:ii:.i:iT!:7.7 Gdchomsa :~!i:!i!,il !iT!t):; D i f l c a e e l Pet8sacce Cithomsa Txtpratno Citsacce Acrlsacce ;-~!7!i7:771':~.,!:!!i!!U c p m e s a u Ucpratno Ucporycu i~:!i,!i!!if-!,!::~7,!!U c p h o m s a Ucpbosta :::i.~ff:::[;i:!i~i,i::i:;ir ~| M2omcaeel .............
::::)::!:!!. :!i!i :.ii% :~:-:::1 -::....-.. 9 .....
-.-:~:.t:~:;:: .......... .... :.9 ::::::::..::> .:..:.... !TiiiTii:::;~77~i.:7:::J
_
....
:,::..-...!
i!i!!ii!!!ili!i 7::!::i:/! ..................... _=_ ....... :<.,.. .... ........
::...:: :~.,.::.~.~........
i!iiTii!i! it :~;!Liii:,i:~t!;iT.;
=.ii!: !ii! ii!ill ....~:-:.::,.,-:,:::.: .:. ::.s
M2omhomsa
..i:i;:.~Z
166
ili:~i::ili~-::iiii~ii!i~
i!:ii~i:,i!.~i:i!i~
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the alignments: Adtlratno, Adtlbosta (Adtlhomsa); Adt2ratno, Adt3bosta, Adt3homsa (Adt2homsal; Adt3sacce (Adt2saccel; Gdcbosta, Gdcratno (Gdehomsa); M2ombosta (M2omhomsa); Ucpmesau, Ucpmusmu (Uephomsa). Residues listed in the consensus sequence are present in at least 75% of the aligned transporter sequences. Residues indicated by boldface type are also conserved in the mitochondrial phosphate carrier family.
Database accession numbers Acrlsacce Adt 1arath Adtlbosta Adtlhomsa Adtlmusmu Adt lratno i~i.L~ il!:L~ i'i!~ ii Adtlsacce Adtlsoltu i~ ~:ii"~-~iii.!,!~ii~!.: Adtlzeama Adt2arath Adt2homsa Adt2ratno Adt2sacce Adt2soltu Adt2zeama Adt3 bosta Adt3homsa Adt3sacce Adtanoga Adtcaeel Adtchlke Adtchlre Adtdrome Adtneucr Adtorysa !:.ii~:i:i~:;:~i~)iAdtplafa ii.,i Adtransy Adttritu :::::::::::::::::::::::::: Adttrybr .:~ii ::::::::::::::::::::: Altlhalro Cithomsa Citsacce ~:-:.::======================= q.:.:.,;:,. Dffl caeel %:!i: i:;:::?. Flxl sacce Gdcbosta Gdchomsa Gdcratno M2ombosta M2omcaeel M2omhomsa Pet8sacce Pmtsacce . ... Rim2sacce Txtpramo Ucpbosta Ucphomsa
i!ii!il
iii!iii!!ii .i,
SWISSPROT
PIR
EMBL/GENBANK
P33303 P31167 P02722 P12235 P48962 Q05962 P04710 P25083 P04709 P40941 P05141 Q09073 P18239 P27081 P12857 P32007 P 12236 P18238
$36407; $43280 $21313 A03181; A24822 A28116; A39891 U27315 X61667; D 12770 A24849 S17917; $21974 A24072; S05199 $29618; $29852 A29132; C28116 D 12771 A31978; $36419 S 14874 S05200; S 1 6 5 6 8 B43646 S03894; B28116 A36582 $31935 X76112 A41677 $30259 $43651 A03182 JS0711 X83551 U44832 X80023 U32987 D83069 U25147 $44554; $46173 $55056; $44090 $48400 Q01888; $26595 A40141 M32973 A36305; $29597 $44091 $29598 $45120; $45458 $25357 $36081 A46595 S03603 A45763
Z25485; Z49595 X65549 M13783; M24102 J02966;J03593
P31692 P27080 P02723 P31691 $51132
P40464 P 16260 P 16261 P22292 Q02978 P38921 P32332 P38127 P32089 P 10861 P25874
M12514; Z49703 X62123 X57556;X15711 X68592 M57424; J02683 X77291;J04021 X5 7557 X59086;X15712 M24103 J03592 M34076; Z35954 Z21814; Z21815 M76669 X65194 X00363 D 12637
U17503; X76053 X76115;Z48240 L41168; Z38059 X66035 M31659 X66115;M58703 X76114 X66114 U02536; X77114 $44213; Z28120 Z36061 L 12016 X 14064 X51952; X51953
167
Mitochondrial adenine nucleotide translocator family
Ucpmesau Ucpmusmu Ucporycu Ucpratno
SWISSPR OT
PIR
EMBL/GENBANK
P04575 P12242 P 14271 P04633
A24363; $34268 A31106 A32446 A26294; A29278
X73138 M21247; M21222 X14696 M11814; X03894
References 1 Kuan, J. and Saier, M. (1993) CRC Crit. Rev. Biochem. Mol. Biol. 28, 209-233. 2 Zarrilli, R. et al. (1989) Mol. Endocrinol. 3, 1498-1508.
Mitochondrial phosphate cartier family
!!!ii!!!!! ;~:.:-: :.:~ ......
~:~.:~::~,.:,~, ~-,,
............
Summary Transporters of the mitochondrial phosphate carrier family, the example of which is the PHC phosphate carrier protein of humans (Mpcphomsa), mediate the uptake of phosphate from the cytosol into the mitochondria. Members of the family are ubiquitous in eukaryotes. Statistical analysis of multiple amino acid sequence comparisons indicates that the mitochondrial phosphate carrier family is most closely related to the mitochondrial adenine nucleotide translocator family 1. Members of the mitochondrial phosphate cartier family are comprised of three homologous domains and are predicted to containing six membrane-spanning helices by the hydropathy of their amino acid sequences, reaction with peptide-specific antibodies and susceptibility to proteolysis e. Several amino acid sequence motifs are highly conserved in the mitochondrial phosphate cartier family, including motifs that are unique to the family and motifs common to the Mitochondrial adenine nucleotide translocator family.
Nomenclature, biological sources and substrates CODE
DESCRIPTION [SYNONYMS]
Mpcpratno
Mitochondrialphosphate carrier [MPCP,PHC] Mitochondrialphosphate carrier [MPCP, PHC] Mitochondrialphosphate carrier [MPCP,PHC] Mitochondrialphosphate carrier [MPCP,P H C ] Mitochondfialphosphate cartier [MPCP,MIR1 YJR077C]
Mpcphomsa Mpcpbosta Mpcpcaeel Mpcpsacce
ORGANISM [COMMON NAMES] Rattus norvegicus
SUBSTRATE(S)
Phosphate
[rat] Phosphate
Homo sapiens
[human] Phosphate
Bos taurus
[cow] Caenorhabditiselegans
Phosphate
[nematode] Phosphate
Saccharomyces cerevisiae [yeast]
Phylogenetic tree Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the phylogenetic tree: Mpcpratno, Mpcpbosta (Mpcphomsa). [
[
. . . .
Mpcpcaeel Mpcphomsa Mpcpsacce
Proposed orientation of PHC in the membrane The model is based on predictions of membrane-spanning regions and ~-helical content. The N-terminus of the protein is illustrated on the inside and is folded six times through the membrane.The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75 % of the aligned transporters (see below} are shown.
Mitochondrial phosphate carrier family
i.li~i~!::i,i::iTi
OUTSIDE P Y [-"~
.::....:.. :
G
RI
L [---'-"~ K
I
I
IG "1
V
F
DI
:.:.. :..:..
[
K
El
I
IF
S
G
DI
IG
" I '~ !1 ,-il ~,o !! ~'!1"(
... :2 7;.. -~. : . ::5,,
HT
~
~ili~ Q
L .... , -::.... ?:'-
GL
G
A
KFG
-.5-
.
Y
VI~ G
R
G
I.D.
I=K
I
.
":'7,"
1,4,111a11~!1~11~
...
.::.
:.::..
.... 7.:
= "
K
E
LG
S
S
~ L
-: = 7:
INSIDE
NH 2
~. ~.;:~:.
p P E COOH
~i~::.~':.:.i~:~i:i Physical and genetic characteristics i!:/'!:!:-:::. .... ..~::-. ~:::~i::/:ii: :i::i:!::~ :!i::::iii:~::; . i':i :i!. ~:!:i:::i: ~
Mpcpbosta Mpcpcaeel Mpcphomsa Mpcpratno Mpcpsacce
AMINO ACIDS 362 340 362 356 311
MOL. WT
CHR O M O S O M A L LOCUS
40 139 36 674 40095 39 445 32 812
12q23 Chromosome
10
Multiple amino acid sequence alignments :.: ....:.- .. : .:. :.
!-:
.
1 Mpcpcaeel . . . . . . . . . . . . . . . . . . .
50 M SVFSQLAE.. SSKQNPFSLP VRSGN.CASA
Mpcphomsa MFSSVAHLAR ANPFNTPHLQ LVHDGLGDLR SSSPGPTGQP RRPRNLAAAA Mpcpsacce ............................................... MSV Consensus ..................................................
::
Mpcpcaeel Mpcphomsa Mpcpsacce Consensus
51 VSAPGQVEFG SGKYYAYCAL VEEQYSCDYG SGRFFILCGL SAAPAIPQYS V S D Y M K F . A L ................... L
GGVLSCGITH GGIISCGTTH AGAIGCGSTH .G...CG.TH
TAIVPLDLVK TALVPLDLVK SSMVPIDVVK ...VP.D.VK
i00 CRIQVNPEKY CRMQVDPQKY TRIQLEPTVY .R.Q..P..Y
Mpcpcaeel Mpcphomsa Mpcpsacce Consensus
i01 .T G I A T G F R T T I A E E G A R A L .K G I F N G F S V T L K E D G V R G L NKGMVGSFKQ IIAGEGAGAL ..G .... F . . . . . . . G...L
VKGWAPTLLG YSAQGLGKFG AKGWAPTFLG YSMQGLCKFG LTGFGPTLLG YSIQGAFKFG ..G..PT.LGYS.QG..KFG
150 FYEIFKNVYA FYEVFKVLYS GYEVFKKFFI .YE.FK ....
...
! . i -:-. ::....:
.:.
..
17C
Mpcpcaeel Mpcphomsa Mpcpsacce Consensus
151 DMLGEENAYL YRTSLYLAAS NMLGEENTYL WRTSLYLAAS DNLGYDTASR YKNSVYMGSA ..LG . . . . . . . . . S.Y ....
LAPMEATKVR LAPMEAAKVR LCPLEATRIR L.P.EA...R
200 IQTSPGAPPT IQTQPGYANT LVSQPQFANG .... P .....
AEGLTGFYKG EEGLKAFYKG EEGIGSFYSG .EG...FY.G
LPPLWMRQIP YTMMKFACFE VAPLWMRQIP YTMMKFACFE FTPILFKQIP YNIAKFLVFE ..P .... Q I P Y . . . K F . . F E
250 KTVEALYQYV RTVEALYKFV RASEFYYGFA ...E..Y...
251 VPKPRAECSK AEQLVVTFVA VPKPRSECSK PEQLVVTFVA GPKEKL.. SS T S T T L L N L L S .PK ..... S . . . . . . . . . . .
GYIAGVFCAI VSHPADTVVS GYIAGVFCAI VSHPADSVVS GLTAGLAAAI VSQPADTLLS G..AG...AIVS.PAD...S
300 KLNQDSQATA VLNKEKGSSA KVNKTKKAPG ..N .......
201 Mpcpcaeel LRGCAPMIYK Mpcphomsa LRDAAPKMYK Mpcpsacce LVGGFSRILK C o n s e n s u s L ........ K Mpcpcaeel Mpcphomsa Mpcpsacce Consensus
ASAEFFADIL ASAEFFADIA AMAEFLADIA A.AEF.ADI.
301
Mpcpcaeel Mpcphomsa Mpcpsacce Consensus
GGI L K K L G F A G V W K G L V P R I I M I ....... SLV L K R L G F K G V W K G L F A R I I M I QSTVGLLAQL AKQLGFFGSF AGLPTRLVMV ........... K.LGF.G...GL..R..M.
Mpcpcaeel Mpcphomsa Mpcpsacce Consensus
351 PRPPPPEMPA SLKAKLAAQQ PRPPPPEMPE SLKKKLGLTQ G C P P T I E I G G GGH ....... . .PP..E . . . . . . . . . . . . .
.......
GTLTALQWFI GTLTALQWFI GTLTSLQFGI GTLT.LQ..I
350 YDSVKVALNL YDSVKVYFRL YGSLKS.. TL Y . S . K .... L 370
Proteins listed subsequently in italics are at least 90% identical to the paired transporters listed in parenthesis and therefore are not included in the alignments: Mpcpratno, Mpcpbosta (Mpcphomsa). Residues listed in the consensus sequence are present in at least 75% of the transporter sequences shown. Residues indicated by boldface type are also conserved in the mitochondrial adenine nucleotide translocator family.
Database accession numbers SWISSPR OT Mpcpbosta Mpcpcaeel Mpcphomsa Mpcpratno Mpcpsacce
P12234 P40614 Q00325 P 16036 P23641
PIR A24265; A29453 $44093 $30487; A53737 A34350 S12318; A37138
EMBL/GENBANK X77338;X05340 X76113 X77377;X60036 M23984 X57478;M54879
References 1 Kuan, J. and Saier, M. (1993) CRC Crit. Rev. Biochem. Mol. Biol. 28, 209-233. z Ferreira, G.C. et al. (1990)J. Biol. Chem. 265, 21202-21206.
1.71
Nitrate transporter I family
Summary :::::::::::::::::::::::::::::::::: ::
...........
':..: , L:.:...
:.:i!::~i~i!:.:.!:!!!!!!!?i: ...........
::::::::::::::::::::::::::::::::::::: ~, .:............................
:~:..i::.~:::::::::::::::::::::::: :: ":~::::2:?:;~: ;;4
....: .:.-.:.:.::::.: .::~ :.: 9 :....::.:..:::..
Transporters of the nitrate transporter I family, the example of which is the NARK nitrate-nitrite facilitator protein of Escherichia coli (Narkescco), mediate uptake of nitrate and nitrite. Members of the NARK family are found in both gram-positive and gram-negative bacteria. Statistical analysis reveals no apparent relationship between the amino acid sequences of the NARK family and any other family of transporters. Members of the nitrate transporter I family are predicted to form 11 or 12 membranespanning helices by the hydropathy of their amino acid sequences. Several amino acid sequence motifs are highly conserved in the nitrate transporter I family.
Nomenclature, biological sources and substrates CODE
iL IZ i ~:::.:::.:::::::::::::::::::::
Nar2escco .......................... :.:
Narkbacsu
..........................
Narkescco ~!!!!i!!!~!i!:~!:!! ::.~
Nasabacsu
DESCRIPTION [SYNONYMS] Nitrite extrusion protein
ORGANISM [COMMON NAMES] Escherichiacoli
SUBSTRATE(S)
[NAR2]
[gram-negative bacterium]
Nitrate-nitrite facilitator [NARK] Nitrate-nitrite facilitator
Bacillussubtilis [gram-positive bacterium] Escherichiacoli
[NARK]
[gram-negative bacterium]
Nitrate transporter [NASA]
Bacillus subtilis [gram-positive bacterium]
Nitrite Nitrite, nitrate Nitrite, nitrate Nitrate
Phylogenetic tree G
[
Nar2escco Narkescco Narkbacsu Nasabacsu
Proposed orientation of NARK in the membrane The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last i.:::!i!iii~i~!:i:! residue of each membrane-spanning helix are boxed. Residues that are iiiiii:, conserved in more than 75% of the aligned transporters (see below} are shown. Consensus residues indicated by an asterisk are not conserved in NARK. ~::~' =============================== -i:-:
.......
172
~.................... .+:
OUTSIDE
............... . = : ......... .: ...................
~".:'Z;;;.; :,:,:~;..:;; :.Y. ,,."; ...... ::::::::::::::::::::::: .::.::::
F
V"
V
.............
_ _ _
V
--
2
;<.,,.-~,~:~:~.~:~:.~.~ ................ .............................. .............. ....................
"
i
'L!~ .
.
.
.
l~
.
m
m
! lI
.
~.
r~
:.;;;:::':;;G;::~;;:;:::;:;.: :
~:,. IF i
F
:.:.:.:.:-:.:c.:.:.:.::.:.:.-:.-:
,-,.,.--:-:v:--,-::-,-,-,-,,-,. ~:~.~;::~:~;.~:~, .................
v
................. .......................
_ ~ }
~:r i rill aL_
G
:qmi[~ Y!~ !:L : .-i
i
'
..................
F ~S,!-
T
................
I~
ii"ll G
N
S
i!:,!~,~,|
P
.................
K
i:~~; i 84~,T
G
o
N
GI
G
................
V
K
SD G
COON
S
QG A G I *
SKQI_
R)@N;;!;~!iil s,2:s
ii-Nt~f. :iii)!~i
INSIDE
NH
VF
Physical and genetic characteristics Nar2escco Narkbacsu Narkescco Nasabacsu
:::::::::::::::::::::::::::
................
ili@@~i .............. ilNiiii!i!i';ili
...................... .,-:.
.................
ii%I:I:~:#}:::ii,::i~::
AMINO ACIDS
MOL. WT
462 395 463 421
49890 42956 49 693 46067
CHROMOSOMAL LOCUS 324 ~ 27.49 m i n u t e s 28 ~
Multiple amino acid sequence alignments Nar2escco Narkescco Narkbacsu Nasabacsu Consensus
1 ..MALQNEKN SRYLLRDWKP ENPAFWENKG MSHSSAPERA TGAVITDWRP EDPAFWQQRG . . . . . . . . . . . . . . . . . . . . . . . . . . MINR .................... MKLSELKTSG ............................. G
51 Nar2esccoVWMLFSAVTV NarkesccoVWMLFSAVAV NarkbacsuVWVLISSLIS Nasabacsu IWVMLGALGV Consensus VW.L.SA..V
KHIARRNLWI QRIASRNLWI QHI...QLSL HPL...TLLC ....... L..
50 SVSCLLLAFC SVPCLLLAFC QSLSLVAGFM SFLYFDVSFM S...L...F.
NLNKIGFNFT TDQLFLLTALPSVSGALLRV NLPKVGFNFT TDQLFMLTAL PSVSGALLRV QIT.LDIHLS KGEISLVTAI PVILGSLLRI YIS.QDFGLS PFEKGLVVAV PILSGSVFRI ...... F . . . . . . . . L.TA. P . . S G . L L R .
i00 PYSFMVPIFG PYSFMVPIFG PLGYLTNRFG ILGILTDRIG P ....... FG
i01 150 Nar2escco GRRWTVFSTA ILIIPCVWLG IAVQNPNTPF GIFIVIALLC GFAGANFASS Narkescco GRRWTAFSTG ILIIPCVWLG FAVQDTSTPY SVFIIISLLC GFAGANFASS N a r k b a c s u A R L M F M V S F I L L L F P V F W I S IA .... D.SL F D L I A G G F F L G I G G A V F S I G
N a s a b a c s u P K K T A V I G M L V T M I P L L W G T FG .... G R S L T E L Y A I G I L L G V A G A S F A V A C o n s e n s u s .R ..... S . . . L . I P . . W . . . A ........... I.I..L.G.AGA.FA.. Nar2escco Narkescco Narkbacsu Nasabacsu Consensus
151 MGNISFFFPK MANISFFFPK VTSLPKYYPK LPMASRWYPP .... S . . . P K
Nar2escco Narkescco Narkbacsu Nasabacsu Consensus
201 GVPQADGSVM SLANAAWIWVPLLAIATI.A AWSGMNDIAS GVKQPDGTEL YLANASWIWVPFLAIFTI.A AWFGMNDLAT A V G W K S T V Q M YL ....... I . L L A V F A L L H V L F G D R H E K K Q F G W H I V M G I A L ....... I P L L I V F I L F V S M A K D S P A Q P .V . . . . . . . . . L . . . . . . . . P L L A . F . . . . . . . G ......
AKQGSALGIN QKQGGALGLN EKHGVVNGIY HLQGLAMGIA .KQG.A.GI.
200 GGLGNLGVSV MQLVAPLVIF VPVFAFLGVN GGLGNMGVSV MQLVAPLVVS LSIFAVFGSQ GA.GNIGTAV TTFAAPVI .......... AQ G A . G N S G T L F A T L F G P R L . . . . . . . . . . AE G..GN.G..V ..L.AP .............. 250 SRASIADQL. SKASIKEQL. VKVSVKTQIK SPQPLKSYLH S..S.K.QL.
251 300 Nar2escco PVLQRLHLWL LSLLYLATFG SFIGFSAGFA MLAKTQFPDV NILLAFRFGP Narkescco PVLKRGHLWI MSLLYLATFG SFIGFSAGFA MLSKTQFPDV QILQYAFFGP Narkbacsu AVYRNHVLWF LSLFYFITFG AFVAFTIYLP NFLVEHFGLN PADAGLRTAG Nasabacsu .VFGQKETWF FCLLYSVTFG GFVGLSSFLS IFFVDQYQLS KIHAGDFVTL C o n s e n s u s .V ..... L W . . S L L Y . . T F G . F . G F S . . . . . . . . . QF . . . . I . . . . . . . .
Nar2escco Narkescco Narkbacsu Nasabacsu Consensus
.........
::::::::::::::::::::::::::::::::::::::::: ..................... . .........
=.......:.:_.
.........
..... ..::.: .....................
. . . . . .
301 FIGA..IARS VGGAISDKFG GVRVTLINFI FMAIFSALLF FIGA..LARS AGGALSDRLG GTRVTLVNFI LMAIFSGLLF F I A V S T L L R P A G G F L A D K M S P L R I L . . M F V F T G ....... C V A A G S F F R P V G G L I S D R V G G T K V L S V L F V I V A ....... F I . A .... R . . G G . . S D . . G G . R V .... F . . . . . . . . . . .
351 Nar2escco NFIAFYAVFM GLFLTAGLGS Narkescco SFMAFFAVFL ALFLTAGLGS Narkbacsu SPTIGLYTFG SLTVAVCSGI Nasabacsu SLSMGYGPIV CRNDGARNGK C o n s e n s u s S ....... F . . L . . . A . . G .
Nar2escco Narkescco Narkbacsu Nasabacsu Consensus
GSTFQMIAVI GSTFQMISVI GNGTVFKLVP RRSIPARAAAL G ....... V.
350 LTLPGTG.SG LTLPTDGQGG LTLSGIILSF LCMAGVSSLP L T L . G .....
400 FRQITIYRVK MKGGSDEQAH FRKLTMDRVK AEGGSDERAM F .... Y F S K Q A G I A N G I V S A .... P Q R N R H G D G N R R C G R F ...... R . . . . . G ......
401 450 KEAVTETAAALGFISAIGAV GGFFIPQAFG MSLNMTGSPV GAMKVFLIFY REAATDTAAALGFISAIGAI GGFFIPKAFG SSLALTGSPV GAMKVFLIFY MGGLGGFFPP LILASVFQAT GQYAIGFMAL SEVALASFVL VIWMYWQERM RN ...... RR V F L A E H L R . . . . . . I S Q T D D R H I C Y R F I T F P V S R C W R L H L .......... L...S...A.G...I .........................
......
......... ................... i:ii~iiiii:-i:.i~::.:ii :i!i: :.iiii
..........
-.-:-:-:- ........... :
451 491 Nar2escco IVCVLLTWLV YGRRKFSQK ...................... Narkescco IACVVITWAV YGRH..SKK ...................... N a r k b a c s u K T H T E R N S Q S IN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nasabacsu CLPQAITGGK AGARKAARRM FRILTDTRLC VSFLFSLDFQ Q C o n s e n s u s ...... T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Residues listed in the consensus sequence are present in at least 75 % of the aligned transporter sequences.
174
Database accession numbers Nar2escco Narkbacsu Narkescco Nasabacsu
SWISSPROT
PIR
P46907 P 10903 P42432
S05239
EMBL/GENBANK X94992 Z49884 X15996; X13360 D30689
17~
mm
Nitrate transporter II family
Summary i:::.i{.:i{i{{iiiii!ii!gi{i ~:~:~:~:~:.:~:.~:~-~
i!:~i~i{:~:i,:'{i{i Transporters of the nitrate transporter II family, the example of which is the {i{i!i~!i!;{!{i~:~' CRNA nitrate transporter of Emericella nidulans (Crnaemeni), mediate uptake of nitrate and nitrite in molds and algae. Statistical analysis reveals no significant relationship between the amino acid sequences of the nitrate transporter II family and any family of transporters. Both transporters of the nitrate transporter II family are predicted to form ten membrane-spanning helices by the hydropathy of their amino acid sequences.
Nomenclature, biological sources and substrates CODE
Cmaemeni Nitrchlre
DESCRIPTION [SYNONYMS] Nitrate transporter [CRNA] Nitrate transporter
ORGANISM [COMMON NAMES] Emericellanidulans [mold] Chlamydomonas rheinhardtid [algal
SUBSTRATE(S)
Nitrate, nitrite Nitrate
Proposed orientation of CRNA in the membrane The model is based on predictions of membrane-spanning regions and ~-helical content. The N-terminus of the protein is illustrated on the inside and is folded ten times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanmng helix are boxed. OUTSIDE
i!ii iI
NH 2
17(
i: !~.....
;~, i):
I
INSIDE
~i:,ii,:
......
COOH
Physical and genetic characteristics AMINO ACIDS 507 547
Cmaemem Nitrchlre
MOL. W T 54 925 59 313
Multiple amino acid sequence alignments 1
50
Crnaemenl ......... M DFAKLLVASP Nitr chlr e MAEKPATVNA ELVKEMDAAP Consensus ............. K...A.P
E V N P .... N N R K A L T I P V L N P F N T Y G R V F F KKYPYSLDSE GKANYCPVWR FTQPHMMAFH ...P . . . . . . . K A . . . P V . . . . . . . . . . F.
51
i00
Crnaemenl FSWFGFMLAF LSWYAFPPLL TVTIRDDLDM SQTQIANSNI IALLATLLVR Nitr chlr e LSWI CFFMSF VATFA. PASL APIIRDDLFL TKSQLGNAGV AAVCGAIAAR C o n s e n s u s . SW.. F... F .... A . P . . L ... I R D D L . . . . . Q . . N . . . . A . . . . . . . R i01 Crnaemenl LICGPLCDRF Nitrchlr e IFMGIVVDSI Consensus ...G...D..
GPRLVFIGLL GPRYGAAATM GPR .......
150 LVGSIPTAMA GLVTSPQGLI ALRFFIGILG LMTAPAVFCM ALVTDFSTFA CVRFFIGLSL L .......... LVT ........ RFFIG...
151
Crnaemenl GTFVPCQVWC TGFFDKSIVG TANSLAAGLG Nitr chlr e CMFVCCQFWC GTMFNVKIVG TANAIAAGWG Consensus . .FV. C Q . W C . . . F . . . I V G T A N . . A A G . G 201
C r n a e m e n l D Q G L P A H K A W R V A Y I V P . FI L I V A A A L G M L Nitr chlr e DGGVPGYQAW RWAFFVPGGI YILTATLTLL ConsensusD.G.P...AWR.A..VP..I .... A . L . . L
Crnaemenl Nitrchlre
251
250
FTCDDTPTGK WSERHIWMKE LGIDHPSGKD Y ......... ...D ................
300
YAIPDVEKKG RDLKKEG
TETPLEPQSQ T .........
Crnaemenl AIGQFDAFRA NAVASPSRKE AFNVIFSLAT MAVAVPYACS Nitr chlr e ....... LKA KGAMWPVVKC GLGNYRSW... ILALTYGYS Consensus ......... A ........ K ....... S ...... A..Y..S
FGSELAINSI FGVELTVDNV F G . E L .....
351 Crnaemeni LGDYYDKNFP YMGQTQTGKW AAMFGFLNIV CRPAGGFLAD Nitr chlr e IVEYLFDQFG .L N L A V A G A L G A I F G L M N L F T R A T G G M I S D C o n s e n s u s . . . Y .... F . . . . . . . . G . . . A . F G . . N . . . R . . G G . . . D
400 FLYRKTNTPW LV.AKPFGMR .... K .....
401 Crnaemeni AKKLLLSFLG VVMGAFMIAM Nitr chlr e GRIWALWIIQ TLGGIFCIVL C o n s e n s u s ..... L . . . . . . . G . F . I . .
450 FLESCNGAIF FCQQACGLHF F ..... G . . F
Consensus
DTQTASKGNI VDLSSGAQSS RPSGPPSIIA .................................
200
NAGGGITYFV MPAIFDSLIR N M G G G A C H F I M P L I Y Q G . IK N . G G G ..... M P . I .... I.
..................................
D..K.G
301
GFSDPKSEAT MFGLTAGLAF G. K V S N S L S S T I V I M I V F S I G ..... S . . . . . . . . . . . . .
T .........
350
17V
451 500 Crnaemeni SLVPHVHPYA NGGSSPAWWV DSGTSAVSSS PSSSAIVIT... TTRAASGF Nitrchlre GITPFVSRRA YGVVSGLVGA GGNTGAAITQ AIWFAGTAPW QLTLSKADGF :::::::::::::::::::::::::::::::::::::::::: i!: !!!!i!:i !i: i :,~i!!!~ii~:i :Ci o n s e n s u s . . . P . V . . . A .G.. S . . . . . . . . T . A . . . . . . . . A . . . . . . . T . . . A . G F :::::::::::::::::::::::::::::::
iiii;i~i;:!:!~i:!iii;i:i;i ii-i~i}~;.5~:-'.!i,i!f
~ii:;;!~!iii!?%iii:i!!iil!
501 550 Crnaemeni .................................................. Nitrchlre VYMGIMTIGL TLPLFFIWFP MWGSMLTGPR EGAEEEDYYM REWSAEEVAS Consensus ..................................................
.......... =============================================
!~!:~:~::.======================= <~:~::.~..::~~:::.~.:~.~,
551 600 Crnaemeni .................................................. N i t r chlr e G L H Q G S M R F A M E S K S Q R G T R D K R A A G P A R V P Q Q L R L G A R C C Q A R G G L I R C Consensus ..................................................
:::::::::::::::::.::::::~:::-~::~:
601 613 C[naemeni ............. ;i!~:iii!iii!:;~ii;i!!!;~i~ Nitr chlr e V L A Q S L G A T G CSD Consensus ............. ======================================
:~i!ii~i',i~i;ii!i ~!
!i',~iii!i~'i;~i i?!~,i!!i!i!i!::iii!::ii~':il
Residues listed in the c o n s e n s u s s e q u e n c e are present in both transporter sequences. Database a c c e s s i o n n u m b e r s Crnaemeni Nitrchlre
178
SWISSPROT P22152 $40142
PIR A38560 Z25438
EMBL/GENBANK M61125
Spore germination transporter family
Summary Transporters of the spore germination family, the example of which is the spore germination protein GRAIl from Bacillus subtilis 1 (Gra2bacsu) mediate amino acid transport by an unknown mechanism. They are involved in the stimulation of spore germination in response to chemical triggers, and each functions as part of a complex of three proteins. These transporters have only been found in gram-positive bacteria. Statistical analysis of multiple amino acid sequence comparisons suggests that the spore germination transporter family may be distantly related to the "APC" family of uni-, sym- and antiporters 2. Both GRAII and GRBII 3 are predicted to contain 11 transmembrane domains by the hydropathy of their amino acid sequences. This would be a very unusual topology; it is quite likely that these proteins actually contain 12 such helices, but that one helix is not well predicted.
Nomenclature, biological sources and substrates CODE
DESCRIPTION
ORGANISM
[SYNONYMS]
[COMMONNAMF.S]
Gra2bacsu Sporegermination protein A2 [GERAB, GERA2, GRAII] Grb2bacsu Sporegermination protein B2 [GERBB, GRBII]
SUBSTRATE(S)
Amino acids
Bacillus subtilis
[gram-positivebacterium] Amino acids
Bacillus subtilis
[gram-positive bacterium]
Proposed orientation of GRAII in the membrane The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the outside and is folded 11 times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. OUTSIDE NH 2
i
i2
, :-!
Smmm
m
i~.i~.,! L.:....... :
COOH
IN~II3F
17~
Physical and genetic characteristics AMINO ACIDS 364 368
Gra2bacsu Grb2bacsu
MOL. W T
EXPRESSION SITES
41 259 41 709
sporangium sporangium
CHROMOSOMAL LOCUS gerA gerB
Multiple amino acid sequence alignments 1
50
Gra2bacsu MSQKQTPLKL NTFQGISIVA NTMLGAGLLT LPRALTTKAN TPDGWITLIL Grb2bacsu MRKSEH--KL TFMQTLIMIS STLIGAGVLT LPRS-AAETG SPSGWLMILL C o n s e n s u s M . . . . . . . . L ...Q . . . . . . . T . . G A G . L T L P R . . . . . . . . P . G W . . . L L Gra2bacsu Grb2bacsu Consensus Gra2bacsu ~!!ii~!!:j41j!!~,!i!!!G r b 2 b a c s u Consensus ...................
ii;!!i2.?~:-i::< -72t ::::::::::::::::::::::::::::....
Gra2bacsu i!d:!!y~i:;,!~i!~!~!~i;! G r b 2 b a c s u Consensus ......... ::.:. .... ................
N ............. ~:~:.
N :.L.:.L.
!Ni:i~?~,
Nil!i!
ii:~;i~iii!::~kliiii~i!!i!ii::
N
51 I00 EGFIFIFFIY LNTLIQKKHQ YPSLFEYLKE GLGKWIGSII GLLICGYFLG QGVIFIIIVL LFLPFLQKNS GKTLFKLNSI VAGKFIGFLL NLYICLYFIG . G . I F I .... L ...... K . . . . . LF . . . . . . . G K . I G . . . . L . I C . Y F . G I01 VASFETRAMA IVCFQARILG ...F..R...
EMVKFFLLER EVVGFFLLKN E.V.FFLL..
150 TPIQVIILTF ICCGIYLMVG GLSDVSRLFP TRMAVVVFIF LAVAIYHVGG GVYSIAKVYA T . . . V .... F .... I Y . . . G G . . . . . . . . .
151 200 FYLTVTIIIL LIVFGISFKI FDINNLRPVL GEGLGPIANS LTVVSISFLG YIFPITLIIF MMLLMFSFRL FQLDFIRPVF EGGYQSFFSL FPKTLLYFSG ..... T . I I . . . . . . . SF.. F ..... R P V . . . G . . . . . . . . . . . . . . F . G
Gra2bacsu Grb2bacsu Consensus
201 250 MEVMLFLPEH MKKKKYTFRY ASLGFLIPII LYILTYIIVVGALTAPEVKT FEIIFYLVPF MRDPKQVKKA VALGIATSTL FYSITLLIVI GCMTVAEAKT .E .... L... M . . . K . . . . . . . L G . . . . . . . Y . . T . . I V . G . . T . . E . K T
Gra2bacsu Grb2bacsu Consensus
251 300 LIWPTISLFQ SFELKGIFIE RFESFLLVVWIIQFFTTFVI YGYFAAN-GL VTWPTISLIH ALEVPGIFIE RFDLFLQLTW TAQQFACMLG -SFKGAHIGL ..WPTISL .... E..GIFIERF..FL...W ..Q.F . . . . . . . . . . A . . G L
Gra2bacsu Grb2bacsu Consensus
301 350 KKTFGLSTKT SM-VIIGI-- TVFYFSLWPD DANQVMMYSD YLGYIFVSLF TEIFHLKNKN NAWLLTAMLA ATFFITMYPK DLNDVFYYGT LLGYAFLIVI ...F.L ................ F ..... P. D . N . V . . Y . . . L G Y . F ....
Gra2bacsu Grb2bacsu Consensus
351 373 L L P F I L F F I V A L K R R I T T K - -TIPFFVWFLS WIQKKIGRGQ LQ ..PF...F ....... I ......
Residues listed i n t h e consensus sequence are present i n b o t h transporter sequences. Database accession numbers Gra2bacsu Grb2bacsu
18(
SWISSPR O T P07869 P39570
PIR A26470
EMBL/GENBANK M16189; G142961 L16960; G289276
~s Zuberi, A.R. et al. (1987) Gene 51, 1-11. 2 Reizer, J. et al. (1993) Protein Sci. 2, 20-30. a Corfe, B.M. et al. {1994} Microbiology 140 (Pt 3), 471-478.
m
Vacuolar membrane pyrophosphatase family Summary !!!!i!~!!~!:i!i,~i!~il Transporters of the vacuolar membrane pyrophosphatase family 1"2, the example .of which is the pyrophosphate-energized vacuolar membrane proton pump from Arabidopsis thaliana i {Avp3arath) mediate proton transport and control of the proton gradient across the vacuolar membrane (tonoplast) by inorganic phosphatase (H+-PPase; EC 3.6.1.1) activity. These transporters !~~ i,!:i~i;i~,!;j~ ,!;:iiil have only been found in plants. Statistical analysis of multiple amino acid sequence comparisons reveals no :i~' Ji!G;:~!~:':i!~! apparent relationship between these two transporters of the vacuolar i!:i!;il;~:ilG'~::i:~!ii!;i membrane pyrophosphatase family and any other family of transporters. The Arabidopsis protein ~ is predicted to contain at least 13 transmembrane helices by the hydropathy of its amino acid sequences; the protein from H o r d e u m vulgare 2 is predicted to contain twelve such helices. The Nterminus is predicted to lie within the cytoplasm. There is a characteristic cluster of charged amino acids in the most N-terminal intravacuolar domain.
Nomenclature, biological sources and substrates CODE
Avp3arath
DESCRIPTION [SYNONYMS]
Pyrophosphate-energized vacuolar membrane proton pump [Pyrophosphate-energized inorganic pyrophosphatase, AW3] Avp3betvu Pyrophosphate-energized vacuolar membrane proton pump [Pyrophosphate-energized inorganic pyrophosphatase] Avp3horvu Pyrophosphate-energized vacuolar membrane proton pump [Pyrophosphate-energized inorganic pyrophosphatase] Avp3vigra Pyrophosphate-energized vacuolar membrane proton pump [Pyrophosphate-energized inorganic pyrophosphatase]
OR GANISM [COMMON NAMES] Arabidopsis thah'ana
SUBSTRATE(S)
H§
[mouse-ear cress]
Beta vulgaris
H§
[sugar beet] Hordeum vulgare
H+
[barley] Vigna radiata
H§
[bean]
Proposed orientation of AVP3 in the membrane The model is based on predictions of membrane-spanning regions and a-helical content. The N-terminus of the protein is illustrated on the inside and is folded 12 times through the membrane. The predicted membrane-spanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed.
OUTSIDE
N
>. COOH
N
,
i
'
I
:
~
r
'
N
i
NH 2
INSIDE
b
II
Physical and genetic characteristics
< : : K ?:?::"?.::
Avp3arath Avp3betvu Avp3horvu Avp3vigra
AMINO ACIDS
MOL. WT
EXPRESSION SITES
770 761 761 765
80 819 79970 79 841 79 979
vacuolar vacuolar vacuolar vacuolar
membrane membrane membrane membrane
Multiple amino acid sequence alignments 1
50
Avp3arath MVAPALLPEL WTEILVPICA VIGIAFSLFQ WYVVSRVKLT SDLGASSSGG Avp3horvu M---AILGEL GTEILIPVCG VIGIVFAVAQ WFIVSKVKVT P--GALRR-C o n s e n s u s M . . . A . L . E L .TEIL.P.C. V I G I . F . . . Q W . . V S . V K . T ...GA ..... i',i~'~i:.ig:,!!ii':i
51 i00 Avp3arath ANNGKNGYGD YLIEEEEGVN DQSVVAKCAE IQTAISEGAT SFLFTEYKYV A v p 3 h o r v u - R R A K N G Y G D YLIEEEEGLN DHNVVVKCAE IQTAISEGAT SFLFTMYQYV Consensus .... KNGYGD YLIEEEEG.N D..VV.KCAE IQTAISEGAT SFLFT.Y.YV i01 150 Avp3arath GVFMIFFAAV IFVFLGSVEG FSTDNKPCTY DTTRTCKPAL ATAAFSTIAF Avp3horvu GMFMVVFAAI IFLFLGSIEG FSTKGQPCTY SKG-TCKPAL YTALFSTASF Consensus G.FM..FAA. IF.FLGS.EG FST...PCTY .... TCKPAL .TA.FST..F 151 200 Avp 3 ar ath VLGAVTSVLS GFLGMKIATY ANARTTLEAR KGVGKAF IVA FRSGAVMGFL Avp3horvu LLGAITSLVS GFLGMKIATY ANARTTLEAR KGVGKAFITA FRSGAVMGFL Consensus .LGA.TS..S GFLGMKIATY ANARTTLEAR KGVGKAFI.A FRSGAVMGFL
~8~
Vacu,,lar membrane pyrophosphatase family
~iiiiiiiiiii:;iil;~'ii2ili ............ .:2::,:..~.......... .......... ........
201 250 Avp3arath LAASGLLVLY ITINVFKIYY GDDWEGLFEA ITGYGLGGSS MALFGRVGGG Avp3horvu LSSSGLVVLY ITINVFKMYY GDDWEGLFES ITGYGLGGSS MALFGRVGGG Consensus L.. SGL. VLY ITINVFK. YY GDDWEGLFE. ITGYGLGGSS MALFGRVGGG 251 300 Avp3arath IYTKAADVGA DLVGKIERNI PEDDPRNPAV IADNVGDNVG DIAGMGSDLF Avp3horvu IYTKAADVGA DLVGKVERNI PEDDPRNPAV IADNVGDNVG DIAGMGSDLF Consensus IYTKAADVGA DLVGK. ERNI PEDDPRNPAV IADNVGDNVG DIAGMGSDLF
ii,i!i!i,,i;;i'i!)i
... ....... ............... .::::::::..;:..
iii!~i!!iii!i!~!!i ................
!~:;!!~!!i:i~:ii!i!i
301 350 Avp3arath GSYAEASCAA LVVASISSFG INHDFTAMCY PLLISSMGIL VCLITTLFAT Avp3horvu GSYAESSCAA LVVASISSFG INHDFTAMCY PLLVSSVGII VCLLTTLFAT Consensus GSYAE. SCAA LVVASISSFG INHDFTAMCY PLL. SS. GI. VCL. TTLFAT 351 400 Avp3arath DFFEIKLVKE IEPALKNQLI ISTVIMTVGI AIVSWVGLPT SFTIFNFGTQ Avp3horvu DFFEIKAANE IEPALKKQLI ISTALMTVGV AVISWLALPA KFTIFNFGAQ ConsensusDFFEIK...E IEPALK.QLI IST..MTVG. A . . S W . . L P . . F T I F N F G . Q 401 450 Avp3arath KVVKNWQLFL CVCVGLWAGL IIGFVTEYYT SNAYSPVQDV ADSCRTGAAT Avp3horvu KEVSNWGLFF CVAVGLWAGL IIGFVTEYYT SNAYSPVQDV ADSCRTGAAT Consensus K.V.NW.LF. CV.VGLWAGL IIGFVTEYYT SNAYSPVQDV ADSCRTGAAT 451 500 Avp3arath NVIFGLALGY KSVIIPIFAI AISIFVSFSF AAMYGVAVAA LGMLSTIATG Avp3horvu NVIFGLALGY KSVIIPIFAI AVSIYVSFSI AAMYGIAMAA LGMLSTMATG Consensus NVIFGLALGY KSVIIPIFAI A.SI.VSFS. AAMYG.A.AA LGMLST..TG 501 550 Avp3arath LAIDAYGPIS DNAGGIAEMA GMSHRIRERT DALDAAGNTT AAIGKGFAIG Avp3horvu LAIDAYGPIS DNAGGIAEMA GMSHRIRERT DALDAAGNTT AAIGKGFAIG Consensus LAIDAYGPIS DNAGGIAEMA GMSHRIRERT DALDAAGNTT AAIGKGFAIG
.....
551 600 Avp3arath SAALVSLALF GAFVSRAGIH TVDVLTPKVI IGLLVGAMLP YWFSAMTMKS Avp3horvu SAALVSLALF GAFVSRAGVK VVDVLSPKVF IGLIVGAMLP YWFSAMTMKS Consensus SAALVSLALF GAFVSRAG... VDVL.PKV. IGL. VGAMLP YWFSAMTMKS !!~!~!y::i?i~i ...
!~!:~:::::::~i::::::::::::::::::::::
N N ~
..
!:)~!~ii::~i::i:.i:ii:.:~;?
~ii i i ~!i .................
......
184
601 650 Avp3arath VGSAALKMVE EVRRQFNTIP GLMEGTAKPD YATCVKISTD ASIKEMIPPG Avp3horvu VGSAALKMVE EVRRQFNTIP GLMEGTAKPD YATCVKISTD ASIKEMIPPG Consensus VGSAALKMVE EVRRQFNTIP GLMEGTAKPD YATCVKISTD ASIKEMIPPG 651 700 Avp3ar ath CLVMLTPLIV GFFFGVETLS GVLAGSLVSG VQIAISASNT GGAWDNAKKY Avp 3horvu ALVMLTPLIV GTLFGVETLS GVLAGALVSG VQIAISASNT GGAWDNAKKY Consensus .LVMLTPLIV G.. FGVETLS GVLAG.LVSG VQIAISASNT GGAWDNAKKY 701 750 Avp3arath IEAGVSEHAK SLGPKGSEPH KAAVIGDTIG DPLKDTSGPS LNILIKLMAV Avp3horvu IEAGNSEHAR SLGPKGSDCH KAAVIGDTIG DPLKDTSGPS LNILIKLMAV Consensus IEAG. SEHA. SLGPKGS..H KAAVIGDTIG DPLKDTSGPS LNILIKLMAV
~-:~::~s ~:~:.:.~:..~::~::~: :::::::::::::::::::::::::::::::
751 770 Avp3arath ESLVFAPFFA THGGILFKYF Avp3horvu ESLVFAPFFA TYGGLLFKYI Consensus ESLVFAPFFA T.GG.LFKY.
~:.~.~:.:.:.~.~.~.~:., ~ .~.,.:
.:.::::::.:: : . : : :
~!~!:!!iY!!!!;:.i2!!~!i!!~
Proteins listed subsequently in italics are at least 90% identical to the paired transporter listed in parenthesis and therefore are not listed in the alignment: Avp3betvu, Avp3vigra (Avp3arath). Residues listed in the consensus sequence are present in both aligned transporter sequences. Database accession numbers Avp3arath Avp3betvu Avp3horvu Avp3vigra
SWISSPR OT
FIR
EMBL/GENBANK
P31414
A38230
Q06572
JC1466
M81892; G166634 L32791 D13472; G285638 U31467
References 1 Sarafian, V. et al. (1992) Proc. Natl Acad. Sci. USA 89, 1775-1779. 2 Tanaka, Y. et al. (1993) Biochem. Biophys. Res. Commun. 190, 1110-1114.
185
Gluconate transporter family
:::::::::::::::::::::::::::::::::::::::::;;
!;!i!:i!iiii!~:.iiiii;~:~:i: i
i
Summary Transporters of the gluconate transporter family, the example of which is the GNTP gluconate transporter of Bacillus subtih's (Gntpbacsu), mediate the uptake of gluconate. Members of the family are found in both gramnegative and gram-positive bacteria. Statistical analysis reveals no apparent relationship between the amino acid sequences of the gluconate transporter family and any other family of transporters. Members of the gluconate transporter family are predicted to contain 12, 13 and 14 membrane-spanning helices by the hydropathy of their amino acid sequences. Several amino acid sequence motifs are highly conserved in the gluconate transporter family.
Nomenclature, biological sources and substrates CODE
DESCRIPTION [SYNONYMS]
Dsdxescco
Dsdxpermease [DSDX] Gluconatepermease [GNTP] Gluconate permease [GNTP] Gluconate permease [GNTP] High-affinitygluconate transporter [GNTT, USGA, GNTM] Low-affinitygluconate transporter [ G N T U ]
i~Siii?;i%.:i~:~ii
Gntpescco i:iiGi!ii~ii<:::::i
Gntpbacsu
l:ii:ili~i!iii!':i:~i~'~:!: Gntpbacli Gnttescco Gntuescco ~9 ...:~:~....~;...~::.;:~.:.~
ORGANISM [COMMON NAMES] Escherichia cold
SUBSTRATE(S)
Unknown
[gram-negative bacterium] Gluconate
Escherichia cold
[gram-negative bacterium] Gluconate
Bacillus subtih's
[gram-positive bacterium] Bacillus h'cheniformistsave
Gluconate
[gram-positive bacterium] Gluconate
Escherichia cold
[gram-negative bacterium] Gluconate
Escherichia cold
[gram-negative bacterium]
P h y l o g e n e t i c tree Gntpbacli Gntpbacsu Gnttescco Gntpescco Gntuescco Dsdxescco
P r o p o s e d o r i e n t a t i o n of G N T P in t h e m e m b r a n e The model is based on predictions of membrane-spanning regions and ~helical content. The N-terminus of the protein is illustrated on the inside and is folded 14 times through the membrane 1. The predicted membranespanning helices are portrayed as rectangles. The numbers corresponding to the first and last residue of each membrane-spanning helix are boxed. Residues that are conserved in more than 75% of the aligned transporters (see below} are shown.
18~
IA TL
~3.,::~:::,.~'~:,:~:~-~.,*,.~:::::::::::::::::::::::::::::::::::::
OUTSIDE
F3N FK
Di
G
N
A
:::::::::::::::::::::::::::::: ~::::::;::::.::,: ?.:..~:-3 :. :.:5:..
P
~::~3#~'~,~::':.-
A
G
P
.~.::~,::~:~:~-.:~:: .......... ........... ..............................
A
V H
-
L
Q
P P
........... .............. ::::::::::::::::::::::::::::::
-
I G
P
K
m F~ L
...........
:L~L L:i
G
S
EV
A
LF
A
A
YG
A:
P
L
Si
! : VG
V :::::::::::::::::::::::::::::::::::
LP
............. ~s
L
IAGP
,VF
iiiiii~i!iiiiiii~ .:....~:..:..:....::..:.::..
NH 2
~::::~:~,:.,~:::~v.-:~. ~.:.:,:~::~.~ :;~:.:,.~
Y!:.!:!:[:.!:.~:.::!::.i.:. ~:~:,..~:~::,~:.:~.,..~,-~ 7:~.;7:::i~?:;7: ~:.~3:.-:~:.:?-,~:~.~:,::~:~ ........... L
!iii!i!ii
INSIDE
Physical
and genetic characteristics
~.:.~:...:~.~:~:.~::~.~%~~
AMINO ACIDS 445 447 448 448 43 7 446
~. :~-~.,.,.,:,..
Dsdxescco Gntpescco Gntpbacsu !,.!,!2,.!:!!~3:!~:,.~.:3!p!~?!,.! Gntpbacli Gnttescco Gntuescco ...........
:::::::::::::::::::::::::::::: t:.:::.:G:;ii!i?~i~ii[i:.:
,~i!i~ii!iijill ~,
MOL. WT 47 163 47 079 46 655 46 725 45 923 46 416
Km Gluconate: 25 #M
CHROMOSOMAL LOCUS 53.38 minutes 98 minutes 351 ~ 76.39 minutes 77.0minutes
Multiple amino acid sequence afignments 1
::::::::::::::::::::::::::::
50
G n t p b a c l i .... M P L L I V A I G . I V A L L L Gntpbacsu MPLIIV ALG ILALLF G n t t e s c c o .... M P L V I V A I G . V I L L L L Gntpescco .MHVLNILWVVFG.IGLMLV Gntuescco .MTTLTLVLT AVGSVLLLLF Dsdxescco MHSQIWVVST LLISIVLIVL C o n s e n s u s . . . . . . . . . . . . G ..... L. 51 Gntpbacli IVKTIEEGLG GntpbacsuVVSSIEAGIG Gnttescco VIGSIKAGVA Gntpescco LLHTMKAGFG Gntuescco IAATMEKGMG Dsdxescr MVNAIESGIG C o n s e n s u s ....... G . G
LIMGLKLNTF LIMGLKLNTF LMIRFKMNGF LNLKFKINSM LVMKARMHAF TIVKFKFHPF L .... K . . . F
VSLIIVSFGV ISLLVVSFGV IALVLVALAV VALLVAALSV LALMVVSMGA LALLLASFFV ..L ...... V
ALALGMPLDD ALALGMPFDK GLMQGMPLDK GMLAGMDLMS GLFSGMPLDK GTMMGMGPLD .... G M ....
GTLGHIALIF GLGAMLGRLI ADSGGAQRIA GTLGHIALIF GLGAMLGKLI ADSGGAQRIA D.VGSLALIM GFGAMLGKML ADCGGAQRIA NTLGALAIIV VFGAVIGKLM VDSGAAHQIA GTLGFLAVVVALGAMFGKIL HETGAVDQIA GTLGFLAAVI GLGTILGKMM EVSGAAERIG .TLG..A ..... GA..GK ..... G.A..IA
i00 MTLVNKFGEE MTLVNKFGEK TTLIAKFGKK HTLLARLGLR VKMLKSFGHS LTL.QRCRWL .TL .... G..
:::::::::::::::::::::::::::::
Gntpbacli Gntpbacsu Gnttescco i}:~!}!!i?i~:::i;s~i:~ Gntpescco Gntuescco ii!~i:ii!i?!~:ii:i D s d x e s c c o Consensus iii::iii~:::ii!i::ii!i::::i:::Z!i?,i
!i::ii;:!i{i::iiii~i:::ili :!!:ii
iiiili?!:!%iii
:::::::::::::::::::::::::::::: .............................. ::ii:~!ii:iiii~i::ii::::.i~:i!!!
ii?i,N!!:!i~iil ~ii:5;:i~iii::!ii;!::!
i01 150 NIQWAWIAS FIIGVALFFE VALVLLIPIV FAISKELEIS ISYLGIPMTA NIQWAVVIAS FIIGIALFFE VGLVLLIPIV FAISRELKIS ILFLGIPMVA HIQWAVVLTG FTVGFALFYE VGFVLMLPLV FTIAASANIP LLYVGVPMAA YVQLSVIIIG LIFGLAMFYE VAFIMLAPLV IVIAAEAKIP FLKLAIPAVA RAHYAIGLAG LVCALPLFFE VAIVLLISVA FSMARHTGTN LVKLVIPLFA SVDVIMVLVG LICGITLFVE VGVVLLIPLA FSIAKKTNTS LLKLAIPLCT ............. G..LF.EV..VLL.P.VF.I .......... L.IP..A
151 200 Gntpbacli ALSVTHGFLP PHPGPTAIAG ELGANIGEVL LYGIIVAIPT VLLAGPLFTK Gntpbacsu ALSVTHGFLP PHPGPTAIAG EYGANIGEVL LYGFIVAVPT VLIAGPLFTK Gnttescco ALSVTHGFLP PHPGPTAIAT IFNADMGKTL LYGTILAIPT VILAGPVYAR GntpesccoAATTAHSLFP PQPGPVALVNAYGADMGMVY IYGVLVTIPS VICAGLILPK Gntuescco GVAAAAAFLV PGPAPMLLAS QMNADFGWMI LIGLCAAIPG MIIAGPLWGN Dsdxescco ALMAVHCVVP PHPAALYVAN KLGADIGSVI VYGLLVGLMA SLIGGPLFLK C o n s e n s u s A .... H . . . P P . P . P . . . A .... A . . G .... YG ..... P .... AGP ....
........... ..............
Gntpbacli Gntpbacsu Gnttescco Gntpescco Gntuescco Dsdxescco Consensus
201 LAKKIVPQSF EKMGSIASLG EQKTFKLEET FAKKIVPASF AKNGNIASLG TQKTFNLEET VLKGI ...... DKPIPEGLY SAKTFSEEEM FLGNL ...... ERPTPSFLK ADQPVDMNNL F I S R Y V E L H I P D D I S E P H L G EGK ...... M F L G Q R L P F .... K P V P T E F A D L K V R D E K T L .................. L ...........
250 PGFGISVFTA MLPVIIMSIS PGFGISVFTA MLPIIIMSVA PSFGVSVWTS LVPVVLMAMR PSFGVSILVP LIPAIIMIST PSFGFSLSLI LLPLVLVGLK PSLGATLFTI LLPIALMLVK P.FG.S ...... P...M...
Gntpbacli Gntpbacsu Gnttescco Gntpescco !ii!i!ii~ =ii!ii!iiiiGiin::it u e s c c o ............................. Dsdxescco Consensus
251 TVITLIQETM GLADNSLLAAVRLIGNASTS TIIDLLQETI GFADNGVLAF IRLIGNASTA A I A E M I L P K .... G H A F L P V A E F L G D P V M A T I A N I W L V K .... D T P A W E V V N F I G S S P I A T I A A R F V P E .... G S T A Y E W F E F I G H P F T A T I A E L N M A R .... E S G L Y I L V E F I G N P I T A .I . . . . . . . . . . . . . . . . . . . . . IG .... A
300 MVISLLVAIY TMGIARKIPI MIISLLVAVY TMGIKRNIPV TLIAVLIAMF TFGLNRGRSM MFIAMVVAFV LFGTARGHDM ILVACLVAIY GLAMRQGMPK MFIAVFVAYY VLGIRQHMSM . . I . . . V A .... G .......
301 KQVMDSCSTA ITQIGMMLLI KTVMDSCSTA ISQIGMMLLI DQINDTLVSS IKIIAMMLLI QWVMNAFESA VKSIAMVILI DKVMEICGHA LQPAGIILLV GTMLTHTENG FGSIANILLI ............. I...LLI
350 LINGGVGDYV AELFKGTAMS LINGGVGDYV ADLFKGTALS LVDSGVDKYI ASMMHETNIS IIDTGIGDTI GMLMSHGNIS LVDSGVGPAL GEALTGMGLP LKSSSLADTL AVILSNMHMH L...G ...............
::!2!i:i:i::i!!~}~j: ~;,~!i~!
;::::::::::::::::::::::::::::
!ii!i}~!;~i !!i!!
Gntpbacli Gntpbacsu Gnttescco Gntpescco Gntuescco Dsdxescr Consensus
IGGGGAFKQV IGGGGAFKQV IGGGGAFKQV IGAGGVLKQT IGAGGVFKQV IGAGGAFNAI IG.GG.FKQ.
............................
~!i!ii~i:i!!!i .............
~8~
351 Gntpbacli PILLAWVIAAILRISLGSAT Gntpbacsu PIILAWLIAAILRISLGSAT Gnttescco PLLMAWSIAAVLRIALGSAT Gntpescco PYIMAWLITV LIRLATGQGV Gntuescco IAITCFVLAAAVRIIQGSAT Dsdxescco PILLAWLVAL ILHAAVGSAT ConsensusP...AW..A...R...GSATVA
400 VAALSTTGLV LPMLGQS ...... DVNLALV VAALSTTGLV IPLLGHS ...... DVNLALV VAAITAGGIA APLIATT ...... GVSPELM VSAMTAAGII SAAILDPATG QLVGVNPALL VACLTAVGLV MPVIEQ...L NYSGAQMAAL V A M M G A T A I V A P M L P ..... L Y P D I S P E I I ..... G . . . P . . . . . . . . . . . . . . . . . .
t~::.~:::~::::~s~::~;~ :::::::::::::::::::::::::::::::::
...... - .......................... .........................
::
~:.~:::~
ii;{~!~}~}}i})i~;i;!il;! }s!~2:.!;!!i:;!!2!i:;!2i-~12! ......... .: :::::::::::::::::::::
:.7:..:.:..:.
:.:.7
::,::::::::::::::::::::::::::::::::::::
Gntpbacli Gntpbacsu Gnttescco Gntpescco Gntuescco Dsdxescco Consensus
401 VLATGAGSVI A S H V N D A G F W VLATGAGSVI A S H V N D A G F W VIAVGSGSVI FSHVNDPGFW VLATAAGSNT LTHINDASFW SICIAGGSIV V S H V N D A G F W A I A I G S G A I G CTIVTDSLFW ..A...GS .... HVND..FW
Gntpbacli Gntpbacsu Gnttescco Gntpescco Gntuescco Dsdxescco Consensus
451 FTLLLSLFV. FILLLSLVV GCLLLNMVI. IVLIISMVA. VGMIAFQLLS GTFLLSFII. ..........
MFKEYFGLSM KETFATWTLL MFKEYFGLSM KETFATWTLL LFKEYFNLTI GETIKSWSML L F K G Y F D L S V KDTLKTWGLL LFGKFTGATE A E T L K T W T M M LVKQYCGATL N E T F K Y Y T T A .FK.Y ...... ET...W...
450 ETIIAVAGLG ETIISVAGLG ETIISVCGLV ELVNSVVGLI ETILGTVGAI TFIASVVALA E.I..V.GL.
Residues listed in the consensus sequence are present in at least 75 % of the aligned transporter sequences. .............. :~:~. ..... :.:~.~:~-;.~:.~-~s~,~:.~.,~ 9.:::~a.:.~.~.~.-.~.: ~.~..:.
Database accession numbers
.~,~,:-:::~:::~:,,~:,,~:~
~.:-:~.:..:.:~.~.:~.:.:...:..~..:;
Dsdxescco Gntpescco Gnttescco Gntpbacsu Gntpbacli Gntuescco
SWISSPR OT
PIR
EMBL/GENBANK
P08555 P39373 P39835 P 12012 P46832 P46858
A26949; A28674
X91821; X86379 X91735; U14003 M32793; U18997 D45242; J02584 D31631 U 18997
A26190 JC2305
1 Klemm, P. et al. {1996} J. Bacteriol. 178, 61-67.
18c~
This Page Intentionally Left Blank
A
ABC-associated binding proteindependent iron transporter family, 214-19 database accession numbers, 219 multiple amino acid alignments, 216-19 nomenclature, biological sources and substrates, 214 phylogenetic tree, 215 physical and genetic characteristics, 215-16 proposed orientation of BtuC in the membrane, 215 summary, 214 supplemental references and reviews, 219 ABC-associated binding proteindependent maltose transporter family, 204- 7 database accession numbers, 207 multiple amino acid alignments, 206-7 nomenclature, biological sources and substrates, 204 phylogenetic tree, 205 physical and genetic characteristics, 206 proposed orientation of MALG in the membrane, 205 summary, 204 supplemental references and reviews, 207 ABC-associated binding proteindependent peptide transporter family, 208-13 database accession numbers, 213 multiple amino acid alignments, 210-12 nomenclature, biological sources and substrates, 208-9 phylogenetic tree, 209 physical and genetic characteristics, 210 proposed orientation of DPPC in the membrane, 209-10 summary, 208 supplemental references and reviews, 213
ABC binding protein-dependent transporters: cytoplasmic elements, 221 ABC multidrug resistance proteins, 113 ABC transporter superfamily, 36 ABC-1 and -2 transporter family, 121-5 database accession numbers, 125 multiple amino acid alignments, 122-25 nomenclature, biological sources and substrates, 121 physical and genetic characteristics, 122 proposed orientation of ABC 1 in the membrane, 121-2 summary, 121 supplemental references and reviews, 125 ABC-2 associated {cytoplasmic} protein family, 194-201 database accession numbers, 201 multiple amino acid alignments, 197-200 nomenclature, biological sources and substrates, 194-5 phylogenetic tree, 196 physical and genetic characteristics, 196-7 summary, 194 supplemental references and reviews, 201 ABC-2 nodulation protein family, 186-9 database accession numbers, 189 multiple amino acid alignments, 188-9 nomenclature, biological sources and substrates, 186 phylogenetic tree, 187 physical and genetic characteristics, 188 proposed orientation of NOD] in the membrane, 187-8 summary, 186 supplemental references and reviews, 189 ABC-2 polysaccharide exporter family, 190-3 database accession numbers, 193 multiple amino acid alignments, 192-3
191
ABC-2 polysaccharide exporter family continued
nomenclature, biological sources and substrates, 190 phylogenetic tree, 191 physical and genetic characteristics, 192 proposed orientation of KPM1 in the membrane, 191-2 summary, 190 supplemental references and reviews, 193 ABC-2 transporters, 185 Acriflavin-cation resistance family, 364-9 database accession numbers, 369 multiple amino acid alignments, 366-9 nomenclature, biological sources and substrates, 364 phylogenetic tree, 364 physical and genetic characteristics, 365 proposed orientation of ACRB in the membrane, 365 summary, 364 supplemental references and reviews, 369 Active transport, 6- 7 ALIGN, 34, 35 Alphanumeric code, 35 Amino acid sequences, 26, 30-3 databases, 30 of membrane transport proteins, 10, 30 Anion exchanger family, 446-53 database accession numbers, 453 multiple amino acid alignments, 448-53 nomenclature, biological sources and substrates, 446 phylogenetic tree, 447 physical and genetic characteristics, 448 proposed orientation of AE 1 in the membrane, 447 summary, 446 supplemental references and reviews, 453 Antiporter motif, 32
192
ATP binding cassette (ABC)transporter superfamily, 32 B
Bacillus subtilis 3 7
Bacterial cell membrane, transport mechanisms, 16 Bacterial transport proteins, 11 Bacterial transport systems, not ionlinked, 9-10 Binding protein-dependent monosaccharide transporter family, 222-6 database accession numbers, 226 multiple amino acid alignments, 223-6 nomenclature, biological sources and substrates, 222 phylogenetic tree, 223 physical and genetic characteristics, 223 summary, 222 supplemental references and reviews, 226 Binding protein-dependent peptide transporter family, 227-49 database accession numbers, 248-9 multiple amino acid alignments, 234-47 nomenclature, biological sources and substrates, 227-31 phylogenetic tree, 23 1-2 physical and genetic characteristics, 233 -4 summary, 227 supplemental references and reviews, 249 Binding protein-dependent transporters: transmembrane elements, 203 BLASTP, 30, 34, 35 C Calcium-transporting ATPase family, 42-7 database accession numbers, 47 multiple amino acid alignments, 44-7 nomenclature, biological sources and substrates, 42 phylogenetic tree, 42 physical and genetic characteristics, 43
Index proposed orientation of ATCX in the membrane, 43 summary, 42 supplemental references and reviews, 47 Cell wall penetration by solutes, 10-14 Chemiosmotic Hypothesis, 9 Chemiosmotic Theory, 8-9, 14, 16 Cystic fibrosis transmembrane conductance regulator family, 135-41 database accession numbers, 141 multiple amino acid alignments, 137-41 nomenclature, biological sources and substrates, 135 phylogenetic tree, 136 physical and genetic characteristics, 137 proposed orientation of CFTR in the membrane, 136- 7 summary, 135 supplemental references and reviews, 141 D
Database accession numbers, 3 7 DNA sequence databases, 26 E
Electron diffraction techniques, 26 Escherichia coli 17, 18, 20, 35, 3 7 Eukaryotic cell membranes, 17 F
Facilitated diffusion, 5-6 Families, grouping, 35 Family definition, 34-5 FASTA, 30, 34, 35 L-fucose-H § symport protein, 20, 22, 23 G Genetic characteristics, 3 7 Gluconate transporter family, 486-9 database accession numbers, 489 multiple amino acid alignments, 487-9
nomenclature, biological sources and substrates, 486
phylogenetic tree, 486 physical and genetic characteristics, 487 proposed orientation of GNTP in the membrane, 486- 7 summary, 486 supplemental references and reviews, 489 GLUT1 glucose transport protein, 6, 10 Gram-negative bacterial cell envelope, 15 Group translocation, 7 H
H+/amino acid symporter family, 290-300 database accession numbers, 300 multiple amino acid alignments, 293-300 nomenclature, biological sources and substrates, 290-1 phylogenetic tree, 291 physical and genetic characteristics, 292 proposed orientation of PHEP in the membrane, 291-2 summary, 290 supplemental references and reviews, 300 H+/carboxylate symporter family, 320-5 database accession numbers, 325 multiple amino acid alignments, 322-5 nomenclature, biological sources and substrates, 320 phylogenetic tree, 321 physical and genetic characteristics, 322 proposed orientation of KGTP in the membrane, 321 summary, 320 supplemental references and reviews, 325 H+-dependent antiporters, 335 H§ symporters, 261 H§ symporter family, 317-19 database accession numbers, 319 multiple amino acid alignments, 318-9 nomenclature, biological sources and substrates, 317
193
H§ symporter family continued phylogenetic tree, 317 physical and genetic characteristics, 317 proposed orientation of FUCP in the membrane, 317-8 summary, 317 supplemental references and reviews, 319 H§ symporter family, 305-9 database accession numbers, 309 multiple amino acid alignments, 307-9 nomenclature, biological sources and substrates, 305 phylogenetic tree, 306 physical and genetic characteristics, 306 proposed orientation of MELB in the membrane, 306 summary, 305 supplemental references and reviews, 309 H§ symporter family, 301-4 database accession numbers, 304 multiple amino acid alignments, 302-4 nomenclature, biological sources and substrates, 301 phylogenetic tree, 301 physical and genetic characteristics, 302 proposed orientation of LACY in the membrane, 302 summary, 301 supplemental references and reviews, 304 H+/nucleotide symporter family, 326-8 database accession numbers, 328 multiple amino acid alignments, 327 nomenclature, biological sources and substrates, 326 physical and genetic characteristics, 327 proposed orientation of NUPC in the membrane, 326-7 summary, 326 H§ symporter family, 310-16
194
database accession numbers, 316 multiple amino acid alignments, 312-6 nomenclature, biological sources and substrates, 310 phylogenetic tree, 311 physical and genetic characteristics, 311-12 proposed orientation of PET1 in the membrane, 311 summary, 310 H+/rhamnose symporter family, 288-9 database accession numbers, 289 multiple amino acid alignments, 289 nomenclature, biological sources and substrates, 288 physical and genetic characteristics, 289 proposed orientation of RHAT in the membrane, 288 summary, 288 supplemental references and reviews, 289 H§ family, 262-87 database accession numbers, 286-7 multiple amino acid alignments, 269-86 nomenclature, biological sources and substrates, 262-5 phylogenetic tree, 266 physical and genetic characteristics, 267-9 proposed orientation of human GLUT1 in the membrane, 267 summary, 262 supplemental references and reviews, 287 H§ amine antiporter family, 336-40 database accession numbers, 340 multiple amino acid alignments, 338-40 nomenclature, biological sources and substrates, 336 phylogenetic tree, 337 physical and genetic characteristics, 338 proposed orientation of VAT2 in the membrane, 33 7
summary, 336 supplemental references and reviews, 340 Haemophilus influenzae, 3 7 Heavy metal-transporting ATPase family, 88-102 database accession numbers, 102 multiple amino acid alignments, 92-101 nomenclature, biological sources and substrates, 88-9 phylogenetic tree, 90 physical and genetic characteristics, 91 proposed orientation of AT7A in the membrane, 90-1 summary, 88 supplemental references and reviews, 102 4-Helix H+/multidrug antiporter family, 353-6 database accession numbers, 355 multiple amino acid alignments, 355 nomenclature, biological sources and substrates, 353 phylogenetic tree, 354 physical and genetic characteristics, 354 proposed orientation of EBR in the membrane, 354 summary, 353 supplemental references and reviews, 356 12-Helix H+/multidrug antiporter family, 357-63 database accession numbers, 362 multiple amino acid alignments, 359-62 nomenclature, biological sources and substrates, 357-8 phylogenetic tree, 358 physical and genetic characteristics, 359 proposed orientation of TETA{C} in the membrane, 358-9 summary, 357 supplemental references and reviews, 362-3 14-Helix H+/multidrug antiporter family, 341-52
database accession numbers, 352 multiple amino acid alignments, 345-52 nomenclature, biological sources and substrates, 341-3 phylogenetic tree, 343 physical and genetic characteristics, 344-5 proposed orientation of QACA in the membrane, 343-4 summary, 341 supplemental references and reviews, 352 Heme exporter family, 252-4 database accession numbers, 254 multiple amino acid alignments, 253-4 nomenclature, biological sources and substrates, 252 phylogenetic tree, 253 physical and genetic characteristics, 253 summary, 252 supplemental references and reviews, 254 Hydrophobic amino acids, 19 I
Inner cell membrane, penetration by solutes, 14 Inner membranes, proton transport across, 14-17 Intramolecular amino acid sequence comparisons, 32 L LACY, 32-3 M
Macrolide-streptogramin-tylosin resistance family, 255-9 database accession numbers, 258-9 multiple amino acid alignments, 256-8 nomenclature, biological sources and substrates, 255 phylogenetic tree, 256 physical and genetic characteristics, 256 summary, 255 supplemental references and reviews, 259
Major facilitator (MFS) superfamily, 31 Membrane protein components and/or domains involved in transport system, 17-18 Membrane spanning peptides, 26 Membrane transport kinetics, 24 Membrane transport proteins amino acid sequences, 10, 30 families, 12-14, 34-35 function and structure, 4-29 hydropathic profiles, 20-21 predicted to contain, 12 transmembrane domains, 19-20 three-dimensional structures, 26 Membrane transport systems classification, 7-10 classification according to amino acid sequences of their proteins, 10 Michaelis-Menten relationship, 5 Mitochondrial adenine nucleotide translocator family, 454-68 database accession numbers, 467-8 multiple amino acid alignments, 460-7 nomenclature, biological sources and substrates, 454- 7 phylogenetic tree, 458 physical and genetic characteristics, 459-60 proposed orientation of ANTI in the membrane, 458-9 summary, 454 supplemental references and reviews, 468 Mitochondrial phosphate carrier family, 469-71 database accession numbers, 471 multiple amino acid alignments, 470-1 nomenclature, biological sources and substrates, 469 phylogenetic tree, 469 physical and genetic characteristics, 470 proposed orientation of PHC in the membrane, 469-70 summary, 469 supplemental references and reviews, 471 Multicomponent transport system proteins, 18
19(
Multiple amino acid sequence alignments, 31, 3 7 Multiple sequence alignments, 31 N
N+-dependent symporters, 375 Na§ symporter family, 411-13 database accession numbers, 413 multiple amino acid alignments, 412-3 nomenclature, biological sources and substrates, 411 physical and genetic characteristics, 412 proposed orientation of ACP in the membrane, 411-2 summary, 411 Na+/branched amino acid symporter family, 404-7 database accession numbers, 407 multiple amino acid alignments, 405-7 nomenclature, biological sources and substrates, 404 phylogenetic tree, 404 physical and genetic characteristics, 405 proposed orientation of BRNQ in the membrane, 405 summary, 404 supplemental references and reviews, 407 Na+/Ca 2§ exchanger family, 376-9 database accession numbers, 379 multiple amino acid alignments, 377-9 nomenclature, biological sources and substrates, 3 76 physical and genetic characteristics, 377 proposed orientation of NAC 1 in the membrane, 377 summary, 3 76 supplemental references and reviews, 379 Na+/citrate symporter family, 408-10 database accession numbers, 410 multiple amino acid alignments, 409-10
nomenclature, biological sources and substrates, 408 phylogenetic tree, 408 physical and genetic characteristics, 409 proposed orientation of CITN in the membrane, 408-9 summary, 408 Na§ symporter family, 392-9
database accession numbers, 399 multiple amino acid alignments, 395-8 nomenclature, biological sources and substrates, 392-3 phylogenetic tree, 393 physical and genetic characteristics, 394-5 proposed orientation of DCTA in the membrane, 394 summary, 392 supplemental references and reviews, 399
Na§ symporter family, 385-91 database accession numbers, 391 multiple amino acid alignments, 387-91 nomenclature, biological sources and substrates, 385-6 phylogenetic tree, 386 physical and genetic characteristics, 387 proposed orientation of SGLT1 in the membrane, 386-7 summary, 385 supplemental references and reviews, 391 Na§ § antiporter family, 428-34 database accession numbers, 434 multiple amino acid alignments, 430-4 nomenclature, biological sources and substrates, 428 phylogenetic tree, 428-9 physical and genetic characteristics, 429-3O proposed orientation of NHE 1 in the membrane, 429 summary, 428 supplemental references and reviews, 434
Na§
symporter family, 414-26 database accession numbers, 425 multiple amino acid alignments, 418-25 nomenclature, biological sources and substrates, 414-6 phylogenetic tree, 416 physical and genetic characteristics, 417-8 proposed orientation of NET1 in the membrane, 416-7 summary, 414 supplemental references and reviews, 425-6 Na+/PO4 symporter family, 400-3 database accession numbers, 403 multiple amino acid alignments, 401-3 nomenclature, biological sources and substrates, 400 phylogenetic tree, 400 physical and genetic characteristics, 401 proposed orientation of NPT1 in the membrane, 400-1 summary, 400 Na§ symporter family, 380-4 database accession numbers, 383 multiple amino acid alignments, 381-3 nomenclature, biological sources and substrates, 380 phylogenetic tree, 380 physical and genetic characteristics, 381 proposed orientation of PUTP in the membrane, 381 summary, 380 supplemental references and reviews, 384 Na+-dependent antiporters, 427 Nitrate transporter I family, 472-5 multiple amino acid alignments, 473-4 nomenclature, biological sources and substrates, 472 phylogenetic tree, 472 physical and genetic characteristics, 473
197
Nitrate transporter I family continued proposed orientation of NARK in the membrane, 472-3 summary, 472 Nitrate transporter II family, 476-8 database accession numbers, 478 multiple amino acid alignments, 477-8 nomenclature, biological sources and substrates, 476 physical and genetic characteristics, 477 proposed orientation of CRNA in the membrane, 476 summary, 476 Nucleotide binding proteins, 32 O Oxidative phosphorylation, 8-9 P
P-Glycoprotein transporter family, 142-78 database accession numbers, 177-78 multiple amino acid alignments, 149-77 nomenclature, biological sources and substrates, 142-45 phylogenetic tree, 146 physical and genetic characteristics, 147-48 proposed orientation of MDR1 in the membrane, 147 summary, 142 supplemental references and reviews, 178 Passive diffusion, 5 PEP-dependent phosphotransferase family, 435 Peroxisomal membrane transporter family, 179-84 database accession numbers, 184 multiple amino acid alignments, 181-84 nomenclature, biological sources and substrates, 179-80 phylogenetic tree, 180 physical and genetic characteristics, 181
19~
proposed orientation of ALD in the membrane, 180-81 summary, 179 supplemental references and reviews, 184 Phosphoenolpymvate-dependent sugar phosphotransferase system {PTS) family, 436-44 database accession numbers, 444 multiple amino acid alignments, 438-43 nomenclature, biological sources and substrates, 436-7 phylogenetic tree, 437 physical and genetic characteristics, 438 proposed orientation of PTGAB in the membrane, 437-8 summary, 436 supplemental references and reviews, 444 Phospholipid bilayer membrane, 10 Phosphoryl transfer, 32 Phylogenetic trees, 36 Physical characteristics, 3 7 PILEUP, 30, 36, 3 7 Plasma membrane cation-transporting ATPase family, 48-87 database accession numbers, 86-7 multiple amino acid alignments, 56-86 nomenclature, biological sources and substrates, 48-52 phylogenetic tree, 52 physical and genetic characteristics, 54-5 proposed orientation of ATC1 in the membrane, 53-4 summary, 48 Primary active transport, 6 Prokaryotic cell membranes, 10-17 Proton transport across inner membranes, 14-17 Proton-coupled mechanisms, 16 Proton-motive force, 15, 16 P-type ATPases, 41 R
L-Rhamnose-H § symport protein, 21
S
Saccharomyces cerevisiae, 37 Salmonella typhimurium, 17 Secondary active transport, 6 Signature motifs, 31 Sodium-linked transport systems, 8 Solute translocating proteins, 25 Solute translocation catalysis, 24-6 Solute translocation mechanisms, 24-6 Spore germination transporter family, 479-81 database accession numbers, 480 multiple amino acid alignments, 480 nomenclature, biological sources and substrates, 479 physical and genetic characteristics, 480 proposed orientation of the GRAII in the membrane, 479 summary, 479 supplemental references and reviews, 481 Substrate transport mechanism, chemiosmotic view, 9 Sugar phosphate transporter family, 329-34 database accession numbers, 334 multiple amino acid alignments, 331-4 nomenclature, biological sources and substrates, 329 phylogenetic tree, 330 physical and genetic characteristics, 331 proposed orientation of UHPT in the membrane, 330 summary, 329 supplemental references and reviews, 334 Supplemental references and reviews, 3 7 T Tetracycline, 32 Topology plots, 36 Transporters, C-terminal halves, 33 Transporters, N-terminal halves, 33 U Uniporter-symporter-antiporter {USA), 31 USA/MFS superfamily, 31, 33
V Vacuolar ATPase family, 104-11 database accession numbers, 111 multiple amino acid alignments, 107-11 nomenclature, biological sources and substrates, 104 phylogenetic tree, 105 physical and genetic characteristics, 106 proposed orientation of VPH1 in the membrane, 105-6 summary, 104 supplemental references and reviews, 111 Vacuolar membrane pyrophosphatase family, 482-5 database accession numbers, 485 multiple amino acid alignments, 483-5 nomenclature, biological sources and substrates, 482 physical and genetic characteristics, 483 proposed orientation of AVP3 in the membrane, 482-3 summary, 482 supplemental references and reviews, 485 W
Walker A and B motifs, 32 White transporter family, 114-20 database accession numbers, 120 multiple amino acid alignments, 116-120 nomenclature, biological sources and substrates, 114 phylogenetic tree, 115 physical and genetic characteristics, 116 proposed orientation of white protein in the membrane, 115-6 summary, 114 supplemental references and reviews, 120 X
X-ray crystallography, 26
I9c~
Y
Yeast multidrug resistance family, 126-34, 370-3 database accession numbers, 134, 373 multiple amino acid alignments, 129-33, 3 71-3 nomenclature, biological sources and substrates, 126, 370 phylogenetic tree, 126-7, 370
i0(
physical and genetic characteristics, 128, 371 proposed orientation of BMR in the membrane, 3 70-1 proposed orientation of CDR1 in the membrane, 127-8 summary, 126, 370 supplemental references and reviews, 134, 373