NON-COVALENT INTERACTIONS IN PROTEINS s
Pil ,---: I
Andrey
Karshikoff Imperial College Press
NON-COVALENT INTERACTIONS IN PROTEINS
llfp World Scientific N E W JERSEY
• LONDON
• SINGAPORE
• BEIJING
• SHANGHAI
• HONG KONG
• TAIPEI
• CHENNAI
Published by Imperial College Press 57 Shelton Street Covent Garden London WC2H 9HE Distributed by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
NON-COVALENT INTERACTIONS IN PROTEINS Copyright © 2006 by Imperial College Press All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 1-86094-707-7
Printed in Singapore by Mainland Press
to my zvife (Danuta
Contents
Preface 1. Introduction 1.1 Some Historical Notes 1.2 Overview of Protein Structural Elements and Basic Definitions 1.2.1 The amino acids 1.2.2 The polypeptide chain 1.3 Non-covalent Interactions and Structure-Function Relationships in Proteins 1.3.1 Some comments on Anfinsen's dogma 1.3.2 Experimental measurements of non-covalent interactions in proteins References 2. Van der Waals Interactions 2.1 Observation of van der Waals Interactions 2.2 Nature of van der Waals Interactions 2.2.1 Dispersion forces 2.2.2 Dipole-dipole interactions 2.2.3 Dipole-induced dipole interactions 2.2.4 Repulsive interactions 2.3 Potential Functions for Application in Proteins 2.4 Approximation for Polyatomic Systems References 3. Hydrogen Bonds 3.1 Nature of Hydrogen Bonds 3.1.1 Proton donors, electronegativity 3.1.2 Proton acceptors 3.2 Geometry and Strength of Hydrogen Bonds 3.2.1 Directionality 3.2.2 Hydrogen bond length 3.2.3 Hydrogen bond strength 3.2.4 Hydrogen bond potential functions vii
xi 1 4 12 12 17 19 20 21 22 25 26 28 29 37 42 44 46 48 50 51 51 52 55 56 57 64 67 71
viii
Contents
3.3 Hydrogen Bonds in Proteins 3.3.1 Secondary structure elements 3.3.2 Hydrogen bonds involving side chains 3.3.3 Salt bridges 3.3.4 Hydrogen bond networks 3.4 Hydrogen Bonds and Protein Stability 3.4.1 Hydrogen bonds within the polypeptide chain, role in folding 3.4.2 Hydrogen bonds involving side chain, role in stability References 4. Hydrophobic Interactions 4.1 Nature of Hydrophobic Interactions, Pseudo Forces 4.2 Water 4.2.1 Flickering clusters model of water 4.2.2 Hydrocarbons in water, iceberg model 4.3 Hydrophobic Effect 4.3.1 Oil drop in water 4.3.2 Experimental assessment of hydrophobic interaction 4.4 Hydrophobic Interactions in Proteins 4.4.1 Additivity of hydrophobic interactions 4.4.2 Solvent accessibility 4.4.3 Evaluation of hydrophobic interactions 4.4.4 Size of the hydrophobic core 4.4.5 Hydrophobic packing and packing defects References 5. Electrostatic Interactions 5.1 Debye-Huckel Theory 5.1.1 Poisson-Boltzmann equation 5.1.2 Parameter of Debye 5.1.3 The electrostatic potential of an ion in solution 5.1.4 Extension for proteins 5.2 Ion-Solvent Interactions 5.2.1 Born model 5.2.2 Application of the Born model for proteins: why do charges tend to be on protein surface? 5.2.3 Generalised Born theory for proteins 5.3 Calculation of Electrostatic Interactions in Proteins 5.3.1 The protein molecule as a dielectric material 5.3.2 Dielectric model for calculation of electrostatic interactions in proteins 5.3.3 Numerical solution of the Poisson-Boltzmann equation, finite difference method
73 73 76 78 80 83 84 86 89 91 91 93 93 96 98 98 100 102 104 105 Ill 116 121 127 129 130 130 135 137 139 140 140 144 146 151 151 157 159
Contents
ix
5.3.4 Boundary conditions 168 5.3.5 Electrostatic potential calculated by means of the finite difference method 171 References 175 6. Ionisation Equilibria in Proteins 177 6.1 Why Does One Need to Know Ionisation Equilibria? 179 6.2 Basic Definitions 180 6.2.1 Protonation/deprotonation equilibria 180 6.2.2 Henderson-Hasselbalch equation 182 6.2.3 Degree of deprotonation and degree of protonation 184 6.2.4 Ionisation equilibrium constants of model compounds 186 6.3 Factors Determining Ionisation Equilibria in Proteins 189 6.3.1 Desolvation 191 6.3.1.1 Born energy 191 6.3.1.2 Calculation of the Born energy 194 6.3.2 Interactions with the protein permanent charges 197 6.3.3 Definition of intrinsic pK 198 6.3.4 Charge-charge interactions 199 6.4 Combinatorial Problem 201 6.4.1 Solution based on the Boltzmann weighted sum 202 6.4.2 Solution based on the Monte Carlo simulation 206 6.5 Cooperative Ionisation 209 References 215 7. Conformational Flexibility 217 7.1 Allocation Variation of Polar Hydrogen Atoms 217 7.1.1 Titratable and pH-sensitive sites 218 7.1.2 Microscopic/)^ 219 7.1.3 Population of the microscopic states 224 7.2 Examples for pH-Dependent Hydrogen Bonding 229 7.2.1 Ionisation properties of Asp76 in ribonuclease Tj 229 7.2.2 Hydrogenbondrearrangement related to protein function ... 234 7.3 Conformational Flexibility Involving Non-hydrogen Atoms 239 7.3.1 Conformations generated by means of molecular dynamics simulation 241 7.3.2 Average p ^ values 246 7.3.3 Desolvation and charge-dipole energy compensation 249 7.3.4 Dynamics of salt bridges 252 References 254 8. Electrostatic Interactions and Stability of Proteins 255 8.1 Definitions 255 8.2 Unfolding Induced by pH 257 8.3 Modelling of Unfolded Proteins 262
x
Contents
8.3.1 Spherical model of unfolded proteins 8.3.2 Size of the dielectric sphere 8.3.3 Average distance between charges 8.3.4 lonisation equilibria in unfolded proteins 8.4 Thermal Stability of Proteins References Appendix A Basic Definitions of Thermodynamics and Statistical Thermodynamics Appendix B Electric Dipoles Appendix C Solution of Laplace and Poisson-Boltzmann Equation Index
264 265 270 273 277 281 283 311 319 329
Preface
This book represents the essential part of the course "Non-covalent Interactions in Proteins: Structure, Stability, Function" held as a part of the "Postgraduate Program in Nanobiology and Biological Physics" of Karolinska Institutet, Stockholm. As far as Karolinska Institutet is a medical university, one could expect that the course is adapted for students with background in biological sciences. This is partially true. Because the course is regularly visited by students from other universities in Stockholm, as well as from Uppsala University and Linkoping University, its content is adapted for students of different backgrounds and different interests. Textbooks on physics of condensed matter consider non-covalent interactions in detail, however their application for analysis of protein properties is often poorly presented or missing. On the other hand, books on biochemistry, molecular modelling or molecular simulation introduce these interactions in the context of the corresponding topic, which sometimes results in sparing of explanations of their nature. The aim of the present book is to unite the considerations of non-covalent interactions with the specificity of their application in protein sciences in a single reading. This includes comments on the nature of the different interactions and their manifestation in protein properties, derivation of the formulae most frequently used for the analysis of non-covalent interactions in proteins and the methods for their calculation. Although the derivation of the various formulae can be found in the specialised textbooks, here the derivations are presented step by step, sometimes even to a level that might look trivial. The purpose of this is to diminish the unnecessary fear of mathematics that some students have inherited
XI
Xll
Preface
from their previous education. In this way, the book can be a useful aid for students of biology, biochemistry, or biomedicine who want to extend their knowledge about how protein properties are described on a molecular level. At the same time, the present book can help students of physics or chemistry who have interests in biology and biophysics. Attention is paid on the terminology, which sometimes is differently used in the different disciplines of science, thus leading to ambiguity and misunderstandings. To make the material closer to the everyday language of biological sciences, and hence to the intuition of the reader, some of the terms do not meet exactly the requirements of the rigorous canons of physics. Thus, for instance, temperature is given in Celsius, although in thermodynamics the absolute temperature must be used. Hopefully, this can help the inexperienced reader to sharpen his or her attention when reading scientific literature, where the two temperature scales are used with a comparable occurrence. Due to the same reasons, the energy units are given in calories (cal/mol or kcal/mol), instead in Joules (J/mol or kJ/mol). The literature quoted refers to the works which to the best knowledge of the author are pioneering in the corresponding field. Last, but not least, the author would like to acknowledge the stimulation and the sincere support of Prof. Rudolf Ladenstein during preparation of the material. The author especially thanks Associate Professor Vladimir Pericliev for his valuable help in the preparation of the manuscript.
Andrey Karshikoff
Chapter 1
Introduction
Non-covalent interactions are weak interactions between atoms or molecules where no chemical reaction takes place. Because no formation or breaking of chemical bonds is induced, non-covalent interactions are often called non-bonded interactions. Formally, we distinguish three types of non-covalent interactions. The most common are the van der Waals interactions. They are short range interactions and occur always when two atoms or molecules come close to each other. We define as short range interactions the interactions which become relevant at distances comparable with the size of the interacting atoms. In this way, practically only neighbouring atoms are involved in these interactions. The Hydrogen bonds are interactions which are at the boundary between the chemical bonds and non-covalent interactions. They take place between pairs of atoms only if one of them is a proton donor and the other one is a proton acceptor. Electrostatic interactions are the third type of non-covalent interactions. In contrast to the other two types, electrostatic interactions are long range ones. This means that electrostatic interactions are also relevant beyond the limits of the closest neighbours. This makes their description somewhat more complicated. Therefore, a special attention will be paid to these interactions. Proteins became a subject of intensive investigations as a part of the colloid chemistry, since a number of their physical properties, such as sedimentation, diffusion, viscosity, light scattering, and many others are similar to those of the colloid particles. The colloid particles are molecular aggregates kept together by the delicate balance of attractive and repulsive forces, all resulting from the non-covalent interactions between the molecules comprising the colloid system. Let us set aside for 1
2
Introduction to Non-covalent Interactions in Proteins
the moment all we know about proteins and glance at the molecule presented in Fig. 1.1. This is Ribonuclease Tl, a small protein which binds and splits ribonucleic acids. The similarity of the molecule to a typical colloid particle is manifested in two aspects, at least. First, the molecule looks like an aggregate of atoms. Second, the surface of the molecule is rich of charges; depending on the physical conditions of the solution, the oxygen (red spheres) and nitrogen (blue spheres) atoms may be negatively or positively charged, respectively, or may have partial charges due to delocalisation of their electron clouds. As it will be shown below, the formation of the compact body seen in the figure, as well as the exposure of the charges on the surface of the molecule, are governed by the same forces responsible for the formation of the colloid particles: namely, the complex action of non-covalent interactions of different type. Proteins are not colloid particles, but molecules with properties, on the basis of which all known forms of living matter exist. Proteins bind and transport organic and inorganic compounds in this way regulating physiological processes or catalysing chemical reactions. These properties of proteins are referred to as functional properties. In the lower panel of Fig. 1.1, the complex of Ribonuclease Tl with the inhibitor guanylyl-2'-5'-guanosine is shown. As seen there, the molecule of the inhibitor is situated in a cleft formed by the protein. This cleft is the active site, i.e. the site where the substrate ribonucleic acid binds and the catalytic reaction takes place. It has a shape that matches the size and the conformation of the substrate or the inhibitor. In this way, the active site facilitates binding and at the same time makes it specific: compounds with other chemical composition or in "inappropriate" conformation do not bind. Another important feature which is not illustrated in the figure is that the atoms constituting the cleft create a micro environment facilitating the catalytic reaction when the substrate binds. Thus, the active sites, as well as the rest of the molecule, are not just aggregates of atoms, as the first glance at the molecule could suggest, but organised structures. Even small changes of this organisation may diminish or terminate the function of the protein molecule.
Introduction
3
Figure 1.1 Three-dimensional structure of ribonuclcasc Tl obtained by X-ray crystallography and deposited in Protein Data Bank1. The atoms are represented by spheres corresponding to their van der Waals radii and coloured according to their type: grey (carbon), blue (oxygen), red (nitrogen), and yellow (sulphur, partially seen at the right hand side of molecule). These colours will be used in all other figures, unless otherwise stated. The hydrogen atoms are omitted. Upper panel: inhibitor free form of ribonuclease Tl. Lower panel: complex of ribonuclease Tl with the inhibitor guanylyl2'-5'-guanosine. The inhibitor molecule is represented by sticks and in green in order to make the active site cleft of the protein clearly seen. All colour molecular images are reproduced using The PyMOL Executable Build (2005), DeLano Scientific LLC, South San Francisco, CA, USA, unless otherwise stated.
4
Introduction to Non-covalent Interactions in Proteins
1.1 Some Historical Notes The first idea for the structuring of proteins was given by Gerardus Johannes Mulder. In his famous paper2 "Tiber die Zusammensetzung einiger thierischen Substanzen" ("On the Composition of Some Animal Substances") Mulder investigated the atomic composition of three "albuminous substances", as proteins were then called, noticing that sulphur and phosphorus bind to an organic body with the composition C4ooH62oNiooOi2o. He named it protein, from the Greek 7tpcoTSio<; (primary) and proposed the formulae: protein + SP in the case of fibrin and egg albumin, and protein + 2SP in the case of serum albumin. In this way, Mulder separated the organic part, the protein, from the inorganic atoms sulphur and phosphorus. At that time, only two amino acids were identified in proteins, glycine and leucine. The sulphurcontaining cysteine and methionine were not known, so that separating the sulphur apart from the protein is completely explicable. By the end of the 19th century the list of the amino acids constituting proteins was almost completed. Still, the structure of the protein bodies was unclear. The breakthrough in the understanding of protein structure was made with the hypothesis, independently proposed by Franz Hofmeister3 and by Emil Fischer4, that amino acids in proteins are linked through repeating peptide bonds. It is remarkable that Hofmeister and Fischer reported their ideas in the same day, at the "74th Annual Meeting of the Gesellschaft der deutschen Naturforschen und Arzte" on September 22, 1902 in Karlsbad (today Karlovy vary, Czech Republic). The hypothesis for the polypeptide nature of proteins became dominant during the next decades. The recognition of all 20 amino acids as protein building blocks was completed with the isolation of threonine in 1938. Still, not very much was known about the spatial organisation of the amino acids in the protein molecules. The fact that the proteins most commonly used for experimental studies, the globular proteins, exhibit some properties typical for hydrophilic colloid particles — they are water soluble charged particles characterised with a compact structure — stimulated the development of the investigations of electrostatic interactions. In 1924 Linderstr0mLang5 proposed a theory for prediction of hydrogen ion titration curves
Introduction
5
of proteins. In this theory, the protein molecule is presented as an impenetrable sphere on the surface of which the charges of the titratable amino acid groups are uniformly distributed. From a present point of view this is a rather rough approximation. Nevertheless, the theory of Linderstr0m-Lang was successfully applied for prediction of titration curves of proteins. The theory of Linderstr0m-Lang became the basis of all following models of electrostatic interactions in proteins known as continuum dielectric models. A valuable contribution to understanding ionisation behaviour of amino acids and proteins was made by Kirkwood with the works on the dissociation constants of organic acids and zwitterions. Based on the same assumption that the protein is an impenetrable sphere, the model was extended by presenting the charges of the titratable groups as point charges6-8. Because protein structure was not known, the position of the charge points could not be defined. This made the direct application of the theory to proteins limited to a large extent. In spite of the fact that electrostatic interactions were the first non-covalent interactions that have drawn the attention of the scientists and the following success in developing of comprehensive theoretical approaches, it should be noted that the theoretical description of electrostatic interactions and the evaluation of their role in physical chemical and functional properties of proteins face difficulties even at the present. The experimental observations convincingly showed that the proteins maintain their compactness at conditions close to the physiological ones, typical for cells or the tissues they are isolated from. At these conditions proteins are in their native state. From the concept of the polypeptide nature of protein molecules, it follows that in native proteins the polypeptide chain is folded and in this way forms a compact body. Changing the conditions, for instance by reducing pH or increasing temperature, the proteins loose their compactness and solubility, as well as their activity. Proteins adopt denatured state, or simply, they denature. The polypeptide chain in denatured state is not folded any more, it adopts the features of a random coil, so that one can also speak about unfolded proteins or unfolded state, a term which is more adequate if the state of the polypeptide chain is of interest. Unfolded state can be achieved by adding denaturing agents, such as guanidinium chloride or urea
6
Introduction to Non-covalent Interactions in Proteins
(chemical denaturation), changing pH of the solution (pH-induced denaturation), or changing the temperature (thermal denaturation). The question arises as to what the forces keeping the polypeptide chain folded are, thereby maintaining the protein molecule in the native state. The simultaneous presence of positive and negative charges in the protein could partially give an answer: the attraction between opposite charges keeps the native proteins compact. This concept was strongly supported by the pH-induced denaturation experiments. Reducing pH, for instance, the negatively charged amino acids become neutral, which leads to the increase of the contribution of the repulsive interactions between the positive charges. As a result of this repulsion the protein molecule denatures. Although the picture of pH-induced denaturation described above is qualitatively correct, it was clear that electrostatic interactions, by themselves, could not give a comprehensive explanation of the stability of the compact structure of native proteins. It is fashionable nowadays to talk about, and to stimulate, interdisciplinary science. In fact, science becomes spontaneously interdisciplinary if it is needed, and this is not a privilege of the present day. The "young" protein science, inhabiting the room of colloid and organic chemistry, is an example of interdisciplinary research started at least one century ago. The fast advance of the quantum physics brought forth new understanding of the structure of matter and the interactions within and between the molecules. It was found that apart from the chemical bond (sharing of electrons) there are other attractive interactions based on sharing of a hydrogen nucleus: the hydrogen bond. This idea was first proposed by Latimer and Rodebush, as the electrostatic attraction between the unbounded electron pairs and the polar hydrogen atoms was called "weak bond"9. Linus Pauling has further developed this idea, introducing also the term hydrogen bond, and implemented it for the explanation of the forces responsible for the compactness of the native proteins10'11. In the fundamental work of Pauling and Myrsky11 the compact structure of proteins was explained in terms of a network of hydrogen bonds between the peptide nitrogen and oxygen atoms, which keep the polypeptide chain in a "uniquely defined configuration". Further stabilisation of the native protein structure comes
Introduction
7
from the hydrogen bonds between the side chains of the amino acids*. Moreover, Pauling and Mirsky explained protein denaturation with breaking of the hydrogen bonds. Thus, together with electrostatic interactions, a new type of non-covalent interactions has been involved in the description of properties of proteins. A decisive step towards revealing protein structure was made shortly after the Second World War. More precisely, the decisive step had been hindered by the war. Two important discoveries should be mentioned. Frederic Sanger developed an experimental methodology for determination of the amino acid sequence of protein molecules. He, and his collaborator Hans Tuppy, were the first to determine the sequence of a protein12-14. This was the molecule of insulin (Fig. 1.2).
Figure 1.2 The amino acid sequence of insulin determined by Sanger and Tuppy. The protein consists of two polypeptide chains linked by two disulphide bonds. The amino acid names are given in three-letters code (see Table 1.1).
The finding of Sanger and Tuppy can be considered not only as a final proof that the Hofmeister-Fischer hypothesis is correct. It confirmed * Terms, such as side chain, main chain, etc. are described in the next section.
8
Introduction to Non-covalent Interactions in Proteins
that the fundamental element of the protein structure is the polypeptide chain. Thus, the Mulder's "primary" part of the "albuminous bodies" is a sequence of amino acids linked in a polypeptide chain and naturally has the name primary structure. The primary structure, i.e. arrangement of the amino acids along the polypeptide chain, is unique for proteins from a given type and given species. This allows the formulation of a new hypothesis, namely that it is the sequence that determines the structural organisation of the protein molecule, and hence its functional properties. This is the so-called Anfinsen's dogma15, which will be discussed in Section 1.3. Nowadays this hypothesis is beyond any doubt, being continually confirmed, say, by mutagenesis experiments. Mutations of the sequence, i.e. changes caused by adding, removing or substituting amino acids in the polypeptide chain, result in changing the structure and the properties of the protein molecule. Of course, not all changes make the same impact on the structure and the functions of the proteins. Changes in parts of the sequence involved in the formation of the active site are, as a rule, crucial for the functions of the molecule, while other changes may have negligible influence. One can compare the protein sequence with a text written by means of 20 letters containing the information needed to build a molecule with defined structure and functions. The "letters" in the sequence are the amino acids, which differ from each other by the chemical compositions of their side chain, and hence by their physical chemical properties. Obviously, the information coded in the sequence transforms to a real structure with corresponding properties by means of a mechanism based on the non-covalent interactions between the amino acid side chains, as well as between the atoms of the protein molecule and the surrounding solvent. The second important discovery is a result of the purposeful work of Pauling on the factor stabilising native proteins. In the year 1951, he published the first model of helical conformations of a polypeptide chain16. Based on X-ray data of crystalline amino acid and short peptide, he modelled the well known oc-helix with amazing precision. Later, he proposed another structural organisation, the (3-sheet, at which the polypeptide chains are in extended conformation. The polypeptide chains in a (3-sheet can be mutually oriented in two ways: parallel, when the
Introduction
directions form N- to C-termini of adjacent chains arc the same, and antiparallel, when N- to C-termini directions are opposite. Examples for a oc-helix and a p-sheet combining parallel and antiparallel orientations of the polypeptide chains are given in Fig. 1.3.
Figure 1.3 The most common secondary structural elements. The hydrogen bonds ( > C = 0 ' H - N - ) are given with dashed lines. The hydrogen atoms are not shown. Left: a-helical segment from Helicobacter pylori cysteine rich protein b. Right: fragment of a P-shcct from antithrombin. The first and the second polypeptide chains from left hand side of the p-sheet are in parallel orientations, whereas the second and the third are in antiparallel mutual orientation. The arrows indicate the parallel and the antiparallel orientations.
The a-helices and the (i-sheets, predicted by Pauling, proved to be the most common conformational pattern found in proteins. There are also other structural organisations of the polypeptide chain found in proteins later. Such are the 3|0-helices and two other types of bends, as well as the rarely observed jt-helices. All these structural elements are united by the name secondary structure.
10
Introduction to Non-covalent Interactions in Proteins
Electrostatic interactions and hydrogen bonds were considered as the main factors responsible for protein properties, including structural organisation, stability of the native state, as well as functional properties. It has been noticed, however, that the structure of native proteins is related to the surrounding solvent. The pioneering investigations on X-ray diffraction of protein crystals have shown, for instance, that during evaporation of water the diffraction loses its sharpness. This phenomenon could be interpreted in terms of breaking of hydrogen bonds between water molecules and the polar groups of the amino acid side chains of the protein molecule. This example is important with the fact that not only electrostatic interactions and the hydrogen bonds within the molecule, but also the interactions between the solvent molecules and the protein play a role in the stabilisation of the structure of the protein molecules. It was also known that some organic compounds, such as hydrocarbons, are hydrophobic, i.e. they do not dissolve in water, but rather prefer to form aggregates or to stay at the water/air interface. This effect suggests that there is a kind of attractive interactions between the hydrophobic compounds when they are surrounded by water. Proteins contain amino acids that have hydrophobic side chains, so that this type of interactions should be present also in proteins. Kauzmann was not the first who paid attention to this fact, but he was the first who related it to the stability of native proteins. In an article published in 195917 he emphasised the fact that about half of the amino acids found in proteins have non-polar, hydrophobic, side chains. He introduced the term hydrophobic bond to describe the attractive interactions the non-polar amino acid side chains are involved in. The hydrophobic interactions cause a seeming attraction between the non-polar amino acid side chains, accompanied by a reduction of their contact area with the water molecules. As an end effect, this leads to the formation of a hydrophobic core of the protein molecule. The energetic evaluations of the hydrophobic core formation have shown that hydrophobic interactions are the dominant factor for the stabilisation of the native protein structure. We speak about hydrophobic interactions, semantically emphasising on the appearance of the phenomenon: the attraction between non-polar
Introduction
11
compounds in water medium. As we shall see in Chapter 4, hydrophobic interactions do not exist as a separate type of interactions. Rather, they are an effect of the behaviour of water molecules surrounding a nonpolar compound. The determination of the three-dimensional structure of proteins entirely confirmed the concept of the role of non-covalent interactions in spatial arrangement of the protein molecule. The predicted cc-helical organisation of the polypeptide chain made by Pauling was found in the first high-resolution three-dimensional protein structure, that of myoglobin18. It was also found that the titratable groups, i.e. the titratable amino acid side chains, are predominantly of the protein surface, as presumed by Linderstr0m-Lang. The hydrophobic side chains are buried in the protein interior, which follows the concept of Kauzmann. The three-dimensional structure of proteins is often called tertiary structure. Shortly after the first three-dimensional protein structure had been solved, the structure of the haemoglobin was solved. This is a protein molecule which is an assembly of more than one polypeptide chains. If a protein constituted by an assembly of polypeptide chains, it is characterised by quaternary structure. The individual polypeptide chains in this case are called subunits. The individual subunits can be identical, similar, or completely different, each characterised by a different fold. The hierarchy of protein structures (primary, secondary, tertiary, quaternary) has been introduced by Linderstr0m-Lang before their unambiguous experimental determination by X-ray crystallography . One can say that ideas about the organisation of the protein molecules were developing with forestalling rates in comparison with their experimental confirmation. It seems that the situation now is the opposite. At present, more than 31,000 protein structures are deposited in the Protein Data Bank* and this number continuously grows. Still, there is a wide spectrum of questions related to the stability and the functionality of proteins, whose answer is unknown. Of course, the character of these questions is different. We would like to know the molecular mechanisms of enzymatic catalysis; or we would like to predict the change of concrete functional properties of a protein molecule http://www.rcsb.org/pdbAVelcome.do
12
Introduction to Non-covalent Interactions in Proteins
caused by mutations. Moreover, we would like to manipulate protein properties; to reduce or increase stability at given conditions, to design proteins, which are active at external conditions (temperature, pH, etc.) distant from those typical for their natural environment. Without knowledge of protein structure and a deeper understanding of noncovalent interactions, these ambitions are not realistic. Therefore, structural biologists, chemists, physicists and theoreticians work together making the modern protein science a very interesting research area. 1.2 Overview of Protein Structural Elements and Basic Definitions In the previous sections, a number of terms were introduced without being precisely defined. This mainly concerns the structural elements of proteins. We will confine ourselves only to the terms relevant for the matter of interest, the non-covalent interactions. A detailed survey on protein structure with typical examples and appropriate illustrations can be found in the books of Lesk20'21, as well as in any biochemistry textbook. 1.2.1 The amino acids In Fig. 1.4 the naming of the atoms in the amino acids is illustrated using lysine as an example. It is convenient to start with the atom named Ca and to consider it as a central one. In the free amino acids, i.e. amino acids which are not linked in a polypeptide chain, Ca binds the a-amino and a-carboxyl groups (-NH 2 and -COOH in the left panel of Fig. 1.4, respectively). If the amino acid is part of a polypeptide chain these two groups are transformed to amide (-NH-CO-), and form the peptide bonds linking the adjacent amino acids (see also Fig. 1.6). Therefore, in a polypeptide chain the a-amino and the a-carboxyl groups can exist only at its ends (see Fig. 1.2). The atoms constituting the polypeptide chain of the protein are denoted as N, Ca, C, and O (the right hand side panel of Fig. 1.4). These atoms form the main chain of the protein molecule, or in other words, the backbone. The third non-hydrogen atom bound to Ca is Cp. This atom is the first in the side chain of the amino acid. The atoms
Introduction
13
from the side chain are designated by their chemical symbols and successive Greek letters. In the example given in Fig. 1.4, the most distant atom along the side chain is N£, from the e-amino group (this group is bound to the carbon at position e, Ce). OH
1
/ H2CX CH 2 / H?C
\ / H2N
CH2
Figure 1.4 Naming of the amino acid residues illustrated with lysine. Note the differences between the left and the right panels. Left: The structural formula of the amino acid lysine. The a- and e-amino groups are in their deprotonated forms, whereas the acarboxyl group is protonated; all these groups are neutral in this state. Right: Hard spheres model of the amino acid residue lysine. The hydrogen atoms are presented as small spheres in grey, the colour scheme for the other atoms is as in Fig. 1.1. The e-amino group is protonated, i.e. positively charged (-NH3+). Note also that only the atoms involved in the peptide bonds are given.
As we have already discussed, the amino acids differ in their side chains. In Table 1.1 the amino acids composing the protein molecules, the natural amino acids, are grouped according to the most commonly used criteria. In order to easily distinguish the difference in the chemical composition of the side chains, they are drawn vertically, whereas the atoms participating in the peptide bonds, the oc-amino and the a-carboxyl groups are horizontally drawn. The aliphatic side chains are entirely hydrocarbon chains, a feature determining their low solubility in water. The shortest aliphatic side chain, that of alanine, contains only one methyl group at position p (see also Fig. 1.6). The other aliphatic side
14
Introduction to Non-covalent Interactions in Proteins
chains are branched. Valine and isoleucine have two methyl groups on yposition, whereas leucine has two methyl groups on position 8. Non-polar groups are also poorly soluble in water. Due to their low solubility in water non-polar and aliphatic side chains are also called hydrophobic side chains or hydrophobic groups. Glycine does not have a side chain, the position (3 being occupied by a hydrogen atom. It is classified as a non-polar group because the hydrogen atom at p position is practically not polarised (Chapter 3). The side chain of proline is peculiar with Cy being linked to N from the backbone. The -SH group of a cysteine side chain is often linked to the -SH group of another cysteine in the protein molecule, thus forming a disulphide bridge (schematically illustrated in Fig. 1.2). The polarity of the S-S cross-link is low. However, if the cysteine side chain is not involved in a disulphide bridge, it is not any more non-polar. Even more, the -SH group can be deprotonated at alkaline pH making the cysteine side chain negatively charged. Among the aromatic side chains, those of the histidines and the tyrosines can also be charged depending on the protonation state of the imidasole rings and the phenol hydroxyl groups, respectively. The polar side chains are well soluble in water, they are hydrophilic. They contain amide (asparagine and glutamine) and hydroxyl groups (serine and threonine) which can form hydrogen bonds with other neighbouring polar groups from the protein moiety or with compounds, including water molecules, from the solvent. The charged amino acid side chains have groups which at physiological pH values are usually charged. In the aspartic and the glutamic acids these are (3- and y-carboxyl groups, in the arginine this is the guanidine group, and for the lysine the e-amino group. These side chains, as well as the histidines, the tyrosines and reduced (not involved in disulphide bridges) cysteines create a charge constellation which changes with the physical conditions, for instance pH, and thus changes the properties of the protein molecule.
Introduction
15
Table 1.1 Structural formulae of the amino acids constituting protein molecules. Threeand one-letter codes of the amino acid names are given in parentheses. Aliphatic Alanine (Ala, A)
Valine (Val, V)
OH CH 1
„C^ ^ 0
CH HgC
Non-polar Glycine (Gly, G)
CH3
Proline (Pro, P)
II
H
_,C CH2
OH
2
/
1
H2C CH 3
OH
1 H2N^
\ CH2"CH2
X
CH3
Methionine (Met, M) OH
1
^C^ CH N3
| /CH
/ C ^ ^ 0
1
Cysteine (Cys, C)
^OH
CH XH
O ^CH
H2N^
^O
II
0 H2N^
CH 1 1 H 3 C. /CH2 CH 1 CH 3
1
CH 3
OH T
T
^"0
1
Isoleucine (He, I)
OH
|
1
H2N^
Leucine (Leu, L)
OH
CH 1 .CH 2
2
HS
^-0
H2C
1
»3c Aromatic Histidine (His, H)
Phenylalanine (Phe, F)
OH
CH HN
^ 0
V
1
H 2 N.
II
/C-. /CH XH HO
1
2
H2N^
/CH2 H 2 CT
1
NH2 0^
NH2
/
^ C
HC.
/^CH CH
T
/C^ CH ^ 0 I _/CH2
1
1
O^CH
/C^. CH ^O
1
HO
T
/C^. CH ^O 1 1
/
Threonine (Thr, T) OH
OH
1
H 2 N.
^C HC
Serine (Ser, S)
OH ^ 0
H2N^
^C^. ^ 0
1
1
II
Polar Asparagine (Asp, N) Glutamine (Gin, Q)
HC^
XH 1
H C ^ /CH ^CH
HO
OH
1
1
.CH ,CH 2 HC^ XC^
CH-N
Tryptophan (Trp, W)
OH
OH T H 2 N. ^C^ XH ^ 0
1
CH I ,CH 2
Tyrosine (Tyr, Y)
H 2 N.
|
/C^. CH ^-O I 1 .CH HO "XH 3
16
Introduction to Non-covalent
Table 1.1
Interactions in Proteins
(Continued)
Chargeable Lysine ( L y s , K )
H2N
A r g i n i n e ( A r g , R)
/0H
n
A s p artic acid ( A s p , D ) HO
r
CH2 / ' H2N
CH
2
°
HN=C
CH,CH 2 / Z NH-CH 2
OH
H2N. H
C/H CH2 C H 2 /
Glutamic acid ( G l u , E)
2N^
v\
^CN.
O 0^
c /
CH
I
^CHg CH 9
a
1
/ OH
X N
H
HO^
%
2
As seen from this brief survey, the individual amino acid side chains interact with their neighbourhood predominantly with one or another type of non-covalent interactions, depending on their chemical compositions. This results in a complex interplay between the different non-covalent interactions, which on its side is at the bottom of the mechanisms for transformation of the information coded in the protein sequence into a structure with given properties.
Figure 1.5 Orientation of the atoms in L-amino acid (left hand side) and R-amino acid (right hand side). The amino acids are shown viewed from the hydrogen atom towards the C(x atom which is at the centre of the amino acid. Except for glycine the amino acids are chiral, i.e. they cannot be superimposed on their mirror images. In this way, one distinguishes two
Introduction
17
types of amino acid: L-amino acid and their mirror image, the R-amino acid. The usual way for representing the spatial organisation of the atoms connected to Ca is given in Fig. 1.5. Both L- and R-amino acids exist in Nature. However, only the L type is presented in proteins. This fact is still one of the mysteries of Nature still awaiting a plausible explanation. 1.2.2 The polypeptide chain Understanding of the principles for formation of the protein structure cannot be achieved without taking into account the specific properties of the polypeptide chain. In Fig. 1.6 a segment of a polypeptide chain containing the amino acid side chain alanine is shown. The peptide bond connecting two amino acids contains the atoms Q_i and 0,_i from the previous amino acid, / - l , and the atoms N„ and H, from the amino acid i (the alanine). If the group i is the first one in the polypeptide chain (/' = 1), the a-amino group (-NH2) remains free. It is usually called Nterminal amino group, or simply N-terminus. As it has been already mentioned, at physiological pH the a-amino groups are protonated, i.e. have the form -NH 3 + . In the same way the C„ 0„ N,+), and H,+i form the next peptide bond. If the group i is the last in the polypeptide chain, the a-carboxyl group remains free (C-terminus) and is charged at physiological pH values. Thus, the ends of a polypeptide chain, the N- and the C-termini, are usually charged. Cow,
Figure 1.6 The amino acid alanine linked in a polypeptide chain.
18
Introduction to Non-covaleni Interactions in Proteins
The peptide bond is planar. By convention, the angle between the C=0 and NH groups has the value co = 180° when the oxygen, 0„ and the hydrogen, H,+,, atoms are most distant (trans conformation). This is the conformation illustrated in Fig. 1.6. The other conformation that allows planarity of the peptide bond is the cis conformation at which CO = 0°. This conformation is energetically less favourable than the trans conformation and is rarely observed in proteins.
O (-0.42)
Figure 1.7 Dipole moment of the peptide bond. Values in parentheses are the partial charges of the individual atoms .
The peptide bond is also a highly polar one. The distribution of the partial charges within the peptide bond is given in Fig. 1.7. It should be noted that the values of the partial charges can vary depending on the model and the method used for their calculation. In all cases, however, the charge distribution creates a substantial dipole moment. An important property of the polypeptide chain is that it is flexible. This flexibility is ensured by the rotation around N-Ca and C a - C bonds connecting the amino acid residue with the adjacent peptides. The rotation around N - C a is given by the value of the torsion angle >, whereas the rotation around C a - C is described by the torsion angle y/ (see Fig. 1.6). The angles
Introduction 180°
Y> ).. •
W
0°
\aL
i
\ •r
19
1
* ^
aR
)
'
;
-180° -180°
0°
180°
<*> Figure 1.8 Ramachandran plot of the backbone dihedral angles
As seen from the figure, the regions of values that > and \f/ can have are restricted. However, the number of conformations that the polypeptide chain can adopt is enormously large. 1.3 Non-covalent Interactions and Structure-Function Relationships in Proteins The problem of structure-function relationships in proteins is one of the fundamental problems of protein science. That is, what are the driving forces responsible for the protein molecules to adopt structures with certain biologically relevant properties? Also, can we predict the functional properties of a protein molecule if we know its structure? Further questions can be posed as well. For instance, how are functional properties changed when protein structure is changed? To a certain extent these questions have already been addressed in Section 1.1. We
20
Introduction to Non-covalent Interactions in Proteins
have pointed out that the mechanism of decoding the information stored in the protein sequence (Anfinsen's dogma) is based on the interplay between the non-covalent interactions. Therefore, it becomes obvious that the problem of structure-function relationships can be approached after a quantitative analysis of non-covalent interactions. 1.3.1 Some comments on Anfinsen's dogma Afinsens's dogma states that the three-dimensional structure of a protein is determined solely by its amino-acid sequence. This conclusion is deduced from the fact that unfolded proteins lose their functional properties adopting the unfolded state: a state at which no secondary or higher levels of structure permanently exist. After removing the denaturing agents proteins spontaneously fold back into the native conformation restoring their biological functions. There are two aspects of this observation. The first one is the simple scheme: - losing the structure causes losing of function - restoring the structure causes restoring the function, which proves the connection between protein structure and protein function. We have to note that the above scheme should not be considered as a strict formula. There are proteins which are unstructured, yet with biological functions. Such are, for instance, the as-casein and the histone H3. Other proteins, such as a number of transcription factors, manifest their biological functions upon the transition from unfolded to folded state. There are also proteins, which are only partially structured. The human y-interferon can be taken as an example. The polypeptide chain of this protein is well structured, whereas the part containing the last 21 amino acids in the sequence is disordered. The biological functions of the unstructured part of y-interferon are still unclear24. These few examples are not violations of Anfinsen's dogma. They should rather be considered as an extension, making the problem of structure-function relationships in protein more complicated and hence, the task more interesting. The other aspect concerns the spontaneous refolding after removal of denaturing agent. The interpretation of this observation is that the native conformation of the protein corresponds to the state of global-minimum
Introduction
21
of the free energy. This means that the description of the driving force of protein folding obeys the principle of thermodynamics. Although this conclusion is not surprising and, to a certain extent, sounds trivial, it is useful to have it as a standpoint. Protein functions are not a result of forces that are inherent to living matter only; they are result of interactions that follow the known physical laws. Combining structural information, say X-ray data, with the phenomenological data obtained by calorimetric or other indirect experimental methods, one can build a correct understanding of a wide variety of problems united by the term structure-function relationships. One should only add one relevant detail. We need to know more about non-covalent interactions, especially about their specific appearance in proteins. 1.3.2 Experimental measurements of non-covalent interactions in proteins Non-covalent interactions in proteins cannot be directly measured. Therefore, their quantification is made indirectly, by evaluation of their effect on measurable quantities. Thus, for instance, the magnitude of electrostatic interactions between two titratable groups is deduced from the shift of the ionisation equilibrium constants of these groups with respect to some reference values. The contribution of the hydrogen bonds to structural stability is evaluated on the basis of model compounds or by mutagenesis experiments. In all these cases the above assessments are approximate. The shift of the equilibrium constants depends on electrostatic interactions, but other effects, such as desolvation, have significant influence as well. The use of model compounds for estimation of the contribution of hydrogen bond strength to the stabilisation of secondary structural elements in proteins is a priori an approximation, because measurements are not made on proteins. Measurements based on mutagenesis experiments also introduce uncertainty and need careful analysis of experimental data because the chemical composition of the protein molecule is changed. One can argue that direct experimental measurements can still be done. Indeed, X-ray crystallography, as well as Nuclear Magnetic Resonance (NMR) experiments and electron microscopy, provide direct
22
Introduction to Non-covalent Interactions in Proteins
structural information. On the basis of X-ray data the spatial coordinates of the protein atoms can be determined, which allows the estimation of a variety of characteristics, such as the geometry of hydrogen bonds, the distribution of the charged groups, the packing, and most importantly: the identification of the active site. One can hardly imagine any progress in the understanding of structure-function relationships without this information. Structural information is substantial but it is not sufficient. In order to understand and hence predict the protein properties — both physical-chemical and biological — one needs to know the magnitude of non-covalent interactions in detail. In other words, one needs to know the interaction energy between the protein atoms. Fortunately, methods for direct measurements of the "energetics" of proteins exist. These are the Differential Scanning Calorimetry (DSC) and the Injection Titration Calorimetry (ITC). By means of DSC one can measure the change of the heat capacity and the enthalpy upon protein denaturation. The change of enthalpy due to ligand binding or protein assembly, for instance, can be directly measured using the method of ITC. These quantities are however thermodynamic ones and characterise the investigated system (the protein molecule and the surrounding solvent) as a whole. They do not provide information about the nature of the interactions that take place in the system. Therefore, in order to understand the connection between the observed phenomena and their molecular nature, we need appropriate theoretical tools. One can say that the bridge between the observation and the understanding is the theory. In the following considerations, we will try to consider the basic tools that can help us understand the extraordinary features of the molecules of life — the proteins. References 1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN and Bourne PE, (2000) The protein data bank. Nucleic Acids Res., 28: 235-242. 2. Mulder GJ, (1839) Ueber die zusammensetzung einiger thierischen substanzen. J. Prakt. Chem., 16: 129-152. 3. Hofmeister F, (1902) liber den bau des eiweissmolekiils. Ergeb. Physiol., 1: 759802.
Introduction
23
4. Fischer E, (1902) Ueber die hydrolyse der proteinstoffe. Chem. Ztg., 26: 939. 5. Linderstr0m-Lang K, (1924) On the ionisation of proteins. C. R. Trav. Lab. Carlsberg, 15: 1-29. 6. Kirkwood JG, (1934) Theory of solutions of molecules containing widely separated charges with special application to zwitterions. J. Chem. Phys., 2: 351-361. 7. Westheimer FH and Kirkwood JG, (1938) The electrostatic influence of substituents on the dissociation constants of organic acids. II. J. Chem. Phys., 6: 513-517. 8. Kirkwood JG and Westheimer FH, (1938) The electrostatic influence of substituents on the dissociation constants of organic acids. I. J. Chem. Phys., 6: 506-512. 9. Latimer WM and Rodebush WH, (1920) Polarity and ionization from the standpoint of the lewis theory of valence. J. Am. Chem. Soc, 42: 1419-1433. 10. Pauling L, (1928) The shared-electron chemical bond. Proc. Natl. Acad. Sci. U. S. A., 14: 359-362. 11. Pauling L and Mirsky AE, (1936) On the structure of native, denatured, and coagulated proteins. Proc. Natl. Acad. Sci. U. S. A., 22: 439-447. 12. Sanger F and Yhompson EOP, (1953) The amino-acid identification in the glycyl chain of insulin. 1. The identification of the lower peptides from partial hydrolysates. Biochem. J., 53: 353-374. 13. Sanger F and Tuppy H, (1951) The amino-acid sequence in the phenylalanyl chain of insulin. 1. The identification of lower peptides from partial hydrolysates. Biochem. J., 49: 463-481. 14. Sanger F and Tuppy H, (1951) The amino-acid sequence in the phenylalanyl chain of insulin. 2. The investigation of peptides from enzymatic hydrolysates. Biochem. J., 49: 481-490. 15. Anfinsen CB, (1973) Principles that govern the folding of protein chains. Science, 181: 223-230. 16. Pauling L, Corey RB and Branson HR, (1951) The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Proc. Natl. Acad. Sci. U. S. A., 37: 205-211. 17. Kauzmann W, (1959) Some factors in the interpretation of protein denaturation. Adv. Protein Chem., 14: 1-63. 18. Kendrew JC, Dickerson RE, Strandberg BE, Hart RG, Davies DR, Phillips DC and Shore VC, (1960) Structure of myoglobin: a three-dimensional Fourier synthesis at 2 A resolution. Nature, 185: 422^127. 19. Linderstr0m-Lang KU, (1952) Proteins and Enzymes The Lane Medical Lectures. Stanford, CA: Stanford University Press. 20. Lesk A, (2004) Introduction to Protein Science. Architecture, Function, and Genomics. New York: Oxford University Press. 21. Lesk A, (2001) Introduction to Protein Architecture. New York: Oxford University Press.
24
Introduction to Non-covalent Interactions in Proteins
22. Hoi WGJ, van Duijnen PT and Berendsen HJC, (1978) The a-helix dipole and the properties of proteins. Nature, 273: 443^446. 23. Ramachandran GN and Sasiskharan V, (1968) Adv. Protein Chem., 23: 283-437. 24. Tsanev R and Ivanov I, (2001) Immune Interferon: Properties and Clinical Applications. Boca Raton, USA: CRC Press.
Chapter 2
Van der Waals Interactions
Van der Waals interactions are universally present interactions which always appear when two molecules come close to each other. They are considered as weak attractive interactions which become appreciable only when the interacting molecules are neutral and non-polar. Van der Waals interactions do take place between charged and polar molecules, as well. However, their effect as cohesive forces is concealed by that of the stronger electrostatic interactions. In the previous section we pointed out that about one half of the amino acids in proteins are characterised by non-polar side chains. One can presume that van der Waals interactions — being dominant when the interacting groups are non-polar — should have a considerable contribution to the stabilisation of the native protein structure. Instead, we pointed the hydrophobic interactions as a driving force for the collapse of the hydrophobic groups into a compact body and as a main contributor to protein stability, rather than van der Waals interactions. Hydrophobic interactions and the attraction between two non-polar molecules due to van der Waals interactions are completely different phenomena. As it will be discussed in detail in Chapter 4, hydrophobic interactions are an effect arising from the reorganisation of the proteinsolvent system towards an energetically favourable state. Contrary to van der Waals interactions, there are no hydrophobic interactions between two isolated molecules. Once the hydrophobic core of the protein molecules is formed, van der Waals interactions become the main interactions between non-polar groups. Although van der Waals interactions are weak, they should not be neglected when functional properties of proteins are of interest. Going 25
26
Introduction to Non-covalent Interactions in Proteins
back to Fig. 1.1, we can notice that the inhibitor molecule fits well into the active site cleft of the protein molecule. So does the substrate, too. In many cases substrate (or inhibitor) binding is driven by electrostatic or hydrophobic interactions. However, the sterical compatibility of the interaction molecules, i.e. the sterical recognition of the substrate (or the inhibitor), is regulated also by van der Waals interactions. 2.1 Observation of van der Waals Interactions The ubiquitous presence of cohesive forces has been noticed while investigating the behaviour of real gases. It was found that in gases of non-polar molecules there are attractive forces holding the gas molecules together. Reducing the temperature of a gas consisting of non-polar molecules, for instance rare gas, temperature can be reached at which condensation of the gas takes place, i.e. a transition from gas phase to liquid phase occurs. There would be no apparent reasons for holding the molecules together in a liquid phase if there were no attractive interactions between the molecules. These interactions are obviously weak because they become perceivable at a temperature at which the kinetic energy of the gas molecules is sufficiently low. In fact, this observation entirely reflects the characteristics we have already ascribed to van der Waals interactions: weak attractive interactions between nonpolar molecules. Let us now compare gas and liquid consisting of the same non-polar molecules. If we try to reduce the volume of these two systems by applying external pressure, we will notice that for the same magnitude of pressure the gas reduces its volume significantly, whereas the reduction of the liquid volume is negligible. In other words, the compressibility of the liquid is essentially smaller than that of the gas. One can conclude that in the liquid phase, apart from the attractive interactions keeping the liquid molecules together, there are repulsive forces, which do not allow the molecules to come closer to each other. The nature of these forces is completely different from that of the attractive interactions. Nevertheless, we will consider the repulsive forces in this chapter, formally as a part of van der Waals interactions, although the latter are usually identified with
Van der Waals Interactions
27
the attractive interactions only. As we shall see, in the calculations of non-covalent interactions in proteins the attractive (van der Waals) and the repulsive interactions are described by a single potential function. Let us now consider the behaviour of real gas in a quantitative way. The equation of state (see Appendix A for definitions) of an ideal gas is pV = nRT ,
(2.1)
where p is the gas pressure, V is the gas volume, n is the number of moles of gas, R is the gas constant and T* is the absolute temperature. Eq. (2.1) is empirical, i.e. it is derived from experimental observations. It is also phenomenological; it describes the behaviour of gas, but does not reveal any kind of interactions or mechanisms responsible for this behaviour. That is why it is not seen from the equation alone that it is valid only for low pressure. Real gases violate Eq. (2.1). In 1973, in his doctoral degree thesis, van der Waals proposed a modification of this equation to account for the behaviour of real gases: n2 (P + a — ) ( V - rib) = nRT, V2
(2.2)
where a and b are empirical constants depending on the kind of the gas. The experimentally measured pressure is somewhat less than the pressure expected for ideal gas due to the attractive interactions between gas molecules. In order to compensate this difference the correction to the pressure
is added. The effect of these attractive interactions is proportional to the number of molecules: the larger the number of molecules, the higher is the probability two molecules to come close to each other. Similarly, the effect of the attractive interactions is reciprocal to the volume: the probability of two molecules coming close to each other reduces with the increase of the volume. Since each molecule can attract and can be * This temperature is, and has to be, used if physical quantities are to be expressed or calculated (see also Appendix A).
28
Introduction to Non-covalent Interactions in Proteins
attracted, the end effect is proportional to (n/V)2 with a proportionality factor a. The concept of ideal gas ignores the fact that individual molecules have finite size. In reality, each individual molecule occupies some room. Because no two molecules can be in the same space at the same time, the room occupied by a molecule is unavailable for the rest of the gas molecules. Thus, the volume available for the individual molecules to move in a real gas is less then the volume, V, given in Eq. (2.1). This is taken into account by the term nb, also called excluded volume correction. The volume available to gas molecules is (V - nb), as given in Eq. (2.2). The constant b is then the excluded volume per mol gas. This volume differs from the volume of a single molecule. Let us approximate the molecules as hard spheres with radius r. The volume of such a molecule is v = 47tr3/3. Being hard spheres, the closest distance between two molecules is the sum of their radii, d = 2r. The volume of the sphere with radius d is the exclusion volume of a pair of molecules and is 4izd3/3 = 8 x 4nr3/3 = 8v. The exclusion volume per molecule is then b = 8v/2 = 4v. It is worth noting again that none of the corrections considered above reveal mechanisms of interactions between gas molecules. The first correction reflects van der Waals attractive interactions, whilst the excluded volume correction, nb, is related to the repulsive forces we have already mentioned. Ignoring it, Eq. (2.2) would be incomplete. This is another reason for considering the attractive and repulsive interactions together. 2.2 Nature of van der Waals Interactions The physical nature of van der Waals interaction can be explained in terms of quantum mechanics. A comprehensive presentation of the theory is not possible within the volume foreseen for this book. It is also not necessarily needed, because rigorous quantum mechanical calculations of van der Waals interactions in protein molecules as a whole cannot be done. Usually van der Waals interactions are substituted by appropriate functions which successfully can be employed for the
Van der Waals Interactions
29
analysis of a wide spectrum of phenomena. Therefore, only the main points of the theory will be presented in order to acquire an idea of how these interactions arise. 2.2.1 Dispersion forces Long after van der Waals postulated Eq. (2.2), the mechanism of the attractive forces this equation accounts for was not understood. Attraction between two molecules may occur if they have opposite charges: electrostatic attraction. Observations showed, however, that attraction takes place between molecules which are uncharged. Uncharged molecules may attract each other by dipole-dipole, or dipoleinduced dipole interactions. The latter were used to give the initial explanation of the van der Waals attractive term, but it is not adequate enough because molecules without permanent dipole moment fall beyond this model. Fritz London was the first to propose in 1930 a plausible explanation of the attraction between non-polar molecules. Since then these attractive forces are called London dispersion forces, or dispersion forces. Let us consider two non-polar molecules. Even if they have no permanent dipole moment, instantaneous dipole moments can arise due to the fluctuation of their electron clouds. This means that at any instant each of the molecules can have a non-zero dipole moment |x(t)*. When the molecules are separated by a large distance, the time average of (i(t) is zero. Reducing the distance, the molecules begin to interact through their instant dipole moments. As far as the dipole moments at certain moment of time are a result of fluctuations, their orientation is arbitrary. This means that the interactions between the two molecules may be attractive or repulsive, depending on the temporary orientation of their dipole moments. In order for permanent attractive forces to arise, a synchronised fluctuation of the electronic clouds of the two interaction molecules is required. The forces arising from such synchronised fluctuations are exactly the dispersion forces. Hereafter vector quantities, such as dipole moment, will be denoted by the corresponding symbol bold until otherwise stated.
30
Introduction to Non-covalent Interactions in Proteins
The interactions arising from synchronised fluctuations of the electron clouds of two atoms or molecules can be expressed by instant dipole-induced dipole interactions. This classical physical approach failed to explain a number of observations. For instance, it has been found that dispersion forces decrease with temperature slower than the prediction based on the classical approach.
(A)
(B)
(C)
^
^
+q($—b-q
+qQ^&
Figure 2.1 A simplified geometrical presentation of the dispersion forces. (A) Individual non-polar atom. The electron cloud has spherical symmetry completely compensating the positive charge of the nucleus. (B) Two identical atoms or molecules with co-linear dipole moments. (C) Configuration of two interacting linear oscillators separated by a distance r. The positive poles, +q, correspond to the nuclei, whereas the negative poles, -q, arise from the displacements x{ and x2 of the electron clouds.
We are not going to analyse in detail the differences arising form the classical physical and the quantum mechanical approaches to the problem. Rather, our goal is to get an idea about the origin of dispersion forces and about how their main features are described. Also, we would like to have an expression for the interaction energy between two nonpolar atoms or molecules. For this purpose we will consider a simple model describing the interactions between two atoms in terms of the interactions between two linear oscillators (Fig. 2.1C).
Van der Waals Interactions
31
Assume that the positive charges (the nuclei) are fixed at a distance r, whereas the negative charges oscillate along the axis connecting the nuclei. Let us first investigate the behaviour of one of the oscillators in isolation. The magnitude of its instant dipole moment is // = xq, where x is the displacement of the negative charge. As long as we describe a single oscillator, the use of indexes for x is not needed (in Fig. 2.1C x could be either Xj or x2). It is also good to remind that the average displacement is zero, x = 0, giving zero dipole moment (Fig. 2.1 A). If x > 0 a quasi-elastic force acts on the charge tending towards reduction of the displacement. The magnitude of this force is F = -kx, i.e. the larger the displacement x, the larger the force F. The multiplier k is the coefficient of quasi-elasticity. Due to the force F the charge oscillates with angular frequency 0) = , / - , m where m is the mass of the oscillator. From F=
(2.3)
-— dx
for the potential energy, U, of such an oscillator one obtains U=-kx2. (2.4) 2 The oscillator obeys the principles of quantum mechanics and its motion is described by the equation of Schrbdinger: dt 2m where h = hl2n , h is the Planck constant, and V'is the wave function. The symbol A = V 2 is the Laplace operator (see Appendix C). If U is independent of time, the wave function can be written as fixj) = yAx)(p{t). For our purposes the equation for y/(x) is of interest: n2d2y/
+(E-U)V 2m dx1
= 0.
(2.5)
Introduction to Non-covalent Interactions in Proteins
32
This is the amplitude equation of Schrodinger and E is the total energy of the oscillator. The potential energy of the oscillator is defined by Eq. (2.4). Substituting it in Eq. (2.5) and performing simple reorganisation one obtains 9
9
ft 9 11/ 1 , 2 ^ ,~ r\ ~ + -kx \f/ = Ey/. (2.6) 2m 3 X 2 2 This equation has solution only for discrete values Eu E2, E3, ... which in our case is En=haKn + -), where n = 0, 1, 2, ... is the quantum number. The minimum energy of the oscillator is then
E0=U(O.
(2.7)
The above solution is for a single oscillator and the E0 corresponds to its ground energy level {n - 0). The problem of describing of two interacting oscillators is somewhat more complicated. We will only mention the main points of the derivation of the expression giving the interaction energy between the two oscillators. Let us assume that each of the oscillators has a potential energy 1 2 Ul= — kx] mdU2=
1 2 — kx2 ,
respectively. Let us also assume that the oscillators interact only through the electric dipoles caused by the displacements x\ and x2 of the charge q (Fig. 2.1). The electrostatic interactions between the charges comprising these dipoles can be calculated by means of Coulomb's law: 2 u
\,2 ~
47te0r
2
2
+"
47l£0(r + x 2 - * i )
2
47te 0 (r-Jtj)
47te0(r + ;e2)
or £/u=-^-(l +
l
r
l
1
-
r
—).
r
r
(2.8)
Van der Waals Interactions
33
Constructing the formula for £/1>2 we have introduced the synchronous oscillation of the dipoles. Making use of the series expansions 1
1-8
• =.l .+ .8 +,82z + . . . ,
1
1. - ..5 +,522 - .
1+8 we can appropriately substitute the denominators in the expression for f/i,2. Also, taking into account that (xu x2) « r, we can reduce the above series to the term 82. For the final expression of the potential energy one obtains _
xlx2q2 27i8 0 r
where 80 is the dielectric constant of vacuum. Note that U\t2 is not the interaction energy between the oscillators. It is part of the potential energy of the pair: , 2
, 2
2 2 2
kx, kx2 x, x2q 1 2 U = U} + U2+U,2=—^+ —L \ . ' 2 2 2m0r3 The energy of a pair of oscillators, E\ is given by the equation of Schrodinger [see Eq. (2.6)]: h2 d2w %2 d2w , 1 , 22 1 , 2 2 *i*2?N ^ ,*, ^ -——f-^TT + (2- k x +2 - k x - ^ - ^ j ) ¥ = E¥ . (2.9) 2m dx{ 2m dx2 27ie0r After an appropriate change of the variables: 2
x
s =-7=Ol
+X
2)
2
andx
a
=-r={xx-X2)
from which also follows 2
ks=k(l+ Eq. (2.9) becomes
q
2
J and ka=k{\
27tE0Ar;i
^-—) , 27t£0&rJ
(2.10)
34
Introduction to Non-covalent Interactions in Proteins ti2 yd W
2m dx{
d W\
1 .,
3^2
2
2
,
2N
r-'
(2.11)
This is an equation for two non-interacting oscillators, s and a, each with potential energy 1 Us=~ksxs
1 2 andUa=-kaxa,
2
as well as with angular frequencies cos and coa, respectively. This allows Eq. (2.11) to be split into two equations of the type of Eq. (2.6). Now, it becomes clear why the above change of the variables was qualified as an appropriate one: by means of this change we were able to reduce a complicated differential equation to equations, whose solutions are known. Thus, instead of solving the equation, we can use the available solution, which for the minimum energies (the quantum number, n = 0) of the oscillators s and a is given by Eq. (2.7): E
0,s = ~%®S
a n d E
0,a=-h®a-
The total energy is then E'o=-^s+-nma.
(2.12)
From Eqs. (2.3) and (2.10) for the angular frequencies one obtains coc =,!-£-=©I1 + - q m 2%E0kr
and co„ = J—- =co 1 2ne0kr3 m
Substituting the above expressions for cos and cos in Eq. (2.12), for the ground level energy of the pair of interacting oscillators one obtains f
E =-h((os + c o J = —
i
^
1+-
1
2lZE0kr
i
;
A
+ 11- q 2nen0" kr
We will use again a series expansion, by means of which the terms in the brackets can be simplified: . . t , t(t-l) t . t (1 + y) =1 + — y-\ 1! 2! n
2^
y +...
Van der Waals Interactions
35
Substituting t with 1/2 and y with the corresponding ratios under the square roots, after some algebra of the terms in the brackets, finally one can write 2(47ie0)2 JfcV Our task is to evaluate the interaction energy between the two oscillators. This energy is equal to the energy of the interacting pair minus the energy of the two oscillators taken separately: Uvdw = E' (Ei + E2). From the expressions for E\ Eq. (2.13), and Eu E2, Eq. (2.7), one obtains aA
hm
ud=
r 2
1
2—r2
(2 14)
-
6
2(47l80) k r In an external electric field, non-polar molecules polarise due to the displacement of their electron clouds (see Chapter 5 and the text around Fig. 5.8). The larger the field, the larger are the displacement and the thereby induced dipole moment. The magnitude of the resulting macroscopic dipole moment is then a function of the magnitude of the electric field: jU = aEel. (2.15) The quantity or is a characteristic of the material and is called polarisability. Polarisability is a measure of the extent of the charge displacement in the atoms or in the molecules of a given material caused by an external electric field. Large values of a indicate that the forces resisting the displacement are weak. One speaks then of high polarisability. It should be noted that the introduction of or was made in a somewhat simplified way. In general, a is a tensor quantity, which means that for some materials it is not isotropic. Let us return to the interacting oscillators. The magnitude of the force tending to reduce the charge displacement is F = -kx. This force is opposite to the external electric force inducing the polarisation (the charge displacement). When the two forces are compensated F = -qEf. Taking into account Eq. (2.15) and substitute JU = xq, one obtains
36
Introduction to Non-covalent Interactions in Proteins
-kx =
q x
a or for the polarisability oc =
(2.16)
k
This is the square of the middle term of the right hand side of Eq. (2.14). Substituting it in there the expression for the interaction energy between two oscillators becomes: h(o 2(4;ie 0 ) 2
a
^=-T^V4-
(2 17)
-
The above expression gives the wanted energy of interaction between two identical non-polar atoms or molecules, i.e. the van der Waals interaction energy, Ud, or the magnitude of the dispersion forces. During the derivation of the expression (2.17) we did not discuss the form of the wave function *F. We also skipped the solution of the amplitude equation of Schrodinger, as well as the algebraic manipulations to achieve Eq. (2.13). These lie outside the scope of our considerations. Instead, we have focused on the basic ideas and their implementation to achieve the quantitative description of the dispersion forces. Let us now briefly inspect Eq. (2.17). All quantities in the right hand side are positive by definition. This means that Ud is always negative, meaning that the dispersion forces are always attractive. The value of Ud rapidly decreases with the distance with order of magnitude r~6, which makes dispersion forces short range forces. The strength of the dispersion forces depends on the type of the interacting atoms or molecules via the polarisability, a. As seen from the Eq. (2.17), the larger a, the larger Ud. According to expression (2.16) a larger value of a can be achieved by increasing the value of the charge displaced or by reducing the quasi-elasticity coefficient, k. Of course, both changes, individually or in parallel, lead to the increase of the induced dipole moment. This suggests that atoms and molecules with larger nucleic charge, q, and larger radii should have higher polarisability, and hence stronger mutual attraction due to dispersion forces. We have already
Van der Waals Interactions
37
mentioned that the only appreciable interactions between non-polar atoms or molecules in liquid state are van der Waals interactions. The difference in boiling temperatures of liquids of non-polar molecules is expected then to reflect the difference in the kinetic energy needed to prevail the intermolecular attraction. Indeed, the boiling points of the rare gases clearly correlate with their masses (see Table 2.1). Table 2.1 Boiling points of rare gases1. Element Helium Neon Argon Krypton Xenon Radon
Boiling temperature -268.6°C -245.9°C -185.7°C -152.3°C -107.7°C -61.8°C
The helium molecules have less electrons and less displacement of the charges than the molecules of neon, argon and the other rare gas molecules. That is to say the polarisability of helium molecules is lower than that of all other molecules listed in Table 1.1, hence helium is characterised by the weakest dispersion forces (weakest van der Waals interactions). Our considerations were made for the case of interactions between two identical molecules. If more than two molecules are sufficiently close to each other, the synchronous displacement of the electron also occurs and dispersion forces arise on the basis of the same mechanism. The approximation used to describe a heterogeneous polyatomic system is given in the last section of this chapter. 2.2.2 Dipole-dipole interactions Van der Waals interactions occur not only between non-polar molecules. If the interacting molecules have permanent dipoles, the effect of dipoledipole interactions is added to that of the dispersion forces. In many textbooks dipole-dipole interactions are included as a part of van der
38
Introduction to Non-covalent Interactions in Proteins
Waals interactions (see for instance the extensive textbook of Berghethon2). Let us consider two molecules with dipole moments |Xi and |X2. Assume that the two molecules are at distance r and are free to rotate, i.e. to change the mutual orientation of their dipoles. Without constricting the generality, we will consider \i\ fixed and (i2 free to rotate. It can be shown that the potential energy of a dipole (dipole 2) in an external electric field (the field of dipole 1) can be given as [/u=Ern2=-£1;u2cos6>,
(2.18)
where 6 is the angle between the dipole axis and the direction of the electrostatic field at the point of ii,2 (see Appendix B). According to Eq. (2.18), the potential energy of dipole 2 depends on its orientation via angle 6. In order to determine the interaction energy between the two dipoles, we need to find the preferred orientation of dipole 2, in the field of dipole 1, i.e. to find an average of the component of the dipole 2 in direction El5 // 2 cos# This means that we have to find the probability of the dipole oriented between 9 and dd and to average it over all possible values of 6.
Figure 2.2 Orientation of the dipole /u2 towards the electrostatic field with direction Eh
As illustrated in Fig. (2.2), the dipole can rotate around the axis E! without changing the angle 6. Thus, the probability to find the dipole in
Van der Waals Interactions
39
the orientation between d and dd is proportional to the area of the annular ring with radius 7?sin<9and thickness Rdd. Taking into account that dd is very small, this area can be approximated to the area of a cylinder: dA = 2;zRsin6>x Rdd. If all orientations are equally probable, for the orientation with angle 6 one obtains pg =
dA
1 - = — sin 6d9.
2
ATIR
2
2
The denominator AuR is the area of the sphere with radius R (Fig. 2.2). The orientations are, however, not equally probable. As we have already pointed out when discussing Eq. (2.18), the different orientations of the dipole (dipole 2) in the field dipole 1 have different potential energies. This results in different weights of the orientations. The magnitude of these weights is given by the expression (see Appendix A) n2£,cos6>/£Br where kB is the Boltzmann constant and T is the temperature*. The wanted average is given by the Boltzmann weighted sum [Eq. (A.26), Appendix A]: 271
^2
cos6)(e^ c o s 6 ''' k T ')-sin6d6
f0i 2 o
2
[(e^coseikT)-sm6d8 oJ
2
Due to the symmetry in respect to the axis Ei the integrals in the above expression split into two identical integrals as illustrated below: 271
-
j(ef'^cose/kT)-smede 2 o
.
7
1
-.
2J(e'J^co'ie"cT)-sm0d0,
= o
2
so that one can write
Note that this is the thermodynamic temperature (see Appendix A).
40
Introduction to Non-covalent Interactions in Proteins
j 0 i 2 cos 9)(e^cosdlkT)
sin QdO
H2 =~
•
(2-19)
ElCOse/kT
](e^ ) sin 8d6 o Equation (2.19) gives the wanted average value of the component of the dipole 2 in the direction of the field, Ei. This expression is not convenient for a direct use. In order to reduce it to a simpler one, we will employ the strategy already used in the previous section, namely we will make an appropriate substitution: dx = -sinddd. From this substitution also follows that x = cos#. We have to apply this substitution to the integral limits as well: for 6=0, x = cos0=l and for 6=%, x = cosft = - 1 . After this substitution Eq. (2.19) can be written in a more compact way: l
\xe^x/kTdx
H2 1^2=—^
•
(2-20)
xlkT
\e^
dx
-l
For the moment, we will write a = jU2Er, which makes the above integrals take the familiar form l
jxeaxlkTdx
H2
*2=-f
• \eaxlkTdx
-l
The solution of the integral in the nominator is e a \xeaxlkTdx = ^-(a-\)-^{-a-\)=-(e _, a1 a1
a
+e~a)-\{eal a
whereas the solution of the denominator is \eaxlkTdx = — - — = -{ea * a a a
-e~a).
-e~a),
Van der Waals Interactions
41
Substituting the results in the expression for p,2 one obtains ,ea+e-u
1
^ 2 = M -ea— — — a) • -e~" The term containing the exponents can be expanded in series: a
—a
.
e +e ea-e-a
-t
3
I + -a •= — a
a 45
3
which allows us to write .a a3 2 3 45 If we assume that the interaction between the dipole and the electrostatic field is weak in comparison to the thermal energy, E^ju « kBT, we can ignore the term a3145 because it is essentially smaller than the term a/3. Thus, after substituting back a = juEx for the average value of jU in the direction of the electrostatic field one obtains -E*&
u Uo —
.
3kBT Equation (2.18) can be written now as follows: F I I Uh2=-Elh=%^. 3kBT The electric field of dipole 1 in direction #(see Appendix B) is 2
2
El=—^-(3cos
e
2
+ l)V2.
4jte 0 r
Substituting this in the expression for £/i?2 one obtains Ul2=
^4—(3cos2e
+
l).
The potential energy f/ii2 given by the above equation is for a fixed direction of the field of dipole 1. If we leave this dipole to rotate we have to take the average of the f/i,2 over all possible values of cos2#. Here we do not need to have weights on the different orientations because they have already been taken into account by the calculation of ju2 . After
42
Introduction to Non-covalent Interactions in Proteins
substituting the average value of cos20 = 1/3 in the equation for £/i,2 we finally obtain U^=U„=- l2/Xlf
.
(2.21)
Equation (2.21) is the expression for the dipole-dipole interactions. If we compare the expression for Ud-d with Eq. (2.17) we will notice some similarities. The most important one is that both potential energies decrease with the distance as r-6. Both, the dispersion forces and the dipole-dipole interactions are short range interactions. Equation (2.21) has a peculiarity which is not directly seen and becomes clear only if we know the way of its derivation. Equation (2.21) reflects the case when the dipoles are free to move. This makes Eq. (2.21) most appropriate for liquids, where the molecules are free to move and rotate. In proteins the polar atoms and groups are often fixed, say by hydrogen bonds, so that the freedom to change the direction of their dipoles is restricted. In such cases, the interaction between two polar groups is better calculated by Coulomb's law, taking into account the partial charges forming the dipoles explicitly. 2.2.3 Dipole-induced dipole interactions When a molecule is in an electric field a dipole moment can be induced. The magnitude of the induced dipole moment is given by Eq. (2.15): The interactions between two dipoles can be calculated by means of Eq. (2.18): Ud-ind
=
~EMind >
where E is the magnitude of the electric field created by a permanent dipole. The term in Eq. (2.18) containing cos6 is equal to 1, because //,„rf is induced dipole and 6 is then zero. Combining the above two expressions one can easily obtain Ud-ind=-aE2-
Van der Waals Interactions
43
However this formula is incomplete. The induced dipole results from a separation of positive and negative charges. This obviously does not occur spontaneously, so that some work needs to be done by the electric field. The energy corresponding to this work is the induction energy, Uind. The correct expression will be then: Ud-ind=-aE2+Uind.
(2.22)
The work necessary to move a charge to an infinitely small distance, dx, is dW = Eqdx = Ed/u , where djU = qdx is an infinitely small dipole. The work, respectively the energy, to induce a dipole with magnitude jUind can be obtained by integrating the above equation:
uind=w = ']W=a y ^2a . ^ - 0 ^ Mind
o
Mind . ,
,,2
„ j-,2
o
In order to perform the integration, we have first substituted E with its equal using Eq. (2.15). In the uppermost right term we have substituted fiind back with E using the same Eq. (2.15). Thus for Eq. (2.22) one obtains Ud-ind=-aE*+^
= -!«£>.
One needs only to substitute E2, which can be done by means of Eq. (B.10) deduced in Appendix B:
Ud.ind=-"f2
6 (3cos
2
fl + l).
If the dipoles are free to rotate the above relation becomes
ud-ind=—W^r-
( 2 - 23 )
167tV r 6 Equation (2.23) gives an expression for dipole-induced dipole interactions. We notice that, similarly to dispersion forces and dipoledipole interactions, the dipole-induced dipole interactions sharply decrease with the distance. The fact that all of these interactions, the
44
Introduction to Non-covalent Interactions in Proteins
dispersion forces, dipole-dipole interactions and dipole-induced dipole interactions decrease with the same order of magnitude, r"6, is a good reason to unite them into a single term, namely, to the van der Waals interactions. We should repeat here that such a merge is appropriate for liquids. For proteins it is inconvenient because of the restricted freedom of reorientation of the dipoles. 2.2.4 Repulsive interactions The parameter b in Eq. (2.2) reflects the fact that gas atoms or molecules have a finite size. We have already evaluated the connection of b with volume belonging to a single atom or molecule which is not available for the rest. This was done assuming that gas molecules are hard spheres. This assumption seems to be obvious, since two objects cannot occupy the same room at the same time. However, this is true only for macroscopic objects. If the objects of interest are atoms, this principle is violated. The formation of a chemical bond is such a violation, because the electron orbitals overlap. From quantum chemistry we know that this overlap occurs if the electrons have opposite spins, i.e. different quantum numbers. It can be shown that the probability for overlapping of electron orbitals with identical quantum numbers is zero. This is known as the principle of Pauli: two electrons in a system cannot exist if they have identical quantum numbers. In other words, two electrons with identical quantum numbers cannot occupy the same place at the same time. For instance, helium atoms have in their orbitals two electrons with spin 1/2 and -1/2. According to the Pauli principle no third electron can be introduced because its quantum number would coincide with one of those already present. As a result a repulsive force arises.
Van der Waals Interactions
0.50.43 0.3-
\ \ \
I 0.2-
\
0.10.04
0
45
\ 1
1 —
1 2
i
3
,—
4
r[k] Figure 2.3 Radial distribution function of the probability to find the electron at a distance r from the nucleus of the hydrogen atom. This function corresponds to n - 1 and / = 0.
The description of repulsive forces is a subject of quantum mechanics and is beyond the scope of this book. For the purposes of our considerations, we need just a brief overview of the origin of the distance dependence of these forces. The possible states that an electron can occupy can be found as solutions of the amplitude equation of Schrodinger [Eq. (2.5)]. The solution of this equation is usually presented as the product of three wave functions in spherical coordinates, R(r)&(0)
where a0 = 0.529 A is the Born radius of the electron. It follows that from a certain distance the probability decreases exponentially when r increases (Fig. 2.3). If two atoms come close to each other the
46
Introduction to Non-covalent Interactions in Proteins
probability to find two electrons with the same quantum number increases exponentially. Following the Pauli principle, the repulsive overlap forces also increase exponentially. 2.3 Potential Functions for Application in Proteins At the beginning of Section 2.2 we pointed out that van der Waals interactions are usually considered to be short range attractive interactions. We violated this convention by including the repulsive interactions as a part of van der Waals interactions. This is not only because the repulsive forces are taken into account in the very van der Waals equation for real gases [Eq. (2.2)]. The potential functions used for calculation of the inter-atomic van der Waals interactions in proteins include both the attractive and the repulsive terms. In this way, one can summarise van der Waals interactions as follows: UvdW ~~Ud
~Udip ~Uind
~U repulsive-
The attractive terms decrease with the distance as r~6 which is a reason to unite them into one attractive term. The general expression for van der Waals interactions includes the dispersion and the repulsive overlap forces: Uvdw=-^-
+ Be-f{r\
(224) r where A and B are coefficients depending on the type of the interacting atoms. There are a number of potential functions that follow the form of Eq. (2.24). Such a potential function is the Buckingham potential: UBm=-^-[e r™ ~ ^ - ( % 6 ] (2.25) a—b a—br where or is a hardness parameter (usually a= 12), rm is the distance between the atoms corresponding to a minimum energy of interaction, and £is the magnitude of this energy (Fig. 2.4). The potential based on Eq. (2.24) is characterised by certain inconvenience. During molecular dynamics simulation, for instance, r may become very small so that the interacting atoms may collapse in non-physical attractive interactions
Van der Waals Interactions
47
(see short overview of this method in Chapter 7, Section 7.3.1). Another inconvenience is the time consuming calculation of the exponential term. U.D-
0.5
0.4-
0.4
^0.3-
0.3
lo,-
0.2
t/N-C
Uc-c
O
0.1
^0.103
^
0.0
0.0e
-0.1
-0.1rm
-0.22
-0.2
i
3
4
5
2
3 r
5
6
7
r[A]
r[A]
Figure 2.4 Potential functions for inter-atomic interactions. Left: Buckingham potential. Right: Lennard-Jones potential.
An alternative approach for calculation of van der Waals interactions is the use of the so-called (6-12)-potentials. Most popular among them is the Lennard-Jones potential: Uu
(Zk)i2_2(Zk)6]. r r
(226)
The advantages of this potential are obvious. The first term in the left hand side of the Lennard-Jones potential is just the square of the second one, which makes the calculations essentially faster than in the case of Eqs. (2.24) and (2.25) containing exponential terms. As illustrated in Fig. 2.4, the forms of the two potentials are very similar. The distance between the atoms at which Uvdw = 0 determines the sum of the van der Waals radii of the interacting atoms. Obviously, if the atoms are identical (dashed curve in the right hand side panel of Fig. 2.4) the van der Waals radius is half of this distance. There is another definition of the van der Waals radius: That is, the sum of the van der
48
Introduction to Non-covalent Interactions in Proteins
Waals radii of two interacting atoms is determined by rm. The difference between these definitions is that the former defines the van der Waals radii as the closest distance of non-repulsive interactions, whereas the latter uses the distances at which the interactions are most attractive. 2.4 Approximation for Polyatomic Systems The parameters £ and rm used in the potential functions given by Eqs. (2.25) and (2.26) are determined empirically for the interactions between two identical atoms. Consider a system of three atoms A, B, and C (Fig. 2.5). Let us temporarily assume that the three atoms are identical. The presence of the atom C causes dipole induction on both atoms A and B which influences their interaction. The magnitude of this influence depends on the position of atom C in respect to A and B. The same is valid for the A~C and B-C pairs. In principle, the interactions between the three atoms can be calculated. Applying certain approximations, the interactions between a larger number of atoms can also be calculated. This however is not necessarily needed. The values of e and rm are extracted from experimental data which already contain the influence of the environment, i.e. the average influence arising form the presence of other atoms in the vicinity of the interacting pair is included. Therefore, it is a satisfactory approximation to assume that in a polyatomic system, van der Waals interactions are additive.
Figure 2.5 System of three different interacting atoms.
In general the atoms A, B, and C differ. To describe the interactions between these atoms three sets of parameters s and rm are needed to describe this system. The number of parameters rapidly increases, as for
Van der Waals Interactions
49
N interacting atoms it is N(N-l)/2. Taking into account the variety of atom pairs that occurs in proteins, the number of these parameters becomes very large and computationally inconvenient. A reasonable alternative is to use only the parameters determined for the interactions between atoms of the same type. Such a set of parameters is used, for instance, in one of the most widely spread programmes for molecular mechanics and molecular dynamics simulations, CHARMM (Table 2.2). Table 2.2 Parameter for a few atom types used in the CHARMM3. Atom Carbon, part of carbonyl groups from the protein backbone Carbon, part of aromatic groups Aliphatic carbon (-CH3) Hydrogen, part of (-CH3) Oxygen, part of carbonyl groups Ammonium nitrogen
S;
-0.1100 -0.0700 -0.0800 -0.0220 -0.1200 -0.2000
rmJ2 2.0000 1.9924 2.0600 1.3200 1.7000 1.8500
The problem of evaluating the van der Waals interactions between atoms of different type is approached with the assumption that the parameters needed to construct the potential function of interaction between the atoms i and j have values between those determined for the interactions between the atoms of type i and between the atoms of type j . Based on this assumption we can employ the so called Lorentz-Berthelot mixing rule:
m,ij ~ ^ m,i
m,j '
"
In this case Eq. (2.26) has to be written as follows: [/ =e,[(-^)12 r
U
-2(^)6].
(2.27)
r
iJ
Two potential functions based on the parameters given in Table 2.2 are illustrated in the right hand side panel of Fig. 2.4. These are the potential functions for interactions between two carbon atoms and between nitrogen and carbon atoms.
50
Introduction to Non-covalent Interactions in Proteins
References 1.
2. 3.
Perrin DD, (1970) Dissociation constants of organic bases in aqueous solution. In: Weast RC (ed.), Handbook of Chemistry and Physics, 51st edn. Cleveland, Ohio: Chemical Rubber Co. Berghethon PR, (1998) The Physical Basis of Biochemistry New York: SpringerVerlag. MacKerell AD, Bashford D, Bellott M, et al., (1998) All-atom empirical potential for molecular modeling and dynamics studies of proteins. /. Phys. Chem. B, 102: 3586-3616.
Chapter 3
Hydrogen Bonds
The hydrogen atom is the simplest one, consisting of one proton and one electron, thus able to create only one covalent bond. It is known, however, that hydrogen atoms can participate in the formation of a second bond to an atom that has at least one lone electron pair. This bond is called hydrogen bond. Usually hydrogen bonds are denoted as D-H - "A which indicates that the hydrogen atom is covalently bound to the atom D and favourably interacts with the atom A. The atom D is called proton donor, whereas the atom A is called proton acceptor. The donor and the acceptor must be electronegative, i.e. hydrogen bonds occur when two electronegative atoms compete for the same hydrogen atom. 3.1 Nature of Hydrogen Bonds Not all bonds of type X-H (X is an unspecified atom) can be considered as D-H. For instance, the molecules of methane, CH4, practically do not form hydrogen bonds, whereas water molecules do. The fact that some compounds containing X-H group can make hydrogen bond whilst others cannot, suggests that hydrogen bonding is determined by the properties of the proton donor and the proton acceptor. Another feature of the hydrogen bonds that also depends on the properties of the proton donor and the proton acceptor is that the distances between the atoms forming the triad D-H"-A are shorter than that expected for van der Waals interactions and in some cases is even close to the corresponding covalent bond.
51
52
Introduction to Non-covalent Interactions in Proteins
3.1.1 Proton donors, electronegativity Some atoms can attract one additional electron and in this way form negative ions. The energy related to this process has two components. The first component is defined as the energy, £, equal to the difference between the ground level of the energies of the neutral atom and the corresponding negative ion. The second component is the ionisation energy, J, of the atom donating the electron. By means of these two components one can introduce a quantity characterising the ability of the atoms in a given molecule to attract electrons. This quantity is the electronegativity. Let us consider two atoms, A and B. The energy needed to transfer an electron from atom B to atom A is JB - £A- The energy of transfer of an electron from atom A to atom B will be JK- CER. If % - <£A = J A - CEB, the two atoms have equal electronegativity. If JB - <EA < JA - £B, the transfer of the electron from atom B to atom A is energetically favourable and hence, occurs spontaneously. In order to put the properties of atom A on one side of the above inequality, we will rewrite it as JA+ <EA > h + ^BFor atoms with equal electronegativity one can also write JA + <£A = JB + "EB. We see that atom A is more electronegative than atom B if the sum of the energies JA and <EA is greater than the sum of these energies for atom B. Thus, the sum JA + (£A can be used as a measure of the magnitude of electronegativity. For this purpose it is customary to use the semi-sum introduced by Mulliken: 2 In Fig. 3.1 the electronegativities of some atoms are schematically compared. For two chemically bound atoms, the difference between the electronegativity shows not only the direction of the spontaneous electron shift, but also its magnitude. If two atoms with similar electronegativities form a chemical bond, the electron shift will be small. This is, for instance, the case of the chemical bond between H and C atoms in the methyl groups of the aliphatic side chains in proteins. If the two atoms differ essentially in their electronegativity, due to the large electron shift the chemical bond becomes ionic in character (ionic bond).
Hydrogen Bonds
53
This is the case of the bonds between alkalis and halogens, NaCl or KCl for instance. H
e C #
N ®
Na ®
O ®
F @
CI #
1
1
1
1
1
2
3
4
X
[eV\
Figure 3.1 A comparative scheme for the electronegativity of some atoms. The direction of the spontaneous electron transfer is indicated by arrow. Electronegativity is commonly accepted to be measured in electron-volts: 1 eV= 1.6 x 10~19 / = 23.06 kcal/mol.
Figure 3.2 Partial ionic character of the covalent bonds as a function of the electronegativity difference, Ax, of the participating atoms. Bonds involving hydrogen atoms and most seen in proteins are given in solid circles for comparison.
The difference between the electronegativities of the interacting atoms, Ax, turns out to be a convenient parameter for analysis of the ionic character of the chemical bonds. Let us use it to formally evaluate
54
Introduction to Non-covalent Interactions in Proteins
the ionic character of a few X-H bonds. For this purpose we can employ the relation proposed by Pauling />(%) = 100(1
-e-(Ax/2)2),
where P(%) is the partial ionic character of the bond. As seen in Fig. 3.2, the parameter P{%) for the C-H bond is close to zero, meaning that the abstraction of the electron by the carbon atom is negligible (see also Fig. 3.1 for comparison of the electronegativities). The partial ionic character of the bonds N-H and O-H is however significant: between 20 and 30%. Due to the large difference in the electronegativities between the partners in these bonds, abstraction of electron density occurs towards the atoms with higher electronegativity, i.e. toward the nitrogen and oxygen atoms. In this way the charge of the nucleus of the hydrogen atom becomes uncompensated, making the bond polarised. Due to this property, very often the hydrogen atoms participating in such bonds are called polar hydrogen atoms. Obviously, in such a polarised bond the atom D is characterised by a negative partial charge. It is worth noting that the partial charges on D and H do not necessarily compensate each other. Thus, for instance, the partial charge distribution in the hydroxyl group of threonine side chains in proteins is 8+(H) ~ 0.4 p.u., whereas 8~(0) between -0.6 and -0.7 p.u.*. The variation of the partial charge values arises from the different approximation used for their determination. This problem is out of the scope of our considerations. It is important for us to realise that the polarity of the D-H bond is not a result of the electronegativity difference between the atoms D and H alone, but depends also on the electronegativity of the atoms chemically bound to D. In the example above the oxygen atom is bound to a carbon atom which has a lower electronegativity (see Fig. 3.1). As far as we are interested in bonds involving hydrogen atoms, we put stress on the conclusion that the polar hydrogen atoms are characterised by a substantial positive partial charge, which stimulates the interactions of the D-H group with negatively charged atoms. Because the nucleus of the hydrogen atom consists of one proton, the atoms D are called proton donors. For the same reason, hydrogen atoms p.u. (proton unit) equals to the electric charge of one proton. 1 p.u. = 1.602 10
19
C.
Hydrogen Bonds
55
participating in hydrogen bonds are called just protons. We will use the terms proton donor and proton acceptor further in the text, however we shall avoid the use of the term proton for the hydrogen atom. 3.1.2 Proton acceptors Electronegativity increases with the number of electrons in the outer electron shell, a relationship that is explained by quantum chemistry. There is also another correlation important for our considerations, namely that the atoms, such as N, O, S, have lone electron pairs in their outer electron shells. Thus, for instance, the oxygen atom has six electrons in its outer L-shell. Two of them occupy the 2s orbital, whereas the other four occupy the 2p orbitals. In a water molecule these orbitals are hybridised so that all six electrons occupy sp3-orbitals. (A) Is
n K
sp
sp
sp
sp
\ 1
t^ t i L
(B)
35JH-D) ls(H)
ls(H)
Figure 3.3 (A) Electronic structure of the oxygen atom with hybridised outer shell electron orbitals. (B) Oxygen sp3-orbitals and the hydrogen atoms (ls-orbitals) in the water molecule. The quantity - 8 is the partial (negative) charge caused by the lone electron pair. The positive partial charge, +8, on the hydrogen atoms results from the abstraction of the hydrogen electron by the electronegative oxygen atom. The orientations of the D-H groups from other compounds are marked by arrows.
56
Introduction to Non-covalent Interactions in Proteins
As seen in panel A of Fig. 3.3, the outer electron shell in ground state has two lone electron pairs. According to the principle of Pauli the lone electron pairs do not form covalent bonds. They, however, can play a role of negatively charged loci which can attract positively charged counterparts. Such counterparts can be molecules containing a D-H group, for instance, a hydroxyl group (O-H) or just another water molecule. Due to the electrostatic attraction the deshielded hydrogen nucleus from the D-H group approaches the acceptor atoms to a distance smaller than expected from van der Waals interactions. A hydrogen bond is formed. In Fig. 3.3B the lone electron pairs attracting the D-H group to form a hydrogen bond are schematically shown. The aromatic compounds can also serve as proton acceptors. The principle of formation of such hydrogen bond is in general the same. Instead of the lone electron pairs, in this case the proton is attracted by the negative partial charge arising from the Jt-orbitals of the aromatic rings. 3.2 Geometry and Strength of Hydrogen Bonds Usually the proton donors and the proton acceptors are chemically bound to other atoms. When a group of covalently bound atoms exhibits distinct properties we will refer to it as functional group. These properties can have different character. For instance, a functional group is the group of covalently bound atoms which are involved in a chemical reaction. In the context of our considerations, a functional group is also the group that can participate in hydrogen bonding. Such a functional group is the carbonyl group, C=0, where the proton acceptor is the oxygen atom. Another functional group is the e-amino group of lysine, -NH 3 , where each of the N-H bonds can be considered as D-H. The introduction of the term functional group is needed because the geometry and the strength of hydrogen bonds are not defined solely by the atoms in the triad D-H-••A. As far as the atoms and the nature of the covalent bonds within the functional group influence the properties of the proton donor and acceptor, they also influence the characteristics of the hydrogen bonds.
Hydrogen Bonds
57
3.2.1 Directionality A fundamental feature of hydrogen bonds is that they cannot be formed for an arbitrary configuration of the participating atoms and molecules. The basic restraints determining the configuration of a hydrogen bond can be easily identified in the example given in Fig. 3.3. First, a hydrogen bond can be formed only if the proton donor group, D-H, approaches the acceptor from the side of the lone electron pair. Second, in order for a hydrogen bond to be formed, the unshielded hydrogen atom of the proton donor group should face the lone electron pair. The configuration of the triad D-H - "A forming a hydrogen bond is illustrated in Fig. 3.4. The angle 0, determined by the atoms D, H, and A, is called hydrogen bond angle and is one of the parameters reflecting the orientation of the proton donor group towards the proton acceptor. Simple geometrical reasons suggest that the ideal configuration of a hydrogen bond is achieved when 9 = 180°. This configuration of the three atoms, D, H, and A, ensures alignment of their partial charges corresponding to the most favourable electrostatic interactions.
Figure 3.4 Hydrogen bond geometry.
The value of #may, however, significantly differ from 180°. This is illustrated in Fig. 3.5A, where data from crystal structures of various organic compounds are summarised. The most frequently observed values of 6 are between 170-160°. However, the amount of hydrogen bonds with 6 < 150° is not at all negligible. This observation suggests that the electrostatic interactions tending to keep the atoms D, A, and H co-linear are in balance with other forces. Such forces arise from the interactions of the atoms constituting the hydrogen bond with the molecules from the closest surrounding.
Introduction to Non-covalent Interactions in Proteins
58
500-
(A)
T3 400a o X! 300-
o 200•—
1 100180 170 160 150 140 130 120 110 100
e 40 T3 G O X>
(B)
30
o 20 ' o— 10
1 Z
ISO 1"U 100 150 140 130 120 110 100
e
Figure 3.5 Distribution of the hydrogen bond angle, 6, in crystal structure structures'. (A) Inter- and intra-molecular hydrogen bonds. (B) Intramolecular hydrogen bonds. (Data reproduced with permission from the International Union of Crystallography.)
Deviation from the ideal geometry is most often observed when both, the proton donor and the proton acceptor belong to the same molecule. These hydrogen bonds are called intramolecular hydrogen bonds. The distribution of angle f?in 152 different types of intramolecular hydrogen bonds is given in Fig. 3.5B. As seen, the maximum of this distribution is between 140° and 120°. This significant deviation from ideal geometry is mainly caused by the restrictions in the conformations the different molecules can adopt. Insofar as the majority of hydrogen bonds in proteins are intramolecular we shall consider an example of a hydrogen bond between a carbonyl and an imide groups in a tripeptide (Fig. 3.6). The
Hydrogen Bonds
59
tripeptide is bent in a way so that the carbonyl group from the first peptide forms a hydrogen bond with the imide from the third peptide. The angle 6\n this structure is 149°. In principle, the ideal geometry of the hydrogen bond can be approached by rotation around the chemical bonds connecting the peptide groups. However, the ideal geometry is not always the energetically most favourable one. Imagine a rotation around the dihedral angles
Figure 3.6 Hydrogen bond between carbonyl and imide groups in a tripeptide ((5-turn).
The comparison of the two plots in Fig. 3.5 shows that the intcrmolecular hydrogen bonds tend to have a linear configuration, whereas for the intramolecular hydrogen bonds such a tendency is not
60
Introduction to Non-covalent Interactions in Proteins
observed. However, even at the large declination of angle 6 typical for the intramolecular hydrogen bonds, the distance of the hydrogen atom from the line connecting the atoms A and D remains relatively small. At 6 = 140° this distance is less than 0.5 A, which on average is about 15% of the distance between A and D. This estimate allows us, as a first approximation, to consider the geometry of hydrogen bonds close to linear. Let us return to the example given in Fig. 3.6. We would like to see whether there is a direction towards the proton acceptor, along which the hydrogen atom is preferably located. We already know that, in order for a hydrogen bond to be formed, the hydrogen atom has to approach the lone electron pair of the proton acceptor. The location of a lone electron pair is determined by the corresponding electron orbital. Hence, the preferred direction of the hydrogen bond is determined by the orientation of the lone electron pair orbital of the proton donor. One needs two parameters to describe this orientation. In order to define these parameters we introduce a coordinate system as shown in Fig. 3.7.
Figure 3.7 (A) Schematic drawing of carbonyl oxygen sp2 orbitals holding lone electron pairs. The orbitals are in the xy-plane (the z-axis points at the sheet). (B) Perspective projection of a carbonyl-imide hydrogen bond and the parameter angles fa and %H. In comparison to panel A, the coordinate system is rotated around the y-axis by 90°.
When we consider a C = 0 " H - N bond, it is appropriate to choose the coordinate system according to the orientation of the sp2 orbital of the carbonyl oxygen. We choose the centre of the coordinate system to be on the carbonyl oxygen atom and orient it so that sp2 orbitals lie on the xyplane. We also choose the j-axis to be co-linear to the C=0 bond. In this
Hydrogen Bonds
61
way the y-axis is the bisector of the angle between the lone electron pair orbital axes. Because the sp2 orbitals holding the lone pairs are symmetric in respect to the y-axis it is enough to consider only the octant x, y, z > 0, i.e. the octant where 0 < %H, fa < 90°. The position of the hydrogen atom in this coordinate system is defined by the angles Xa a n d fa (and the distance 0 " H , which for the moment is not of interest). These angles are the parameters describing the directionality of the hydrogen bond.
Figure 3.8 Examples for different geometries of the C = 0 " H - N hydrogen bond. (A) Ideal geometry (0H = 30°, Xu = 90°, 0= 180°). (B) The hydrogen atom equally interacts with both lone electron pairs {(fa = 90°). (C) Each of the lone electron pairs is involved in a hydrogen bond (fa < 30°). The atom X is not necessarily nitrogen. (D) The N-H group appears to be a third hydrogen bond partner of the carbonyl oxygen 0& = 0°).
If the geometry of the sp2 orbitals is unperturbed and the hydrogen atom lies on the axis of the lone pair orbital, fa = 30° and ZH = 90°, (note that these values depend on the choice of the coordinate system). If in
62
Introduction to Non-covalent Interactions in Proteins
addition N-H is co-axial with orbital axis, the three parameters (fa = 30°, Zn = 90°, and 9 = 180°, define the ideal geometry of the hydrogen bond. Some typical configurations, including the ideal one, are shown in Fig. 3.8. In panel B of this figure a hydrogen bond is shown at which the hydrogen atom interacts with both electron pairs of the oxygen atom. This configuration differs from the one we called ideal because the N-H bond is not co-linear to any of the lone electron pair orbital axes. However, it can be easily realised that the location of the charges corresponds to the most favourable electrostatic interactions. Hence, hydrogen bonds with (fa - 90° are as optimal as those with (fa - 30°. A reduction of
Hydrogen Bonds
63
position of the hydrogen atom in this interval is mainly determined of the environment.
200
200
a 150 o
150
100
100
3 Z 50
50
•—
4)
t
0J_..
0
-p=al
2\j M 40 50 60 70 80 90
0 10
40 50 60 70 80 90
Figure 3.9 Distribution of angles 2k and $j in C = 0 ' H - N hydrogen bond observed in crystal structures (Data reproduced with permission from the International Union of Crystallography.)
In principle, the influence of the environment on the concrete hydrogen bond can be evaluated and the angles %a and (fa predicted. This is a subject of quantum chemical computations. Instead, we have restricted our considerations to a statistical analysis of large number of experimental data, which allowed us to draw an important conclusion, namely that there is a preferred geometry of the hydrogen bonds and a preferred direction of the axis defined by the atoms A and D, i.e. hydrogen bonds are characterised by directionality. It follows that hydrogen bonds are preferably formed at a limited number of mutual orientation of the functional groups. If the proton donor and the proton acceptor belong to the same molecule this preference is transferred to a limited number of conformations of the compound.
64
Introduction to Non-covalent Interactions in Proteins
3.2.2 Hydrogen bond length Another parameter which characterises a given hydrogen bond is the distance between the hydrogen atom and the proton acceptor, r(A-H> This distance we will refer to as hydrogen bond length. The correlation of r
170 •:
Q>
160"
u
bo J-l
u
> <
150 "
140" 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 r(H-O) A Figure 3.10 Correlation between #and the distance r between the hydrogen and oxygen atoms in inter- and intra-molecular hydrogen bonds1. The rectangles indicate the average value of 6 within intervals of 0.1 A. (Data reproduced with permission from the International Union of Crystallography.)
Hydrogen Bonds
65
Table 3.1 Mean hydrogen bond length, r(A...H), in A. For some hydrogen bonds r(A...H) values are given within intervals illustrating the variability of this parameter. The middle column "row" is subsidiary and is used in the text for pointing to particular hydrogen bonds. Acceptor /c=o
Donor H— N^
ffA-m
Row 1
Acceptor
1.96-2.00
Donor H—N^
ffA-m
1.93
O 2 /c=o
H—N—H H
•r
\
T
1.99
H—N—H H
1.89
3
R
1.87-1.94
R
1.79-1.97
1.84
o 4
\ ^=0
•r
T
1.80
5
HON
\
fr-N—R
1.64-1.70
1.40
6 1.64-1.66 (double bond)
HC
\/° c 1
H20
H,0
HQx/o
c 1
H(\
1
s-
H-OH 2
1.64
H20
H—N^
HzO
H-OH
1.77-2.00
1.40-1.52
H-OH
1.76-1.96
1.30
H-F
1.13-1.70
7 1.64-1.70 (single bond) 8
66
Introduction to Non-covalent Interactions in Proteins
The mean values of r(A...H> of the hydrogen bonds mostly found in proteins, as well as some between atoms with higher electronegativity, are listed in Table 3.1. The hydrogen bond length is mainly determined by the electronegativity of the proton donor and the proton acceptor atoms. Let us take the fluorine atoms as an example. We already know (see Fig. 3.1) that the fluorine atoms have a higher electronegativity than the oxygen and nitrogen atoms. Accordingly, the distance between the fluorine atom as proton acceptor and the hydrogen atom (row 9) is the shortest in comparison with the other hydrogen bond lengths listed in the table. The correlation between the electronegativity and the hydrogen bond length is also seen in the difference between the hydrogen bonds composed by oxygen atoms and those in which one of the partners is a nitrogen atom. The latter are characterised by a larger average hydrogen bond length. One can say that, as a rule, the higher the electronegativity of the atoms, the shorter the hydrogen bond. The value of r(A...H) depends on the nature of the functional groups themselves. If, for instance, the functional group containing proton donor has deficiency of electron density the hydrogen bond length reduces. A reduction of the hydrogen bond length also occurs if the proton acceptor functional group has an excess of electron density. Compare the hydrogen bonds in row 5 of Table 3.1. The hydrogen bond between carbonyl and carboxyl oxygen atoms has a length between 1.64 and 1.70 A. If the proton acceptor group is replaced by a carboxylate (the deprotonated form of the carboxyl group), the hydrogen bond length is reduced to 1.40 A. There is however an important exception which is worth mentioning. The hydrogen bond between water as acceptor and carboxyl group as donor (Table 3.1, row 8) has a shorter distance r(0-H) than in the case where the water molecule binds as proton donor to carboxylate, irrespective of the fact that the carboxylate group is negatively charged. One can say that water is a "good" proton acceptor and "bad" proton donor. Hydrogen bonds at which the hydrogen atom interacts with one proton acceptor are called two-centre hydrogen bonds. Such are the hydrogen bonds given on the left hand side of Table 3.1, rows 1 to 5. Two-centre hydrogen bonds are also the C = 0 " H - N hydrogen bond given in Figs. 3.8A and 3.8B. The carboxylate group can serve as an
Hydrogen Bonds
67
example of another type of hydrogen bonds, namely the three-centre {bifurcated) hydrogen bonds (Fig. 3.11). In this case the hydrogen atom interacts with two proton acceptors. The opposite configuration is also often observed, when one proton acceptor binds two hydrogen atoms. An illustration of such a hydrogen bond is given in Fig. 7.12 (Chapter 7). In fact, we have already considered the physical origin of the bifurcated hydrogen bonds when commenting the geometry of the C=0 •H-N binding (Fig. 3.8). The bifurcated hydrogen bonds are described by two distances r(A...H) which are not necessarily equal. They are somewhat longer than those of the two-centre hydrogen bond. „ — d
(A)
- ,,-H— NH2— R O"
(B) —Q/-
+ \0—-H—NH2—R
Figure 3.11 Hydrogen bond between carboxylate and amino groups. (A) Three-centre (bifurcated) hydrogen bond. (B) Two-centre hydrogen bond.
3.2.3 Hydrogen bond strength It is commonly accepted for hydrogen bonds with energy of formation with a magnitude exceeding -15 kcal/mol to be considered as strong bonds. Moderate or normal hydrogen bonds are those with energies between -15 and - 4 kcal/mol, whereas the hydrogen bonds with formation energy less than - 4 kcal/mol are treated as weak bonds. Strong hydrogen bonds are formed by atoms with extreme electronegativity in which the proton donor group is characterised by a deficiency of electron density and/or the proton acceptor partner has excess of electron density. The deficiency of electron density of the proton donor leads to an additional deshielding of the hydrogen atom nucleus and hence, to an increase of the polarity D-H bond. The excess of electron density of the proton acceptor leads to an increase of its negative partial charge and in this way facilitates electrostatic interactions with the hydrogen atom. Examples of strong hydrogen bond are the bonds formed by fluorine atoms. The bond energy of F~"H-F is
68
Introduction to Non-covalent Interactions in Proteins
about -39 kcal/mol. The hydrogen bond angle, 6, observed in different crystal structures, is between 170° and 180°, whereas the hydrogen bond length r(F. H> is between 1.13 and 1.70 A. If we compare these values with the relation given in Fig. 3.10, we will notice that they fall in the utmost left hand side of the graph. As a rule, strong hydrogen bonds are characterised with geometry close to the ideal one and with hydrogen bond length less than 1.7 A. In the literature, hydrogen bonds with I"(A-H)< 1.4 A are called very strong hydrogen bonds. Strong (or very strong) hydrogen bonds form water molecules when hydrated protons are involved. One of the characteristics of water solutions is the negative logarithm of hydrogen ion concentration, pH. In fact, this is the concentration of the oxonium (also called hydroxonium or hydronium) ions, H 3 0 + . For instance, the dissolution of hydrochloric acid in water leads to lowering of pH, or in other words, to the increase of the concentration of oxonium ions: HC1 + H 2 0 -»• CI" + H 3 0 + . The oxonium ion is formed by direct binding of the hydrogen ion to one of the lone electron pairs of the oxygen atom of the water molecule. This bonding, called coordinated bond, is strongly electrostatic. Oxonium ions form water clusters, which can be expressed by H 3 0 + «H 2 0, where n shows the number of water molecules participating in the cluster. The smallest one, H 5 0 2 + , is illustrated in Fig. 3.12. r(0—H)
"x 1 H
•»
| H r(0—H)
Figure 3.12 Hydrogen bond between oxonium ion and water.
The energy of the hydrogen bond between oxonium ion and water molecule in H 5 0 + is about 36 kcal/mol. This energy gradually increases with the increase of n. The hydrogen bond length, r (0 - H ), varies between 1.22 and 1.34 A, and is comparable with the length of the covalent bond, /•(O-H), which is between 1.10 and 1.22 A. The similar sizes of the two
Hydrogen Bonds
69
bonds suggest that the acceptor and the donor can exchange their roles, or in other words, the proton (the hydrogen ion) can migrate. Moderate hydrogen bonds are formed between neutral proton donor and acceptor groups. An exception is the ammonium ion, NH4+. As seen from Table 3.1 the hydrogen bonds of NH4+, RNH3+, and R2NH2+ with carboxylate have hydrogen bond lengths very similar to those formed with carbonyl groups. A moderate hydrogen bond forms also the ^ N + - H group. Moderate, also called normal, hydrogen bonds are characterised by a more varied geometry than the strong hydrogen bonds. The angle 6 in normal hydrogen bonds usually adopts values between 180° and 140° and in some cases even below this range. Declination from the linear configuration is most pronounced in intramolecular hydrogen bonds, as illustrated in Fig. 3.5B. The hydrogen bond length varies between 1.70 and 2.00 A. The importance of moderate hydrogen bonds evolves from the fact that they are typical for proteins and water. Nowadays, there are experimental methods, such as the neutron diffraction method, by means of which hydrogen atoms can be localised in the protein crystal structures. However, still the prevailing amount of data does not contain information about the hydrogen atoms which makes the determination of the hydrogen bonds geometry in proteins more complicated. If no structural information is available, another parameter for description of hydrogen bonds is used, namely the distance between the proton donor and the proton acceptor. Some values of this parameter are listed in Table 3.2. Table 3.2 Average distances between proton donor and proton acceptors in hydrogen bonds most relevant for proteins1. Bond O-H-0 O-H-O" O-H-N N+-H-0 N-H-O N-H-N
Distance, A 2.70 2.63 2.88 2.93 3.04 3.10
70
Introduction to Non-covalent Interactions in Proteins
The data given in Table 3.2 demonstrate one of the basic features of hydrogen bond, namely that the distances between the atoms linked by hydrogen bonds are shorter than those expected for van der Waals interactions. If we compare the distances given in Table 3.2 with the parameters given in Table 2.2, we shall see that the values of rm, the distance between two atoms corresponding to most favourable van der Waals interaction energy, are essentially larger than the distances between two hydrogen bound atoms. If we solve the Eq. (2.27) for ry at Uy = 0 using the parameters in Table 2.2, for two oxygen atoms we obtain ru=0 = 3.03 A. Between two nitrogen atoms this distance is 3.30 A. As seen, these values are also larger than those given in Table 3.2, meaning that the distances between the proton donor and the proton acceptor are even shorter than those tolerated by van der Waals interactions. The shortening of the proton donor-acceptor distance is more pronounced in the strong hydrogen bonds. The origin of this basic feature of the hydrogen bonds has been considered in Section 3.1. Weak hydrogen bonds occur if the proton donor has a comparable, yet lower, electronegativity than that of the hydrogen atom. This is the case of the C-H bond, where the carbon atom has a slightly lower electronegativity than the hydrogen atom. From the analysis of the data illustrated in Figs. 3.1 and 3.2 we have concluded that the deshielding of the hydrogen nucleus is negligible in the C-H bond. This is the reason groups, such as -CH 3 and -CH 2 -, to be considered as non-polar. On the other hand, we have mentioned that the magnitude of electron abstraction depends also on the chemical nature of the compound containing the X-H bound. Indeed, there is experimental evidence that the proton donor ability of R 3 C-H can increase by appropriate substitution of R. Also, deshielding of the hydrogen atom in the C-H bond takes place if the carbon atom has a multiple covalent bond. For instance, it has been shown that acetylene forms hydrogen bond with water. According to the experimental measurements and theoretical calculations the hydrogen bond length of the pair H-C=C-H-OH 2 is between 2.19 and 2.23 A and formation energy of -2.19 kcal/mol2. Even methane forms weak hydrogen bonds. These bonds are characterised with a marginal energy between 0.2 and 0.8 kcal/mol and distance between the donor and the acceptor of about 3.5 A.
Hydrogen Bonds
71
As seen, weak hydrogen bonds are characterised by low energy which is comparable with that of van der Waals interactions. Accordingly, the distances between the atoms involved in weak hydrogen bonds do not differ from those typical for the van der Waals contacts. The main difference between the weak hydrogen bonds and van der Waals interactions is the directionality of the former. 3.2.4 Hydrogen bond potential functions The geometry and the energy of formation of hydrogen bonds in principle can be calculated by means of quantum mechanics. This rigorous approach can be applied for calculations in gas phase. Using some approximations, quantum mechanical calculations can be carried out for more complicated systems, including proteins. Such calculations however cannot cover the whole protein molecule. Similarly to van der Waals interactions, the energetics of hydrogen bonds in protein is most often evaluated by means of empirical potential functions. The ideology for formulating the hydrogen bond potential functions does not differ from that for van der Waals interactions. This is not surprising taking into account the fact that the dispersion forces and the overlap repulsion are the dominating forces. The essential difference between the two types of potential functions is that the hydrogen bond potential functions should account for the effect of the rearrangement of the electron clouds of atoms participating in the hydrogen bond. This effect can be reflected simply by shortening of the interatomic distances. A function that gives a minimum at interatomic distance shorter than that of the Lennard-Jones potential [see Eq. (2.26)] is
UUB=Cr-n+Dr-w, where C and D are parameters depending on the atom pair and r is the hydrogen bond length. In order to meet the above requirement, the term corresponding to the dispersion forces is modified to have an exponent of -10. This function is symmetric and does not take into account the geometry of the hydrogen bonds.
72
Introduction to Non-covalent Interactions in Proteins
One possible way to take into account the geometry of the triad D-H-"A is the introduction of the hydrogen bond angle, 6, into the potential function. Such a function is UHB = cos0(Ar' n - Br"6) + (l-cos6)(A'r~ n - B'r~6),
(3.1)
where the parameters A and B correspond to the parameters describing the interactions between atoms i and j in the Lennard-Jones potential: 4=p
r12
R — ?p
r6
The parameters A' and B' are appropriately defined to reflect the shortening of the interatomic distances and the corresponding energy changes between the atoms that can form a hydrogen bond. This function requires four parameters for each atom. One can set A=A' and B = B' for atoms that do not belong to functional groups able to form hydrogen bonds. It is easy to see that in such a case Eq. (3.1) reduces to Eq. (2.26). The dependency of Eq. (3.1) on 0 is illustrated in Fig. 3.13. At 0 = 180°, i.e. at the ideal geometry of the hydrogen bond, the potential function is determined by the second term on the right hand side of Eq. (3.1) reduced by the Lennard-Jones term. With the decrease of #the contribution of the hydrogen bond term diminishes. Uw{9) {/HB(max'
Uu 0
90
180
e Figure 3.13 Dependence on 6>of the function defined by Eq. (3.1). The interval of 0for most frequently observed hydrogen bond geometry is marked by arrows.
Functions of the type of Eq. (3.1) have some disadvantages. The variable #is a geometrical parameter which is a result of the interactions
Hydrogen Bonds
73
between the atoms. In this context, its introduction is to a certain extent artificial. Modern computational approaches based on empirical functions do not use 6 as a parameter. Instead, the parameterisation of the proton donors, proton acceptors and the polar hydrogen atoms is improved, allowing hydrogen bonds to be specifically described by functions of type of Eq. (2.26). After adding a term describing electrostatic interactions — and other terms which will be briefly considered in Chapter 7 — the empirical potential functions give a sufficiently accurate account for the factors responsible for the variation of the hydrogen bond geometry. 3.3 Hydrogen Bonds in Proteins It is worth quoting the conclusion of Pauling and Mirsky about the role of hydrogen bonds in the structural organisation of proteins because it was made three decades before the first three-dimensional protein structure became available: "The molecule consists of one polypeptide chain which continues without interruption throughout the molecule (or, in certain cases, of two or more such chains); this chain is folded into a uniquely defined configuration, in which it is held by hydrogen bonds between the peptide nitrogen and oxygen atoms and also between the free amino and carboxyl groups of the diamino and dicarboxyl amino acid residues".3 3.3.1 Secondary structure elements The hydrogen bonds participating in stabilisation of the secondary structure elements are those between the peptide N-H and 0=C groups. We have already mentioned in Chapter 1 that the atoms in the peptide group are co-planar. Hence, in trans-conformation — most usual conformation in proteins — the proton donor and the proton acceptor are diametrically opposite (Fig. 3.14).
74
Introduction to Non-covalent Interactions in Proteins
donor H
t
-i- > f O
acceptor
t
Figure 3.14 Peptide group: the proton donor and proton acceptor are indicated by arrows.
This peculiarity of the peptide groups together with the directionality of the hydrogen bonds leads to a certain organisation if several peptides are linked by hydrogen bonds. In proteins, the intramolecular hydrogen bonds between the peptide groups result in a limited number of conformations of the backbone which we call secondary structure elements. The average values of the parameters of the hydrogen bonds forming the secondary structure elements are given in Table 3.3. Among them we recognise the a-helix and the (5-sheets which have already been considered in Chapter 1 (see Fig. 1.3). Another secondary structure element is the 3 ) 0 helix. The 3 i 0 helix is observed in short segments of protein backbones. Often 3 i 0 helices have the role of turns at which the polypeptide chain changes its direction. These fragments are also known as P-turns. An example of (3-turn is given in Fig. 3.6. Table 3.3 Average geometry parameters of the N - H " 0 = C hydrogen bonds forming secondary structure elements in proteins . N-H-0=C a-helix - main chain a-helix - N-terminus a-helix - C-terminus B-sheet - parallel B-sheet - antiparallel 3 10 helix B-turn
H-O.A 2.06 2.25 2.26 1.97 1.96 2.17 2.13
e
155 140 152 161 160 153 154
N - O distance, A 2.99
2.92 2.91 3.09 3.06
Hydrogen Bonds
75
The hydrogen bonds in a 3i0 helix connect every second peptide group (the N-H group of amino acid residue i connects the 0=C of residue i + 3), whereas in cc-helices the hydrogen bonds connect every third peptide group (residues i, i + 4). The geometrical parameters of these hydrogen bonds do not differ essentially. The hydrogen bonds formed at the C-termini of the oc-helices have geometry very similar to that of the 3 ] 0 helices. Accordingly, the conformation of the C-terminal segments of the a-helices often corresponds to 310 helix.
Figure 3.15 Cartoon presentation of the membrane protein porin from Rhodobacter capsulatus. The polypeptide chain fragments forming a (3-barrel are given as arrows.
The hydrogen bonds forming (3-sheets have geometry somewhat closer to the ideal one. If a P-sheet consists of more than two polypeptide chains, the resulting structure is called (3-pleated sheet (see Fig. 1.3) or (3barrel (Fig. 3.15). The (3-pleated sheets and the (3-barrels are characterised by a continuous hydrogen bond chain directed laterally across the polypeptide chains as shown in Fig. 3.16. The hydrogen bonds forming such structures are straightened by the cooperative effect known as resonance assisted bonding. Due to the high polarisability of the lone pair electron density, the formation of hydrogen bond between two peptide groups (Fig. 3.16, bond 1) is accompanied by a certain increase
76
Introduction to Non-covalent Interactions in Proteins
of the 0=C bond length. Consequently the electron density of the nitrogen atom shifts so that the C-N bond shortens and becomes "more double" in character. This makes the N-H in bond 2 a better donor. N-H—0=^C ~> N - H — 'I 1 'I *• 'I ---O-C N H---CKC I I Figure 3.16 Hydrogen bond crosslink in (3-pleated sheets and [3-barrels.
It is interesting to note that in (3-barrels some of these hydrogen chains may close to form a cycle, a factor stabilising the protein structure. The protein shown in Fig. 3.15 is an example of a P-barrel with a number of hydrogen bond chains closed in cycles. The hydrogen bonds in the helical structures do not show this cooperative effect because the hydrogen bond chain is interrupted by N-Coc and C a - C bonds. 3.3.2 Hydrogen bonds involving side chains A number of amino acid side chains have functional groups which can serve as proton donors or proton acceptors. These are the polar and charged amino acids which can easily be recognised in Table 1.1. In Table 3.4 these amino acids are sorted according to their function as proton acceptor or proton donor. This separation is however rather formal. The majority of the functional groups can act as both proton donors and proton acceptors. Pure proton donors are the lysine e-amino groups and arginine guanidinium groups. As pure proton acceptor can be considered the deprotonated form of the carboxyl group (the carboxylate). Each of the oxygen atoms in the carboxylate can form two hydrogen bonds (Fig. 3.17). Occupation of anti positions is relatively seldom observed in hydrogen bonds between side chains. However, all possible positions can participate in hydrogen bonding with water molecules.
Hydrogen Bonds
77
anti i /
6— s y n cr
Figure 3.17 Positions of the proton acceptor sites in carboxylates. Table 3.4 Proton acceptors and proton donors in protein side chains. Acceptor
Donor +
Aspartate Glutamate C-terminus
/
H-NH2-
HO.
Lysine N-terminus Aspartic acid Glutamic acid
,c—
O /,C—
H2Nx
/
Asparagine Glutamine
o
Arginine
VC-NH HN
Threonine Serine
Tyrosine (deprotonated)
Histidine (deprotonated)
-HO
O-
H-O-
H-O
„N
Tyrosine (protonated)
Histidine HN -—^ N
Aromatic rings Tyrosine, tryptophan
Threonine Serine
HN
HN )N I + )NH
Tryptophan
78
Introduction to Non-covalenl Interactions in Proteins
If the carboxyl group is protonated it can act also as proton donor. Its protonation state changes with pH, so that its proton donor properties are pH dependent. The change from the proton donor to proton acceptor functions applies to all ionisablc side chains. The possible impact of this property of the titratable groups on protein functional properties will be considered in Chapter 7. The amino acid classified as aromatic in Table 1.3 can participate in hydrogen bonding via the 7i-electron cloud of their aromatic rings. In such a hydrogen bond these groups are proton acceptors. This type of bonds is not often observed in proteins. However, they can play the role of an additional stabilisers of the spatial structure of proteins. Thus, for instance, the hydrogen bond between a tyrosine and asparagine side chains illustrated in Fig. 3.18 stabilises one turn of a oc-helix in the protein named transcription enhancer factor 2.
Figure 3.18 Hydrogen bond between asparagine and the aromatic ring of tyrosine in transcription enhancer factor 2.
3.3.3 Salt bridges Among the variety of hydrogen bonds, the binding of charged functional
groups is of particular interest. This type of hydrogen bonds is called salt bridge. Most often, salt bridges occur between the deprotonated carboxyl groups of the aspartic and glutamic acids, including the C-termini (proton acceptors) and cc-amino group of the N-termini, e-amino group of the lysines, guanidine group of the arginines, imidasole group of the
Hydrogen Bonds
79
histidines in their protonated form (proton donors). As we have pointed out, protonation states of these groups depend on pH so that the formation of salt bridges also depends on pH. The prediction and the effect of this dependency will be considered in Chapters 6 and 7. The deprotonated carboxyl groups have uncompensated electron density having a net electric charge of - 1 p.u. The proton donors mentioned above have one hydrogen atom whose electric charge is not compensated, i.e. they are characterised by a deficiency of electron density. Following the definition of a strong hydrogen bond, one can suppose that salt bridges are strong hydrogen bonds. On the other hand, comparing the geometry of the strong hydrogen bonds with that formed by carboxylates and ammonium ions (Table 3.1), we see that the latter belong rather to the moderate (normal) hydrogen bonds. Thus, it is important to note that salt bridges are not strong hydrogen bonds. It would be useful here to introduce some terminology which is often used in the literature. Figure 3.18 shows an example of side chain-side chain hydrogen bonding that links amino acid residues separated along the polypeptide backbone by three peptide groups. If the proton donor and the proton acceptor in a hydrogen bond belong to amino acid residues separated by a few peptide bonds along the polypeptide chain, it is considered as a local hydrogen bond. Obviously, there are no restrictions for formation of hydrogen bonds between side chains which are distant along the polypeptide chain. Hydrogen bonds connecting different secondary structural elements in proteins are very common. These hydrogen bonds, i.e. bonds between amino acid residues separated along the backbone by a large number of peptide units, are often called long-range hydrogen bonds. An example of long-range hydrogen bond is given in Fig. 7.12. This definition should not be confused with longrange interactions. It clearly follows from our discussion in the previous sections of this chapter that hydrogen bonds are short-range interactions. Thus, it is good to remember that the term "long-range hydrogen bond" is referring to the "distance" along the polypeptide chain. Salt bridges are hydrogen bonds which link charged partners. In this context they differ from the other hydrogen bonds in proteins. Due to electrostatic interactions between the net charges of the proton donor and the proton acceptor, salt bridges include long-range electrostatic
80
Introduction to Non-covalent Interactions in Proteins
interactions. Therefore, salt bridges are often recognised as ion pairs and are believed to have a role in the electrostatic stabilisation of the native structure in proteins. Again, it is worth noting that salt bridges are ion pairs, but not all ion pairs are salt bridges. Ion pairs with a distance between the proton donor and the proton acceptor more than 4 A lose directionality, hence they cannot be considered as hydrogen bonds. 3.3.4 Hydrogen bond networks Based on the analysis of a large number of three-dimensional structures of proteins it was found that the functional groups tend to fully satisfy their hydrogen bonding atoms. This trend is parallel with the fact that the majority of these groups can act as proton donors and proton acceptors simultaneously. Hence, the formation of hydrogen bond networks by the side chains in proteins must be a common feature. We have already analysed one type of hydrogen bond networks, namely the chains of hydrogen bonds linking the polypeptide main chains in (3-barrels. Among the variety of hydrogen bond networks, those involving salt bridges are of particular interest. Because the proton donor and proton acceptor groups have a net charge of +1 and - 1 p.u., respectively, these networks are often called ion clusters. This definition emphasizes the ionic character of the salt bridge networks only. We should keep in mind that the hydrogen bond properties, such as the directionality, have a dominating role. Salt bridges often link secondary structure elements, fixing in this way their mutual orientation, i.e. salt bridges are expected to stabilise the three-dimensional structure of proteins. The same is valid for salt bridge networks. Figure 3.19 shows a salt bridge network connecting the two subunits of the protein disulfide oxidoreductase from Pyrococcus furiosus. The network consists of ten functional groups. The total number of bonds is 14, out of which six are connecting the two subunits. These hydrogen bonds are indicated by green dotted lines in the figure. The rest of the hydrogen bonds are also relevant because they keep the interacting groups on the right positions and orientations, ensuring in this way energetically favourable, hence stabilising configuration of the partners. In this context, salt bridge networks are highly cooperative. Indeed, a
81
Hydrogen Bonds
number of experimental observations based on site-directed mutagenesis of the protein considered in our example show that the removal of one of the supporting side chains (those that do not make hydrogen bonds between the subunits) leads as a rule to the disturbance of the balance within the network and to its disintegration. (A)
(B)
Figure 3.19 (A) Salt bridge network connecting two subunits of disulphide oxidoreductase from Pyrococcus furiosus5. The hydrogen bonds connecting the subunits are given in green. (B) Cartoon presentation of the two subunits of disulphide oxidoreductase. The individual subunits are given in different colours. The region of the salt bridge network is indicated.
Large salt bridge networks are most often observed in proteins from hyperthermophilic organisms. The natural environment of these organisms is characterized by a temperature range between 80 and about 100°C. Obviously, the proteins from hyperthermophilic organisms must hold their biologically active three-dimensional structures at these extremely high temperatures. Their counterparts from mesophilic organisms (all plants or animals, for instance) as a rule denature at temperature around 60°C. The observed increase of the number and the size of salt bridge networks in the proteins from hyperthermophilic organisms becomes even more remarkable when we take into account the fact that the three-dimensional structures of the corresponding counterparts from mesophilic organisms do not differ essentially. One
82
Introduction to Non-covalent Interactions in Proteins
can conclude that salt bridge networks are a factor stabilising the protein structure at high temperatures. We will consider this hypothesis in Chapter 8.
Figure 3.20 Water channel in alcohol dehydrogenase from Drosophila lebanonc-nsis6. The carbon atoms of the substrate NAD* are given in light-grccn. The electron densities of the water molecules (contoured at 1-sigma) are presented in blue. The drawing was kindly provided by Dr. J. Benach, Columbia University, Dept. of Biological Sciences.
Hydrogen Bonds
83
Hydrogen bond networks occur not only between the functional groups of the proteins. Water molecules successfully compete to form hydrogen bonds and often participate in hydrogen bond networks. Water molecules can link two or more side chain functional groups via hydrogen bonds. In concave regions of the protein surface clusters involving hydrogen bound water molecules and protein polar groups can be formed. The fact that such clusters are observed in protein crystal structures suggests that the positions of the water molecules are energetically favourable and that they are occupied not only in the crystalline state, but also when proteins are in solution. This is most likely to be true for water clusters that form channels penetrating deep in the protein moiety. Such a water channel is illustrated in Fig. 3.20. It connects the active site of the protein alcohol dehydrogenase (upper part of the figure including the residues Tyrl51, Lysl55, and the co-factor NAD+) with the bulk solution (bottom part). The hydrogen bond network contains nine water molecules which occupy a cleft and form hydrogen bonds between themselves and with the polar groups lining the cleft. From a structural point of view, this hydrogen bond network connects different structural elements of the molecule and in this context it plays a stabilising role. It is also speculated that it can be involved in the catalytic functions of the protein ensuring access of the active site to the solvent. 3.4 Hydrogen Bonds and Protein Stability In the previous section we mentioned several times that the formation of hydrogen bonds and hydrogen bond networks has a stabilising impact on protein structure. The direct evaluation of the energetics of a distinct hydrogen bond in the protein molecule is not a straightforward task. Formation or breaking of a hydrogen bond is accompanied by changes of the interactions between other atoms, not participating in the hydrogen bond under consideration.
84
Introduction to Non-covalent Interactions in Proteins
3.4.1 Hydrogen bonds within the polypeptide chain, role infolding We have already seen that the formation of secondary structure elements is correlated with the formation of well defined patterns of hydrogen bonds. The question arises: is the formation of hydrogen bonds between the peptide groups a driving force for folding of the polypeptide chain and its stabilisation? In order to answer this question one needs to show that the energy of the hydrogen bond between -NH and 0=C groups is more favourable in a folded than in an unfolded protein. The solution of this difficult problem can be approached by investigations of hydrogen bonding of a model compound which is maximally similar to the peptide groups. Also, we should find appropriate environments (solvents) which are maximally similar to the environment of the peptide groups in folded and unfolded proteins. This set-up represents a primitive modelling of these two states of a protein molecule. A very instructive example for the evaluation of the contribution of the hydrogen bonds in the stabilisation of proteins is given in the book of Kozo Hamaguchi "The Protein Molecule"7. It is based on experimental measurements of methylacetamide, a compound very similar to the peptide group (Fig. 3.21). H H 3 C-N-(J-CH 3 O Figure 3.21 Structural formula of methylacetamide.
Assume that in unfolded state the peptide groups of the protein are fully hydrated. This means that the N-H and 0=C groups are free to form hydrogen bonds between themselves, as well as with the surrounding water molecules. This situation is simulated by a water solution of methylacetamide. The free energy of formation of a hydrogen bond between two methylacetamide molecules in water is AGHB,w= 3.1 kcal/mol. The positive value of AGHB,W shows that the formation of hydrogen bond between two methylacetamide molecules is unfavourable.
Hydrogen Bonds
85
It also shows that water molecules successfully compete for hydrogen bonding with the molecules of methylacetamide. In folded state of the protein molecule the peptide groups are usually inaccessible to the solvent. In this way the competition of water molecules for hydrogen bonding to N-H and 0=C groups is eliminated or essentially reduced. Such a situation can be simulated by substituting water as solvent by a non-polar solvent, for instance tetrachlormethane (carbon tetrachloride, CC14). Experimental measurements of methylacetamide hydrogen bonding in such a solvent give AGHB^non.poiar = -2.4 kcal/mol. This result shows that non-polar environment stimulates the formation of hydrogen bond between the N-H and 0=C groups. In the terms of our model the process of folding of a protein molecule can be regarded as a change of the environment of peptide groups: from fully hydrous to an anhydrous. Hence, we are interested in the behaviour of the hydrogen bond N - H " ' 0 = 0 upon this change. The free energy change of formation of the hydrogen bond due to the change of its environment, AGfoi, can be estimated by means of the thermodynamic cycle shown in Fig. 3.22 and the relation (see Appendix A and Fig. A.l) AGHB,W
+ AGfoM + (—AGHB,non-polar) + AGtr
= 0 ,
where AGtr is the free energy of transfer of methylacetamide from the medium of tetrachlormethane to that of water. In the above expression the term AGHBnon.poiar is taken with sign minus, because we have defined it as the free energy of formation of the hydrogen bond, whilst the free energy of dissociation participates in the thermodynamic cycle. The value of AGfoid calculated in this way is 0.62 kcal/mol. This value does not change essentially when tetrachlormethane is substituted by another non-polar solvent. It follows that the stability of the hydrogen bond N-H-"0=0 does not depend on the polarity of the solvent. Thus, according to this model, the stability of a hydrogen bond between two peptide groups is not essentially influenced by the change of the environment. This result suggests a negative answer of the question posed at the beginning of this section, namely that these hydrogen bonds do not contribute to the spontaneous folding of the protein molecule. In the light of the observation that the secondary structure always arises with formation of hydrogen bonds between the peptide groups such a
86
Introduction to Non-covalent Interactions in Proteins
result might seem, to a certain extent, surprising. It might be a consequence of certain shortcomings in the concept of the model. Indeed, the model used for evaluation of AGfoid is not an exact match of the environment of the peptide groups in the protein molecule. The experimental data used to calculate AGfoM are for diluted solutions. In the polypeptide chain, the peptide groups are enforced to be close to each other. This situation resembles a high concentration solution rather than the diluted solutions. Also, possible cooperative effects upon formation of hydrogen bonds in proteins are excluded.
(Water) N-H 0=C
AGHB.W N-H-0=C
AG'fold
AGtt N-H
(CCl4)£_£ <«U
(Water)
"^
N-H-0=C
(CC14)
AGMB, non-polar
Figure 3.22 Thermodynamic cycle for calculation of the free energy change, AGfotd, of N-H-"0=0 due to change of the environment.
3.4.2 Hydrogen bonds involving side chain, role in stability In the previous section we have considered an approach based on a simplified model of the native and the denatured states of protein molecule. We noticed that this model may not match precisely enough the characteristics of the protein molecule. Here we will try to approach the problem in another way. A hydrogen bond stabilises protein structure if its removal leads to reduction of the absolute value of the free energy of the protein. Thus, if we remove one of the functional groups — and with that we eliminate the hydrogen bond this group participates in — we will be able to evaluate the contribution of the hydrogen bond of interest to the stability of the protein molecule. For the purposes of this approach, the method of
Hydrogen Bonds
87
site-directed mutagenesis is most appropriate. Since this method is far beyond the scope of our considerations, we will only be interested in its end result. That is, one amino acid residue is substituted by another one, at that the position of the substitution along the polypeptide chain is determined beforehand. In this way the target amino acid can be substituted by an appropriate one to avoid misleading side effects. Such an effect can be, for instance, a conformational change of the polypeptide chain due to the different volumes of the original amino acid residue and the substituent. Substitutions within the couples tyrosine/phenylalanine and threonine/valine are most appropriate for analysis of the effect of hydrogen bond removal. In both couples, formally only the functional groups are subject to substitution: the tyrosine -OH group becomes a hydrogen atom in phenylalanine, while the threonine -OH is replaced by a -CH 3 group in valine. Based on the above approach, one can estimate the contribution of a hydrogen bond by the formula AGHB
= AGMutant
— AG mid type •
(3.2)
The free energy of the original protein (wild type), AGwiid type, as well as of the modified one (the mutant), AGMutant, are obtained experimentally and define AGHB for given conditions, such as temperature and pH. As it will be discussed in Chapter 8, the sign of AGwudtype, or AGMulant, depends on the choice of the reference state. If this is the native structure, the positive values of AGHB mean stabilisation of the native structures. As written, Formula (3.2) gives the change of the energy due to the removal of a hydrogen bond (if it is removed in the mutant). The above formula can be applied for evaluation of the energy change due to any mutation. To formally distinguish the common case of mutation from that eliminating a hydrogen bond, we will use temporarily the symbol AGAM which designates the change of free energy of an arbitrary mutation. Some values of AGHB and AGAM obtained by Nick Pace and his co-workers in a series of comprehensive studies8- are listed in Table 3.5. These values are obtained by mutation of tyrosine residues to phenylalanine and threonine residues to valine at different positions in two proteins, ribonuclease Sa and ribonuclease Sa3. The data
88
Introduction to Non-covalent Interactions in Proteins
given in Table 3.5 are obtained with reference state chosen to be the native state, i.e. according to Eq. (3.2). Therefore, negative values of AGHB mean stabilising contribution of the corresponding hydrogen bond. Based on the results listed in this table, we can give an answer to the question posed at the beginning of this section. Hydrogen bonds stabilise the native structure of proteins. On average, a hydrogen bond formed by tyrosine residues stabilises the native structure by 2 kcal/mol. Less, but also significant, is the stabilisation contributed by the hydrogen bonded functional group of threonine.
Table 3.5 Change of stability, AGHB and AGAM = AGMutant - AGwiUtype, of ribonuclease Sa and ribonuclease Sa3 due to mutation tyrosine to phenylalanine and threonine to valine. The notations, such as Tyr51Phe, mean mutation of tyrosine at position 51 in the polypeptide to phenylalanine. Hydrogen bond ribonuclease Sa -OH-"OOC-glu -OH-O-pro -OH-"OOC-glu -OH-"OOC-glu -HO- + HNe-arg -HO-OH-thr -OH-"OOC-asp -OH-0=C-gly -HOOH-thr ribonuclease Sa3 -OH-"OOC-glu -OH-O-pro -OH-"OOC-glu
Mutation
AGHB
Mutation
AGAM
Tyr51Phe Tyr52Phe Tyr80Phe Tyr86Phe
-2.3 -3.6 -1.5 -0.3
Tyr30Phe Tyr49Phe Tyr55Phe Tyr81Phe
0.4 -0.2 -0.6 -1.2
Thrl8Val Thr56Val Thr67Val Thr82Val
-1.4 -1.9 -0.0 -1.7
Thr5Val Thrl6Val Thr59Val Thr72Val
0.0 0.3 -1.7 -0.2
Tyr54Phe Tyr55Phe Tyr83Phe
-2.6 -2.1 -1.5
TyrllPhe Tyr33Phe Tyr58Phe Tyr84Phe Tyr89Phe
-0.6 0.5 -0.7 -1.0 O0
Hydrogen Bonds
89
The data given in Table 3.5 also illustrate that an a priori quantitative prediction of the contribution of the individual hydrogen bonds to protein stability is not a straightforward task. As seen from the table, their contributions differ essentially. For instance, the mutation of Thr67 to valine (Thr67Val) in ribonuclease Sa does not have any effect on the stability of the protein. The effect of the mutation Tyr86Phe is also negligible, although the tyrosine residue makes hydrogen bonds with two partners. We have already mentioned that mutations may cause structural changes which are not necessarily related to the elimination of a hydrogen bond. Thus, one can suppose that AGHB contains not only the hydrogen bond contribution. The magnitude of the non-hydrogen bond contribution depends on the environment of the mutated site. This can be seen by the inspection of the values of AGAM resulting from mutations of amino acid residues not participating in internal hydrogen bonds. On average, they are lower than those of AGHB (AGAM~ -0.4 kcal/mol). However some individual values can be substantial and comparable with the AGHB. Such are the changes of the stability of ribonuclease Sa due to the mutation Tyr81Phe, AGAM = -1.2 kcal/mol, or due to the Tyr84Phe in ribonuclease Sa3, AGAM = -1.0 kcal/mol. These results come to illustrate once again that hydrogen bond properties, including their energetics and contribution to protein stability, are a result of the balance between the interactions within the hydrogen bond and the interactions of the atoms composing the hydrogen bond with the environment. References 1. Berghethon PR, (1998) The Physical Basis of Biochemistry. New York: SpringerVerlag. 2. Turi L and Dannenberg JJ, (1993) Molecular-orbital studies of C-H—O H-bonded complexes. J. Chem. Phys., 97: 7899-7909. 3. Pauling L and Mirsky AE, (1936) On the structure of native, denatured, and coagulated proteins. Proc. Natl. Acad. Sci. U. S. A., 22: 439-447. 4. Jeffrey GA, (1977) An Introduction to Hydrogen Bonding. New York: Oxford University Press.
90
Introduction to Non-covalent Interactions in Proteins
5. Ren B, Tibbelin G, Pascale D, Rossi M, Bartolucci S and Ladenstein R, (1998) A protein disulfide oxidoreductase from the archaeon Pyrococcus furiosus contains two thioredoxin fold units. Nat. Struct. Biol., 5: 602-611. 6. Koumanov A, Benach J, Atrian S, Gonzalez-Duarte R, Karshikoff A and Ladenstein R, (2003) The catalytic mechanism of Drosophila alcohol dehydrogenase: evidence for a proton relay modulated by the coupled ionization of the active site lysine/tyrosine pair and a NAD+ ribose OH switch. Proteins, 51: 289-298. 7. Hamaguchu K, (1992) The Protein Molecule. Conformation, Stability and Folding. Tokyo: Japan Scientific Society Press. 8. Myers JK and Pace CN, (1996) Hydrogen bonding stabilises globular proteins. Biophys.J., 71: 2033-2039. 9. Takano K, Scholtz JM, Sacchettini JC and Pace CN, (2003) The contribution of polar group burial to protein stability is strongly context-dependent. /. Biol. Chem., 278: 31790-31795. 10. Pace CN, Horn G, Hebert EJ, Bechert J, Shaw K, Urbanikova L, Scholtz JM and Sevcik J, (2001) Tyrosine hydrogen bonds make a large contribution to protein stability. J. Mol. Biol., 312: 393^104.
Chapter 4
Hydrophobic Interactions
Non-polar compounds, such as hydrocarbons, are poorly dissolved in water. This property is the reason to call non-polar compounds hydrophobic. Hydrophobic compounds in water tend to form aggregate or films on the water/air interface. The forces responsible for this effect are known as hydrophobic interactions. Kauzmann1 has stressed the fact that almost half of the amino acid residues in proteins are hydrophobic and concluded that the interactions between these residues, which he named hydrophobic bond, must be considerable enough in order to enforce the polypeptide chain to fold into a compact structure. In the literature, the term hydrophobic force can also be seen. Hydrophobic interactions, hydrophobic bonds, as well as hydrophobic forces refer to the same phenomenon, namely the collapse of non-polar compounds into aggregates when surrounded by water. Because it is commonly accepted, we will stick to the term hydrophobic interactions. 4.1 Nature of Hydrophobic Interactions, Pseudo Forces The fact that two non-polar molecules in water, being distant at a certain initial state, come close to each other and form a stable aggregate, appears as the manifestation of an attractive force. The question arises, as to what is the nature of the attractive force between the non-polar molecules. In order to give an answer to this question, we have to consider two types of forces: real and pseudo (fictitious) forces. Real forces result from definite interactions between the objects (molecules, particles, etc.) and are properties of these objects. Real forces are, for instance, the electric forces. They always arise between charged objects. 91
92
Introduction to Non-covalent Interactions in Proteins
On the contrary, pseudo forces are not property of individual objects. Pseudo forces are property of the system these objects define or are part of. Thus, pseudo forces do not occur between individual entities considered out of a system. Because real and pseudo forces have similar appearance they cannot be distinguished if their nature is not known. Sometimes this leads to confusion. A source of confusion could also be the hydrophobic interactions. (A)
— •
mmmm :^x^^
(B) ' •
. 3 M .
•
- ^
"
-:-:-:->>:-:-:-:-:-:^:-:-:-:-:->:-:-:-:-:
Figure 4.1 Hydrophobic compounds in water. (A) Unfavourable state of dissolved hydrophobic compound. Two-way arrows indicate pseudo forces. (B) Favourable state of aggregates. The arrow indicates the direction of spontaneous transition.
Let us consider a system, containing water and non-polar, hydrophobic, molecules. In state A of the system, the hydrophobic molecules are dissolved in water; they are separated from each other (Fig. 4.1 A). In state B, the molecules are in an aggregate form (Fig. 4. IB). The experimental observations show that the system undergoes a spontaneous transition from state A to state B and suggests an attractive force between the molecules bringing them together to form an aggregate. We call the force causing this apparent attraction hydrophobic force. This observation is phenomenological, so that it does not reveal the nature of the hydrophobic forces. We have seen in Chapter 2 that the attraction between non-polar molecules results from the dispersion forces. These forces alone cannot explain the observed phenomenon. Also, no other forces, specific to non-polar molecules can be identified. This leads to the conclusion that the apparent attractive forces driving the assembly of non-polar molecules in water are a result of the behaviour of the system, i.e. they are pseudo forces.
Hydrophobic Interactions
93
4.2 Water The conclusion made above is in a sense premature because it was based on common considerations without detailed analysis of the cause for the assembly of non-polar molecules. As far as the phenomenon behind hydrophobic interactions occurs in water, it is reasonable to begin our analysis with the investigation of the interactions of water with non-polar compounds. For this purpose we need to get familiar with some properties of water. 4.2.1 Flickering clusters model of water Water is characterised by a number of properties which make it different from all other liquids. The most prominent of them, compared with those of some other liquids, are given in Table 4.1. Water is among the fluids with highest dielectric constant. Also, it is characterised with high melting and vaporisation temperatures. Some of the properties of water make it a unique medium for life, the only one known so far. For instance, liquid water has a higher density than ice (995 kg/m3 at 30°C versus about 917 kg/m3, respectively). Due to this property, ice floats and ensures a thermal insulation of the living organisms in lakes and rivers. Water has a very high heat capacity, which makes it act as a thermal buffer reducing the temperature variation of the environment. The examples given above relate water properties to some global features of the biosphere. Among the other interesting properties of water, one is of special interest: That is, density of water does not reduce gradually with temperature as it is observed for all other liquids, but shows a maximum at 4°C (Fig. 4.2), meaning that by cooling at 4°C water expands. This phenomenon, called volumetric anomaly, is to a certain extent related to our problem of elucidation of the nature of hydrophobic interactions. There are two general concepts on which the different theoretical models of water are based. The first one considers water as a continuum material. On molecular level, water is presented as a continuous network of hydrogen bonds. Hydrogen bonds can be flexible and bend, or they can break, allowing water molecules to rotate and move. Models based
Introduction to Non-covalent Interactions in Proteins
94
on the second concept consider water as a system of hydrogen bound water clusters surrounded by non-bound molecules. Models based on this concept are known as cluster or mixture models. Here, we will give only a basic view of this concept in order to get a qualitative understanding of water properties relevant for the phenomenon of interest, namely hydrophobic interactions. Table 4.1 Some physical constants of water and other compounds. Temperatures are given in °C. The values of heat capacity are for 25°C. Dielectric constant
Compound Water Hydrofluoric acid, HF Methanol, CH^OH Ethanol, C2H5OH Ammonia, NH3 Chloroform, CHC13 Methane, CH4 Benzene, C6H6
Melting temperature
Boiling temperature
Heat capacity cp cal/(g.K)
0
100
1.00
-93 -94 -115 -78
20 65 78 -33 59 -162 80
gas 0.35 0.61 0.58 gas 1.13 0.23 gas 0.53 0.42
78 at 25°C 88.3 at 0°C 83.6atO°C 34 24 17 5 2 2
-182 5
1000.0999.5-
4°C
s M 999.0998.5-
998.0-U 0
.
, 5
,
,
,
10
, 15
,
,— 20
temperature °C Figure 4.2 Temperature dependence of density of water.
Hydrophobic Interactions
95
One of the earliest mixture models is the flickering clusters model proposed by Frank and Wen2. An illustration of this model is given in Fig. 4.3. It reflects the basic idea of mixture models, namely that water consists of two kinds of molecular organisation, ice-like and liquid-like. The ice-like organisation represents clusters of hydrogen bound water molecules surrounded by molecules which do not participate in any hydrogen bonding. The latter represents the liquid-like organisation. The clusters exchange water molecules with the surroundings: hydrogen bound molecules are released from the cluster, whereas molecules from the surroundings form hydrogen bonds and become members of a cluster. In this way, the ice-like clusters change their size, form and position.
-'(C-& x- "QoSfp
--
Figure 4.3 The flickering cluster model. The arrows indicate a molecule leaving the cluster (left hand side) and a molecule joining the cluster (right hand side).
The flickering cluster model gives a qualitative explanation of the peculiar temperature dependence of the density of water (Fig. 4.2). At a certain temperature, for instance at 4°C, the exchange of molecules between the clusters and the surrounding unbound water is in equilibrium. By cooling, starting from 4°C, the molecules lose kinetic energy. For unbound molecules this is manifested in a decrease of the translational and rotational energy, leading to a preferable orientation and formation of hydrogen bonds. In this way the clusters increase in size or new ice-like clusters are formed. In both cases, the fraction of the ice-like clusters increases. Due to the lower density of the clusters (they are ice-
96
Introduction to Non-covalent Interactions in Proteins
like) the volume of water increases. The opposite process occurs by heating. At temperatures higher than 4°C, the fraction of the ice-like clusters reduces, whereas the fraction of unbound, liquid-like, molecules increases. Heating induces increase of the thermal motion of the molecules which leads to an increase of the volume as in all other liquids. Although this model cannot explain some other experimental observations, it is interesting for us, because it gives an insight of the complexity of water. The important feature postulated in the model is that water molecules form hydrogen bond networks. There are other, more sophisticated models, where clusters are considered as ordered structures, such as networks of water rings and different polyhedral formations. The number of water molecules bound in clusters is estimated between 100 (at low temperature) and 20 (at high temperature). It should be noted that the clusters, regardless of their organisation, are continuously changing size, breaking and re-forming. This explains why no stable clusters have ever been experimentally identified in pure water. Therefore we should consider the clusters as an unstable, temporary organised arrangement of water molecules. 4.2.2 Hydrocarbons in water, iceberg model The ability of water molecules to form clusters and networks of hydrogen bonds brings forth another interesting phenomenon. Water molecules can form inclusion compounds. Inclusion compounds are molecular complexes in which a set of molecules envelops other molecules without creating covalent bonds with them. The former are referred to as hosts, whilst the latter are guests. If water is the host, the inclusion compounds are called hydrates. Water molecules can create a variety of hydrogen bond network structures at which cavities are formed. In the right hand side of Fig. 4.3 a quasi pentagonal formation of hydrogen bound water molecules can be recognised. This can be the face of a polyhedron, the vertices of which are defined by the oxygen atoms of water. Such a polyhedron is shown in Fig. 4.4. This cage-like structure forms a relatively large cavity able to accommodate a molecule with the size of methane. In pure water these
Hydrophobic Interactions
97
formations are unstable, as we have pointed out. At certain conditions, mixtures of water and hydrocarbons form relatively stable crystalline structures of inclusion compounds. The cavities formed by the cage-like structures of the hydrogen bound water molecules are occupied by the hydrocarbon molecules. These formations are called clathrate hydrates.
Figure 4.4 Pentagonal dodecahedron of hydrogen bound water molecules. This structure contains 20 water molecules.
There is a great variety in the size and geometry of clathrate hydrates, in other words different compounds can occupy the voids of the ice-like hosts. The variety of inclusion compounds is not restricted to non-polar guests only. Some cations containing hydrocarbon moiety, such as tetramethylammonium, (CH3)4N+, or tetrabutylammonium, (CH3(CH2)3)4N+, can be enveloped in clathrate structures as well. In these structures, the anions (anions must be present in order to keep the system neutral) are incorporated in the hydrate cage by hydrogen bonding. One speaks about ionic clathrate hydrates. Detailed description and analysis of different clathrate structures can be found in the book by Jeffrey and Seanger "Hydrogen Bonding in Biological Structures"3. For the purposes of our considerations, however, most important is the observation that in clathrate hydrates water is organised in a cage-like network around the hydrocarbon molecules.
98
Introduction to Non-covalent Interactions in Proteins
Based on this observation, we can presume that water molecules form ice-like hydrogen bound frameworks around dissolved hydrocarbons. This concept has been proposed by Frank and Evans4 and is known as the iceberg model. The iceberg model provides a plausible picture of the phenomena related to dissolution of non-polar compounds in water. One of the most prominent effects of the transfer of non-polar compounds to water is the increase of the heat capacity. According to the iceberg model heating of the solution leads to melting of the "iceberg" — the clathrate shell around the dissolved non-polar molecule — which is related to absorption of energy (breaking of the hydrogen bonds) and increase of entropy (increase of disorder of the unbound molecules), hence to increase of the heat capacity, Cp = dHldT. Other phenomena, the understanding of which is facilitated by the iceberg model, will be considered in the next section. 4.3 Hydrophobic Effect 4.3.1 Oil drop in water Everyday experience shows that by mixing oil with water, oil drops are formed. Also, it is easy to observe that these drops or oil spots on the water/air interface tend to unite rather than to split. An immediate conclusion that can be made is that the oil drop tends to minimise its contact surface with water, an effect which we will consider in Section 4.4.2. In fact, this is just a rough observation of the effect of hydrophobic interactions. However, we are interested in the molecular mechanism of hydrophobic interactions, whose macroscopic expression is the oil drop behaviour. Moreover, our goal is to understand hydrophobic interactions in proteins. Therefore we will focus our analysis on a distinct class of non-polar molecules, namely the hydrocarbons. The formation of hydrogen bonds occurs with release of heat, i.e. this is an exothermic process. The water molecules lose kinetic energy and fall in a potential energy minimum. At the same time, the hydrogen bonding immobilises the water molecules which leads to a reduction of the entropy. Thus, the formation of the clathrate shell is an entropically
Hydrophobic Interactions
99
unfavourable process. Let us employ the iceberg model to analyse the outcome of aggregation of non-polar compounds in water (which macroscopically is fusing of oil drops).
Figure 4.5 Hydrophobic interactions and the iceberg model (see also Fig. 4.1).
The aggregation of non-polar molecules is illustrated in Fig. 4.5. This is in fact Fig. 4.1 worked out in detail by presenting the clathrate water shell around the non-polar molecules. In the left hand side (panel A) of the figure, the separated non-polar molecules are enveloped by clathrate shells. As we have seen in the previous chapter, the formation of hydrogen bonds is a favourable process, therefore water tends to preserve maximum hydrogen bond interactions. On the other hand, the water molecules are immobilised in the clathrates which is entropically unfavourable. If the clathrates fuse and form a cage large enough to accommodate an aggregate of non-polar molecules, the water molecules that appear at the interface between the non-polar molecules are liberated from the individual clathrates (see the right hand side of Fig. 4.5). In this way the entropy of the system increases. In the next section we shall see that in spite of the unfavourable effect of the breaking of hydrogen bonds, the entropic win is larger. The dominance of this favourable entropic change upon aggregation is the driving force for the system to spontaneously adopt state B, i.e. the collapse of the non-polar molecules into aggregates. It becomes clear that the low solubility and
100
Introduction to Non-covalent Interactions in Proteins
the aggregation of the non-polar molecules are two observations of one and the same phenomenon. 4.3.2 Experimental assessment of hydrophobic interaction In order for a system to adopt a certain state spontaneously, this state must be characterised by minimum free energy. According to the above considerations, state B must have lower free energy than state A, ZIGA^B = AGB-AGA < 0. The assembly of hydrocarbon molecules in state B can be considered as a separate phase immersed in the medium of water. This phase we name non-polar liquid phase. The surrounding water defines the aqueous phase. The chemical potential of the hydrocarbon solute in the aqueous phase is given by piw=^+RTlnXw+RT\nfw,
(4.1)
where ^ is the standard chemical potential of the solute, Xw is its mole fraction and/ w is its activity. The quantity / w accounts for the interactions between solute molecules. Because the solubility of hydrocarbons in water is very low we can set ln/w = 0. In the same way, for the chemical potential of the hydrocarbon in the non-polar liquid phase one can write fi0=/j°0+RTlnX0+RT\nf0,
(4.2)
where the notation is the same as in Eq. (4.1), with the subscript "0" indicating the non-polar liquid phase. If the non-polar liquid consists of only one type of hydrocarbons, X0 = 1 and ln/0 = 0. In equilibrium the chemical potentials of the two phases are equal: j"w +RTlnXw =fi°0 or
In general, Eqs. (4.1) and (4.2) in equilibrium give
Hydrophobic Interactions
101
indicating that the non-polar liquid contains different types, i, of hydrocarbons. The term ( ^ , : - /u°oi) is the free energy change, AG,, when a hydrocarbon is transferred from the non-polar liquid to water: X AGt4=RTln^-
f + RT\n—
(4.3)
or AGt =-RT In Xw
(4.4)
if the non-polar liquid is homogenous. The association of a hydrocarbon molecule to the oil drop is then -AG,. Thus, we can bring the task of evaluating AGA^B to obtaining of -AG, by solubility measurements. Some experimental values of AG, at 25°C for transfer of several hydrocarbons from different non-polar liquid to water are given in Table 4.2. Note that for transfer from CC14 to water, Eq. (4.3) is valid, instead of Eq. (4.4). As seen, the transfer free energy of hydrocarbons from pure non-polar liquid to water is positive. In the context of the considerations we have made above, these results are expected. Dissolving of hydrocarbons in water is an unfavourable process. Hence, the opposite process, association of hydrocarbons (formation of an ion drop) is favourable. The experiments also show that the entropy of transfer from non-polar liquid to water is negative. This is also in accord with the picture provided by the iceberg model: the dissolving of hydrocarbons leads to immobilisation of water molecules in hydrogen bond nets of clathrate structures. The negative values of AH, show that the process is enthalpically favourable, however, it cannot compensate the large decrease of the entropy. Table 4.2 Thermodynamic data of transfer of hydrocarbons from different organic solvents to water at 25°C. Note that the absolute temperature (T = 25 + 273.15) must be used in order to relate the thermodynamic quantities in the table (See Appendix A). Transfer of CH4 C2H6 C2H6 C5H12
C6H14
From -> To CC14—water CC14—water C2H6—water C5H12—water QH^—water
AH, (kcal/mol) -2.4 -1.7 0 -0.5 0
AS, (cal/mol/deg) -19 -18 -14 -25 -26
AG, (kcal/mol) 3.3 3.7 4.1 6.8 7.7
102
Introduction to Non-covalent Interactions in Proteins
Based on these data we can make some conclusions about the origin of hydrophobic interactions. The interactions which are characteristic of the molecules in the system are electrostatic interactions (also in hydrogen bonding) and dispersion forces (in van der Waals interactions and in hydrogen bonding). The association of the non-polar molecules, which is the essence of hydrophobic interactions, is a result of the tendency of the system to increase its entropy. Thus, hydrophobic interactions are an effect which results from the behaviour of the system, hence they are pseudo forces. There are no hydrophobic interactions between two molecules out of the context of a system. Therefore, it is correct to speak about hydrophobic effect, rather than about hydrophobic interactions. As far as the latter is commonly accepted, we will specify it as follows: the term "hydrophobic interactions" refers to the phenomenon of the tendency for association of non-polar molecules in aqueous medium. Our conclusions were made on the basis of data obtained at standard conditions. Both AH, and AS, increase with temperature. However, in the temperature interval we are interested in, namely the interval within which biological processes occur, the change of the free energy is both small and positive. This means that in this interval solubility of hydrocarbons slightly reduces with temperature. Accordingly, the hydrophobic effect slightly increases. 4.4 Hydrophobic Interactions in Proteins Due to the hydrophobic effect, the non-polar side chains avoid contact with water and tend to assemble close to each other. As a result, the polypeptide chain collapses so that the hydrophobic residues form the hydrophobic core of the protein molecule. It should be noted that this process occurs at certain conditions, such as appropriate temperature and pH, lack of denaturing co-solvents, etc. We assume that these conditions are fulfilled. A simplified presentation of a protein built by two types of amino acids, polar and non-polar (hydrophobic) is shown in Fig. 4.6. In the unfolded state of the protein molecule all amino acids are accessible to
Hydrophobic Interactions
103
the solvent, which, as usual, is water. Due to the low solubility in water the non-polar groups tend to collapse. Thus, the unfolded protein chain spontaneously folds so that the hydrophobic amino acid residues have minimum contact with water. The polar residues, on the contrary, are soluble in water, so they tend to stay on the protein surface, forming hydrogen bonds between themselves and with the surrounding water molecules. This organisation of the polar and hydrophobic amino acid residues in a folded protein is illustrated in the right hand side of Fig. 4.6. It resembles a clathrate structure around dissolved hydrocarbons. The elements of the "protein clathrate shells" are also parts of the molecule, ensuring in this way favourable interactions with the solvent, solubility and stability of the protein molecule. Here emerges the important role of the hydrogen bond networks, including amino acid side chains and water molecules we have considered in Section 3.3.4. One of the features of the hydrogen bond networks on the protein surface is the stabilisation of the polar shell insulating the hydrophobic core of the protein molecule. It should be noted that this picture is to a certain extent idealised. It would be incorrect to deem the hydrogen bond networks on the protein surface as a nutshell protecting its hydrophobic "kernel". A significant area of the protein/solvent interface is hydrophobic, as we shall see below. (A)
(B)
Figure 4.6 Simplified illustration of unfolded (A) and folded (B) protein molecule. Hydrophobic and polar amino acids are presented as circles and ellipses, respectively.
The picture will be incomplete if we do not mention the membrane proteins. The hydrophobic effect is clearly manifested in this class of proteins. The X-ray structures of membrane proteins show that the amino acid side chains which are in contact with the aliphatic moiety of the
104
Introduction to Non-covalent Interactions in Proteins
membrane are hydrophobic. The parts of the protein molecule protruding out of the membrane have the characteristics of the water soluble proteins: hydrophobic core surrounded by a shell of polar amino acid side chains. 4.4.1 Additivity of hydrophobic interactions It is desirable to have an expression that gives a quantitative measure of the magnitude of hydrophobic interactions in proteins. Because hydrophobic interactions appear as a result of the behaviour of the system, we need to investigate how the free energy of the system changes upon the formation of the hydrophobic core. That is, we need to evaluate the contribution of the hydrophobic interactions, AGh, to the free energy of transition of the system from state A to state B (panels A and B of Fig. 4.6, respectively). Direct measurements of AGh cannot be done. Measurements of the transition from state A (unfolded protein) to state B (folded protein), or vice versa, can be performed, however, the energy obtained by such experiments is the free energy of folding, AGU~^, or the free energy of unfolding, AG*""1, respectively. These quantities are not of interest at the moment. In order to estimate AGh we will use a model for which Eqs. (4.3) and (4.4) are applicable. Let us employ the approximation used by the evaluation of the role of the hydrogen bonds between the peptide groups in protein stability (Section 3.4.1). We assume that in the unfolded state the amino acid side chains are fully hydrated, whereas in folded state they are immersed in the protein interior and completely inaccessible to the solvent (water). As before, the protein interior is presented as a nonpolar material. This approximation is very rough, however, it allows us to use experimental data of solubility of amino acids in water and nonpolar solvents. The connection between solubility and energy of transfer from non-polar solvent to water is given by Eq. (4.4). The values of the energies of transfer, AG,, of several amino acids are listed in Table 4.3. The transfer energies of the individual amino acids are negative, reflecting the fact that they are soluble in water. Nozaki and Tanford5 have assumed that the free energy of transfer can be split into two additive parts: the free energy of transfer of the glycine and that of the
Hydrophobic Interactions
105
side chain. The latter is denoted by Ag, and is called hydrophobicity of the side chains. The hydrophobic side chains of the amino acids listed in the table have positive values of Ag, which corresponds to their expected low solubility in water. If the hypothesis for additivity of the free energy of transfer is valid, additivity should be applicable to any constituents of the amino acid, not only to the main chain and side chain parts. Indeed, the difference between the transfer energies of methane and ethane is equal to the difference between glycine and alanine: 0.73 kcal/mol. The difference in the chemical composition in both cases is just a CH3 group, indicating that the hypothesis for additivity holds. There are also other experimental observations supporting the hypothesis for additivity. On this basis we can partition the free energy of transfer of the individual amino acids and consider only these components which are involved in hydrophobic interactions. Table 4.3 Free energy of transfer AG,? and hydrophobicity Ag, for several amino acids. Amino acid Glycine Alanine Valine Leucine Isoleucine Phenylalanine Proline
AG, (kcal/mol) -4.63 -3.90 -2.94 -2.21 -1.69 -1.98 -2.09
Ag, (kcal/mol) 0 0.73 1.69 2.24 2.97 2.65 2.60
4.4.2 Solvent accessibility The above finding is appropriate for estimation of the hydrophobic interactions if the amino acids are fully immersed in the protein interior (folded state) or fully accessible to water (unfolded state). We know however that there are hydrophobic side chains in proteins which are not completely buried. For these cases, the model we are using is inadequate. We will refine it, using another important observation. As we have already noticed, oil drops in water, or assembly of hydrocarbons into aggregates, are accompanied by a reduction of the interface area between the solute and water. It is interesting to see whether there is a correlation between the observed tendency of
106
Introduction to Non-covalent Interactions in Proteins
reduction of solvent accessibility of the hydrocarbon aggregates and the magnitude of the hydrophobic interactions. A large number of experimental measurements of solubility and transfer energy of hydrocarbons with different lengths have convincingly shown that there is such a correlation. It turns out that hydrophobicity linearly depends on the solvent accessible surface of the hydrocarbon molecules. This allows us to introduce a specific quantity, Afh, corresponding to the transfer energy per unit solvent accessible area. The value of Afh is between 19 and 28 cal/mol/A2, depending on the estimates of the solvent accessible surface. 3.0
ile o leuO
2.5 |
2.0 O
ip ^
1.0 0.5 100
150
Solvent accessibility surface, A
200 2
Figure 4.7 Hydrophobicity versus solvent accessible area of the hydrophobic amino acids. Solid circles: data from Table 4.3; open circles: data from Fauchere and Pliska6.
This linearity is also observed for hydrophobicity of the amino acid side chains. In Fig. 4.7 the relation between Agt of the hydrophobic amino acid and their solvent accessible surface is given. The slope, Afh = 22 cal/mol/A2, falls within the range 1 9 - 2 8 cal/mol/A2 found for other hydrocarbons. Figure 4.7 also shows that data obtained by different experimental approaches can differ. This is clearly seen for the cases of proline and the couple leucine and isoleucine. This difference can be explained by the influence of the a-amino- and carboxyl groups, which under the experimental conditions are charged. The experiments (open circles in Fig. 4.7) performed with amino acids having these groups
Hydrophobic Interactions
107
blocked (acetyl-X-amide, where X is the amino acid side chain) eliminate this influence (see also Table 4.4 for comparison of Afh obtained by the two methods). Leucine and isoleucine differ in the position of branching of their side chains, and hence in its distance to the charged groups. After eliminating this influence the divergence in Ag, is reduced. A linear dependence of Agh on the solvent accessibility surface area is also observed for the side chains containing hydroxyl group. The slope in this case is Afh = 26 cal/mol/A2. Thus, one can conclude that the linear relation between Ag, and the solvent accessibility surface holds. Based on the results of the above analysis, we are now able to construct an expression relating Ag, and the solvent accessibility area. Table 4.4 Solvent accessibility7 in A2 and hydrophobicity5'6, Ag in kcal/mol of the amino acids. Residue ala arg asn asp cys gin glu giy his ile leu lys met phe pro ser thr trp tyr val
Total 113 241 158 151 140 189 183 85 194 182 180 211 204 218 143 122 146 259 229 160
Side chain 67 196 113 106 104 144 138
Non-polar 67 89 44 48 35 53 61
Polar
151 140 137 167 160 175 105 80 102 217 187 117
102 140 137 119 117 175 105 44 74 190 144 117
49
107 69 58 69 91 77
48 43
36 28 27 43
Ag5 0.5 0.0 0.5 -0.1 0.5 0.5 3.0 1.8 1.3 2.5 2.6 -0.3 0.4 3.4 2.3 1.5
Ag6 0.4 -1.4 -0.8 -1.1 2.1 -0.3 -0.9 0.2 2.5 2.3 -1.4 1.7 2.4 1.0 -0.1 0.4 3.1 1.3 1.7
We already know that the energy of transfer of amino acid side chains is additive and proportional to the surface area exposed to the solvent. This proportionality is linear with an average slope Afh, of about 24 cal/mol/A2. If we know the area,
108
Introduction to Non-covalent Interactions in Proteins
ASA = SA}h
-SA%,
of the hydrocarbon constituents of the side chains that becomes inaccessible to the solvent upon folding of the protein (the transition from state A to state B illustrated in Fig. 4.6), we can calculate AG using the simple relation AGh=AfhASAh.
(4.5)
h
Obviously ASAh < 0, so that AG < 0, in accordance with the fact that the burial of hydrophobic groups in the protein interior is a favourable process.
Figure 4.8 Solvent accessibility surface (left) and solvent contact surface (right).
The task that remains to be solved is the calculation of the solvent accessible area of the unfolded, SA^, and of the folded, SA^ , states of the protein. Solvent accessible surface is defined as the area described by the centre of a spherical solvent molecule, which rolls over the solute molecule (left hand side panel of Fig. 4.8). Solvent accessible surface is a purely geometrical term, therefore we are interested only in the size and mutual disposition of the atoms. Other properties, such as charge distribution or ability for hydrogen bonding, are ignored. In our case, the solvent molecule is water. Usually, the radius of the sphere representing the water molecule is taken to be 1.4 A. The shape of the solute molecule is determined by the van der Waals radii of the individual atoms (see end of Section 2.3 for definition). Solvent accessible surface should be
Hydrophobic Interactions
109
distinguished from the solvent contact surface. The latter is the area determined by the van der Waals envelope of the solute (right hand side panel of Fig. 4.8). Two issues should be borne in mind when SA is to be calculated. The first one concerns the van der Waals radii. There are no rigorous rules to follow when van der Waals radii are to be chosen. In Table 4.5 a few sets of van der Waals radii are given. One peculiarity of the radii listed in the table is that they are not explicit van der Waals radii. These radii are defined so that they take into account the hydrogen atoms bound to the "main" atom. For instance, the carbonyl oxygen in the data set of Getzoff has a radius of 1.40, whereas the hydroxyl oxygen atom is somewhat larger reflecting the presence of a bound hydrogen atom. The atoms with radii accounting for hydrogen atoms are called united atoms. United atoms are very useful because the majority of protein structural data do not contain information about the hydrogen atoms. Table 4.5 Van der Waals radii used for SA calculations. Carbon, not specified Tetrahedral carbon Trigonal carbon Nitrogen, not specified Tetrahedral nitrogen Trigonal nitrogen Oxygen, not specified Oxygen (carbonyl) Sulphur, not specified Sulphur (SH)
Ref. 8 1.80 1.70 (Ca) 1.80 1.80 1.55 1.80 1.52 1.8
Ref. 9 1.87 1.76 1.50 1.65 1.40 1.85
Ref. 10 2.00 1.86 (CH) 1.74 2.00 1.80 1.70 1.40 1.60 (OH) 1.80 1.85
Ref. 11 1.87
Ref. 12 2.0 1.7-1.86
1.65
1.40 1.85
2.0 1.5-1.7 1.4 1.5 (OH) 1.85 2.0
The second important point that should be noted is that SA is calculated for a fixed structure of the protein molecule. For that reason Lee and Richards8, the authors of the first algorithm for calculation of SA, called this quantity "static solvent accessibility". Usually, the fixed conformation is that of the protein crystalline state. In the unfolded state, the number of conformations that the main and the side chains can adopt is huge. According to the model of unfolded state we have accepted, the amino acid side chains are completely
110
Introduction to Non-covalent Interactions in Proteins
hydrated. This reduces the complexity arising from the large number of conformations, because a complete hydration corresponds to the extended conformations of both the amino acid side chains and the polypeptide backbone. Hence, the values of solvent accessibilities of the individual amino acids can be obtained by calculations based on a single conformation. Because the solvent accessibility of the side chains is of interest, the backbone chain is usually simulated by the tripeptide gly-X-gly in extended conformation, where X is an amino acid of a given type. This is another simplification of the task, because once calculated, the values of SA for the different types of amino acids can be tabulated (Table 4.4) and used to calculate SA^ for all proteins. The value of SA^ is just the sum of the solvent accessibility surfaces of the individual amino acid side chains according to the protein sequence. The solvent accessible area of the individual side chains in folded state depends on their environment and, of course, in order to calculate SA^ the three-dimensional structure of the protein should be known. The calculation of the solvent accessibility of the individual atoms in folded proteins can be performed using a very simple scheme. In Fig. 4.9, two atoms and their solvent accessible surface are shown as a pair of two co-centric spheres. The inner spheres represent the atoms, whereas the outer spheres their solvent accessible surfaces. The radius of sphere A, R = rvdw + rwater, is the sum of the van der Waals radius of the atom (the inner sphere) and the radius of the water molecule. The radius of sphere B is determined in the same way. The two radii can differ if the van der Waals radii of the atoms A and B differ. We can imagine that a large number of points are uniformly distributed on the surface of the outer spheres. To each point, a certain area of the sphere surface dSA = 47rR2/n belongs, where n is the total number of points distributed on one sphere. If the two atoms are at a distance at which a water molecule cannot be situated between them, the outer spheres overlap. All points on the overlapped hemispheres are then inaccessible to water. This gives a simple criterion for solvent inaccessibility (or alternatively, for solvent accessibility) of the points. Let us consider two points on the sphere B. If the distance between a point and the centre of the sphere A is
Hydrophobic Interactions
111
less than R the point is buried in the interior of the sphere A, hence inaccessible to the solvent. This is the case of point / from sphere B, for which dc_i < R. For this point SSA, = 0. Point j is accessible because dc_j > R and dSAj = 4flR2/n. The solvent accessible area of atom B is then SA B =]T5SA, . k
The same procedure can be applied for a set of large number of atoms, for instance the atoms of a protein with known three-dimensional structure. The only technical difference is that the criterion for solvent accessibility has to be checked for more than one neighbour. The total solvent accessibility is then the sum of accessibilities of the individual atoms.
Figure 4.9 Calculation of solvent accessibility.
4.4.3 Evaluation of hydrophobic interactions We already have all the tools needed to evaluate the energy contribution of hydrophobic interactions to the stability of a protein molecule. We shall make this evaluation using the molecule of human y-interferon as an example (Fig. 4.10). As seen from the figure, this molecule is a dimer forming two symmetrical domains.
II 2
Introduction to Non-COvalent Interactions in Proteins
Figure 4.10 Human y-interferon. Upper panel: Topology of the two subunits of the molecule. Subunit L and R are coloured in turquoise and brown, respectively. Each subunit contains six a-helices depicted as circles connected by non-helical segments. Lower panel: An alternative view of the molecule. Subunit L is illustrated as a cartoon drawing, whereas subunit R with full-space van der Waals spheres. The last helix of each subunit is rich in hydrophobic amino acids (pointed by the arrow) and is immersed in the hydrophobic core (marked with ellipse) of the other subunil. In this way the molecule is characterised by two domains (L and R) and two hydrophobic cores.
Taking into account the hydrophobic amino acids only (see Fig. 4.7), one calculates 5-4/ =786 A 2 for the first domain and SA[ =811 A 2 for the second domain. The difference between the solvent accessibility of the side chains is due to a small difference in their conformations. Although this difference is not relevant for the current evaluation, it is worth pointing out that we work with static solvent accessibilities. Static
Hydrophobic Interactions
113
solvent accessibility, or as one often reads in the literature, solvent accessibility, is sensitive to the conformation. The value of SAfr = 5202 A for the two domains is the same because it corresponds to a fully extended conformation of the backbone and the side chains. Applying Eq. (4.5) with Afh = 22 cal/mol/A2 (because we took into account only the amino acids with pure hydrocarbon side chains) for AG '(domain) we obtain -97.2 and -96.6 kcal/mol for the individual domains, respectively. The contribution of the burial of the pure hydrophobic side chains to the stability of the whole molecule is then the sum of the above two values: AGh = -193.8 kcal/mol. If we include in the calculations the hydrophobic constituents of all side chains, such as the aliphatic part of the lysines, and use the average value of Afh = 24 cal/mol/A2 we obtain for AGh a value of -405.6 kcal/mol. This result shows that the effect of burial of hydrophobic material in the protein interior, i.e. the contribution of hydrophobic interactions, is very large. It also shows that all amino acids but glycine contribute to the significant contribution of the hydrophobic interactions. The large value of AGh suggests that hydrophobic interactions should be the main contributor to the stabilisation of the native protein structure. Based on evaluations of AGh similar to that of our example, a common opinion has been formed fully supporting the conclusion of Kauzmann, namely that hydrophobic interactions are one of the driving forces of protein folding. This should be understood as a force driving the polypeptide chain to adopt those folds at which a hydrophobic core can be formed. Among these folds is the one we call native, the functionally active, three-dimensional structure of the protein molecule. The question about the interplay of the different interactions leading to this unique fold, i.e. the prediction of the three-dimensional structure encoded in the protein sequence, according to the Anfinsen's dogma, is still open. According to the model used for evaluation of the AGh, the unfolded state is assumed to be a fully extended conformation of the protein molecule with maximum solvent accessibility. This means that the values of AGh obtained on the basis of this model set the upper limit of the contribution of hydrophobic interactions. It however does not make the above conclusion less relevant. Unfolding experiments show that there is
114
Introduction to Non-covalent Interactions in Proteins
a significant increase of heat capacity upon unfolding which is explained by a large increase of the hydration (increase of the solvent accessibility) of the hydrophobic moiety of the unfolded protein. Hence, AGh is more likely less than, but yet close to, the magnitude estimated by this model. It is notable that in spite of the large value of AGh proteins have a relatively low, and in some cases, marginal stability. Thus for instance, the experimentally measured stability of human y-interferon is AG"~*f~-3 kcal/mol at pH 7. The reason for the low stability of this protein is not known. Usually, stability of proteins amount to values of AGu^f between -10 and -20 kcal/mol. Still, these values are essentially lower than AGh. We have seen that the assembly of hydrocarbons is driven by a favourable increase of the entropy of the system. This favourable entropy change is due to the release of water molecules from the clathrates upon assembly of the non-polar molecules, in this way increasing their degrees of freedom. The same applies for AGh, however we have to take into account an additional factor, namely the change of entropy arising from the reduction of the degrees of freedom of the polypeptide upon folding of the protein molecule. The entropy of a system in a given state is given by the expression (A.31, Appendix A) S = ~R^jPilnPi,
(4.6)
i
where Pt is the probability for the system to be in microscopic state i and R is the gas constant. The value of S is positive because lnP, < 0 when Pi< 1. Because the main contribution to the entropy change upon folding of the polypeptide arises from the loss of conformational degrees of freedom of the protein molecule, we will consider only this part of the entropy: the conformational entropy. In this case, the different microstates of the system become the different conformations of the polypeptide, including the conformations of the side chains, whilst Pt becomes the probability conformation / to be realised. The change of the conformational entropy upon folding is
K2hsLs-Konf
(4.7)
Hydrophobic Interactions
The evaluation of AS^J
115
is not an easy task. Nevertheless, in order to
get a feeling about the magnitude of its contribution we shall perform some calculations making a few simplifying assumptions. We shall assume that in denatured state all conformations have equal probability. This will reduce Eq. (4.6) to S = - / ? £ - l n - = / ? £ - l n L = .RlnL, . L L . L
(4.8)
where L is the number of conformations. Although the polypeptide chain is flexible and the combinations of the peptide angles
for the same protein. It is clear that if we change any of the parameters used to evaluate ASuc~^J we will have another value. Therefore, the above estimates should be taken as an example stressing the fact that conformational entropy change upon folding is unfavourable and significant in magnitude. Assessments of the contribution of AS"^1
have been made by
means of different models and analysis of experimental data. Different values can be found in the literature. Based on a comprehensive analysis of a large number of experimental data and model calculations, Lee et al.u have concluded that change of the configurational entropy is between 4.1 and 4.4 cal/(mol.K) per residue. The difference
116
Introduction to Non-covalent Interactions in Proteins
between the conformational and configurational entropy is that the latter contains a vibrational term. If we assume that the oscillations around any conformer are the same in all states, then the contribution of the vibrational term to ASCOnfigurationai cancels, so that we can consider the conformational and configurational entropies as equivalent. The assessment of the entropy change upon folding of our example protein using the value 4.4 cal/(mol.K) per residue gives TASuc2f =-391 kcal/mol. It follows from the above analysis that the significant reduction of the conformational entropy tends to compensate the favourable contribution of hydrophobic interactions. In this context the low stability of native proteins is not surprising. 4.4.4 Size of the hydrophobic core Often functionally active proteins often consist of dimers, trimers or higher multimeric aggregates which can be gigantic in size. Also, large single chain proteins have substructures which we refer to as domains. The individual domains are characterised by a hydrophobic core and are limited in size. Hence, the size of the hydrophobic core does not increase in parallel with the molecular mass, but rather two or more domains are formed, each characterised by a separate hydrophobic core. The approximate size of the domains forming a hydrophobic core can be estimated on the basis of the known three-dimensional protein structures using statistic methods. The size of one domain corresponds to a protein molecule with molecular weight between 10 and 25 kD. Often, the active sites of enzymes are situated in clefts or cavities between domains or subunits. However, not only enzymes form domains. Figure 4.10 illustrates that the two subunits of human y-interferon, which is not an enzyme, form separate hydrophobic cores. It has also been found that the gene-coding regions of some proteins are interrupted by non-coding sequences at positions connecting different domains. This suggests that domains can be independent folding units during the synthesis and that their formation is related to the stability of the protein molecule.
Hydrophobic Interactions
117
-40-35-
X
-30-
Buried area ASA, A
b
-25-20-15-10-5-
o-l
10
15
20
25
30
35
40
Molecular weight, kD Figure 4.11 Dependence of the buried surface area upon folding on the molecular weight of proteins9.To be consistent with the notations used in Eq. (4.5) the buried area is designated ASA which is always negative.
It can be easily estimated that the area buried upon folding increases with the size of the protein molecule (Fig. 4.11). This dependence does not suggest any apparent reasons for the formation of domains. To explore the hidden correlation between the burial of hydrophobic components in the protein interior and the formation of domains we shall employ the idea of optimisation of protein solvent interactions proposed by Spassov14. He proposed the ratio SA f
4 = SAuh
to be used as a criterion for the efficiency of the burial of hydrophobic material in folded proteins. As we have pointed out, SA^ can be calculated using tabulated values. It depends linearly on the molecular weight and is independent of the amino acid composition. The value of SA£ depends on the protein conformation. Thus §, appears to be a specific characteristic of a given protein molecule. By definition gh can have values between 0 and 1. The lower the values of <%h, the larger area
118
Introduction to Non-covalent Interactions in Proteins
of the hydrophobic moiety becomes inaccessible to the solvent in the folded state, hence the higher the efficiency of the burial. The dependence of §, on the molecular weight of proteins is given in Fig. 4.12. For proteins with molecular weight larger than 10 kD, the parameter %h asymptotically approaches a value of 0.2, being insensitive to the protein mass.
20 30 40 50 60 Molecular weight, kD Figure 4.12 The parameter §, calculated for a set of monomer proteins of different folding and functional classes versus their molecular weight (open circles)14. The line shows 4 calculated for different chain lengths of the protein aconitase, beginning from the N-terminus. The segments of the polypeptide chain corresponding to the different domains (Dl, D2, D3, and D4) of aconitase are marked with dashed lines.
Our understanding of the hydrophobic core was developed on the basis of the observed minimisation of the solvent accessibility surface. Following the results illustrated in Fig. 4.12, the minimum solvent accessible surface of hydrophobic moiety that can be achieved in proteins corresponds to t,h between 0.2 and 0.3. Proteins, for which the value of §, is above this approximate interval, tend to have not optimally minimised hydrophobic accessible surface. Such a tendency is clearly seen for proteins with molecular weight less than 10 kD. For these proteins the reduction of the molecular weight correlates with a sharp
Hydrophobic Interactions
119
increase of £,h. Hence, in order to develop an optimal hydrophobic core, the polypeptide chain should have an appropriate length to wrap the hydrophobic side chains and at the same time to adopt functionally active conformation. Probably this is the reason for the smallest enzymes known so far to have molecular weight around 10 kD. We can regard this size as the lower limit needed for combining two basic features: the formation of an optimal hydrophobic core and the creation of the distinct microenvironment where catalytic reactions take place. The upper limit of the domain size will be estimated by means of an example with the protein aconitase. In contrast to many other large proteins, which fold in a way that the distant segments along the polypeptide chain become neighbours in the three-dimensional structure and thus form domains, the fold of aconitase is arranged so that domains are formed by adjacent sequential segments. The three-dimensional structure showing the individual domains of this protein together with the segments of the polypeptide chain that belong to the individual domains are given in Fig. 4.13. We shall calculate the parameter §, for aconitase as a function of the polypeptide chain segment length. This can be done by calculation of partial values of £,h(ri) by consecutive addition of amino acids (n= 1,2, ... N, where N- 754 is the total number of amino acids) according to the sequence and the three-dimensional structure. In this way we simulate growth of the protein molecule beginning with the Nterminal amino acid. The results of these calculations are given in Fig. 4.12. Obviously, 4(1) ~ 1 because the protein consists of a single amino acid. With the increase of n, i.e. with adding amino acids to the polypeptide chain, the values of %h(n) steeply reduce. It can be noticed that the pattern of §,(«) fits well to the distribution of %h calculated for proteins with molecular weight less then 10 kD. This can be interpreted as a simulation of the developing of the protein hydrophobic core. Thus, in the context of our model calculations, one can say that small proteins are characterised by underdeveloped hydrophobic cores.
120
Introduction to Non-covaient
Domains: Sequence: |
Dl
D2
Interactions in Proteins
D3
D4
Figure 4.13 Full-space view of aconitasc. The individual domains are coloured according lo the palette shown at the bottom of the figure.
The evolution of $>(n) is characterised by three minima which coincide with the segments of the polypeptide chain connecting the individual domains. Accordingly, $,(«) at the three minima is very close to the average value corresponding to an optimal hydrophobic core. Hence, the individual domains are characterised by an optimal hydrophobic core typical for a single protein. The separations between the individual minima correspond to sequence segments with molecular weight between 10 and 25 kD. In terms of polypeptide chain length, these figures give 70-80 and 200-220 amino acid residues, respectively. These are the limits within which hydrophobic cores are defined.
Hydrophobic Interactions
121
4.4.5 Hydrophobic packing and packing defects In the previous section we have found a connection between the buried hydrophobic surface and the size of the domain. It does not however provide an explanation as to why domains have a limited size. In this section we will try to find an answer to this question. Experiments with synthetic proteins show that these molecules are flexible and have the properties of molten globule. Molten globule is a state of the protein molecule in which the secondary structural elements are present, however the three-dimensional structure is not maintained. One of the causes of this effect is the reduced spatial complementarity of the side chains in the protein interior. This leads to a reduction of the packing density of the internal side chains, and hence to a reduction of van der Waals interactions between the non-polar groups. We can relate this reduction with a reduction of the structural stability. Because the protein interior consists of groups, the prevailing majority of which form the hydrophobic core, one speaks about hydrophobic packing. The question arises as to whether the hydrophobic packing is also responsible for the limited size of the hydrophobic core, i.e. whether the hydrophobic packing reduces with the increase of the molecular weight of the protein. Assessments based on the analysis of the ratio solvent accessible surface area/volume do not give unambiguous answers. Therefore we choose another way to analyse this problem. The packing of the protein molecule can be evaluated by measurements and theoretical calculations of the protein partial specific volume, v°, which is reciprocal of the molecular density, p - l/v°. Partial specific volume is related with the molecular volume by N V M where Na is Avogadro's number, V is the molecular volume, and M is the mass of the molecule. Partial specific volume is measured in ml/g. If we use A3 for the molecular volume and Daltons for the molecular weight the above relation becomes
122
Introduction to Non-covalent Interactions in Proteins
v ° = 0.6023— W
(4.9) D
where W is the molecular weight of the protein. The molecular volume can be estimated from the three-dimensional structure of the protein molecule as a sum of three terms: V = Va+Vv+Vc,
(4.10)
where Va is the volume occupied by the protein atoms according to their van der Waals radii, Vv is the volume of voids (the room enveloped by the contact surface but not occupied by protein atoms), and Vc is the volume of the internal cavities. Internal cavity is defined as the room which is sufficiently large to accommodate at least one water molecule and is isolated from the bulk so that water molecules cannot enter or leave the cavity without overlapping with the protein atoms. According to this definition, cavities small enough not to be able to accommodate at least one water molecule are voids (see Fig. 4.14).
Figure 4.14 Cavities and voids in a protein molecule. V is the void volume belonging to the individual amino acids when the rest of the protein molecule is ignored.
For the purposes of our considerations the approximations made in the model are satisfying. However, it should be noted that estimates of V
Hydrophobic Interactions
123
based on geometrical consideration only do not take into account a number of factors influencing its value. For instance, because water is approximated as a sphere, the phenomena of reorganisation of the water molecules composing the first hydration shell are omitted. Also, possible flexibility of the protein molecule and the related changes of the void and cavity volumes are not taken into account. Temperature dependence of V, and hence of v° and p, cannot be explored with this model either. The only experimental input in this model is the three-dimensional structure of the protein molecule. Therefore, the calculated quantities best correspond to the experimental conditions at which the structure is obtained.
(B)
10
o
o 1
1
X
(,
°<
^
0
20
40
60
Molecular weight, kD
80
A-
0
1
1
1
r—
20
40
60
80
Molecular weight, kD
Figure 4.15 Molecular volumes (A) and void volumes (B) versus molecular weight of proteins.
The relation between molecular volume and molecular mass is linear (Fig. 4.15) and according to Eq. (4.9) the slope is proportional to the partial specific volume. The value of v° is 0.729 ml/g, which is the average measured for proteins. It follows that v° and p are independent
124
Introduction to Non-covalent Interactions in Proteins
of the size of the protein. Hence, the packing and the size of the protein seem not to be related. According to Eq. (4.10) the molecular volume is a sum of three terms which we shall explore separately. Obviously, Va is not related to the packing density because it is just the sum of the volumes of the van der Waals spheres. The void volume however is a component which amounts to 20-25% of the molecular volume and is related to the packing density. It consists of two parts. The first one is the volume which is not related to the packing. This is the volume, V, that is determined by the contact surface envelope of the individual amino acid residues, including the backbone. In other words, V is the sum of the void volume of the individual amino acids calculated by ignoring the rest of the protein. The remaining volume, AVV = VV — V, is the volume of the room enclosed by the amino acid when packed in the protein molecule. When increasing the packing, i.e. when increasing the number of atoms in a given volume, the void volume is reduced. In the terms of our considerations, this means that the atoms from different side chains come closer to each other and the number of interatomic contacts increases. In this way the contribution of van der Waals interactions increases. Thus, quantity AVJN (N is the number of atoms in the protein molecule), giving the void volume that belongs to one atom is related to the stability of the protein molecule. The calculations show that the dependence of AVV on the molecular weight is linear. We obtain a constant value for the void volume per atom, AVV/N= 2.33 A3 (Fig. 4.15B). It turns out that the increasing of the size of the protein is not related to a change of the packing. Each point from the graph shown in Fig. 4.16 is an average value. We know however that protein interior is not homogenous matter, so that different regions of the protein molecule may differ in packing. The third term in Eq. (4.10), the cavity volume, Vc, reflects this feature of proteins. According to the definition, cavities are internal room not occupied by protein atoms. In this context, cavities can be considered as packing defects. Cavities can be either unoccupied ("empty") or occupied by water. Voids do not contribute to stability because van der Waals interactions are short-range interactions. Burial of a polar compound into a non-polar medium, such as the water molecules and the hydrophobic
Hydrophobic Interactions
125
core, is an energetically unfavourable process. If a cavity is lined by polar atoms, its occupation by water molecule can have a stabilising effect. However, the experimental observations show that as a rule, the number of such cavities is low. Hence, in both cases cavities or packing defects can be considered as energetically unfavourable formations in the protein interior. 40-
(A)
o
£ 1000 o > > 500
u 0
20
40
60
80
Molecular weight, kD
oo O8B°O
0
00
o o
to o
B
o
Cavity number
°<
u> o
o
1500
m
(B)
1
1
1
20
40
60
80
Molecular weight, kD
Figure 4.16 Total cavity volume (A) and number of cavities (B) versus molecular weight.
In Fig. 4.16 the cavity volume and cavity number in proteins with different molecular weight are plotted. From geometrical point of view, the total cavity volume is negligibly small in comparison with the molecular volume. However, cavities are energetically significant and influence the stability of proteins. As seen from the figure, with the increasing of the molecular weight both cavity volume and cavity number increase. In other words, increasing protein size is accompanied by creation of energetically unfavourable packing defects. The destabilisation effect of cavity formation is difficult to assess. One of the most reliable approaches for evaluation of the energetics of creation of an internal cavity is the combination of site directed mutagenesis and stability measurements. The change of the free energy, AGcavity related with the creation of a cavity with a size enough to
126
Introduction to Non-covalent Interactions in Proteins
accommodate one CH3 group amounts to about 1.2 kcal/mol15'16. Other experimental investigations17 suggest an expression for the average value of AG,cavityAG (4.11) cavity =1.9 + 0.024rfVc kcal/mol, A3
where dVc=\ A
\
50-
o 40-
73 o M
O Q
30Q
20-
O ^
10-
jo
g ^
OOQ
o 0-
0
o
o \o o :
d
i
20
40
60
80
Molecular weight, kD Figure 4.17 Values of AGcavit versus molecular weight according to Eq. (4.11). Dashed lines indicate the approximate maximum molecular weight of a single domain.
The destabilising effect of the internal cavities is illustrated in Fig. 4.17. It is seen that the increase of the molecular weight is accompanied by the formation of energetically unfavourable packing defects. The energy of these packing defects for proteins with molecular weight close to the estimated maximum size of a domain is about 10 kcal/mol. This value is comparable to the total stability of proteins, so that further increase of the number of cavities could lead to unfolding of the protein. Hence, one can conclude that the main factor regulating the upper limit for domain formation is the tendency for formation of energetically unfavourable packing defects.
Hydrophobic Interactions
127
References 1. Kauzmann W, (1959) Some factors in the interpretation of protein denaturation. Adv. Protein Chem., 14: 1-63. 2. Frank HS and Wen WY, (1957) Structural aspects of ion-solvent interaction in aqueous solutions — a suggested picture of water structure. Discuss. Faraday Soc, 24: 133-140. 3. Jeffrey GA and Seanger W, (1991) Hydrogen Bonding in Biological Structures. Berlin, Heidelberg: Springer-Verlag. 4. Frank HS and Evans MW, (1945) Free volume and entropy in condensed systems. III. Entropy in binary liquid mixtures; partial molal entropy in dilute solutions; structure and thermodynamics in aqueous electrolytes. /. Chem. Phys., 13: 507-532. 5. Nozaki Y and Tanford C, (1971) The solubility of amino acids and two glycine peptides in aqueous ethanol and dioxane solutions. /. Biol. Chem., 246: 2211-2217. 6. Fauchere JL and Pliska V, (1983) Hydrophobic parameters-pi of amino-acid sidechains from the partitioning of n-acetyl-amino-acid amides. Eur. J. Med. Chem., 18: 369-375. 7. Miller S, Janin J, Lesk AM and Chothia C, (1987) Interior and surface of monomeric proteins. /. Mol. Biol, 196: 641-656. 8. Lee B and Richards FM, (1971) The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol, 55: 379^100. 9. Chothia C, (1975) Structural invariants in protein folding. Nature, 254: 304-308. 10. Kuhn LA, Swanson CA, Pique ME, Tainer JA and Getzoff ED, (1995) Atomic and residual hydrophilicity in the context of folded protein structure. Proteins, 23: 536-547. 11. Laskowski RA, (1995) SURFNET: A program for visualizing molecular surface, cavities, and molecular interactions. J. Mol. Graph., 13: 232-330. 12. Rashin AA, Iofin M and Honig B, (1986) Internal cavities and buried waters in globular proteins. Biochemistry, 25: 3619-3625. 13. Lee KH, Xie D, Freire E and Amzel LM, (1994) Estimation of changes in side chain configurational entropy in binding and folding: general methods and application to helix formation. Proteins, 20: 68-84. 14. Spassov VZ, Karshikoff AD and Ladenstein R, (1995) The optimization of protein solvent interactions. Thermostability and the role of hydrophobic and electrostatic interactions. Protein Sci., 4: 1516-1527. 15. Steif C, Hinz H-J and Cesareni G, (1995) Effects of cavity-creating mutations on conformational stability and structure of the dimeric 4-OC-helical protein ROP. Thermal infolding studies. Proteins, 23: 83-96.
128
Introduction to Non-covalent Interactions in Proteins
16. Diirr E and Jelesarov I, (2000) Thermodynamic analysis of cavity creating mutations in an engineered leucine zipper and energetics of glycerol-induced coiled coil stabilization. Biochemistry, 39: 4472-4482. 17. Eriksson AE, Baase WA, Zhang X-J, Heinz DW, Blaber M, Baldwin EP and Matthews BW, (1992) Response of a protein structure to cavity-creating mutations and its relation to the hydrophobic effect. Science, 255: 178-183.
Chapter 5
Electrostatic Interactions
The ubiquitous character of electrostatic interactions is manifested by the fact that electrostatic interactions always appear when charge separation takes place. We have derived the expression for the dispersion forces based on the assumption that the electrically neutral atoms or molecules interact through the dipoles caused by the displacements of electric charges [see Eq. (2.8)]. Also, the geometry of the hydrogen bonds is regulated by electrostatic interactions between the partial charges of the involved atoms. The important role of electrostatic interactions in proteins becomes evident at any pH-dependent property, pH-regulation of enzyme activity, acid and alkaline denaturation, protein substrate/ inhibitor interactions and many others. The correct understanding of any of these phenomena depends on the level of our predictions regarding electrostatic interactions. Although the theory of electrostatic interactions is well developed it faces difficulties when applied to proteins. One of the obstacles is the complexity of the protein molecule. Another difficulty arises from the fact that some charges in the charge multipole of the protein molecule are not a priori known. The values of these charges depend on the protonation state of the corresponding functional group, which on its side depend in electrostatic interactions. This problem will be approached in the next chapter. Here we will focus on the first obstacle, namely we will try to develop an approach for prediction and analysis of electrostatic interactions in proteins. Our first task will be to find a way to calculate the electrostatic potential created by the protein charges in an arbitrary point. We shall
129
130
Introduction to Non-covalent Interactions in Proteins
begin with the Debye-Hiickel theory because it is the basis on which the understanding of electrostatic interactions in proteins is build upon. 5.1 Debye-Hiickel Theory Any ionic solution in equilibrium is electrically neutral. Because of this neutrality, the dissolved ions are surrounded by ions with opposite charge hereafter referred to as counterions. Their distribution and the electrostatic potential around an ion in solution are the subject of the Debye-Hiickel theory. 5.1.1 Poisson-Boltzmann equation The basic assumptions of the Debye-Hiickel theory are the following: First, everything around an ion (called below central ion) is treated as non-structured, continuum medium. Second, the central ion is a sphere with a continuous charge uniformly distributed on its surface. We define it as surface charge a. This assumption implies that the system is spherically symmetrical. Third, there is a certain density of mobile ions, p, around the central ion, which follows the Boltzmann distribution law. Fourth, the only interactions between the ions are electrostatic in nature. In accordance with these assumptions one can build a model as shown in Fig. 5.1. Let us arbitrarily choose an ion, the central ion, and use its centre as the origin of the coordinate system. It is immersed in the medium of the solvent, which we take to be water. The central ion is presented as a sphere with radius R and dielectric constant £t. On the surface of the sphere there is uniformly distributed charge a = qlA%R2. We call this part of the space Zone I. Zone I is surrounded by a spherical segment determining the minimum distance to which the mobile charges from the bulk can approach the central ion. This segment, defined as Zone II, reflects the fact that ions have a finite size. The radius of the segment we denote with a. The dielectric constant in Zone II is that of the bulk solvent, £>. The rest of the space, Zone III, is the medium of the
Electrostatic Interactions
131
solvent, which is characterised with dielectric constant ^ and a certain distribution of charge density p(r).
Ill Figure 5.1 Model and parameters of the Debye-Hiickel theory.
The value of p(r) is defined as the sum of the charge of all ions that reside in a certain volume element. Because p(r) is assumed to be spherically symmetrical, we can work with p(r), where r is the distance between the central ion and the position of the volume element. Also, due to the assumption for continuity, p{r) is continuous function of r in the whole region where it is defined, i.e. in Zone III. The total charge in Zone III is oo
\\7tr1 p{r)dr - -q a
which is an expression of the electrical neutrality. The electrostatic potential, (p, in the different zones is given by the following expressions: V2p = 0
(5.1)
&2
(5.2)
for Zone I and Zone II, and
for Zone III. The above expressions are differential equations of the electrostatic potential
132
Introduction to Non-covalent Interactions in Proteins
operator is commonly used because it makes the writing of differential equations shorter and more elegant (see also Appendix C). The permittivity of the medium can be expressed as £ - £o£r, where £0= 8.85 xl0~ 1 2 F/m is the permittivity of vacuum. The quantity £r is called relative dielectric constant (see also Section 5.3.1). It is a dimensionless quantity relating the permittivity of a given material to that of vacuum £
r
£ = - .
For vacuum £r=l, whereas for water at 25°C £ = 78.3. Often, in the literature £r is given without specifying that it is the relative dielectric constant. Because it is more convenient to work with the relative dielectric constant, we will write Eq. (5.2) in the following way £Q£sV2
(5.3)
where we have substituted £r with £s to denote that this is the relative dielectric constant of the solvent. In order to be able to solve the equation of Poisson, we have to find an expression for p(r). According to the Third assumption of the DebyeHiickel theory, the charge distribution around the central ion follows the Boltzmann distribution law. This means that the number of ions in a given volume element at a distance r from the central ion will be N+=Ne-e°«r)/kT
(5.4a)
N_=Nee°*r),kr
(5.4b)
for positive charges and
for negative charges. In the above equations, N is the number of ions of a given kind when the volume element is sufficiently far from the central ion (i.e. when r —> GO). According to the assumption that the only interactions in the system are electrostatic, the energy change associated with moving a unity charge e0 from infinity (r —> 00) to a finite distance to the central ion, r, is E = e0
Electrostatic
Interactions
133
of two components. The first one is the potential created by the central ion, while the second one is the potential created by the mobile ions present in its vicinity. Depending on the sign of q, the positive and negative mobile ions are repulsed or attracted by the central ion, which results in different densities of the different species. This difference in density produces an electrostatic potential which is superposed to that of the central ion. Obviously, at r —*• oo, where the electric potential of the central ion tends to zero, no difference in density of the mobile ions exists, so that ^(oo) = 0. Then, N+-N- = N. The total number of ions is 2N. The charge density at given point can now be expressed as p(r) = e0N+
-e0N_=eQN(e
-e0
e0
(5.5)
To simplify the above equation, we will make use of the series expansion 2
3
4
e =1 + - + — + — + — . . . , 1 2! 3! 4! which can be found in any mathematics handbook. Substituting the exponential terms in Eq. (5.5) with this series expansion, after simple arithmetic one obtains p(r) = -2Nec
^
ep
kT
1 5!
^e^(r)^5
+ •
kT
+ ...
(5.6)
This is an expression for the charge density which can be substituted in the Poisson equation, Eq. (5.3): 2Nen V> =£ £
0s
\
kT
3! KkT.
+-
5
kT
Because p(r) obeys the Boltzmann distribution law the above equation is called Poisson-Boltzmann equation. Another, shorter writing of this equation, is Si- s i n h ( ^ ) .
V> = £ £
0s
kT
This however does not reduce its complexity.
Introduction to Non-covalent Interactions in Proteins
134
In order to simplify the Poisson-Boltzmann equation we will analyse the properties of the series expansion in the square brackets. Let us calculate the first three terms separately and compare their values. In Fig. 5.2 the values of the first term (e^qikT) are set on the abscissa. Using these values one can calculate the second and the third terms and plot them against the first term (right and left ordinates, respectively). It can be seen that for values of e0(plkT< 1 the higher order terms are very small. For instance, if the energy needed to put an ion at a certain distance from the central ion is comparable to the energy of thermal motion, e0
2Aten
e0eskT
•
•
(5.7)
This form of the Poisson-Boltzmann equation is called linearised Poisson-Boltzmann equation.
(e0(p/kT)3/3\ en 2 CO
P
(eo
f
Si-
e0(p/kT Figure 5.2 Comparison of the terms in the series expansion in the right hand side of Eq. (5.6).
The condition at which the linearised Poisson-Boltzmann equation is valid requires counterions to be in a weak electrostatic field. This can be
Electrostatic Interactions
135
fulfilled if the distance between the ions is sufficiently large, which in turn corresponds to a low concentration of the ions or, as we shall see below, to a low ionic strength. The linearised Poisson-Boltzmann equation is the basic one in the Debye-Hiickel theory because it contains the information about electrostatic potential and the charge distribution around an ion in solution. For simplicity, we will speak about PoissbnBoltzmarin equation implying its linearised form. 5.1.2 Parameter ofDebye Equation (5.7) is still unhandy because it contains the quantity N which is not specified. Therefore, we have to discard it. This is a straightforward task, taking into account the meaning of N. The molar concentration of the ions of kind / is C, =1000 Na with Nt equal to the number of ions of kind i. The multiplier 1000 is to take into account that molar concentration is given per liter instead of per cubic meter, which is the standard measure of volume. The ionic strength is defined as
i
where z, is the elementary charge number of ion /. We will combine these two equations taking into account that in the model used here the ions have elementary charges ±1, i.e. z? = 1 . Also, we should realise that 2N = JLN(. For the connection between N and I we obtain N NJ 7 = 1000 N or N = 1000 - a a Substituting the obtained expression for N in the multiplier of
2N
°**
\m0eQeskT
I.
(5.8)
Introduction
136
to Non-covalent
Interactions in Proteins
In this way we have obtained another, more elegant expression of the Poisson-Boltzmann equation: V2
(5.9)
The quantity ffis the parameter of Debye. It is interesting to see what the dimension of the parameter /ris. For this purpose, one only needs to substitute the quantities in Eq. (5.8) with their dimensions. Before this, we will manipulate it by multiplying the ratio e02/(£o£s) by rlr, where r is an arbitrary distance: K =
\112
2A^o I000e0e skT
r
~
sllir
1000
9
-
el •L.±.N. V £0£s
\
U 2
•I
r kT
The purpose of this manipulation is to obtain the familiar Coulomb's law and in this way to simplify our work. Substituting now the individual terms with their dimensions we obtain: All
el
r ••
£o£sr
m- J
1 •N-I kT
J xl/2
1_ 1 mol J mol m j
An • m
Km2
j
The Debye parameter has dimension m~\ The measure in metres is inconvenient for the size of the objects it has to be applied, therefore one uses Angstrom instead, 1A = 10~10m, or nanometer, 1 nm = 10~9m. Angstrom will be used in all following considerations. The reciprocal of the Debye parameter can be seen in the literature as Debye radius or screening parameter. It is a measure of the size of the counterion cloud screening the central ion. The higher the ionic strength, the smaller the Debye radius and the average distance between the central ion and a counterion. Here we can also see the connection between the ionic strength and the limits of the Debye-Huckel theory discussed after the derivation of Eq. (5.7). The Debye parameter is a characteristic of the solvent. It depends on temperature via kT and es. The temperature dependence of the relative dielectric constant of solvent (water), es, is plotted in Fig. 5.3. At a given
Electrostatic Interactions
137
temperature, the Debye parameter depends only on the ionic strength. If we substitute the dielectric constant, es, with its value for water at 25 °C and substitute all quantities independent of the ionic strength with their values, we obtain the simple formula £" = 0. 33V7 A"1. At 100°C the relative dielectric constant of water is es ~ 55 which gives for the Debye 1 parameter K = Q. 37V7 A" . 90 85 80
es
75 70 65 60 55 20
40
60
80
100
Temperature, °C Figure 5.3 Temperature dependence of £s of water. Note that the temperature is given in °C. To perform calculations with Eqs. (5.8) and (5.16) the temperature has to be transformed in absolute temperature.
5.1.3 The electrostatic potential of an ion in solution The complete derivation of the expressions for the electrostatic potential of an ion in solution is given in Appendix C. Here we will make a short discussion of its properties in the different zones. The solutions of the Eqs. (5.1) and (5.9) in the different zones (see Fig. 5.1) are Zone I:
q 47t£QEsR
„ (1
KR —), l+ m
(5.10a)
138
Introduction to Non-covalent Interactions in Proteins
Zone II:
AnBQE.r
Zone III:
(1-
(5.10b)
),
1+ m K(a-r)
q
(5.10c)
47TE0£sr
\+ m
As seen, et is not present in any of the Expressions (5.10), meaning that the electrostatic potential is independent of the dielectric constant inside the central ion. This is a result of the second assumption, namely that the charge is continuously and uniformly distributed on surface of the spherical ion. We should note in advance that this assumption is too strong for protein, where the charge distribution is far from uniform. As we shall see, it does not mean that the Debye-Hiickel theory is not relevant for the analysis of electrostatic interactions in proteins. 120
/'.: / /
100
///
80 m V 60 40 H 20
0
o
10
15
20
r,k Figure 5.4 The potential,
At ionic strength zero the Debye parameter K- 0, thus Eqs. (5.10) takes on the form of the Coulomb's law. This is graphically illustrated in Fig. 5.4. Equation (5.10a) gives a constant value for the potential inside
Electrostatic Interactions
139
the central ion and is equivalent to the electrostatic potential at a distance R created by a point charge located at the centre of the ion. When / > 0, the potential in Zone I is reduced by subtracting from the Coulomb's term a fraction q 47t£0esR
KR \ + m'
which reflects the influence of the counterions via the Debye parameter. The larger the ionic strength, respectively K, the larger the subtracted fraction. Equations. (5.10b) and (5.10c) are read in the same way. At / = 0, the two equations are identical and equivalent to the Coulomb's law. At non-zero ionic strength, the electrostatic potential reduces due to the screening effect of the counterions, the measure of which is the Debye parameter. In this case the two expressions differ as the electrostatic potential in Zone II reduces more strongly with r in comparison with that in Zone III. 5.1.4 Extension for proteins We can directly employ the ideas of the Debye-Hiickel model for proteins. For instance, the protein molecule can be represented as a sphere with low dielectric constant immersed in the medium of the solvent, the latter being characterised with high dielectric constant. In this way a dielectric cavity is formed, giving the name of the model: dielectric cavity model. The charges of the protein can be considered as uniformly distributed on the surface of this sphere. Such a model was first proposed by Linderstr0m-Lang1 for calculation of protein titration curves (for titration curves see next chapter). An extended theory of the spherical dielectric cavity model has been proposed by Kirkwood2 and later adopted for proteins by Tanford and Kirkwood3. The fundamental difference between this and the DebyeHiickel model is that the charges on the surface of the dielectric cavity are represented as point charges with fixed spatial coordinates. The solution of the electrostatic potential for such a system is based on the same ideology as in the Debye-Hiickel theory, presented in some more detail in Appendix C; however, the lack of spherical symmetry makes it
140
Introduction to Non-covalent Interactions in Proteins
more complicated. We will skip studying this model because nowadays it has limited practical use. Instead, in Section 5.3 we shall consider a more general model which assumes dielectric cavity with an arbitrary shape. 5.2 Ion-Solvent Interactions It was shown in the previous chapter that non-polar constituents of the amino acid side chains tend to minimise contact to the solvent (water), thus forming a hydrophobic core. In this way polar and charged groups remain preferably on the surface of the protein molecule. It is interesting to see whether the expelling of polar and charged groups from the protein interior depend on the hydrophobic effect alone or there are other forces favouring this process. Consider NaCl, a compound with a partial ionic character of the covalent bond of about 75% (see also Fig. 3.2). In water NaCl spontaneously dissociates to Na+ and Cl~. The electrostatic field created by each of the ions causes reorientation of the surrounding water molecules. Their permanent dipoles take up a direction towards the ion, thus forming an organised structure tending to compensate its electrostatic field. This organised structure is called hydration shell. The formation of hydration shell is energetically favourable and drives the dissociation of the salt molecules. In the continuum model used in the previous section the compensation of the electrostatic field by the solvent molecules is accounted for by the dielectric constant, es. Solvents with different dielectric constants have a different compensatory effect. In the following discussion we keep on using this model, thus replacing properties of the solvent with its dielectric constant. 5.2.1 Born model We are interested in finding an expression allowing evaluation of the ion-solvent interactions. For this purpose we will use the Born model. This model is based on the assumptions of the Debye-Huckel theory, hence it is a continuum model.
Electrostatic Interactions
141
vacuum qJO)< •'» charging
AG solv
^•Jmass
[q^^mq=o solvent
Figure 5.5 Thermodynamic cycle for calculation of transfer of a charged sphere from vacuum to a solvent.
The dissociation of a salt molecule, such as NaCl, can be considered as a process of creation of charges. Where these charges come from is not important here, so that the description of their appearance is a matter of convenience. In the Born model it is assumed that the charges are transferred from vacuum to the medium of the solvent. The energy of this transfer is then the ion-solvent interaction energy. It can be obtained by means of the thermodynamic cycle shown in Fig. 5.5. The processes involved in this cycle are the following: The anti-clockwise direction begins with the process of charging of a sphere in vacuum then the charged sphere (the ion) is transferred from vacuum to the solvent medium. The clockwise direction of the cycle describes the process of transfer of the uncharged sphere from vacuum to the solvent medium and consequently, charging the sphere in the medium of the solvent. The two sides of the cycle are by definition equivalent (see Appendix A and Fig. A.l): W + AGsolv = AGmass + W, where AGsoiv is the ion-solvent interaction energy or the solvation energy, AGmass is the energy of transfer of the uncharged sphere, whereas W° and W the work needed to charge the sphere in vacuum and in solvent environment, respectively. All quantities have the meaning of free
142
Introduction to Non-covalent Interactions in Proteins
energies. Provided that only electrostatic interactions are relevant, AG,mass can be set to zero (which is an approximation). For AGsoiv one obtains: AGsolv = W-W°. (5.11) The quantities W° and W can be obtained by y
charging
J V**<7 >
0
where the electrostatic potential #>is given by Eq. (5.10a). Because we want to extract the ion-solvent interactions only, we set the ionic strength to be equal to zero. After performing the integration of the above equation for W° and W one obtains w
o='f_?_^ J
0
=
4^0/?
I_?l_
(512a)
2 4xe0R
and
w=[
?
^JtEQesR
dq=-
2
1
2 4xe0esR
,
(5.12b)
respectively. The quantities W° and W are called self energies. As seen, the self energy is always positive and increases with the decrease of the radius R. For a point charge it becomes infinitely large. Substituting the expressions for the self energies in Eq. (5.10) and performing a simple arithmetic for the solvation energy one obtains
AG
2
soiv=-~—a—). %7tEQR
,
(5.i3)
£s
This is the Bom formula for the interaction energy between the solvent and a single ion. It is important to note the minus sign in the right hand side of Eq. (5.13). All quantities on this side of the equation, including (l-l/£0, are positive, meaning that zlG so , v <0. The ion-solvent interactions favour the solvation of ions. We are interested in the solvation energy of a single ion, which is given by Eq. (5.13). However, the experimental determination of the solvation energy is made on the basis of measurements of the solvation enthalpy of salts. If this is NaCl, an equal number of positive and
Electrostatic Interactions
143
negative ions are added to the solution thus maintaining its electrical neutrality. The experimental record will then contain information for two species, which is not modelled by Eq. (5.13). In order to approach this problem, one can measure the solvation enthalpies of different salts which have a common ion (NaCl and KCl, for example). The observed difference in the solvation enthalpies is then assumed to arise from the different ions (Na and K in the above example). As reference values of the solvation enthalpies, one can use the solvation data for salts whose ions share equally the solvation enthalpy. According to Eq. (5.16) ions with equal radii have equal solvation enthalpies. This is the case of KF, for which the X-ray structural analysis suggests equal radii of the K+ and F~ ions. The connection of the Born model and Eq. (5.13) with the experimentally measured solvation enthalpy is straightforward. The fundamental thermodynamic equation (see Appendix A) AHsolv=AGsolv+TASsolv
(5.14)
connects the measured AHso[v and the wanted quantity AGso[v. In order to obtain this connection we need just to rearrange Eq. (5.14). The entropy of solvation is obtained by the relation dAG
AS SOlV
solv -\rp
In Eq. (5.13) only es depends on T. Thus ^ ^ - ^ - ^ T ^ . (5-15) solv &zeQR£s2 dT The temperature dependence of the relative dielectric constant of water is illustrated in Fig. 5.3. As a first approximation it can be assumed de as linear, so that — - becomes constant, independent of T. Substituting dT Eqs. (5.13) and (5.15) in Eq. (5.14) one finally obtains
" , -"-r4 (1 -7-TiF>%ne0R
es
e; oT
(5 16)
-
144
Introduction to Non-covalent Interactions in Proteins
As a rule, the calculated values using Eq. (5.16) are somewhat larger than the experimental one. One can point out a few factors responsible for this discrepancy. First, the value of R can be too small, i.e. the radii of the ions observed in ionic crystals are smaller than those of the dissolved ions. Second, AGmms can be other than zero as it is assumed in the model. Third, the dielectric constant around the ion can be lower than that in the bulk due to non-linear effects. There are approaches that account to a certain extent for these effects, which are beyond the scope of our considerations. 5.2.2 Application of the Born model for proteins: why do charges tend to be on protein surface? The outcome of the Born model gives the answer to the question posed at the beginning of this section. The expelling of charged groups from the protein interior is an energetically favourable process. We shall investigate this process by analysing the burial of a charged group in the protein interior upon folding. In order to employ the Born theory directly we approximate the charged group to a sphere with radius R and charge evenly distributed on its surface. Let us make use of the model utilised for evaluation of the energetics of hydrogen bonds between the peptide groups in proteins (Section 3.4.1). Following the logic of this model, we assume that in unfolded state the charged groups are fully hydrated. This means that the charged sphere is entirely immersed in the medium of the solvent with dielectric constant £s. It is clear that such an assumption ignores the fact that even in a fully hydrated polypeptide chain the charged groups are not surrounded by homogenous material, as it is required in the Born theory. We will discuss this matter later in this section and in the next chapter. Here, we abide by this model to benefit from its simplicity. Assume that upon folding the charged groups become buried in the protein interior, which has a dielectric constant ep < es. This process we approximate as a transfer of a charged sphere from medium with dielectric constant es to a medium with dielectric constant £p. The energy of this transfer is obtained by Eq. (5.11):
Electrostatic Interactions
145
where 2 A7teQspR and 2 4x£0£sR are the self energies of the charged group in protein interior and in solvent, respectively. The energy ^ B o
r a
= ^ ( ^ - f ) %KE0K
£
(5-17)
£s
Because
(-!—L)>o. ^GBOITI >
0. It follows that the burial of a charged group in the protein interior is energetically unfavourable. For simplicity, the derivation of the expression for W5 was made for / = 0. It can be easily shown that the above result holds for / ^ 0, as well. The quantity ^GBorn is often called desolvation energy although it would be better to call it dehydration energy as far as the solvent is water. It is.also called Born energy, a name which we shall use from now on. Also, because AG^om is always positive and therefore appears to be a destabilising factor, one speaks about desolvation penalty. The above model can be further developed by representing the protein molecule as a sphere with low dielectric constant. We have already mentioned this presentment in Section 5.1.4 when we introduced the cavity model of Tanford and Kirkwood. This model is worth mentioning here, too, because it directly demonstrates that burying a charge in the protein interior is related to the increase of the electrostatic energy, i.e. the burial of a charge is energetically very costly. This theoretical observation remained unattended because charge-charge
146
Introduction to Non-covalent Interactions in Proteins
interaction calculations were a satisfactory basis for the interpretation of the experimental data available at the time. In several fundamental works4-6, Warshel et al. has recalled the phenomenon of desolvation by showing that the penetration of a charged sphere in the protein interior, represented as a spherical dielectric cavity, is about 35 kcal/mol. Warshel et al. stressed the fact that the high energy cost to bring a charge in the protein is more significant for the correct evaluation of electrostatic interactions in proteins than the charge-charge interactions between the ionisable groups. In the following discussions (see Chapter 6) we will have the opportunity to convince ourselves in the rightness of this statement. Summarising the above considerations, it becomes clear that due to the positive Born energy, the atoms or the groups having non-zero net charge prefer to be on the surface of the protein molecule, in this way reducing the desolvation penalty. In the context of the question posed at the beginning of this section we can conclude the following: The desolvation of the charges upon folding is unfavourable, i.e. it is a factor destabilising folded proteins. However, together with the hydrophobic effect, we see that the tendency of the non-polar constituents to form a hydrophobic core is parallel to the tendency of minimising the desolvation penalty by exposure of charged groups to the solvent. 5.2.3 Generalised Born theory for proteins Let us consider a system of charges as shown in Fig. 5.6. The energy needed to situate a charge % in the electrostatic field of the charge constellation (q\, q2, ... ,-, ...) at position r is Wj{v) = qj
is the sum of the electrostatic potential of the individual charges qt in the system. Wfr) can be written then as
Wj(r) = qJY,
Electrostatic Interactions
147
The total electrostatic energy of the system is the sum of the energy needed to put all charges on their positions:
W = Xwy(r;) = i Z ^ Z ^ ( r ; ) -
(5-18)
L J J i In the above summation the index pairs i,j and j , i are equivalent, referring to the same interactions between charges qt and g> Multiplying by 1/2 the double account for these interactions is corrected. The electrostatic potential $(r ; ) is created by charge qt at the point with space coordinates r, (where the charge qj resides).
oo
©#l
©
Equation (5.18) can also be written as
W
r + -^I^;( ;) 2^Z'^( r i)2^ J
J
J
( 5 - 19 )
The first sum in the above equation collects the self energies of the charges in the system. Indeed, if we rewrite Eqs. (5.12) for the general case of / ^ 0, we obtain ? , \\ q „ W= f\
«R )dq M I2 „ = 1 (1 l+ m 2 47re0£sR
«R x) . l+ m
Comparing this equation with Eq. (5.10a) we can write it in a more general and simple way: W =
-qP!,
148
Introduction to Non-covalent Interactions in Proteins
where (pi is given by Eq. (5.10a). The potential (pj is independent of the space coordinates. The second sum in Eq. (5.19) contains only the terms with different indexes, i £j are taken, which is denoted by E'. The potential $(r,) is calculated by Eq. (5.10c). We can use Eq. (5.19) for calculation of electrostatic interactions in a protein molecule. For this purpose we need to know their values and their positions in space. Suppose that this information is available. The electrostatic interaction energy is given by AGd
^ Q j ^ ^ ^
+ ^qjvfirj).
(5.20)
The first term in the right hand side of the above expression is the sum for the desolvation energies of the individual charges. As written, this term does not depend on the space coordinates of the charges. The potential $P(r,) can be different for the different charges, because they can be in different dielectric environments in the protein molecule. The latter is determined by the configuration of the protein atoms that form the dielectric cavity and in this context (pj depends on the position of atom j in the protein molecule. If we know how to calculate the electrostatic potential (pf(rj) for an arbitrary situation of the protein atoms, we will be able to predict the exact desolvation energy. For the moment we have in disposition only Eq. (5.10a) which is valid for homogenous dielectric environment. We will introduce some modification of the Born model to enable us to apply it for complicated systems, such as proteins. A method that takes into account an arbitrary shaped dielectric cavity will be considered in Section 5.3. Before getting started with the adaptation of the Born model to proteins, a few words should be said about the second sum of Eq. (5.20). It gives the electrostatic interaction energy between the charges in the protein molecule only, assuming that in solvent environment they do not interact. Thus, Eq. (5.20) is an expression of the work to bring the charges from infinity to their positions in the protein molecule. As we shall see in the next chapter, this formulation is not appropriate for all charges. If we are interested in the energetics of a group with a net charge zero, for instance a carboxyl group in its protonated state, the electrostatic interactions between the partial charges within the group
Electrostatic Interactions
149
should be taken into account in both states: the group in solvent and the group on its site in the protein molecule. In this way we take into account the influence of the dielectric medium on the charge separation that takes place in the group of interest.
- * «
1 \A^
Figure 5.7 Charged spheres in a dielectric cavity with an arbitrary form, representing the protein molecule. The ion exclusion radii are also shown (see Fig. 5.1). The arrows indicate bringing of the charged spheres from infinity to their positions in the protein.
As we have pointed out the electrostatic potential (pj depends on the environment of the individual charges. This is a serious obstacle for applying Eqs. (5.10). To overcome this difficulty a method can be used known as generalised Born model. The generalised Born model has first been developed to improve the estimates of solvation free energy for small molecules7. Let us consider a protein molecule with embedded charged spheres in its moiety (Fig. 5.7), each characterised by a radius Rj and a value of the charge qj. To obtain the electrostatic energy we can rewrite Eq. (5.17), substituting the potential
J G - = _!_ 5 : _»!«i_ ( ±-J. ) l
150
Introduction to Non-covalent Interactions in Proteins
where we have set 7 = 0. The essential modification in the above equation is the introduction of the function fosiTij)- This function must be such that in case of i=j it has the meaning of Rj, whereas in case of i^j becomes the distance between the interacting charges. The introduction of function / G B changes the meaning of the radii Rj. They have to be treated here as effective radii reflecting the dielectric environment of each individual charge. In contrast to the Debye-Huckel and Born theories, here one and the same atom species can have different radii if the dielectric environment differs. These radii are called effective Born radii and we designate them with R*. The most common form of the effective Born radius is +R;iR;ej /GB ;,,) = r? i,j s
''•''*"•"'
At long distances, where r,j » Rt*,Rj*, /GB(^J) = fy and electrostatic interactions are given by Coulomb's law. It is easy to see that the other extreme case, where r,; = 0, corresponds to the desolvation energy because i =j and R,* = Rj* and/GB(0) = Rj*. In the cases between the two extremes the estimates of the effective Born radii are essential. Onufriev et al. have deduced an expression for Rj*: R".
R,
J •
J
Hn }
r'
solute r>R:
where Rj is the van der Waals radius atom j , and the integration is over the solute volume excluding the volume of the atom j . If the solute consists of a single atom the integral in the above expression is numerically zero because formally r = Rj. In this case Rj* = Rj. In the generalised Born model, the ionic strength is taken into account by the substitution 1 F
r
K
e -fGB( i,j) >— F
One of the main advantages of the generalised Born model is the computational simplicity. We are not going to further analyse its advantages and limitations, since this model is in a stage of development
Electrostatic Interactions
151
and is continuously improved. Instead, in the next section we will consider an approach that takes into account the actual shape of the dielectric cavity and uses a minimum number of parameters. 5.3 Calculation of Electrostatic Interactions in Proteins Until now, for the calculations of electrostatic interactions we have used the analytical solution of the potential,
152
Introduction to Non-covalent Interactions in Proteins
Figure 5.8 Reaction of a non-polar molecule (A) and a polar molecule with a permanent dipole moment \Lperm (B) in external electrostatic field. (X= is the average component of a permanent dipole along the external electrostatic field E. Panel C illustrates the reorientation of the dipoles of a dielectric material producing an electrostatic field E r opposite to the external one.
The relative dielectric constant is a measure of the reduction of the electrostatic field caused by the reaction of the dielectric material and can be defined as: |E| „ _ l 'vacuum r ~ IFI l l dielectric
where ^.dielectric = E - Er. If the material consists of non-polar molecules (Fig. 5.8A), its reaction to an external electrostatic field is realised by the formation of induced dipoles only (electronic and atomic polarisation). The electronic polarisation is the shift of the electron clouds caused by the electrostatic field acting on the molecule, whereas the atomic polarisation results from a shift of the intramolecular vibrations. The magnitude of E r in both cases is low, which leads to a low value of er. If the material contains polar groups (Fig. 5.8B) the reorientation of their
Electrostatic Interactions
153
dipoles leads to a stronger reaction field and consequently to a higher value of the relative dielectric constant. We are not going to discuss the phenomena of reaction of the dielectrics material in detail. For the purposes of our further considerations it is enough to realise that E r is an average field resulting from the average alignment of the dipoles (induced and permanent) of dielectric material. This makes er a macroscopic characteristic reflecting the collective reaction of the molecules of the dielectric. In a scale comparable to the size of the individual molecules, hereafter referred to as microscopic level, this quantity cannot be rigorously defined. Protein molecules are at the boundary between the microscopic and macroscopic levels. The large number of atoms, which vary from several hundreds in small proteins to hundreds of thousands in large multi subunits aggregates, justifies the assumption that macroscopic characteristics, such as the dielectric constant, are applicable. On the other hand, we are interested in interactions occurring between the atoms or between different groups of atoms in the protein molecule, which brings us to the microscopic level. This is the main reason for criticism of the methods for calculations of electrostatic interactions based on the macroscopic treatment of the protein molecule as it is done in the terms of the dielectric cavity model. In principle, one can evaluate electrostatic interactions in proteins on a microscopic level. Some of the basic tools of such an approach are given in Appendix B. One needs to know the partial charges of all atoms, as well as their polarisabilities. The implementation of the microscopic approach is however not as straightforward as it might look. Upon the introduction of a charge, for instance upon deprotonation of a carboxyl group, the reaction of the material is realised not only by inducing of dipole moments in the surrounding (Fig. 5.8A), but also by reorganisation of the dipole moments (Fig. 5.8B), which for proteins means reorientation of chemical bonds, hence change of the conformation. These conformational changes cannot be exactly predicted on the basis of electrostatic calculations only. One can simplify the task by assuming fixed protein structure. In fact, this assumption is the most common one when electrostatic interactions are analysed. Its advantage is that it allows accounting for the delicate
154
Introduction to Non-covalent Interactions in Proteins
structural individualities of the protein molecules, which are experimentally observed, for instance by X-ray analysis. Such individualities can be the organisation of the active sites, ligand binding sites, etc. The obvious disadvantage is that when keeping the protein structure fixed, the reorganisation of the dipole moments upon protonation or deprotonation of the ionisable groups is ignored. One possible way out of this situation is the introduction of a dielectric constant, i.e. returning to the macroscopic treatment of the protein molecule while having in mind that it is still a microscopic object. As we have already mentioned, the macroscopic models treat the protein molecule as dielectric material. The dielectric constant of this material is however not known. One can try to evaluate it using the connection between the macroscopic character of the dielectric constant and the microscopic properties of the material given by the KirkwoodFrohlich theory9'10: (ep-l)(2es (ep+2es)
+\)
^AKSM2
~3V
kT '
where £p is the wanted relative dielectric constant of the protein material, represented as a sphere with a volume V, immersed in a medium with a relative dielectric constant es. The quantity SM2 = <M2>-<M>2 is the average fluctuation of the dipole moment M of the material within the volume V. The average fluctuation of the dipole moment can be obtained by molecular dynamics simulation11. The relative dielectric constant of different proteins evaluated by means of this method varies between 11 and 30. Calculations based on models consisting of randomly oriented ahelices and density of dipolar groups equal to that in real proteins gave a value of the protein dielectric constant between 2 and 412. This large difference between the relative dielectric constants evaluated by the different approaches illustrates the fact that the description of the protein as a dielectric material is a severe problem, which is far from being solved. Experimental measurements on dried protein films and powders show that the dielectric constant of these materials is between 2.5 and 3.5. These values correspond perhaps best to the dielectric constant of protein
Electrostatic Interactions
155
molecule. This conclusion is supported by permittivity measurements of materials with chemical structures similar to that of proteins. Such are for instance the polyamides which have a relative dielectric constant between 2.5 and 2.6. The aliphatic polymers, such as polypropylene (er~ 1.5), or hydrocarbons, such as pentene (f r ~ 1.8), which resemble the hydrophobic core of proteins, are characterised with low permittivity. The values of er quoted above (all for 25 °C) correspond to a reaction of the material mainly by electronic and atomic polarisation. The relative dielectric constant of alcohols, such as ethanol (er ~ 24) or methanol (er~ 33), corresponds to the values assessed by Eq. (5.21) and molecular dynamics simulation. These high values of er originate from the reorientation of the dipoles due to an applied electrostatic field. The contribution of the dipolar reorientation is not seen by the experiments on protein powders and films, suggesting that the polar groups of the proteins under these conditions are immobilised. This seems not to be the real situation when proteins are in water solution. As we shall see in Chapter 7, amino acid side chains on the surface of the proteins are mobile and can adopt different conformations. Another possible way to combine microscopic and macroscopic properties of the protein material is the introduction of a distant dependent dielectric constant. This reduces the calculation of electrostatic interactions to a summation of Coulomb terms each of which has a double dependence on the distance between the charges: the first one arising from the Coulomb's law itself, and the second one arising from the distant dependence of the dielectric constant: q 1
£r(r) = ss V
„rlz
zL2 ,„rlz(er,z-\Y
\
(5.22a)
Introduction to Non-covalent Interactions in Proteins
156
where z = 2.5 A. Curve b is obtained by the equation er(r) = £eJf(r) = l + 60(l-e-r/z).
(5.22b)
In this case z= 10 A. In Eq. (5.22b) we have introduced the term %(r) to note the emphasis made by the authors of this formula on the fact that this is an effective dielectric constant which is obtained empirically by fitting a number of different experimental data on ionisation and redox equilibria. 8060£r
4020-
o\
,
,
,
,
,
0
5
10 15 20 distance, A
25
30
,
Figure 5.9 Distant dependent relative dielectric constant — a: Eq. (5.22a)'3, b: Eq. (5.22b)14.
As seen from the figure, at short distances the value of %(r) from Eqs. (5.22) is low and increases when r increases. At distances of about 20 or more the two functions become independent of r and adopt the value set a priori in the equations: %(r > 20)—>78 in Eq. (5.22a) and %(r>20)—>58 in Eq. (5.22b). These and other similar functions for distant dependent relative dielectric constant should be considered as handy computational tools rather than a rigorous description of the dielectric properties of the protein interior. It is also easy to note that the two functions determine different shapes of %(r).
Electrostatic Interactions
157
5.3.2 Dielectric model for calculation of electrostatic interactions in proteins We have seen in the previous section that the treating of protein molecule as a dielectric medium faces difficulties. However, models based on this concept are the most developed and hence the most common for analysis of electrostatic interactions in proteins. One of the advantages of these models is that they require a relatively small number of parameters which facilitates both the calculations and the interpretation of the results. Below we will further develop the dielectric cavity model. We should be aware however that the protein is still a microscopic object. In this context we should treat the dielectric constant of proteins as a working parameter to which we ascribe the meaning of relative dielectric constant. We have mentioned that the spherical and other analytic forms do not provide satisfactory description of the protein dielectric material. Therefore, the electrostatic potential has to be found for an arbitrary shaped dielectric boundary. This is achieved by a numerical solution of the equations of the electrostatic potential. Distasio and McHarris15 are probably the first to propose a numerical solution to the electrostatic potential with arbitrary dielectric boundaries. Because their work was focused on the solution of the Laplace equation (Eq. 5.1), it did not find direct application to proteins. The decisive step towards the solution of the problem was made in the fundamental work of Warwicker and Watson16. They proposed a numerical procedure based on the finite difference method for the solution of the Poisson equation taking into account the real protein structure as determinant of the dielectric boundary. The finite difference solution of the Poisson-Boltzmann equation has been made by Klapper et al.v. This solution will be considered in the next section. The protein molecule is presented as a dielectric material with a relative dielectric constant £p immersed in the medium of the solvent, which has a relative dielectric constant £$ (Fig. 5.10). The solvent, if it is water, is characterised by the presence of mobile ions, expressed by the ionic strength. The shape of the dielectric cavity formed by the protein is determined by the protein/solvent contact surface (for definition see
158
Introduction to Non-covalent Interactions in Proteins
Chapter 4 and Fig. 4.8). In this cavity the charges of the protein molecule are situated. Their position is determined by the three-dimensional structure of the protein molecule. We note that this model requires the three-dimensional structure of the protein of interest to be known. The protein dielectric region is surrounded by an ion exclusion layer which belongs to the solvent, but is free from mobile ions (7 = 0). This layer, accounting for the finite size of the mobile ions, is often referred to as Stern layer. We have met this formulation in the Debye-Hiickel theory. solvent
(<7=0, es, l=const)
Figure 5.10 The continuum dielectric model for calculation of electrostatic interactions in proteins.
Before getting to the solution of electrostatic potential, some essential differences between Debye-Hiickel model and this one should be clarified. The analytical form of the dielectric boundary assumed in the Debye-Hiickel theory allows us to partition the space into regions where the dielectric constant is independent of the space coordinates. Here we have to renounce from this convenience because the dielectric boundary is not analytically defined. When £ is a function of space the Poisson equation can be written as V-(£(r)V(p(r)) = -(p(r) + q(r)). If we use the relative dielectric constant the above equation becomes:
Electrostatic Interactions
159
£0V • [er (r)Vp(r)) = -(/?(r) + q{r)). The charge distribution q(r) is known. This is the set of charges that belong to the protein atoms or groups, whose coordinates are determined by the three-dimensional structure of the molecule. We assume that the charge values are also known. The charge density of the mobile ions in the solution designated p(r) is however not known. As before, we assume that it obeys the Boltzmann distribution law. Applying the same procedure as in Section 5.1.1 we obtain a result very similar to Eq. (5.7): V.(Mp)V*r)) =
^
,
-
£Qkl
^
.
£Q
After introducing the ionic strength, the above equation becomes r
Y
1000£0fcr
e0
The multiplier of (p on the right hand side of this equation has the dimension of the Debye parameter [Eq. (5.8)], but is independent of the relative dielectric constant of the medium. We can call this multiplier modified Debye parameter ~2
2
KL =esK , which allows us to write the linearised Poisson-Boltzmann equation in the following form: V-(£ r (r)Vp(r)) = £ > - —
•
(5-23)
This equation is solved numerically. 5.3.3 Numerical solution of the Poisson-Boltzmann equation, finite difference method The finite difference method is widely used for numerical solution of differential equations. Here we will focus only on the basic ideas of just one of its applications, namely the numerical integration of Eq. (5.23). More comprehensive presentation of this method can be found in most of
Introduction to Non-covalent Interactions in Proteins
160
the books on numerical methods, such as Press et al.'s "Numerical Recipes"18. I
I
solvent
1 1
1 1
1 1
1 1
1 1
!
T !
t
T
£(i), q(i), K(i),
—
1
//
\'f\
'
J/
'
! ^sl
if ' ' ^ s !
"X^
•——
! TV !
' V*
ft i "i : T ^ Q / K !
protein J
!//S.
Figure 5.11 Left: Two-dimensional lattice representation of the protein molecule and the surrounding solvent. Each grid point is a centre of a cubic element. Right: A cubic element the centre of which is the grid point ;'. The six neighbouring points, j , are also given. The edge length of the cubic element, h, is equal to the distance between grid points i andj.
The main idea of the finite difference method is to transform the continuum task to a discrete one. We will do this by partitioning the space into a set of regular cubic elements. For this purpose, the protein and part of the surrounding solvent are placed in a box with a threedimensional grid forming a cubic lattice (Fig. 5.11, left). This box we refer hereafter to as computational box. Each grid point is then a centre of a cubic element (Fig. 5.11, right). Instead of calculating the continuous function (f(j) we calculate the electrostatic potential,
Electrostatic Interactions
161
values to the grid point is often called "mapping" of the system on the grid. In our case, we have to produce a few three-dimensional "maps". At each of the grid points the quantities involved in Eq. (5.23) have a distinct value. Thus for instance, the dielectric constant at the grid point k with coordinates rk has the value £r(r) = e, if the grid point is in the medium of the solvent, or er(r) = £p if the grid point is inside the protein material. These are for instance the points i and j in Fig. 5.11, whereas point k is in the solvent region. In this way, we can substitute the continuous function er(r) with the discrete one er(i) and create a threedimensional map of the dielectric constant. This is the dielectric map. In the same way we can assign the values of the Debye parameter K or K. In the region of the protein, where no mobile ions are defined, obviously K = 0, whereas in the region corresponding to the solvent rd$) has a value determined by the ionic strength and the temperature according to Eq. (5.8) and
This is the so-called Debye map. The charges that belong to the protein molecule, q(r), are point charges, so that they are not presented by a continuous function. In general, the loci of these charges do not coincide with any grid point, so that a direct assignment of the charges cannot be done. There are different ways to correlate the point charge positions and the grid points. The simplest, but inaccurate, one is to assign the charge q(r) to the nearest grid point. In the example shown in Fig. 5.12A this is the grid point i. A better way to convert q(r) into q(i) is to assign fraction charges to the eight grid points determining the cube where the charge q(r) is situated. The values of the fractional charge on each grid point should be proportional to the distance of the corresponding grid point to the position of the charge. The fractional charge of each of the eight grid points, qf = q(j), can be calculated by the formula
q(j) =
(l-xJ)(l-yj)(\-zj),
where Xj, v;, and Zj are the distances of the charge from the grid pointy in the directions x, y, and z, respectively, taken in fraction of the edge
Introduction to Non-covalent Interactions in Proteins
162
length of the cube, h (Fig. 5.12B). When this procedure is repeated for all charges, the charge map is created. :
:
(A)
i.^.
(B) •
•
--J • •
/ i>T
I ' l
/ -\
\i
'*
Zj
"">?'
,
>j
- -1 •
'I
Figure 5.12 A charge, q(r), in the lattice of grid points. (A) Distance of the charge q(r) to the nearest grid points. (B) Interpolation scheme for distribution of charge values on the grid points. The fractional charges on the vertices of the cube are graphically illustrated by spheres with different radii.
The three maps, the dielectric map, the Debye map, and the charge map describe the system completely and we can proceed with the integration of Eq. (5.23). According to the finite difference method the integration is performed in each cubic element: jV • {er (r)Vp(r))dv - j";?2(p(r)dv + — \q{r)dv = 0, V
V
(5.24)
° V
where V is the volume of the cubic element. We will treat the three integrals in the above equation separately. In order to solve the first integral we first will transform it to a surface integral using the Gauss' theorem (see Appendix C): jV • {er(r)V
(5.25)
where s is the normal vector of the surface of the integration region, which in our case is the six walls of the cubic element. The regular form of our integration region suggests a simple solution. We can split the integral on the right hand side of Eq. (5.25) to six surface integrals over the individual walls of the cubic element. At each of the walls the dielectric constant er(r) has a distinct value. This value is taken to be the
Electrostatic Interactions
163
average of the value assigned to grid point i and the adjacent grid point j : £{i, j) - [eii) + £(j)]/2, where j runs over the six neighbouring grid points (Fig. 5.13). Because £(i,j) is independent of the space coordinates, it can be taken out of the integration.
V
€n h -*r
£j
y ..;... >c
ds
Figure 5.13 Integration cubic element: determination of the dielectric constant on the walls.
The operation V^j(r) can be written as (see Appendix C) „
. .
,
,dq> d(p d
If we take one of the walls, perpendicular to the x-axis of the coordinate system (the grey one in Fig. 5.13), grad
(-^-,0,0). dx
According to the definition of the first derivative (p(x + Ax) - (p(x) dq> •• l i m Ax dx Ax->0 Here we should take into account that our system is discrete and the minimum shift is limited by the grid spacing, h. Hence, Ax—>h. Also, taking into account that
Introduction
164
to Non-covalent
Interactions in Proteins
d
V
which is valid for all other walls of the cubic element. In this way, the integral on the right hand side of Eq. (5.25) becomes 6
| er (r)V^(r) •ds = hYJ e(i, j){
7=1
where the index j denotes the six neighbouring grid points. The solution of the second integral in Eq. (5.24) is straightforward. The modified Debye parameter tc has a distinct value in the cubic element. Also, we should not forget that within the cubic element
The third integral in Eq. (5.24) gives the charge assigned to the cubic element, because g(r) is the charge assigned to the grid point i, q(i): 1
r ,_w = -4(0 jq(r)dv
~U y
-0
Substituting the results of the integrations in Eq. (5.24) we obtain h£ e(i, j){
= 0.
£
7=1
0
The wanted potential at grid point i, (p(i), can be expressed after reorganisation of the above equation: h£e(i, 7=1
and finally
j)(p{i) + K1 (i)
f
^ 0
Electrostatic Interactions
165
h£
ti
o
•
(5.26)
2 2
^ > ( U ) + /* ^ (0 i=i
Equation (5.26) is the finite difference formula for the potential (p(i) at any point of the computational box. This formula, however, contains the values of the potential of the neighbouring points, which means that in order to calculate the potential q(j) we should know values qtj). Therefore, Eq. (5.26) is solved iteratively. We will consider only the main principles of an exemplary iterative procedure. First, we set the electrostatic potential for all grid points (p(j) - 0. Also, for simplicity of the example, we assign the charges q(r) to the nearest grid point. Then, we number the grid points starting from one of the vertices of the computational box. During the iterations we will trace through the behaviour of three consecutive grid points numbered, iu i2, and z3. The indexes 1, 2, and 3, indicate only that the numeration of the grid point is consecutive. Also, we choose /i be an odd number. Grid point ix is in the solution (see Fig. 5.14), meaning that q(i\) = 0 and
Equation (5.26) for this point (and all other points with these characteristics) becomes 6
J^£(i,j)
#o=V^
2
^>(U) + ^ ( 0 .7=1
•
(5 27)
-
166
Introduction to Non-covalent Interactions in Proteins
©
©
©
Figure 5.14 Even (open circles) and odd (full circles) grid points. The grid points on the computational box walls are distinguished by somewhat larger circles. The electrostatic potential at these grid points is fixed by the boundary conditions, so that they participate in iterative solution of Eq. (5.26) only as neighbouring points.
Grid point i2 is in the protein dielectric region and is characterised with a charge q(i2) = q and rc(i) = 0. For grid points with such characteristics Eq. (5.26) becomes 6
YJ£(iJ)(PU) +
;=i
q(i) hen
(5.28)
£^(U) 7=1
Grid point i3 is also in the protein dielectric region, but here q(i3) - 0 and Eq. (5.26) is written as 6
^e(i,j)qKj) ;=i
0*0 = -
(5.29)
Electrostatic Interactions
167
We see that the full form of Eq. (5.26) does not appear in the solution of the electrostatic potential in any grid points, which is an advantage from a computational point of view. The points at the walls of the computational box have a fixed potential defined by the boundary conditions which will be considered in the next section. We start the solution of Eq. (5.26) for the odd inner grid points (the full circles in Fig. 5.14). In point ix Eq. (5.27) gives
making grid point i2 a source of electrostatic potential. As seen, this is the electrostatic potential of a point charge at a distance equal to the grid spacing, h. The procedure continues with repeating the calculations for the odd grid points. For point i\ Eq. (5.27) is not zero any more because the numerator on its right hand side contains at least one non-zero term, that containing the potential ^2), calculated in the previous step. If only this term is non-zero, the potential at this point looks like ,. . _ W l ' ~ x=i
e{ix,i2)
•
2>(»'i../) + >rjr(i 1 ) The potential at the next odd point, i3, is also not equal to zero, because of the contribution of the potential (/Kji)- Equation (5.29) gives £(i3,i2)
168
Introduction to Non-covalent Interactions in Proteins
(/Kh) — grows faster than that in the protein solution — the potential (fK.i{). This effect is due to the influence of the ionic strength in the region of the solvent. With each iteration the potential (p(i) increases according to Eqs. (5.27)-(5.29). It can be shown that the iteration procedure converges and that (fKi) approaches asymptotically its true value. 5.3.4 Boundary conditions The grid points at the walls of the computational box, the boundary grid points, have at least one neighbour missing, making Eq. (5.26) inapplicable. This is the reason to execute the iterative procedure for the inner grid points only. The values of the electrostatic potential at the boundary grid point are assigned beforehand and are kept constant. In the example considered in the previous section we have set the initial values of the electrostatic potential (p(i) = 0, including the boundary grid points. This means that at the walls of the computational box the electrostatic potential remains zero after the relaxation of the iterative procedure. These boundary conditions are called zero boundary conditions. The zero boundary conditions are justified only if the protein is sufficiently far from the walls of the computational box. In principle, these boundary conditions introduce an underestimation of the calculated electrostatic potential for the inner points. A reasonable approximation is to assign values to the electrostatic potential according to Eq. (5.10c), Debye boundary conditions. Assuming that a « rki„, where a is the ion exclusion radius (see Fig. 5.1) or the thickness of the Stern layer, Eq. (5.10c) can be approximated as 1
A a e~Krkjn
The summation in the above expression runs over the total number of charges, N, in the protein molecule. The distance between the boundary grid point k and the charge qn is denoted as r t „. This approximation requires that the walls of the grid box are sufficiently distant from any of the protein charges.
Electrostatic Interactions
169
The Debye boundary conditions are not exact either, because Eq. (5.10c) is derived for a spherical ion in a homogenous dielectric medium. Though the atoms carrying the protein charges can be considered as spheres, they are immersed in a heterogeneous dielectric medium: some of them can be buried in the low dielectric protein medium, whereas others can be at the protein/solvent interface.
*2,0
Figure 5.15 Three-step focusing. The lattice I is the initial one. Fl is the focused box which is within I and has a smaller grid spacing. Four grid points on one of the walls of Fl are selected, and augmented in the left hand side, to illustrate their position on the cubic elements of the grid box I. The second focused box, F2, is centred on a protein region of interest.
A good way to increase the accuracy of the boundary condition is the so-called focusing, which is a simple but efficient procedure. It consists of the following. After the relaxation of the iterative procedure performed in a computational box, which we call initial box I, we define a second one (the focused box, Fl) which is within the initial box and has the same number of grid points (Fig. 5.15). The grid spacing in Fl becomes smaller, hm < h0, which increases the accuracy of the calculations. The iterative procedure is executed again for Fl with boundary conditions taken from box I as follows. Each of the boundary grid points of box Fl, kFu is located in one of the cubic elements of box
170
Introduction to Non-covalent Interactions in Proteins
I. The value of
Formulae and procedures for three-dimensional linear interpolations can be found in the handbooks for mathematics and numerical methods. Here we will give a simple example for linear interpolation of the potential in a point within a cubic element.
/
/
4 ft -o '
(p3A
/ l
1
/
S 1 1
1
i
'
7(p4'
\k
A\ i i
b/ /
a
1
v
i / i /
ft
Figure 5.16 Left: A grid point at the wall of the focused grid box in the volume element of the previous grid box. Right: Values of the electrostatic potential at the grid points of the previous grid box. At the front face they are denoted as (fa to q>A, whereas at the rear face, as
On the left hand side of Fig. 5.15 the position of a boundary grid point, k, is related to the grid points defining a cubic element of the previous grid box. The parameters a, b, and c are the distances from point k to the grid point i in the three spatial directions taken in fraction of h. Let the grid points at the front face of the cubic element have potentials q\ to cpA (Fig. 5.16, right) and the other four grid points at the rear face have potentials (f\' to
b(<pac'-<pac),
where q>ac and <pac' are the potentials at points ab and ab' on the front and the rear faces of the cubic element, respectively, as shown in Fig. 5.15. These potentials are obtained by the same formula:
Electrostatic
Interactions
<Pac=
171
(PIT) Pn')-
The unknown potentials on the right hand side of the above equations are equal to (p12 = 34 =(p3 + K
and the same is true for the rear face, where the potentials are denoted with "prime". The above series of interpolations can be united in a single formula, though rather long, where only the known values,
172
Introduction to Non-covalent Interactions in Proteins
The simplest example for calculations that can be made with the finite difference method is the prediction of the interaction energy between two charges in a homogenous medium. Suppose we have coded a finite difference algorithm and applied it for the calculation of (p(r) created by a point charge. The boundary conditions in this case are exact, because the system is homogenous and has an analytical solution (Coulomb's law). The energy of interaction between this charge and a probe charge, q, at point with coordinates r is then simply W = q(p(r). The calculated values of W as a function of the distance between the two charges are compared with the exact solution of the Coulomb's law in Fig. 5.17. We see that for an ideal system, the solution of the finite difference method fairly well coincides with the analytical one. 140kcal/mol
12010080604020rij, A Figure 5.17 Electrostatic energy of interaction between two unity positive charges in vacuum as a function of the distance between them. Continuous line: calculation based on the coulomb' law; points: finite different calculations.
The electrostatic potential created by the charge constellation of a protein molecule can be very complicated. In Fig. 5.18 the electrostatic potential created by the charges of the protein NK-lysin is illustrated. The electrostatic potential is presented by isopotential surfaces corresponding to 0.1 V and -0.1 V. The region enveloped by the positive isopotential surface is larger than that of the negative one and embraces an essential part of the protein molecule. A wide region of positive
Electrostatic Interactions
173
potential is observed around an a-helix, which is rich of positively charged side chains. This region of the protein (lower part of the image in Fig. 5.18) is supposed to interact with cell membranes. The negative potential is constricted mainly within the protein interior.
Figure 5.18 Electrostatic potential, 0 r ) , around NK-lysin in aqueous solution at pH 7 contoured at 0.1 V (blue) and -0.1 V (red)1''. The protein molecule is presented with Crxbackbone.
It is often assumed that the main contributor to the form of the electrostatic field is the charge constellation formed by the titratable groups (see next chapter for definition) of the protein molecule. This is to a large extent correct, but is not a universal rule. Partial atomic charges, if they are arranged in an appropriate way, can also produce a significant electrostatic potential. Such an arrangement can adopt the atoms from the polypeptide backbone, determining the dipole moment of the peptide bond (see Fig. 1.7 and Table 6.5). In the secondary structure elements, such as the a-helices, the dipoles formed by the partial charges of these atoms are quasi-parallel. This results in a distinguishing pattern of the
174
Introduction to Non-covalent Interactions in Proteins
electrostatic potential at the two ends of the helix. As illustrated in Fig. 5.19 the ct-helices are characterised with pronounced positive potential in the region of the N-terminus (upper part of the figure) and negative potential at its C-terminus. This is a result of an accumulation of the uncompensated partial charges of the first and the last three peptides of the oc-helix. These atoms can be recognised in Fig. 1.3. This feature of the electrostatic field of the ot-helices has an impact in a various properties of proteins. Often, the a-helices are anti-parallel oriented in the three-dimensional structures of the protein, compensating in this way the excess of the electric charge at their termini. Also, it is observed that acidic amino acid residues appear more frequently in the vicinity of the N-termini of the a-helices. The same tendency is observed for the basic amino acid residues, which are more frequently found around the Ctermini.
Figure 5.19 Electrostatic potential created by the partial charges of the atoms from the polypeptide backbone in CC-helical conformation20. The definitions of the isopotential surfaces are as in Fig. 5.18.
Electrostatic Interactions
175
There is another interesting feature of the electrostatic potential created by the peptide dipoles, namely that side chains are in a region of positive potential. This very important property will be discussed in Chapter 7. The examples given in Figs. 5.18 and 5.19 clearly show that the electrostatic potential created by the protein charges has a non-trivial form. The information provided by the spatial distribution of the electrostatic potential,
176
Introduction to Non-covalent Interactions in Proteins
9. Kirkwood JG, (1939) The dielectric polarizibility of polar liquids. J. Chem. Phys., 7: 911-919. 10. Frohlich H, (1958) Theory of Dielectrics. Oxford: Clarendon. 11. Dominy BN, Minoux H and Brooks III CL, (2004) An electrostatic basis for the stability of thermophilic proteins. Proteins, 57: 128-141. 12. Gilson M and Honig B, (1986) The dielectric constant of a folded protein. Biopolymers, 25: 2097-2191. 13. Hingerty BE, Ritchie RH, Ferell TL and Turner JE, (1985) Dielectric effects in biopolymers: the theory of ionic saturation revisited. Biopolymers, 24: 427-439. 14. Sham YY, Chu ZT and Warshel A, (1997) Consistent calculations of pKa's of ionozable residues in proteins: semi-microscopic and microscopic approaches. J. Phys. Chem. B, 101: 4458^472. 15. Distasio M and McHarris WC, (1979) Electrostatic problem? Relax! Am. J. Phys., 47: 440-444. 16. Warwicker J and Watson NC, (1982) Calculation of the electric field potential in the active site cleft due to alpha-helix dipoles. J. Mol. Biol., 157: 671-679. 17. Klapper I, Hagstrom R, Fine R, Sharp K and Honig B, (1986) Focusing of electric fields in the active site of Cu-Zn superoxide dismutase: effects of ionic strength and amino-acid modification. Proteins, 1: 47-59. 18. Press WH, Flannery BP, Teukolsky SA and Vetterling WT, (1986) Numerical Recipes. Cambridge: Cambridge University Press. 19. Miteva M, Anderson M, Karshikoff A and Otting G, (1999) Molecular electroporation: a unifying concept for the description of membrane pore formation by antibacterial peptides, exemplified with NK-lysin. FEBS Lett, 462: 155-158. 20. Spassov VZ, Ladenstein R and Karshikoff A, (1997) Optimization of the electrostatic interactions between ionized groups and peptide dipoles in proteins. Protein ScL, 6: 1190-1195.
Chapter 6
Ionisation Equilibria in Proteins
Proteins are rich in ionisable amino acid residues. In Table 1.1, four amino acid side chains are classified by convention as chargeable or charged because at neutral pH they have either positive (lysines and arginines) or negative (aspartic and glutamic acids) charge. The aromatic residues histidines and tyrosines, as well as cysteines, can also be charged. This accounts for 1/3 of all natural amino acids. Thus, one of the major tasks when investigating electrostatic interactions in proteins is the prediction of the ionisation equilibria of the individual ionisable groups. The factors influencing these equilibria in proteins are summarised in Table 6.1. Before getting started with studying ionisation equilibria, some specific terminology should be clarified. One of the factors determining the ionisation equilibria is charge-charge interactions. By charge-charge interactions we refer to the interactions between the charges of the ionisable groups. In this way we formally distinguish them from electrostatic interactions between the ionisable side chains and the partial atomic charges of the polar groups. The latter can be designated as charge-dipole interactions. We also use the term charge-charge interactions to distinguish electrostatic interactions between individual charges and the effect of desolvation. The term titratable group and ionisable groups are synonyms. A titratable group is a functional group of an amino acid side chain which changes its protonation state with the change of pH. This can be observed by means of potentiometric titration experiment wherefrom these groups are named titratable groups. Upon protonation, i.e. upon binding of a hydrogen ion, the charge of the titratable groups is changed, 177
178
Introduction to Non-covalent Interactions in Proteins
so they are also ionisable groups. Accordingly, deprotonation is the process of release of a hydrogen ion from the functional group. We will also use the term titratable site, referring to titratable groups which have distinct locations in the protein molecule. For instance, two histidine residues in a given protein molecule have the same titratable groups (the imidasole ring), but they are different titratable sites because they have different locations in the protein molecule and, as we shall see, can have different ionisation behaviour. Table 6.1 Major factors determining the ionisation equilibria of the titratable groups in proteins.
1
Factor Standard energy of ionisation
Comments This is the energy of ionisation of the titratable amino acid side chains taken as model compounds. This factor reflects the chemical nature of the titratable groups.
2
Desolvation of ionisable groups
This is the energy of transfer a titratable group or a model compound from solvent to its location in the protein molecule. This energy is always unfavourable and is often called "desolvation penalty" (see Section 5.2).
3
Electrostatic interaction of the titratable groups with the permanent protein charges, such as charge-dipole interactions
For instance, these are the interactions of a given titratable group with the charge dipoles of the polypeptide backbone.
4
Electrostatic interaction between titratable groups (charge-charge interactions)
These interactions determine the cooperative character of the ionisation equilibria in proteins.
5
Conformational flexibility
Native proteins can adopt different conformations at different conditions. For instance, conformational changes may occur upon the ionisation of a given titratable group. Also, at certain conditions more than one conformation of the protein molecule can be in equilibrium.
Ionisation Equilibria in Proteins
179
6.1 Why Does One Need to Know Ionisation Equilibria? Titratable groups change their charge upon binding or releasing a hydrogen ion. Hence, the charge state of a given titratable group is determined by the equilibrium of hydrogen ion association (or dissociation) reaction. As far as this equilibrium depends on the concentration of hydrogen ions in the solution, the charge states of the titratable groups in proteins will change with the change of pH. This leads, of course, to a change of electrostatic interactions. Thus, any pH induced change of properties of the protein molecules results from changes of electrostatic interactions.
0.4
0.2
0.0
10 pH
Figure 6.1 (A) A property depending on pH regulated by the ionisation equilibrium of a distinct titratable group in the protein molecule. (B) Dependence on pH of the reciprocal value of the kinetic coefficient (j>2 relating the substrate binding and the proton abstraction from the substrate by alcohol dehydrogenase from Drosophila lebanonensis\ The continuous line is the titration curve of a titratable amino acid side chain with pK 7.3.
Figure 6.1 A illustrates an idealised pH-dependence of an observable quantity related to a certain protein property. The nature and the measured values of the observable quantity are not relevant at the moment. The observable can be the chemical shift recorded by nuclear magnetic resonance (NMR), or the pH-dependence of the wavelength change observed by spectroscopic experiments. It can also be just the fraction of protein molecules in a certain state, for instance the native state. In all these cases the change of electrostatic interactions in the protein molecule is behind the observed pH-dependence.
180
Introduction to Non-covalent Interactions in Proteins
Since changes of pH induce changes of the ionisation state of the titratable groups, usually the midpoint of the observed experimental curve is attributed to the pK of a titratable group in the protein molecule. This group is regarded as the one regulating the observed transition. The identification of this group and the prediction of its ionisation equilibrium is the first step towards the understanding of the phenomenon manifested by the observed pH-dependence. This is not a straightforward task. Despite the decisive progress in developing of methods for experimental detection of ionisation equilibria of the individual titratable groups in proteins, for instance by NMR, the pK values cannot always be determined unambiguously. In some cases the observed pH dependence cannot be attributed to any of the experimentally measured pK values. Such is, for instance, the case of the reaction of proton abstraction from alcohol catalysed by the alcohol dehydrogenase from Drosophila lebanonensis. In Fig. 6.IB the pH dependence of the kinetic coefficient that relates substrate binding and the proton abstraction is shown. It corresponds to the deprotonation curve of a titratable group with pK value of 7.3. So far, no titratable group with such a pK value has been experimentally identified. This example shows that the theoretical analysis of electrostatic interactions aiming at predicting the ionisation equilibria in proteins is an obvious prerequisite for understanding the mechanisms of the observed pH dependent phenomena. 6.2 Basic Definitions 6.2.1 Protonation/deprotonation equilibria Since the titratable groups in proteins are acids and bases, it is perhaps useful to recollect their definitions. An acid is a substance from which a hydrogen ion can be removed. In a similar way, one defines a base: a substance that can take a hydrogen ion. The dissociation of an acid can be described by the equilibrium AH + H 2 0 o A ~ + H 3 0 + .
(6.1a)
Ionisation Equilibria in Proteins
181
The above equilibrium is an acid-base equilibrium at which the water molecule acts as a base. Equation (6.1a) is often written as AH<->A~+H + . (6.1b) The difference between these two ways of writing is that Eq. (6.1a) registers the fact that the dissociation of AH is a process in which the solvent is involved. In the second expression this is omitted. Using Eq. (6.1b), for the dissociation equilibrium constant we write
K.=<^M1,
(6.2)
[AH] where the subscript a indicates acid dissociation. The square brackets denote concentration measured in mol/L. Similarly, the binding of a hydrogen ion to a base is described by the equilibrium B+H2OoBH++OH",
(6.3a)
which is also an acid-base reaction. The water molecule in this case acts as an acid. If we formally ignore the solvent (water) we write B + H + «H>BH + . The association equilibrium constants of the above reaction is .
k
b=
(6.3b)
[BH + ] —-
[B][H+] whereas the corresponding dissociation constant is * > = « (6-4) [BH + ] The subscript b stands for base. The equilibria expressed by Eqs. (6.1b) and (6.3b) are more convenient for the purposes of our considerations. Therefore we will stay by these forms, keeping in mind that the above dissociation/association equilibria depend on the solvent. The removal or association of a hydrogen ion is nothing but removal or association of a proton. Hence, the equilibria (6.1b) and (6.3b) and corresponding equilibrium constants [Eqs. (6.2) and (6.4)] reflect one and the same process: the protonation/deprotonation equilibrium. In
182
Introduction to Non-covalent Interactions in Proteins
this sense, we are going to use neither Kb nor kb for characterising the protonation/deprotonation equilibria. Besides, there is no need to use the subscript a either, so that instead of Ka, for the dissociation constants we will use the notation K. 6.2.2 Henderson-Hasselbalch equation Taking the negative logarithm of both sides of Eq. (6.2) one obtains: , [A~][H + ] -lg^T = - l g S S [AH] or -lgK = - l g [ H + ] - l g ^ - j . [AH] By definition pK = -\gK and pH = -lg[H + ], so that we can also write p£ = p H - l g ^ . [AH] This equation, reorganised as PH
= pK + l g ^ - ^ (6.5) [AH] is the equation of Henderson-Hasselbalch, which is used for preparation of buffer solutions and for calculation of the charge of the ionisable species in the solution. Proteins, being ionisable species, are expected to obey the Henderson-Hasselbalch equation. We would like to see how and to what extent this equation can be applied for the complex task of protonation/deprotonation equilibria in proteins. Before this, two aspects of the Henderson-Hasselbalch equation are instructive to comment. The compound AH is by definition an acid. Its deprotonated form, A", can be considered as a base because it is able to take a proton, i.e. the compound A" is the conjugated base of AH. The same is valid for the base B. Its conjugated acid is the protonated form BH+. Taking this into account, the Henderson-Hasselbalch equation can be written as
Ionisation Equilibria in Proteins
183
„ , [conjugatedbase] pH = p £ + lg— -. [conjugated acid] In terms of protonation/deprotonation equilibria, we can also write _- , [ deprotonated ] pH = pK + lg-—. [ protonated ] These equations describe the ability of a given compound to donate or accept a proton and are valid for both acids and bases. When writing and deducing equations, it is natural that errors will occur. There are a few rules of how to check for errors, if they are just lapsus calami. The simplest one is to check if the dimension of the expressed quantity makes sense. In this context, seemingly, the Henderson-Hasselbalch equation contains an error, because pH, being the negative logarithm of the hydrogen ion concentration, has dimension logarithm of concentration. Of course, such a dimension does not make sense. It is important to note that pH is defined as f
pH = - l j
[H + ] '
JH+]°
where [H+]° is a reference hydrogen ion concentration. In this way pH becomes a dimensionless quantity and Henderson-Hasselbalch equation is correct. This reference concentration is taken to be [H+]° = 1 mol/L by convention. If we choose another reference concentration, for instance the concentration of hydrogen ion in pure water, 10~7 mol/L, the pH of pure water would be just zero. The acidic pH would then have negative values, whereas the alkaline pH positive values. Perhaps, such a standard state has some advantages, for instance the value of zero coincides with the intuitive understanding of "neutral". On the other hand, the standard reference concentration [H+]° = 1 mol/L, makes pH directly indicate the actual concentration of the hydrogen ions in solution, which is a stronger advantage.
184
Introduction to Non-covalent Interactions in Proteins
6.2.3 Degree of deprotonation and degree of protonation The Henderson-Has selbalch equation as expressed in Eqs. (6.5) does not suit our purposes. The ratios in the logarithm term on the right hand sides of the equations contain mutually dependent variables. It is more convenient to use variables that are independent of each other. For this purpose we introduce the variable degree of dissociation, a. This is obviously that a is the degree of deprotonation as well. The degree of deprotonation is defined by the ratio of the concentration of the deprotonated species to the total concentration: _ [deprotonated] [total] where [total] is the sum of the concentrations of the protonated and deprotonated forms and, of course, is a constant. In the case of Eq. (6.5) the degree of deprotonation is written as [A"] [AH] + [ A " ] ' We can start rearranging the above equation with a[AU] + a[A~] = [A~] and
[A - ] [A - ] a + a-—- = -—[AH] [AH]
or finally a \-a
=[A"]
[AH]'
The right hand side term of the above equation is just the ratio in the logarithm in Eq. (6.5). We substitute it and obtain another form of the Henderson-Hasselbalch equation: pH = p ^ + l g - ^ . \-a We have to rearrange it once again to obtain the form of the HendersonHasselbalch equation, which expresses the degree of deprotonation:
lonisation Equilibria in Proteins
185
10 (pH-pK)
g = 1 + 1 0
(piW
(6 6)
'
The pH dependence of a has a sigmoidal shape as illustrated in Fig. 6.1 with a midpoint a= 1/2 (which is at the same time the inflection point) corresponds to pH = pK. This feature is widely used in the analysis of properties of proteins depending on pH. For instance, it has been used to correlate the experimental observation shown in Fig. 6.IB with the ionisation behaviour of a putative titratable group pA' of 7.3. Sometimes it is more convenient to use degree of protonation or, what is the same, degree of association, 0. The way to derive of an expression for #is the same. Working with the association constant k=
[AH] [A][H+]
we write the Henderson-Hasselbalch equation as pH = - l g - _ - p . . The degree of protonation is defined then by the ratio _ [protonated] [total] or 0=
[AH] [AH] + [A~]
After performing the same reorganisation as above we obtain pH = -p&-lg1-0 or jQ(-pH-pfc)
0=l + 10 ( _ p H " p i ) We know that pK = -pk. Because we have decided to work with pK, we can substitute pk in the above expression:
186
Introduction to Non-covalent Interactions in Proteins 10 (p*r- P H)
The last equation relates the degree of protonation with the dissociation constant. This equation is preferred by some researchers when analysing the pH-dependence of the chemical shift observed in NMR experiments. 6.2.4 Ionisation equilibrium constants of model compounds We have already seen from Eqs. (6.5) and (6.6) that if the concentration of the conjugated base and acid are equal, i.e. a= 1/2, then pH = pK. In other words, pK is a characteristic of a given acid or base whose measure is the pH at which this acid or base is half dissociated. The titratable groups in proteins differ in their ability to release or bind hydrogen ions, so that the pK values appear as an appropriate parameter distinguishing the different types of titratable groups. If their pK values are known, the Henderson-Hasselbalch can give us the ratio of protonated and deprotonated species at any pH. In this way the task of finding the charge constellation of a protein molecule at given pH looks like solved. This, however, is not the case because, as we have already mentioned, the protonation/deprotonation equilibria of the individual titratable sites depend on the environment. As an example of this dependency we can refer to the data listed in Table 6.2, where the pK values of the standard amino acids are listed. The variation of the pK values of the side chains is attributed to the difference in the chemical nature of the titratable groups. Hence, these pK values can be considered as properties of the distinct titratable groups. However, it can also be seen that the occarboxyl groups, which have the same chemical composition, are characterised by different pK values. One can notice that those occarboxyl groups belonging to amino acids with positively charged side chains have the lowest pK values. This shows that pK values depend on electrostatic interactions, an observation that is not surprising, taking into account that the p ^ values refer to ionisation equilibria. For us, the important point is that the pK values depend on the environment. Still, Henderson-Hasselbalch equation is applicable, because this environmental influence is inherent to a given type of amino acid. In
Ionisation Equilibria in Proteins
187
proteins this influence may vary, a feature which is not reflected by Eqs. (6.5) and (6.6). Table 6.2 Values of pK of the standard amino acids found in proteins. Amino acid ala arg asn asp cys gin glu giy his ile leu lys met phe pro ser thr trp tyr val
oc-Carboxyl group 2.4 1.8 2.1 2.0 1.9 2.2 2.1 2.4 1.8 2.3 2.3 2.2 2.1 2.2 2.0 2.2 2.1 2.4 2.2 2.2
a-Amino group 9.9 9.0 8.8 9.9 10.8 9.1 9.5 9.8 9.2 9.8 9.7 9.2 9.3 9.2 10.6 9.2 9.1 9.4 9.1 9.7
Side chain 12.5 3.9 8.3 4.1 6.0
10.8
10.1
In proteins, the titratable groups at a position are not present, so that their electrostatic influence on the ionisation equilibria of the side chains is not present either. This fact makes the pK values listed in Table 6.2 inappropriate for the analysis of ionisation properties of proteins. Compounds that can provide unperturbed pK values are those having the titratable groups on a position blocked. Such compounds are the series of acetyl-X-amide, where X stands for the variety of titratable side chains. The equilibrium constants, pKmod, of the titratable side chains in different model compounds are listed in Table 6.3. One can notice that there is a variety of pKmod used to represent the ionisation equilibrium of the individual titratable side chains. Different researchers have different arguments for choosing different model compounds and hence, using different \>Kmod.
188
Introduction to Non-covalent Interactions in Proteins
Table 6.3 Values of pK (pKmod) of the titratable amino acid side chains in different model compounds. Amino acid
Titratable group
pKm
P-COO" + H+
Asp
P-COOH o
Glu
y-COOH <-» y-COO" + H+ NH
His
• >
-
NH
N '/ )> NH
NH His (tautomers)
-> NH
NH
4.4, 4.5
+H +
6.3, 6.4
+H+
6.6, 7.0
+ H+
10.8, 14.0
NH
-
'I \ N
N His (2nd deprotonation)
4.0
N
Y N
-SH *-* S" + H+
Cys )H
8.3, 9.0, 9.5 + H+
Tyr
Lys
Arg
N-terminus
C-terminus
Ser, Thr
9.6, 10.0
e-NH 3 + <-> e-NH 2 +H +
—NHC
NH,
NH / & ^vfHC X X NH2 NH 2
/
10.4
a-NH, + <-> a-NH, + H+
a-COOH a-COO" + H+ Polar groups: -OH <-> -O" + H+
+ H+
12.0
7.5, 8.0, 8.2
3.6, 3.67
16.0
lonisation Equilibria in Proteins
189
6.3 Factors Determining lonisation Equilibria in Proteins In this section we will return to Table 6.1 and will consider the different factors influencing the ionisation equilibria of the individual titratable group in proteins. The first factor in this table is determined by the chemical nature of the titratable group. We have already taken it into account by the introduction of pKmod. The factors 2 to 4 in Table 6.1 reflect the interactions of the titratable group with its environment. Our goal is to evaluate the energies of these interactions and their impact on the ionisation equilibria. Therefore it is more convenient to work with the corresponding energies rather than with the equilibrium constants. For the reaction AHA"+H + the energy of dissociation is given by AGdiss=AG°+RTlnK. At equilibrium AGdiSS = 0, so that AG° =-RTIn K . If we want to relate the standard free energy, AG°, to the commonly accepted quantity pK we rewrite the above formula as AG" = 2.3RTpK, where we have just applied the rule for change of the logarithm base \nK = lnlOxlgA" = -lnlOxptf - -2.303pK. At conditions different from the standard and if there are other factors influencing the dissociation equilibrium, the general expression for free energy of the reaction is AGp^d
= AG" - 2.3/?rPH + AGenv .
(6.8)
We write AGp^d instead of AGdiss in order to stress that we consider the process of deprotonation. The term AGenv accounts for the influence of the factors causing the difference from the standard conditions. We assume that these are the factors 2 to 4 from Table 6.1 and introduce them as additive terms constituting AGenv:
Introduction to Non-covalent Interactions in Proteins
190
AGenv = AGBom+AGpc+AGcc.
(6.9)
In the above equation AGBom is the desolvation energy (factors 2), AGpc is the electrostatic influence of the protein permanent charges (factors 3), and AGCC is influence of the charge-charge interactions between titratable groups (factors 4). 1.00.80.6oc 0.40.2-
0.01
2
T
1
1
r
3
4 pH
5
6
7
Figure 6.2 Examples of ionisation curves. Curve a corresponds to ionization according to Eq. (6.6). Curve b is the ionisation of a titratable group for which AGem j= 0. The dashed curve is curve b shifted so that pA^curve a) and pA"(curve b) coincide. In this way the broadening of the curve b is clearly seen.
The most general manifestation of these factors is the shift of the pK value and the broadening of the ionisation curve of the titratable group (Fig. 6.2). A broadening of the ionisation curve arises if AGem depends on pH. In this case the ionisation equilibrium constant and the corresponding pK become dependent on pH as well. Therefore, the pA" at which a- 1/2 is often referred to as pA"1/2.
Ionisation Equilibria in Proteins
191
6.3.1 Desolvation The ionisation of a certain titratable group can be treated as a process of charging. We have already stressed the fact that the energy needed to create a charge depends on the dielectric properties of the environment. The energy difference to create a charge on a certain titratable group when it is a part of a model compound and when it is at its location in the protein molecule is the desolvation energy we need to evaluate. We will do this using the idea of the Born model. Therefore we denote this energy as Born energy, AGBom- Because the environment of the individual titratable sites in proteins differs, the corresponding values of AGBom differ as well. This, of course, leads to different contributions to ZlGenv and hence, to different pK values. 6.3.1.1 Born energy The change of the ionisation equilibrium of a given titratable group can be evaluated by means of the thermodynamic cycle shown in Fig. 6.3.
AGB
Figure 6.3 Thermodynamic cycle used for the calculation of the energy of deprotonation, AGpp^d, of a given titratable group in the protein molecule.
192
Introduction to Non-covalent Interactions in Proteins
Each titratable group is treated as a part of an appropriate model compound, which is transferred from solution (upper part of Fig. 6.3) to its place in the protein (bottom part of Fig. 6.3). This transfer is performed for both forms of the titratable group: the protonated (left hand side of the figure) and deprotonated (right hand side). We are interested in the deprotonation energy when the titratable group is at its location in the protein molecule. According to the thermodynamic cycle it can be expressed as
AG^d=AG^d+AGrV-AG^. In the above expression AGsp^d is given be Eq. (6.8) and can be written as AGsp^d=2.3RT(VKmod-VH). In this way we introduce the measurable quantity pKmod, and thereby the pH-dependence of AGVp^d: AGl^
= 2.3RT(pKmod - pH) + AGsd^? - AGSP~>P.
(6.10)
The last two terms in the right hand side of the equation do not depend on pH. These two terms define the Born energy: AGBom=AGsd^-AGstTP, where AGds~*P and AGPS~*P are the energies of transfer of the model compound in its deprotonated and protonated form, respectively. Note that the Born energy is related to the process of deprotonation of a given group, rather than to its ionisation. For the groups which become ionised upon protonation (see Table 6.3) the Born energy has the opposite sign. The principle of calculating the energies AGds^P and AGPS^F has been considered in the previous chapter. That is, we need to calculate the difference in the self energies of the titratable group in the two environments. Let us consider one of the forms of a titratable group, say the protonated form of the aspartic acid. This group has zero net charge, however it is characterised by a certain distribution of discrete partial atomic charges schematically illustrated in Fig. 6.4. The electrostatic energy of this distribution can be calculated by Eq. (5.18) or by Eq. (5.19). It consists of the self energies of the individual charges, as well as
Ionisation Equilibria in Proteins
193
of the electrostatic interactions between them. This energy we refer to as self energy of the titratable group.
Figure 6.4 Transfer of the charge constellation of a model compound from water to its location in the protein molecule.
Let us denote the distribution of discrete charges of the titratable group in protonated state with pp. For the desolvation energy we can write AG
V
= ~ Z *P W^ P (PP ^-^ 1
(PP ' *>1 •
(6-! D
k
The summation index k runs over all partial charges in the distribution pp and qp(k) is the k-th partial charge (see Fig. 6.4). The potentials q?(pp,k) and (jf(ppk) are the potentials created by the whole charge distribution pp at the position of the charge k. The superscripts S stand for the case when the titratable group belongs to the model compound free in solution, whereas P indicates that the titratable group is on its location in the protein molecule. Equation (6.11) is very similar to Eq. (5.20). Here, however, we renounced the assumption that the charges in the solvent environment do not interact. We do not need this assumption because we are interested in the transfer energy of the charge distribution as a whole, rather than the energy to bring the individual charges from infinity to their positions in the protein molecule. The example illustrated in Fig. 6.4 was taken with a neutral form of the titratable group with the purpose to stress the fact
194
Introduction to Non-covalent Interactions in Proteins
that the transfer energy is non-zero even for charge distributions with zero net charge. The same considerations are valid for the deprotonated state of the titratable group. We denote the charge distribution in this state with pd. The distribution pd differs from pp not only by the deficiency of one proton charge. The proton release may cause changes of the electron cloud displacement and the chemical bond polarisation which leads to changes in the atomic partial charges of the compound. Therefore the desolvation energy of the deprotonated and protonated forms must be calculated separately, which means that we have to rewrite Eq. (6.11) for the case of the deprotonated form: ^
^
=^qd(k)l
(6.12)
k
The difference between Eqs. (6.12) and (6.11) gives the desired Born energy: qd(kW(Pd>k)-(pS(Pd,k)) 1
k
-qp(k)Y
(pp,k)-
(6.13)
6.3.1.2 Calculation of the Born energy Equation (6.13) contains the entire information needed to calculate zlGBom- All we need is to calculate the electrostatic potentials for the protonated and deprotonated forms of the titratable group in the two environments, provided that the charge distributions are known (Table 6.4). In proteins, the titratable groups can be involved in interactions which may cause additional delocalisation of the electron clouds and hence to difference in the values of the partial charges. These effects are assumed to be small and are ignored. We will illustrate the calculation of the Born energy using the example shown in Fig. 5.15 from the previous chapter. The protein molecule is situated in the initial computational box I. The source of electrostatic potential is the charge distribution of the titratable amino acid side chain of interest. After a series of focusing steps we reach the final computational box F2. In this box we calculate the values potential
lonisation Equilibria in Proteins
195
qf(pd,k). The calculations of the potential
Atom
cp
Glu
Cy O81 052 H8 Cy C5 Oel Oe2 He
His
cp CY
Tyr
N8l H5l C52 H52 Cel Hel N£2 He2 C5l Hoi C82 H52 Cel Hel Ce2 He2
cc
Or| Hri
Protonated -0.03 0.75 -0.55 -0.61 0.44 -0.03 0.75 -0.55 -0.61 0.44 0.13 0.19 -0.51 0.44 0.19 0.13 0.32 0.18 -0.51 0.44 -0.12 0.12 -0.12 0.12 -0.12 0.12 -0.12 0.12 0.10 -0.54 0.44
Deprotonated -0.10 0.62 -0.76 -0.76 0.00 -0.10 0.62 -0.76 -0.76 0.00 0.10 0.22 -0.70 0.00 -0.05 0.09 0.25 0.13 -0.36 0.32 -0.16 0.09 -0.16 0.09 -0.17 0.09 -0.17 0.09 -0.05 -0.65 0.00
Tautomer
0.09 -0.05 -0.36 0.32 0.22 0.10 0.25 0.13 -0.70 0.00
196
Introduction
to Non-covalent Table 6.4
Amino acid
Atom
Cys
cp
Lys
Sy Hy Ce
N? HCl Arg
H^2 H^3 C5 Ne He
CC Nr|l 1HT)1 2HT|1
NT) 2 lHri2 2Hn2
Interactions in Proteins
{Continued)
Protonated 0.07 -0.23 0.16 0.31 -0.30 0.33 0.33 0.33 0.30 -0.64 0.44 0.70 -0.80 0.45 0.45 -0.80 0.45 0.45
Deprotonated 0.07 -0.23 0.00 0.00 -0.30 0.00 0.15 0.15 0.20 -0.70 0.44 0.40 -0.64 0.30 0.00 -0.90 0.45 0.45
Tautomer
0.20 -0.70 0.44 0.40 -0.90 0.45 0.45 -0.64 0.30 0.00
The origin of the grid energy is the following. According to Eqs. (5.12) the self energy of a charge evenly distributed on a sphere is inversely proportional to the radius of the sphere, R, and becomes infinitely large when /?—>(). In other words, the self energy of a point charge is undefined. In the finite difference method the atomic charges are partitioned into the grid points (the charge map). Hence, we have a set of a point charges whose (undefined) self energy has to be calculated. The values of the self energies calculated by the finite difference method are not infinite because the electrostatic potential is an average over a cubic element with a finite size. Obviously, the self energy increases and becomes infinitely large when ft—>0 [see also Eq. (5.26)]. Hence, the values of the self energies depend on the grid spacing. However, we are interested in the difference of the self energies, rather than in their absolute values. Thus, in running the calculations with identical computational conditions as described above the grid energies cancel when subtracting the potentials in Eq. (6.13). The calculation of zlGBorn of a given group requires four runs of the finite difference calculations corresponding to the four states of the
Ionisation Equilibria in Proteins
197
thermodynamic cycle shown in Fig. 6.3. In this context the generalised Born model is essentially faster, making many researchers prefer this approach rather than the finite difference calculations or other numerical methods. 6.3.2 Interactions with the protein permanent charges We define the protein permanent charges as charges whose value does not change with pH. A typical example of protein permanent charges which can influence the ionisation equilibria of the titratable groups are the partial charges of the atoms from the polypeptide backbone (Table 6.5). In secondary structure elements, such as a-helices or (3-sheets, the dipoles formed by these charges are organised, which leads to a local amplification of their electrostatic field and hence to an increase of the charge-dipole interactions. This effect is particularly pronounced at the ends of oc-helical segments, where the dipoles on both ends of the helix are approximately in a parallel orientation and are not compensated (see Fig. 5.19). Metal ions coordinated in the protein molecule are another type of permanent charges which dramatically change the ionisation equilibria of the titratable groups. Table 6.5 Example for partial charges in p.u. of the atoms from the polypeptide backbone. Atom N H Ccc C O
Charge -0.47 0.31 0.16 0.51 -0.51
The contribution of the permanent charges can be expressed as follows: *Gpc= ^qpc(k)[^(pd,k)-^p(pp,k)],
(6.14)
ke{pc}
where summation is over the set of the protein permanent charges, {pc}, and qpc(k) is the k-th permanent charge in this set. The products
Introduction
198
to Non-covalent
Interactions
qpc(k)
in Proteins
mdqpc(k)
give the electrostatic energy of interaction between the charge qpc(k) and the set of the partial charges of the titratable group of interest in its deprotonated and protonated forms, respectively. The values of these potentials at each point k belonging to the set {pc} can be taken from the finite difference calculation in the focused box Fl during the calculations of the Born energy considered in the previous section. 6.3.3 Definition of intrinsic pK The terms ^GBorn and zlGpc are independent of pH. As we have already pointed out, their influence on the ionisation equilibria is manifested by a shift of the pK value of a given titratable group. We can relate this shift with the magnitudes of desolvation energy and the interaction energy with the protein permanent charges as follows AnK
-
^Born
and _ AG'pc ApK^pcpc=-~ 23RT respectively. Thus, Eq. (6.9) becomes AGenv = 23RTAVKBom
+ 23RTAVKpc + AGCC .
Bringing the above expression for AGenv in Eqs. (6.8) and combining the result with Eq. (6.10) we can write AG%
= 23RT(pKmod
+ ApKBom + ApKpc - P H ) + ZlGcc .(6.15)
The above expression suggests that a more convenient quantity, the intrinsic p^, can be introduced: PKint = P^mod + ^PKBom + 4 *
p c
,
which similarly to p^ mod is pH independent. The difference is that pKint already contains the factors 2 and 3 from Table 6.1 influencing ionisation equilibrium of a given titratable group in the protein molecule. The
Ionisation Equilibria in Proteins
199
definition of pKint has been given by Tanford and Kirkwood3: This is the pK value of a titratable group in a protein molecule if there were no charge-charge interactions, i.e. if AGCC - 0. Employing the idea of pKint we can write Eq. (6.15) in a shorter form AG%
= 23RT(pKint - pH) + AGCC.
(6.16)
This equation has the same form as Eq. (6.8). Here, the term 2.3RTpKint substitutes (but is not) the standard free energy AG° and the influence of the environment is defined by the energy of the charge-charge interactions. 6.3.4 Charge-charge interactions The charge-charge interactions are electrostatic interactions between the set of charges, {/}, belonging to the titratable site, i, with the charge set, [j}, belonging to the titratable site j (Fig. 6.5). Here we have started using the term titratable site instead of titratable group because the individual titratable groups are in different environments and consequently have different intrinsic pK values. Thus, even if group i and group j are of the same kind, their intrinsic ionisation equilibria can be different, pKiM ^ pKjin„ and should be treated as different ionisable species. The principle of calculation of charge-charge interactions does not differ from that used in the previous section. We have to calculate the interaction energy of the charges qj(k) in the electrostatic potential created by the charge distribution pt of the titratable site i. The electrostatic energy of interaction between these two sets can be written as: ^(pP(pi,k)qj(k),
Wij= te{j}
where
200
Introduction to Non-covalent Interactions in Proteins
Figure 6.5 Charge-charge interactions between site ;' with charge distribution pixi and site j with charge distribution PjiXj. The charges of site i define the set {i}, whereas the charges of site j define the set [/'}.
The expression for Wtj is missing important information. We did not specify the protonation state of the interacting titratable site. In order to do this we introduce the variable, x„ which identifies the protonation state of a given titratable site, i. In this way, the expression for the charge-charge interactions becomes Wfc^E^ta*,-'*)?;*/*)•
(6-17)
The index ixt indicates the protonation state of site i which can be protonated (x; = 0) or deprotonated (x, = 1). The same convention is valid for site j . The values of x, are prompted by our choice to work with the dissociation constant, i.e. the values of x, coincide with the values of the degree of deprotonation a in protonated and deprotonated state, respectively. Thus, Eq. (6.17) gives the energy of interaction of the site i in protonation state x, with the sitej in protonation state xj. The term AGCC can then be written as AGucc = Z j*i
W
W
J
-I^X=o,^ •
( 6 - 18 )
j*i
The sums in Eq. (6.18) are over all titratable sites in the protein but site i. It follows that the charge-charge interactions of site / with the other sites described by Eq. (6.18) correspond to only one set of the variables Xj, or in other words, for only one protonation state of the protein. It also
Ionisation Equilibria in Proteins
201
follows that the energy of deprotonation of this group given by Eq. (6.16) is valid for one protonation state of the protein. There is a large number of protonation states, any of which have a different influence on the ionisation equilibrium of the titratable site of interest. We do not know which protonation state of the protein should be used in Eq. (6.18). Thus, in order to evaluate the energy of deprotonation of site i at given pH, we need to know the protonation states, i.e. the values of x,, for all other sites at this pH. It turns out that the prediction of the ionisation behaviour of site, i, requires the prediction of the ionisation behaviour of all other sites j . 6.4 Combinatorial Problem A single protonation state can be described by the sequence (or by the vector, as it is also called in the literature) X = (X\, %2,...,
Xj, Xj,...,
Xpj),
where N is the total number of the titratable sites in the protein molecule and the element, xj, determines the protonation state of the corresponding sites j . As we have pointed out, Eq. (6.16) gives the energy of deprotonation of site i for a single set of protonation states of the rest of the titratable sites in the protein determined by the sequence x. Taking this into account, we rewrite Eq. (6.16) as follows: AGi^d(&)
= 23RT(pKUnt-pH)
+ YJWiXi=ljXj ~^WiX!=0JXj j
. (6.19)
J
The above expression, as well as Eq. (6.16), originates from Eq. (10) deduced on the basis of the thermodynamic cycle (Fig. 6.3) relating the free energies of the transitions between the different states of the system considered in the cycle. These states are macroscopic states and should not be confused with the protonation states determined by x, which are microscopic states. Hence, after the introduction of the variables x, determining distinct protonation (microscopic) states of the protein molecule, the energy given by Eq. (6.19) loses the meaning of free energy of deprotonation. It is rather the change of the energy of the system (the charge constellation created by the titratable sites at given
202
Introduction to Non-covalent Interactions in Proteins
pH) upon the transition of one microscopic state (site i protonated) to another microscopic state (site i deprotonated). This energy change can also be expressed as the difference between the energies corresponding to two microscopic states: A
Gi,P->d ( & ) = Ei4 ( x *,=i) - Ei,P ( x x,=o) •
The energy J consists of the charge-charge interactions energy of the site / in its protonated form with the rest of the titratable groups. The energy corresponding to the deprotonated contains also the energy needed to remove the hydrogen ion, which is characteristic for the site i, determined by its intrinsic pK value and depends on pH:
Ei4 (x) = 2.3RT(pKiMt - pH) + J X = U J C . . i These two expressions can be united:
Et (x) = 23RTXi (PKint - pH) + J X
JXj
.
j
This equation gives the electrostatic energy of interaction of titratable site / with the environment at given protonation state x of the protein molecule. The total electrostatic energy of the protein at this protonation state is then
£(x,pH) = fjEi(x) i
= 23RTf^Xi(pKiM -pH) + 1 ^ 'wiXhJx. .(6.20) i
i,j
6.4.1 Solution based on the Boltzmann weighted sum Equation (6.20) reflects the fact that the microscopic protonation states of the protein molecule differ in their energies. Hence, there is a certain distribution of the protonation states at given pH. Our task is to find these distributions. For this purpose we will use a simple example of a molecule with three titratable sites as shown in Fig. 6.6. It is easy to recognise that this is the amino acid lysine. We are not going to study the
Ionisation Equilibria in Proteins
203
ionisation properties of this amino acid. According to our task, we are interested in finding the energy distributions of the protonation states at different pH. Therefore, we take the pK values of the individual titratable sites from Table 6.2. These pA' values, being experimental observations, include the influence of the charge-charge interactions. Thus, substituting the pK value from Table 6.2 in Eq. (6.20) and setting the sum on its right hand side to zero we can calculate the electrostatic energy for each protonation state at given pH. The results for three pH values are illustrated in Fig. 6.7A. As seen, the distributions differ essentially. From statistical physics (see Appendix A) we know that the probability, p(x), certain state to be occupied is proportional to its energy: e-E(x,pU)/RT
PW=ye-E{x,m/RT>
<6-21)
fx)
where £(x,pH) is the energy calculated by Eq. (6.20) and {x} is the set of all possible protonation states of the molecule. If we calculate the probabilities for all states we shall see that most populated states are those with most favourable energy (Fig. 6.7B). Thus, for pH 4 and 7, most populated distribution is x(5) corresponding to the deprotonated form of the oc-carboxyl group and the protonated form of the other two sites. The probability distributions show, however, that there are also other states that can be occupied with a significant probability. Such is the case for states x(6) and x(8) in the distribution for pH 10. Hence, the degree of deprotonation of the individual sites observed at this pH will be determined mainly by the values x, in the sequences corresponding to states x(6) and x(8).
Introduction to Non-covalent Interactions in Proteins
204
Ne 3
(x)
Na 2
W)
(•^
^
1
k ol R>
Ca >1 i
site
x 1
2 3
x, 0 Jt2 0 x3 0
2 0 1 0
3 0 1 1
4 0 0 1
5 1 0 0
6 1 1 0
7 1 0 1
? 1 1 1
Figure 6.6 An example of a system with three titratable sites. W(x) is the charge-charge interactions energy of the different protonation states. The number of possible protonation states is 2' = 2 = 8 .
In a given distribution, the variable x„ determining whether site i is in its protonated or deprotonated form, appears with the same probability as the corresponding protonation state x. The observed degree of deprotonation at given pH will be then the average of the variable x, over all protonation states weighted by their probabilities in the distribution corresponding to this pH: -£(x,pH)/R7'
a,(pH) =<x,. >= Ix•y' -E(\,pH)/RT _F(v„HWPr e
•
(6.22)
The symbol <*,-> stands for average (over an ensemble) of the variable x,. The sums on the right hand side of the above expression are over all possible protonation states, whereas the energy £(x,pH) is given by Eq. (6.20). The above equation — which is the weighted Boltzmann sum (see also Appendix A) — gives the solution of the problem of finding the ionisation behaviour of the individual titratable sites in proteins, namely the degree of deprotonation as a function of pH, oj(pH). The implementation of Eq. (6.22) for calculation of ionisation equilibria in proteins was first made by Bashford and Karplus4. This equation is exact, thus giving the most rigorous solution for «;-(pH).
Ionisation Equilibria in Proteins
205
1.0H (I!) 0.8-
9 0.6 ^
pH4 pH7 pH 10
0.40.20.0
4
States Figure 6.7 (A) Energy distribution of the protonaiion states of the system illustrated in Fig. 6.6 at different pH. (B) The corresponding probability distribution.
A decisive advantage of Eq. (6.22) is that it gives directly the function Q}(pH), which provides essentially more information about the ionisation properties of proteins than the pK value. As we shall see in the last section of this chapter, the simple rule, pK - pH when a = 0.5, is not valid in case of cooperative ionisation. In such cases, the value of pK cannot be treated as an individual characteristic of the ionisation behaviour of the tilratable sites, and hence 0}(pH) should be obtained by means of Eq. (6.22) instead of directly from the Henderson-Hasselbalch equation.
206
Introduction to Non-covalent Interactions in Proteins
We should be aware of the limitation of Eq. (6.26) which arises from the assumptions made for the energy term, £(x,pH). First, we have assumed that the protein molecule is presented by a single conformer. This assumption excludes the conformational flexibility (factor 5 in Table 6.1). Second, according to Eq. (6.20) the energy of the system is purely electrostatic in character. These limitations will be discussed in the next chapter. Here we will give attention to another limitation, which is technical in character, namely that Eq. (6.22) can be applied only for small proteins. For a protein with N titratable sites the number of the possible states is 2N. A value of N about 30 is a reasonable upper limit of the current computer technology mostly available for academic purposes. Calculations for larger values of N can be performed on parallel computers, however the increased computational costs make the rigorous implementation of Eq. (6.22) unattractive. Thus for medium and large protein alternative solutions or modifications should be found. One reasonable approximation that reduces the number of states is to eliminate the sites which titrate in a pH region sufficiently distant from the pH for which the calculations are made. For instance, if the pH region of interest is around pH 4, the titratable groups, such as the guanidinium groups of the arginine side chains, which titrate at highly alkaline pH (see Table 6.3) can be treated as permanent charges and can be excluded from the sums of Eq. (6.22). This approximation can be of help only if the number of titratable sites remaining for statistical mechanical calculations [Eq. (6.22)] is less than about 30. This condition is not fulfilled by medium and large proteins. 6.4.2 Solution based on the Monte Carlo simulation The problem of the large number of protonation states can be approached by another very powerful approach, namely the Monte Carlo simulation. This approach has been adopted for prediction of the ionisation equilibria in proteins first by Beroza et ai, University of California5. Let us return to the protonation state distributions illustrated in Fig. 6.7. We see that at given pH only a few protonation states are characterised with a significant probability to be occupied. If we are able
Ionisation Equilibria in Proteins
207
to identify these protonation states, the task for determination of «;(pH) would be essentially lightened. The criterion for identification of these states at given pH is suggested by Eq. (6.21): the more favourable energies E(x,pR), the higher the probability of the corresponding microscopic state. However, in the general case of a large number of states, neither the number of the relevant states nor their energies are known. We can then pick out states randomly and calculate their energy. After a sufficiently large number of such random choices, the collected states and their energies can be assumed to be representative for the given distribution. As far as the hit of a state with relevant favourable energy is a matter of chance, the method is called Monte Carlo. The very name suggests that the chance to "win", i.e. to select states with statistical relevance (with favourable energy), is low. The collection of states can be essentially improved by a method named on its author: the Metropolis algorithm6. This algorithm starts with a Monte Carlo step. That is, we randomly pick out one protonation state x. This state is characterised with energy calculated from Eq. (6.20). Let this energy be E(x). In the next step of the Metropolis algorithm we try to find a protonation state with lower energy by varying the value of one of the variables xt in the sequence x(xu...xh...xN). The variable can be picked out randomly or consecutively. We choose to vary them consecutively starting with the variable X\. In the example given in Fig. 6.8 xx = 0. After varying it to x\ = 1 we obtain a new protonation state of the molecule which we denote as xgj. Let the energy of this protonation state calculated by Eq. (6.20) be £(x5i)- The energy difference between the two states is SE =
E(xsl)-E(x).
If the new protonation state is energetically more favourable, i.e. SE < 0, then we discard the protonation state x and choose xgi to be the representative one. If SE > 0 the protonation state X51 is selected if e'SEIRT
(6.23)
where 77 is a random number between 0 and 1. In other words, the protonation state is selected with a probability according to the Boltzmann factor e~ IRT. Assume that the microscopic state X51 is
Introduction to Non-covalent Interactions in Proteins
208
selected, as illustrated in Fig. 6.8. We denote it as xsei and continue with the variation of the variable x2. The selection of the new representative microscopic state is made in the same way using as a reference the selected microscopic state:
& =
E(x52)-E(xsel). Markov chain
Variation of the elements of x — x83 xs X52 xsi
X
XN
0 1 1
1 1 1
1 0 1
1 0 0
1 0 0
1
1
1
1
0
E(x)
£(x81)
E(xm)
£(X53)
E(xSi)
yes
no
no
selected:
t
£(xSel)
£(Xsel)
Figure 6.8 A Metropolis algorithm for selection of state with a minimum energy.
In the example given in Fig. 6.8 the protonation state X52 is not selected. This procedure, also called the Markov chain, continues with variation of the next variables. It terminates when all elements in x{x\,...Xi,...xN) are varied. The last selected protonation state, say x&—»xsel, is considered as the state resulting from this Monte Carlo step. The length of the Markov chain is not limited, so that within one Monte Carlo step several cycles of variation of the variables x, can be executed. In order to facilitate the "search" of a protonation state with low energy, within one Monte Carlo step several cycles of the variation of xt can be executed starting with an artificially high temperature. In this way the weight of the Boltzmann factor in the inequality (6.23) reduces. This
Ionisation Equilibria in Proteins
209
allows us to explore a larger area of the configurational space by increasing the probability of overcoming of energy barriers and thus to avoid trapping in statistically not relevant minima. In the context of high temperature, the initial step of random choice of a microscopic state can be considered as a choice made at infinitely high temperature [the Boltzmann factor in Eq. (6.23) is zero]. The temperature is reduced gradually in the next cycles so that the last cycle is executed for the actual temperature. The last cycle provides also the selected protonation state. This procedure is called simulated annealing. After a protonation state is selected, the next Monte Carlo step is performed in the same way. The variable x, and the corresponding degree of deprotonation ct; at a given pH are then calculated as average over the values of x, in all selected protonation states: ai{VYi)=<xi>=^-Yaxi M
'
< 6 ' 24 )
(x )
Xs-sell
where M is the number of Monte Carlo steps. The average <x,> looks like a simple average. In fact, it is a weighted average because the set of states {xsei} are selected using energy criterion based on the Boltzmann probability factor [inequality (6.23)]. Equation (6.24) substitutes the exact Boltzmann weighed sum [Eq. (6.22)] when the number of titratable sites in the protein molecule is large. 6.5 Cooperative Ionisation The most common effects of the factors considered above are the shift of the pK values and the broadening of the ionisation curves of the titratable sites. These effects we have illustrated at the beginning of Section 6.3 (Fig. 6.2). Taking into account that the desolvation energy is always positive, one can easily realise that this factor shifts the pK values of the acidic groups towards higher pH, whereas for the basic groups this shift is towards lower pH. The influence of the electrostatic potential depends on its sign, rather than on the type of the titratable groups. The positive electrostatic potential causes a reduction of the pK values, and vice versa, the negative potential increases the pA^ values.
210
Introduction to Non-covalent Interactions in Proteins
10 12
Figure 6.9 Calculated titration curve of human serum albumin. At pH 5.3 the protein has a zero net charge. This pH is the isolectric point, pi.
There is a certain compensation of the effect of the desolvation and the electrostatic potential. It arises from the fact that the net charge of the protein molecules depends on pH. This dependence is illustrated by the well known protein titration curves. As seen from the example in Fig. 6.9, at acidic pH, to the left from the isoelectric point, the protein is positively charged. Due to the average electrostatic potential created by the net positive charge of the protein molecule, the pK values of the groups titrating in this region (aspartic and glutamic acids) are shifted toward lower pH. This shift is opposite to that of the desolvation effect. The same tendency of compensation is observed at alkaline pH. The negative net charge of the protein molecule creates an average electrostatic field which stabilises the charged form of the basic groups, i.e. their pK values are shifted upwards, opposite to the shift due to the desolvation effect. Because both the desolvation energy and the electrostatic potential differ at the different sites, the compensation effect may differ substantially, or may by missing. Among the most famous examples for an extreme pK shift due to desolvation and low compensation effect is that of the hen egg white lysozyme active site Glu35, which has a pK value of 6.2. The titration curve shown in Fig. 6.9 prompts also the source of the broadening of the ionisation curves. Let us consider the ionisation of an
lonisation Equilibria in Proteins
211
acidic group. At pH = pK, the group is half deprotonated, a = 0.5. Reducing pH, the positive potential of the net protein charge increases and stabilises the deprotonated form of the group. This leads to an increase of a in comparison with that if the influence of electrostatic interactions would be a constant with pH. At pH higher than pK, the stabilisation effect of the net charge is reduced, which leads to a reduction of a. As a result, the ionisation curve broadens as illustrated in Fig. 6.2. The ionisation behaviour described above is most commonly observed in proteins. Therefore, experimental records showing pH dependence as illustrated in Fig. 6.1 can be analysed by fitting to Eqs. (6.6) or (6.7) which provides the pK value of the putative titratable site responsible for the observed phenomenon. However, there are cases were the above clear rules are not followed.
Figure 6.10 Constriction zone of the channel formed by the transmembrane protein porin PhoE. The titratable groups of four basic side chains form a cluster. The distances between neighbouring titratable sites are within 3 to 4 A which results in strong mutual electrostatic interactions. The ionisation of these sites is expected to be cooperative in character. The biological unit is a trimer of identical subunits. Only one subunit is shown in the figure.
212
Introduction to Non-covalent Interactions in Proteins
If titratable groups with similar pKint are situated close to each other, they form a cluster within which the mutual electrostatic interactions are the dominating factor regulating their ionisation behaviour. An example of a cluster of several basic amino acids is illustrated in Fig. 6.10. The ionisation of a site involved in such clusters is coupled with the ionisation of the other members of the cluster, so that its titration curve may not follow the familiar sigmoidal form illustrated in Fig. 6.1 or Fig. 6.2. In such cases one speaks about cooperative ionisation. From a theoretical point of view, cooperative ionisation has been first investigated by the research group of Yang et al.''. Consider two carboxyl groups with equal pKint. Increasing the degree of deprotonation of one of the groups, its negative potential increases and stabilises the protonated form of the second group. In this way the deprotonation of the latter is suppressed. The same influence has the deprotonation of the second group on the first one. Due to this buffering effect, the titration behaviour of each of the groups is characterised by a plateau as shown in Fig. 6.11. The plateau indicates the pH region where this buffering effect occurs. If the pKint values of the groups differ, the two-step sigmoidal character of the titration curves remains, however, their plateaus are shifted. The plateau of the ionisation curve of the group with lower pKint is shifted toward higher values of a (the upper dashed curve in Fig. 6.11), whereas the plateau of the group with the higher pKim will be symmetrically shifted towards lower values of a. This shift arises from the fact that the buffering effect occurs at different values of the degree of deprotonation of the two groups. It can be easily realised that if the difference between the pKint values approaches or is larger than about 2 pH units, the buffering effect vanishes. It can be also shown that the distance between the inflection points pK and p ^ ' is equal to AWaa/23RT, where AWaa is the difference between the charge-charge interaction energies in protonated and deprotonated forms of the groups. The same is valid for pairs of basic groups.
lonisation Equilibria in Proteins
f
1.0
a
213
0.5
.,^>f"AWafl/2.3i?r 0.0 pK
6
pH
pK"<
10
Figure 6.11 Cooperative ionisation curves. The continuous line corresponds to the cooperative ionisation of a group, whose ionisation is coupled with the ionisation of another group with the same pKint. The ellipse marks the region of the plateau. The dashed lines correspond to the ionisation curves of a pair with different pKint. The upper curve is that of the group with lower pKint.
The environment of the individual titratable sites may induce large shifts of the intrinsic ionisation properties of the titratable groups so that the pKint of a basic group becomes lower than that of an acidic group. If these groups are close enough, a cooperative ionisation occurs. The appearance of the cooperative ionisation of acid-basic pair does not differ from that shown in Fig. 6.11. However, the formal conditions, first deduced by Koumanov et a/.8, are different. They can be summarised as follows. First, as we have already stressed, A K
V int
= PKint ( a c i d ) - VKint ( b a s e ) > ° •
Second, the absolute value of the difference between &pKin, and the magnitude of the charge-charge interaction energy between the sites should satisfy the inequality \ApKint-AWab/RT\<23RT,' where AWab - Wa^deprotonated) - W^protonated). Third, \AWab\>23RT. The shape of the titration curves illustrated in Fig. 6.11 can be described as a sum of two sigmoidal curves of type of Eq. (6.6) with
214
Introduction to Non-covalent Interactions in Proteins
inflection points pA"1 and PA"', respectively. These inflection points do not have the meaning of \>Km because in the case of cooperative ionisation at these points a ^0.5. Thus, if an experimental record shows a pH dependence similar to that in Fig. 6.11, it is not obvious that ionisation of two groups with equilibrium constants pfC and pA"' is observed. If the Koumanov's conditions are fulfilled it might be a case of cooperative ionisation. 1.0 Arg82+Argl32 0.8 0.6 0.4 1 n-
4
6
8 pH
10
12
i
'"_4
6
8 pH
10
12
Figure 6.12 Ionisation curves of Arg82 and Argl32 belonging to the charge cluster shown in Fig. 6.10. Left: Individual ionisation curves. Right: The sum of the ionisation curves.
Let us return to the cluster of titratable groups shown in Fig. 6.10. As we have mentioned, it is expected to have cooperative ionisation. The ionisation curves of two titratable sites from this cluster are shown in Fig. 6.12. Indeed, they exhibit the characteristics typical for cooperative ionisation. The ionisation curve of Arg82 has a plateau in the region of pH 9 very close in shape to that given in the example in Fig. 6.11. The ionisation of the other titratable site in the cluster, Argl32, is also characterised by a plateau in this pH region, however it shows a tendency of "winning" back its protonated state at pH higher than 10. This is a result of the cooperative behaviour of the ionisation of the titratable site in the cluster. It is worth quoting the researchers who were first to become aware of cooperative ionisation9, namely that this effect should not be considered as "theoretical exercises" without connection to the
Ionisation Equilibria in Proteins
215
experimental observations and to the properties of proteins. In the right panel of Fig. 6.12 the sum of the ionisation curves of the two sites is given. It resembles very much the ionisation curve of a single site with pK of 6.8. Interestingly enough, the channel size formed by this class of porin molecules shows pH-dependence with a midpoint around pH 7. Hence, the ionisation behaviour of the pair Arg82-Argl32, which can be treated as a single titratable site, can be related to this experimental observation. There is another, very important, issue which should be pointed out. We should always be aware of the limitation arising from our basic assumption, namely that the protein molecule is presented by a single conformation. Upon ionisation of the sites involved in strong electrostatic interactions with a neighbouring titratable group, the change of these interactions can induce conformational transitions, thus changing the environment of the interacting groups. This leads to changes of both, &pKint and the charge-charge interaction energies, and the conditions for cooperative ionisation can be lost. Even in such cases, the analysis based on a single three-dimensional protein structure can be of help for revealing details in the phenomena that show pH dependence. If the theoretical analysis suggests cooperative ionisation, which is not confirmed by experimental pK measurements, this discrepancy can be interpreted as evidence that conformational changes occur upon the ionisation of the titratable groups in question. References 1. Winberg JO, Brendskag MK, Sylte I, Lindstad RI and McKinley-McKee JS, (1999) The catalytic triad in Drosophila alcohol dehydrogenase: pH, temperature and molecular modelling studies. J. Mol. Biol, 294: 601-616. 2. Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S and Karplus M, (1983) CHARMM: A program for macromolecular energy, minimization, and dynamics calculations. J. Comp. Chem., 4: 187-217. 3. Tanford C and Kirkwood JG, (1957) Theory of titration curves. I. General equations for impenetrable spheres. /. Am. Chem. Soc, 79: 5333-5339. 4. Bashford D and Karplus M, (1990) pKa's of ionizable groups in proteins: atomic detail from a continuum electrostatic model. Biochemistry, 29: 10219-10225.
216
Introduction to Non-covalent Interactions in Proteins
5. Beroza P, Fredkin MY, Okamura MY and Feher G, (1991) Protonation of interacting residues in a protein by a Monte Carlo method: application to lysozyme and the photosynthetic reaction center of Rhodobacter sphaeroides. Proc. Natl. Acad. Sci. U. S. A., 88: 5804-5808. 6. Metropolis N, Rosenbluth AW, Rosenbluth MN and Teller AH, (1953) Equation of state calculations by fast computing machines. J. Chem. Phys., 21: 1087-1092. 7. Yang A-S, Gunner MR, Sampogna R, Sharp K and Honig B, (1993) On the calculation of pKas in proteins. Proteins, 15: 252-265. 8. Koumanov A, Ruterjans H and Karshikoff A, (2002) Continuum electrostatic analysis of irregular ionization and proton allocation in proteins. Proteins, 46: 85-96. 9. Bashford D and Gerwert K, (1992) Electrostatic calculations of the pKa values of ionizable groups in bacteriorhodopsin. J. Mot. Biol., 224: 473-486.
Chapter 7
Conformational Flexibility
We define the conformational flexibility of proteins as the ability of the side chains and fragments of the polypeptide backbone to adopt different conformations while maintaining the native protein structure. According to this definition, the native protein molecule is presented by a set of structures instead of one fixed structure. This does not contradict to the idea of "unique native protein structure". We just extend it by giving the structure the freedom to vary within a set of conformations. Conformational flexibility and non-covalent interactions are closely related. Any change of the conformation of an amino acid side chain, at which its solvent accessibility is changed, involves changes of van der Waals interactions. At different conformations the functional groups may also change their hydrogen bond partners. As we shall see, electrostatic interactions are especially sensitive to variation of the conformation of the protein molecule. 7.1 Allocation Variation of Polar Hydrogen Atoms The term "polar hydrogen atom" was introduced in Chapter 3 to distinguish the hydrogen atoms whose nuclei are deshielded, making the corresponding chemical bond polar. Polar hydrogen atoms are often involved in hydrogen bonding and if so, their locations are of particular interest. Some of these hydrogen bonds involve functional groups which are at the same time titratable. Hence, upon deprotonation of such groups the hydrogen bonds break or are rearranged. This leads to reorientation of the polar hydrogen atoms in the vicinity of the titratable group.
217
218
Introduction to Non-covalent Interactions in Proteins
The ionisation of the titratable groups has been considered so far as a change of the protonation state and, respectively, of the charge of the individual titratable groups in the protein molecule, ignoring their ability to participate in hydrogen bonds. Here, we will take this property into account by "allowing" the polar hydrogen atoms to change their positions. We have already pointed out that upon deprotonation (or protonation) of the titratable groups conformational changes may occur. The relocation of the polar hydrogen atoms can be considered as the simplest case of such conformational changes reflecting transitions between various rotamers and tautomers. The introduction of this type of conformational flexibility is in fact an extension of the theoretical approach viewed in the previous chapter which does not change the ideology or the computational strategy by any means. However, we have to make one additional assumption, namely that the change of the location of the hydrogen ions does not change the shape of the protein dielectric material. This assumption concerns the choice of the set of the van der Waals radii used to create the dielectric map, rather than the basis of the theoretical approach. If we use the united atoms' radii (see Table 4.5), we can employ the finite difference solution given in the previous chapter without modifications. 7.1.1 Titratable andpH-sensitive sites The substantial difference between the basic approach and the extension we are going to develop in this section is that the polar hydrogen atoms can change their positions. These changes occur under the influence of the ionisation of the titratable sites only; hence these changes are pH dependent. Functional groups that undergo such conformational changes induced by ionisation of the titratable sites in the protein molecule we refer to as pH-sensitive sites. The formal difference between pH-sensitive and titratable sites is that the former can change the orientation of their dipole moment, whereas the latter can also change their protonation state. The simplest example of a pH-sensitive site is the hydroxyl group of the threonine or serine side chains. Water molecules incorporated in the protein molecule can also be considered as pH-sensitive sites. The amide
Conformational Flexibility
219
groups of the glutamine and asparagine side chains can be treated as pHsensitive sites if the rotamers of these groups are assumed to be dependent on electrostatic interactions only. The term "pH-sensitive sites" does not describe a distinct property of proteins. It rather delineates the limits of the extension and facilitates our further considerations. It should be noted also that the pH-sensitive sites "feel" pH only via the change of the ionisation states of the titratable sites. Hence, their properties of interest, namely the orientation of the polar bonds, are not a direct function of pH. 7.1.2 Microscopic pK The variation of the positions of the polar hydrogen atoms, which includes rotamers and tautomers, can be described by introducing additional microscopic states of the individual titratable groups, as well as of the functional groups defined as pH-sensitive sites. Thus, for instance, the protonated form of the carboxyl group can be described as a form with two microscopic states. In this way, instead of two microscopic states (protonated and deprotonated), the carboxyl group will be characterised by three microscopic states: one deprotonated and two protonated (Fig. 7.1 A). The two protonated forms can be considered as rotamers of the carboxyl group. Similarly, the deprotonated form of the histidine residues is characterised by two microscopic states corresponding to Ne-H and N8-H tautomers (Fig. 7. IB). Three possible rotamers of the serine side chain are illustrated in Fig. 7.1 C. Each of these rotamers corresponds to one microscopic state of this pH-sensitive site. The choice of these rotamers is substantiated by the stereochemistry of the compound, i.e. the orientations of the O-H bond correspond to the three energy minima when rotating around the CP-Oy bond (see also Fig. 7.10). For the description of the microscopic states of the titratable and pHsensitive sites we will employ the ideology developed in the previous chapter. A given titratable or pH-sensitive site, /, is fully described by a set of n microscopic states Sa,i{cx= 0, 1, ..., «,—1). Hence, instead of two values (x,= 0 for the protonated and x,= 1 for the deprotonated form), the
220
Introduction to Non-covalent Interactions in Proteins
variable xh determining the microscopic state of a site i will now have n,values, each corresponding to a single microscopic state S a ,. 0 1 2 (A)
j^; 0
0
1
1
2
2
Figure 7.1 Examples of various allocations of hydrogen atom: (A) in titratable sites (rotamers of glutamic acid), (B) in titratable sites (tautomers of histidine), and (C) pHtitratable sites (serine). Note that only polar hydrogen atoms are presented.
In our previous considerations we have chosen to work with the dissociation constant. In this way we have determined the reference microscopic state of each titratable site to be the protonated form, x-, - 0. Similarly, we define a; = 0 as a reference microscopic state for a certain titratable or pH-sensitive site. In fact, there are no requirements for the choice of reference state. Therefore we are free to define the reference states according to those criteria which best suit the concrete task. Thus, for instance, in the example given in Fig. 7.1 the reference state of the carboxyl groups is chosen to be the deprotonated form, whereas for the histidine a, = 0 when this group is in its protonated form. In the following considerations we temporarily skip the index / in order to focus on a single titratable or pH-sensitive site. Also, we will use
Conformational Flexibility
221
Greek letters when referring to microscopic states or quantities. Each microscopic state, a, is characterised by a certain number of titratable hydrogen atoms, v a , and a specific charge distribution, pa. For the protonated form of the carboxyl group given as an example in Fig. 7.1, or takes the values 1 or 2 and v a = 1. The pH-sensitive sites do not titrate, hence v a = 0 for all microscopic states. The transition from the reference microscopic state, S0, to state Sa can be described by a microscopic equilibrium constant, K% , or its equivalent, pK% , defined as: p^=-lg^=-lg|y[H
+
]^=-l
S
Pa
+AvapU,
(7.1)
where pa is the population of state Sa(Lpa = 1) and Ava = v 0 -v a . The superscript ju indicates that Eq. (7.1) is written for equilibria between microscopic states, whereas a identifies the equilibrium between the reference state S0 and the microscopic state Sa. m protonated microscopic states
n—m deprotonated microscopic states
v„ = . . . = vm_! = v
v m = . . . =v„ = v - l
•So, M>, A ) < -
A pKMm-\
• v >'/« !• '«?•!•
Pm-\
•* • "m? *mi Pm
pK°bs
.V„.|. !•'„..|. p„..
Figure 7.2 A general scheme of the equilibrium constants relating the microscopic states. The observable macroscopic equilibrium constant pA**s relates the protonated (left) and deprotonated (right) forms of the site.
Consider a compound with m protonated states, v0 = ... = vm_i = v, and n-m deprotonated states, vm = ... =v„_i=v-l, ( l < m < n - l ) as illustrated in Fig. 7.2. The equilibrium between the protonated and deprotonated forms is given by the macroscopic equilibrium constant which we designate here as pK°bs. If this is a model compound as defined in Section 6.4.2, pK°hs = pKmod. Since, in the terms of our considerations
222
Introduction to Non-covalent Interactions in Proteins
pKmod is a key quantity distinguishing the chemical character of the titratable groups, it is important to find a relation between its macroscopic and microscopic equilibria. Within the protonated form of the model compound, the equilibrium constant of the transition from the reference microscopic state S0 to a state Sa can be written as ptf£=-lg^. (7.2) Po This formula directly follows from Eq. (7.1) taking into account that no deprotonation occurs, i.e. Av=0. For the transition from the reference microscopic state S0 to a state in the deprotonated form of the compound, Sp, the microscopic equilibrium constant is written as p A $ = - l g ^ - + pH. (7.3) P Po For the macroscopic equilibrium constant Eq. (6.2) can be written as n-l
YUPP l g ^ = l g ^ ^ + lg[H + ] TjPa or n-\
m-\
-p/r^igX^-igl^-pH. {S=m
a=0
The indexes a and (3 enumerate the microscopic states in the protonated and the deprotonated forms of the titratable group, respectively. A connection between pK°bs and pKi1 can be obtained by an artificial manipulation of the above equation: n—1
m—\
n
n
-p^=lgZ^-lgI^-pH + (lg^-lg^) fi=m
so that
a=Q
PO
PO
Conformational Flexibility n-1
223
m-\
n
-P^=igI^-igZp«-ig^-P^, P=m a=0 Po where we have substituted the appropriate terms with Eq. (7.3). The desired connection can then be written as n-l
m-l
n
P^=P^+lgI^-lgS^-lg— • P=m
«=0
PO
Now, we would like to separate the microscopic equilibria within the two protonation forms of the compound: n—\
m—\
pK% = pKobs +lg Y,Pj3 ~lg Y,Pa -agPfi P=m
-lgPo) + lgPm - l g p m ,
or=0
where, like before, we have added and subtracted one and the same term. After some algebra, the above equations can be written as follows pK% = pKobs + SpK% - SpK% - l g ^ - , Pm
(7.4)
where SpK» = - l g g — and SpK% = - l g g ^ . The terms ( S p ^ a n d <%>^ reflect the content of microscopic states in the protonated and deprotonated forms of the compound, and hence have pure entropic meaning. Usually, the charged forms of the titratable groups can be represented by a single microscopic state. In the case of the acidic groups, the deprotonated form is the charged one, so that there is no need to define positions of the hydrogen ions, whereas in basic groups the positions of the hydrogen atoms are uniquely defined. It follows that for the acidic groups SpKff = 0 and dpK$ < 0, hence pK% < pKobs. For the basic groups SpKg = 0 , so that pK% > pKobs.
As an example, let us evaluate the microscopic pK values of the glutamic acid side chain from Fig. 7.2 considered as a model compound. The protonated form of this titratable group has two states, whereas the deprotonated form only one. We assume that the rotamers in the
224
Introduction to Non-covalent Interactions in Proteins
protonated form are equally probable, i.e. they are equally populated. The correction terms in Eq. (7.4) become SpKfi - - l g 2 and SpK% = 0. The term \gipplpm) - 0 because the deprotonated form has only one microscopic state, pp = pm. Thus, for the microscopic pK values of this titratable group we obtain obs
PK%=pK
-lg2.
This evaluation is justified for the cases of titratable groups which are part of model compounds in solution where the rotamers of the carboxyl group are indistinguishable and hence equally populated. In proteins, due to difference in the environment the rotamer populations may differ. 7.1.3 Population of the microscopic states Each rotamer or tautomer of a certain titratable or pH-sensitive site is treated as a microscopic state, So, with a population pa. Our main goal is to find these populations as a function of pH. The methodology for achieving this goal does not differ from that developed in the previous chapter for the protonation/deprotonation equilibria. Here we need to make some modifications concerning mainly the terminology. First, we need to modify the thermodynamic cycle relating the equilibria defined for a model compound and those in the protein molecule. Instead of referring to protonated and deprotonated state as it was done in the previous chapter (Fig. 6.3), we will consider a thermodynamic cycle connecting different microscopic states (Fig. 7.3). The values of pK%od of the titratable sites are determined by Eq. (7.4), and those of the pH-sensitive sites by Eq. (7.2). This means that the populations of the microscopic states of the model compounds should be known. It is reasonable to assume that these populations are equal. If so, for the transitions within the same protonation form of the titratable group in the model compound pK%od = 0. Obviously, for the pHsensitive sites pK^od = 0, as well. For proton binding sites with different affinity, such as Ne and N8 atoms of the histidine imidasole ring, this assumption is not valid. The microscopic pK values in such cases are determined by the difference between the experimentally measured equilibrium constants.
Conformational Flexibility
r
solvent
225
mod
AGrp
S-^P
protein AG,o->« Figure 7.3 Thermodynamic cycle used for the calculation of the transition from the reference microscopic state, S0P, to microscopic state, Sap, of a titratable or pH-sensitive site in the protein molecule.
Following the formalism developed in the previous chapter and the thermodynamic cycle from Fig. 7.3 we can write an expression for the transition SQ —> S# of a given site in the protein molecule: AG^a
=23RT(VK^mod
- AviaVU)
+AGaM
" AGa,pc
+
AG
a,cc •
The terms in the right hand side of the above equations are already known. The Born energy term zlGaBorn we obtain from the transfer energies of the site from the model compound to its position in the protein [see also Eqs. (6.12) and (6.13)]:
AG a, Born
qa(k)[k)-(pS(pa,k)) L
k
- q0 (k)i
where the pa and /% are the charge distributions of the site in microscopic states Sa and S0, respectively. The sums in the above equations are, as before, taken over all charges of the site. The meaning of the electrostatic potentials ^is the same as in Eqs. (6.12) and (6.13). We only have to note that within the same protonation form of the site the charge distributions pa differ by the positions of the partial charge of hydrogen atom. The term AGa,pc is given by [see Eq. (6.14)]:
226
Introduction to Non-covalent Interactions in Proteins
T.4a,Pc(k)[
AGa,Pc =
ke{pc}
Continuing the course of the considerations made in the previous chapter, we can represent these two terms as shifts of the microscopic pK values: A K
P a,Bom
=
AG a,Bom 23'R™
and
i A AK is M V a,Pc
=
a,pc 2 3 R T
In this way we can introduce intrinsic microscopic pK: PK%M = pK^mod + 4 < , B o m + AvK,Pc
.
(7.5)
The intrinsic microscopic pK does not have the meaning given by the Tanford definition used in Section 6.3.3. Here, pK%int is intrinsic for a transition between microscopic states, including transitions within the same protonation form of the site. Hence, the intrinsic protonation/deprotonation equilibrium of a certain titratable site is characterised by a set of microscopic intrinsic pK values. It is interesting to see how pH-sensitive sites obey Eq. (7.5). If the populations of the microscopic states in the model compound are equal, according Eq. (7.2) pK%mod = 0 for all transitions. However, the terms ^G^Bom and AG^pc can be non-zero because the charge distributions pa and hence q?(pa,k) differ. Thus, for the pH-sensitive sites we obtain PKa,int = APKa,Bom
+ APKa,pc
•
This expression shows that the equilibria of the microscopic states of the pH-sensitive sites are determined by their environment in the protein molecule. This is an expected result, which directly follows from our basic assumptions. We are mentioning it in order to stress that Eq. (7.5) and the following expressions implicitly distinguish the difference between titratable and pH-sensitive sites. In principle, the evaluation of the charge-charge interactions does not differ from that described in the previous chapter. Again, the only difference is that here we consider the transition of a given site i from microscopic state S0 to state Sa in the context of the microscopic states of all other sites. The charge-charge interactions of site i in state 5«with site j in microscopic state P is
Conformational Flexibility
227
where {/'} is, as before, the set of charges of site j . The contribution of the charge-charge interactions to the transition S„ —> S^ is then j*i
j*i
The summations in the above expressions are over all titratable and pHsensitive sites, but site i. Each site j is in a certain microscopic state /? indicated by the double index j/3. Having all terms composing the energy of transition of a site i from state So to state Sa, we can write
^ W
=23RT(pK%Jnt -AviaV\i)
+ YwiaJp j*'
-YWtojp • j*i
The index j enumerates all titratable and pH-sensitive sites in the protein molecule. With the intention of being as close as possible to the considerations made in the previous chapter we kept practically the same notation. Hence, we should note again that the energy AGifi^a is the energy difference between two microscopic states. Here we can also express the energy of interaction of each site i in microscopic state Sa with the protein molecule and the rest of the sites, each of which is in a certain microscopic state Sp as follows EiXa(x,pH) = 23RT(pK%Jnt~AviXapH)
+ YjWiXaJx/j ,
(7.6)
where Avixa indicates the difference between the titratable hydrogen atoms content of the current state Sa and of the reference state S0. As before, the sequence x = (xi, ...x„ ...JCJV) corresponds to a single microscopic state of the system. It consists of iV variables (N is the total number of titratable and pH-sensitive sites), each of which describes the microscopic state of the individual sites: x,• = 0, 1, ..., a, ...,rii. Thus, the index ixa is determined by the value or of the element xt of the sequence x and uniquely defines the microscopic state Sa of site i. In the same way, the index jxp defines the microscopic state Sp of site j . The notations a
228
Introduction to Non-covalent Interactions in Proteins
and j5 should not be confused with those used in Eqs. (7.2) and (7.4) where we have formally indicated the protonated and the deprotonated form of a given site. Here, a and /? indicate the microscopic states of two different sites. The energy of the system in a microscopic state x is then given by
and taking into account that E(x) depends on pH via Eq. (7.6) J?(x,pH) = 2.3*rX(p*£fc, -AviXaVH)
+ ^WiXaJx/j
i
i,j
. (7.7)
The above equation, embedded in Eq. (6.22), can be used for calculations of the average degree of deprotonation of the titratable site or for the average occupancy of the different tautomers or rotamers of the titratable and pH-sensitive sites. In order to do this, we have to take into account that the microscopic states of a certain site i, Sia (/oc = 0, 1, ...,n—1), cannot exist simultaneously. This can be done by means of the Kronecker notation Sixja). If in a given microscopic state of the system x, = ia then d{Xi,id) = 1, otherwise Sixja) = 0. Thus, for the average population of a certain state, which is a macroscopic and in general an observable quantity, finally we can write the statistical weighted sum as follows: YJ8{xi,ia)e-E{^Yi)IRT P i g ( p H ) = { X l ^e_E{x,pH)/RT
•
(7-8)
W
The sums in the above formula are taken over the set {x} of all possible microscopic states of the system. This set is essentially larger than that defined in the previous chapter and used in Eq. (6.22) containing more than two microscopic states of the titratable sites and various number of microscopic states of the pH-sensitive sites. Equation (7.8) can be used for prediction and analysis of both ionisation equilibria and possible pH dependent rearrangements of hydrogen bonds in proteins. The latter can be considered as elementary pH-induced conformational changes, which do not affect non-hydrogen
Conformational Flexibility
229
atoms. There are no principled difficulties for extending this approach by including other energy terms, such as van der Waals interactions and conformational variation of the side chains. Efforts in this direction have already been made by Alexov and Gunner1. Still the calculations based on this approach are extremely demanding from computational point of view, which makes it difficult to benefit from the rigorousness of Eq. (7.8). 7.2 Examples for pH-Dependent Hydrogen Bonding As we have pointed out, the extension introduced in Eq. (7.8) is limited to variation of the polar hydrogen atoms only. In spite of this restriction, it is a useful tool for interpretation of experimental observations whose analysis, if based on the "classical" model presented in Chapter 6, may lead to misleading conclusions. In this section we shall consider two examples of the application of this extension. It is important to note, however, that the interpretation proposed in the examples should not be taken as valid beyond our basic assumption, namely that the nonhydrogen atoms of the protein molecule remain fixed in a single structure. 7.2.1 Ionisation properties ofAsp76 in ribonuclease Ti The experimentally observed ionisation equilibria of two aspartic acid side chains in ribonuclease Ti form Aspergillus oryzae are shown in Fig. 7.4. The pH dependence of Asp66 (the open circles) is characterised by a pK value of 3.9. This feature, as well as the whole titration curve (the continuous line), is fairly well predicted by Eq. (6.22). Very different is the situation of Asp76. The chemical shift recorded for this residue shows weak pH dependence. Judging from the values of the chemical shift, one can suppose that the carboxyl group of this residue is in its protonated form. Comparing it with the titration curve of Asp66 (see Fig. 7.4), an alternative suggestion can be made, namely that Asp76 remains half protonated in the pH region investigated. The calculations made on the basis of Eq. (6.22) show that Asp76 has a pK value of 6.5,
230
Introduction to Non-covalent Interactions in Proteins
reflecting the fact — observed in the crystal structure of the molecule — that this residue is buried in the protein interior and does not participate in salt bridges. Hence, Asp76 is protonated in a large pH range. However, there is no experimental indication for the predicted transition at pH 6.5. Moreover, neither the interpretations suggested by the experiment, nor the theoretical expectation can explain another observation. On the basis of denaturation experiments with wild-type and Asp76Asn mutant of ribonuclease Ti a pK of 0.5 has been evaluated for Asp762, meaning that this residue is in deprotonated form in the whole pH range investigated by NMR spectroscopy. 1.0 0.8
0.4 0.2
0.0 2
3
4
5
6
7
8
pH Figure 7.4 pH dependence of the NMR chemical shift of the carboxyl carbon resonances for two aspartic acids, Asp76 (solid circles) and Asp66 (open circles)3. The continuous line is the theoretical prediction of the degree of deprotonation of Asp 66 based on Eq. (6.22).
The side chain of Asp76 is situated at the bottom of a cleft formed by the protein molecule and has a low accessibility to the solvent. According to the crystal structure of the protein the solvent accessible region of the carboxyl group of this residue is occupied by a water molecule, which is immobilised in its position by hydrogen bonds with the polar atoms lining the cleft (Fig. 7.5).
Conformational Flexibility
231
Figure 7.5 Titratable and pH-sensitive sites in the vicinity of Asp76 of ribonuclease T| form Aspergillus oryzae. Only polar hydrogen atoms which can vary their locations are shown. (A) One of the possible configurations of the polar hydrogen atoms at pH < 2. (B) Configuration of the polar hydrogen atoms at pH > 2. The rotation of the hydroxyl group of Thr93 is shown with an arrow.
It is interesting to see whether the extension introduced in Eq. (7.8) can provide information about the behaviour of these hydrogen bonds and about the peculiar ionisation behaviour of Asp76. Because the water molecule appears to be an important hydrogen bond partner, it is reasonable to treat it as a part of the protein molecule. In terms of our considerations made in Chapter 5, this means that the water molecule should belong to the low dielectric material, rather than to the
232
Introduction to Non-covalent Interactions in Proteins
surrounding solvent. As seen from Fig. 7.5, this molecule is surrounded by functional groups (titratable and pH-sensitive sites) which can act as proton donors and acceptors depending on the orientation of the corresponding polar hydrogen atoms. Accordingly, we can define a number of orientations (microscopic states) of the water molecule and treat it as pH-sensitive site. The calculations made on the basis of Eq. (7.8) show that Asp76 has pK of 2.0, which is essentially lower than that calculated by means of Eq. (6.22). This is to a certain extent a surprising result, taking into account the fact that the carboxyl group is buried in the protein moiety and isolated from the bulk solvent by the immobilised water molecule, which belongs to the low dielectric material. The explanation of this result is given by the effect of the reorganisation of the hydrogen bonds in this region. At pH < 2 the water molecule acts as a proton acceptor to the carboxyl oxygen of Asp76 which is in its protonated form (Fig. 7.5A). At the same time, the water molecule is a proton donor in the hydrogen bond with hydroxyl group of Thr93. The second hydrogen atom in this configuration of the water molecule points at the bulk. There are other energetically favourable, but less populated orientations (not illustrated), at which the water is a proton acceptor to the carboxyl oxygen of Asp76. At pH > 2 another configuration of the hydrogen bonds in this region becomes favourable (Fig. 7.5B). Subject to these conditions, the water molecule acts as proton donor to the carboxyl oxygen of Asp76, which is now in its deprotonated form. In the hydrogen bond (water)0-H'"OH(Thr93), observed also at low pH, the roles of the two oxygen atoms are changed and the water molecule is now proton acceptor. Thus, the transition from protonated to deprotonated form of Asp76 is transmitted by the water molecule, which is accessible to the bulk solvent. The coupling between the reorientation of the water molecule and the deprotonation of Asp76 is illustrated in Fig. 7.6. The result obtained by these calculations suggests a possible, though speculative, interpretation of the lack of pH dependence of the chemical shift observed for Asp76 (Fig. 7.4). The carboxyl oxygen atoms of this residue participate in hydrogen bonding with the same partners in the whole pH range of experimental record. In this context, the oxygen atom bound to the water molecule undergoes a transition from a proton donor
Conformational Flexibility
233
to a proton acceptor. Thus, the environment of the monitoring nucleus does not change essentially, which results in the observed weak pH dependence. 1.0 0.8 0.6 "
a,p 0.4 " 0.2
tra" 0
2
4
6
8
PH Figure 7.6 Degree of deprotonation, a, of Asp76 (dashed line) and occupational probability, p, of the orientation of the immobilised water molecule acting as proton donor to the carboxyl oxygen of Asp76 (solid line).
The behaviour of the immobilised water molecule provokes also other reflections. The reorganisation of the hydrogen bonds and the reorientation of the corresponding chemical bonds occurring upon ionisation of the titratable groups can be treated as a reaction of the medium. Thus, the introduction of this limited conformational flexibility partially accounts also for the dielectric properties of the protein molecule. Having this in mind, it is reasonable to assume that the parameter, £p, describing these properties, should reflect the effects of electronic and atomic polarization (see Section 5.3.1) rather than reorientation of dipoles, i.e. £p should be around 4. Another instructive conclusion that can be drawn from this example is that water molecules participating in hydrogen bonding with the protein should not be considered as part of the bulk solution. Although they are immobilised, these molecules are free to reorient their dipoles and in this way can essentially influence electrostatic interactions and ionisation equilibria in proteins.
234
Introduction to Non-covalent Interactions in Proteins
7.2.2 Hydrogen bond rearrangement related to protein function The ionisation behaviour of the active site of alcohol dehydrogenase from Drosophila lebanonensis is another example illustrating the importance of the hydrogen bonding of the titratable sites. This enzyme catalyses the oxidation of primary and secondary alcohols to aldehydes and ketones using NAD+ as coenzyme. In the catalytic process, the active site (Tyrl51, Lysl55, Serl38) abstracts a proton from the hydroxyl group of the substrate and a hydride is transferred from the generated alcoholate anion to the NAD+ molecule. Kinetic studies on the pH dependence of the different reaction steps revealed that substrate binding and dehydrogenation steps are pH dependent with a midpoint at pH 7.1 (See Fig. 6.1 A). This pH dependence has been attributed to the deprotonation of a titratable site supposed to have pK of 7.3. Such a site has not been identified experimentally or theoretically. Using the approach of position variation of the polar hydrogen atoms, we can investigate the ionisation behaviour of the active site of this protein. Three amino acid residues involved in the active site are suspected to play a key role in the observed pH dependence: Serl38, Tyrl51 and Lysl55 (Fig. 7.7). The active site is isolated from the solvent by the molecule of the co-factor, NAD+, restricting also the conformational flexibility of the side chains in the region. The NAD+ ribose hydroxyl group is pointing towards the interior of the active site and is in the immediate vicinity of the titratable groups of Tyrl51 and Lysl55 (see 02' atom in Fig. 7.7). This group is also considered as pHsensitive site. Because there is no three-dimensional structure of the complex with the alcohol substrate, we will use the structure in which the position of the alcohol hydroxyl group is occupied by a water molecule. This molecule is identified as W in Fig. 7.7. As before, we will treat the water molecule as a pH-sensitive site which can adopt different orientations.
Conformational Flexibility
235
Figure 7.7 The active site of alcohol dehydrogenase from Drosophila lebanonensis. The titratable hydrogen atoms are given in cyan, whereas the rest of the polar hydrogen atoms belonging to the titratable and pH-sensitive sites are presented as usual. The other hydrogen atoms are omitted. (A) Hydrogen bond network at low pH. The orientation of the 02'-H group at which the hydrogen is not involved in hydrogen bonding is mostly populated (0.7). The e-ainino group of Lysl55 is a proton donor rather to the NAD* ribose 0 3 ' hydroxyl group. (B) Intact proton relay chain at high pH. The direction of proton abstraction is indicated by an arrow.
Introduction to Non-covalent Interactions in Proteins
236
The calculations made by means of Eq. (7.8) reveal that the titratable sites constituting the active site show a cooperative ionisation. As seen from the left hand side panel of Fig. 7.8, both Tyrl51 and Lysl55, are partially deprotonated even at low pH. The buffering effect of the parallel deprotonation of these groups is reflected by the formation of a plateau of Q(pH) at pH > 9. The sum of the two titration curves resembles very much the deprotonation of a single group with apparent pK (midpoint) of 7.2.
pOA
4
5
6
7
8 9 pH
10 11
4
5
6
7
8 9 pH
10 11
Figure 7.8 Left: Degree of deprotonation of the two titratable side chains in the active site of alcohol dehydrogenase. Right: Occupational probability of the ribose hydroxyl group (02'-H) rotamers as a function of pH.
Tyrosine hydroxyl groups have two stereochemically favourable orientations. In the case of Tyrl51 these orientations point towards the water molecule and to the NAD+ ribose hydroxyl group (02'-H). The calculations show that the latter orientation is predominantly populated in the whole interval between pH 4 and pH 11. It follows that the tyrosine hydroxyl oxygen atom is proton acceptor in the hydrogen bond with the water molecule in this pH range. Hence, the deprotonation curve shown in Fig. 7.8 is that for the rotamer forming hydrogen bond with the ribose hydroxyl group (02'-H).
Conformational Flexibility
237
The orientation of the ribose hydroxyl group as a function of pH is worth noting. The pH-dependence of the population of the three stereochemically favourable orientations is presented in the right hand side panel of Fig. 7.8. The orientations at which the hydroxyl group acts as proton donor in the hydrogen bonds with Tyrl51 and Lysl55 are occupied with probabilities following the degree of deprotonation of the corresponding groups (compare the dashed curves a(pH) and /?(pH) in Fig. 7.8). The population of the third rotamer, designated as free in Fig. 7.8 and shown in Fig. 7.7A, follows the sum of the degree of deprotonation of the two titratable sites. This behaviour of the ribose hydroxyl, which is consistent with the deprotonation of the titratable sites in the active site, cannot be detected by means of the concept of two microscopic states (protonated and deprotonated). Moreover, it could suggest an idea about the pH dependent component of the catalytic mechanism of substrate dehydrogenation. Here, we will speculate and presume that the hydroxyl group of the substrate alcohol can be represented by the water molecule. As we have already seen (Fig. 7.7), the tyrosine hydroxyl group acts as proton acceptor in the hydrogen bond with the water molecule (or with the presumed substrate hydroxyl). With the increase of pH, i.e. with the increase of its degree of deprotonation, this residue can act as a general base to abstract the proton from the substrate. At the same time, the population of the "free" orientation of the 02' ribose hydroxyl group reduces, whereas the orientations at which this group is a proton donor in the hydrogen bonds with Tyrl51 and Lysl55 become more populated. At pH around the midpoint of the observed pH dependence the three rotamers of this group are approximately equally populated. One can imagine that the hydroxyl group rotates. With the increase of pH a situation is reached at which one proton is shared between the pairs Tyrl51 - 02' ribose hydroxyl group and 02' ribose hydroxyl group Lysl55 (Fig. 7.7). The second pair becomes possible because of the increase of the degree of deprotonation of Lysl55. Here, one can imagine that the hydroxyl group oscillate between the two hydrogen bond partners. Summarising the above theoretical observations, the following picture of the process emerges. After an event of proton abstraction from
238
Introduction to Non-covalent Interactions in Proteins
the alcohol substrate by the hydroxyl group of Tyrl51, the latter becomes protonated. This however is an unfavourable microscopic state. The hydrogen can be then spontaneously transported from the Tyrl51 hydroxyl group via the ribose hydroxyl group to Lysl55. As seen from Fig. 7.8, the degree of deprotonation of Lysl55 is significant at alkaline pH, so that the proton donated by the NAD+ ribose hydroxyl group can be accepted. We can conclude that the hydroxyl group of Tyrl51, the NAD+ 02' ribose hydroxyl, and the 8-amino group of Lysl55 form a proton relay chain. It is indicated by an arrow in Fig. 7.7B. Lysl55 is at the end of a channel in which hydrogen bound water molecules form a chain leading to the bulk solution (Fig. 3.21). This channel can serve as a connection of the proton relay chain in the active site with the bulk. Thus, once accepted by Lysl55, the hydrogen ion can be transported out of the active site. 1.0
0.6
0.8 0.3
P
1%
0.6 0.0 0.4 ~1
4
5
1
6
1
1
7 8 pH
T"
9 10
Figure 7.9 Comparison of the population, p, of the rotamers of NAD+ ribose hydroxyl group forming the proton relay chain in the active site of the reciprocal value of the kinetic coefficient i>2 measured by Winberg and co-workers4. See also Fig. 6. IB.
At low pH, the proton relay chain is broken due to the protonation of Lysl55. Hence, although Tyrl51 acts as proton acceptor towards the substrate hydroxyl group, a possible event of proton abstraction will be prohibited by the breakage of the proton relay chain between the NAD+ 02' ribose hydroxyl and the e-amino group of Lysl55, disabling in this
Conformational Flexibility
239
way the proton transport out of the active site. It follows that the NAD+ ribose hydroxyl group is a key participant in the mechanism of proton abstraction, acting as a switch in the proton relay chain. We can further assume that the observed pH dependence shown in Fig. 6.1 reflects the populations of the 02' ribose hydroxyl rotamer ensuring intact proton relay chain. This hypothesis is graphically illustrated in Fig. 7.9. As seen, the theoretical curve follows the experimental points fairly well. It should be noted that the interpretation of the phenomenon given in this example is based on strongly restrictive assumptions which make the conclusions speculative. Hence, the reasonable agreement between the experimental data and the theoretical prediction cannot be taken as a definitive proof. Nevertheless, we can draw an important conclusion, namely, that electrostatic interactions and the ionisation equilibria are strongly coupled with the behaviour of the hydrogen bonds and hydrogen bond networks in proteins. 7.3 Conformational Flexibility Involving Non-hydrogen Atoms The ability of native proteins to adopt different conformations is one of their inherent properties. The theoretical models of electrostatic interactions considering the protein molecule as dielectric material ignore this feature or take it implicitly into account, for instance by adjusting the relative dielectric constant to an appropriate value. We have seen from the examples given in the previous section that the introduction of even a restricted conformational flexibility may essentially improve our understanding of the relation between electrostatic interactions and physical chemical and functional properties of proteins. Besides the computational obstacles faced by the introduction of conformational flexibility in the theoretical model, the work with a single protein conformation is grounded on the fact that the three-dimensional protein structures obtained by X-ray crystallography are in single conformation. This argument becomes less and less justified because crystal structures with atomic and sub-atomic resolution show multiple conformations of the amino acid side chains. This heterogeneity of protein conformation has a direct impact on the prediction of the
240
Introduction to Non-covalent Interactions in Proteins
ionisation equilibria. An example of this influence is given in Table 7.1, where the experimental and the theoretically predicted pK values of the protein bovine pancreatic trypsin inhibitor are compared. Table 7.1 Experimental and calculated pK values of some titratable amino acid side chains of bovine pancreatic trypsin inhibitor. The values of pK given in bold are closest to the experimental measurements. Residue Experiment N-term. Glu7 Glu49 Asp3 Asp50 Tyr21 Lysl5 Lys26
8.15,7.496 3.7s 3.8s 3.05 3.45 10.166 10.436 10.106
Theory Th.: 4PTI Th.: 5PTI-1 Th.: 5PTI-2 Th.: 5PTI-3 Th.: 5PTI 5.48 7.38 7.32 7.39 7.33 2.86 1.94 4.13 4.14 1.93 1.41 0.86 0.84 0.86 0.84 2.85 3.29 3.11 3.29 3.11 <1. 1.41 1.40 1.38 1.38 9.75 10.28 10.30 10.24 10.25 10.67 10.95 10.96 10.95 10.96 10.09 10.42 10.40 10.40 10.42
Bovine pancreatic trypsin inhibitor is a small globular protein with known three-dimensional structure and well documented experimental record on various physical chemical quantities. This makes it a preferred subject for testing of the theoretical approaches for prediction ionisation equilibria in proteins. The pK values listed in Table 7.1 are calculated for two crystal structures of the molecule using Eq. (6.22). The different structures are designated as 4PTI and 5PTI. The structure of 5PTI is characterised with double conformations of two side chains, one of which is of Glu7, whilst the other one is a methionine. The structures of 5PTI resulting from the alternative conformations are numbered in the table. Comparing the calculated pK values with the experimental data, one can notice that a good agreement is obtained when combining the theoretical results obtained from different crystal structures of the protein. One can also notice that the difference between the calculated pK values of a single site can reach 2 pH units. This is an illustration of the effect of the conformation variation on the calculated ionisation equilibria. For some side chains no agreement between the experimental observations and the theoretical prediction can be achieved. In these
Conformational Flexibility
241
cases the discrepancy can also reach values of 2 pH units. One can suspect that this disagreement originates from the difference in the conformations of these titratable side chains in solution and in the crystal structure. 7.3.1 Conformations generated by means of molecular dynamics simulation The experimentally measured pK values are an average over all possible conformations that the native protein molecule can adopt in solution. In contrast, the corresponding calculated value — as we have pointed out several times — is obtained on the basis of a single protein structure. Hence, the assumption for fixed protein structure can be the source of disagreement between the observed and predicted ionisation equilibria. The statistical physical methods provide the necessary tools for prediction of the average (observable) quantities. Equation (7.8) is one example of this. By means of this equation we are able to predict the average degree of deprotonation of titratable sites which are in regions of the protein molecule where conformational flexibility involving only hydrogen atoms is tolerated. There are no principled difficulties to extend this equation to account for the multiple conformations involving side chains or even the whole protein. We need to describe the energy of all microscopic states of the system and to apply a formula of the type of Eqs. (7.8) or (6.22). However, the direct application of such a formula is not a straightforward task because of the enormously large number of microscopic states. We can think about alternative approaches for sampling of conformations (microscopic states) such as the Monte Carlo method considered in the previous chapter. We choose however another way to approach the problem, namely we will collect conformations using molecular dynamics simulation. Molecular dynamics simulation is a powerful method for theoretical investigation of processes in solid matter on molecular level. This method is successfully applied for analysis of dynamic processes in protein molecules such as conformational transitions, ligand binding, simulation of molecular motion and many others. Also, a wide spectrum
242
Introduction to Non-covalent Interactions in Proteins
of physical chemical quantities can be interpreted on molecular level by means of molecular dynamics simulation. We have already mentioned some of the results of this method in Chapter 5 when we were discussing the dielectric properties of proteins. The theoretical background and the computational techniques of this method are beyond the scope of our considerations. We will use just one of the outcomes of molecular dynamics, namely a set of protein structures with different conformations. Therefore, we need to know only the basic ideology of the method. Molecular dynamics simulation is based on the numerical integration of the equation (Newton's Second Law of motion) for each atom in the system (the protein molecule and surrounding water molecule): _, d\t d2ri F; =m = m -1 dt dt 2 The force F,- acting on each individual atom, i, with coordinates r„ is determined by the gradient of the potential energy of interaction: *-i
,
v
iw
'
*« hence dU
d2r:
*/
dt2
The potential U is a sum of different components: U = Uvdw + UHb + UEI + Ubond + Umg + Utor.
The first three components arise from the non-covalent interactions we have already considered: the van der Waals interactions, hydrogen bonding and electrostatic interactions. The van der Waals and the hydrogen bonding are often united in empirical functions of type of Eq. (2.26). The electrostatic interactions term, UEI, is usually calculated by means of Coulomb's law in vacuum: 2 ^
AneQrU]
Conformational Flexibility
243
where the qt and % are the partial charges of the atoms / and j , respectively and £b is the permittivity of vacuum. The last three terms are the potentials reflecting the chemical nature of the molecules in the system. The potential Ubond is the bond stretching potential and arises from the deviation of the chemical bond length from the ideal one. One can describe it as a harmonic function U
bond = Z M r - > b ) 2 •
The function given above is the Hooke's law with a force constant kr which depends on the character of the chemical bond. The main property of this function is that Ubond increases always when the length of the bond, r, deviates from the ideal length r0. There are also more rigorous potential functions, such as the Buckingham potential [see Eq. (2.25)]. However, during the molecular dynamic simulation Ubind is calculated at each step and for all chemical bonds present in the system. This makes the use of functions like the Buckingham potential computationally very demanding. The bond bending potential, Uang, has the same form
and arises from the deviation of the bond angle 6 from the ideal one 6Q. The force constant ke depends on the type of atoms participating in the chemical bond. The term Utor is a potential function of the rotation around a chemical bond. It has the form Utor = Z
A
n t1 +
COS
(n*
-Zo)]>
tor
where An is a parameter characterising the chemical bond, n is the periodicity number of the potential minima, x is m e torsion (dihedral) angle and Xo is the phase shift. An example of Utor for the torsion angles -CP-Oy- determining the orientation of the hydroxyl group of the serine side chain is given in Fig. 7.10. For comparison, £/tor(-CH2-CH2-) is also given in the same graph.
244
Introduction to Non-covalent Interactions in Proteins
,
0.4-
^
-v
'mol
0 "\ \
I 0.2-
I /~
//
\'\
v
7
V
V"\ \ \\
\\
/ / //
;7 V
7
v
7
\ /
0.0-
-120
2
1
60 X
-60
120
180
Figure 7.10 Torsion angle potential function, Uton for the bond -Cp-Oy- in the serine side chain (continuous line). The minima correspond to the rotamers of the hydroxyl group, Oy-H, shown in Fig. 7.1. The dashed line is U,or for aliphatic (-CH 2 -CH 2 -) bond. For both cases, n = 3 and ^b = 0.
Equation (7.9) is integrated in short time steps, 8t, usually 0.5xl0~15 s. Within this time interval we can assume that the acceleration, cNAt) a, = dt of a given atom is a constant. After integration of the above equation, for the velocity we obtain v,-(0 = a1-f + v0>1-, where the integration constant v0ji is the initial velocity of atom /. Taking also into account that V; = dxjdt, one can write dX:
dt
• = a;t + v0,i
After integration of :dr; j— l -dt= \{2Lit + vQi)dt odt o one obtains for the coordinates of atom i
Conformational Flexibility
r
1
»' = 2a'"'
2
+
245
+ r
W
o,;>
where r0,, are the initial coordinates of the atom. The acceleration can be evaluated by means of Eq. (7.9): 1 dU(r) m dxi The integration of the equation of motion, Eq. (7.9), is performed numerically. There are different algorithms for integration of this equation, all of them based on the assumption that the atom positions, velocities, and accelerations can be expressed by a Tailor series expansion (we have already used in Chapter 2): ri(t) = ri(t0)
+
^(t-t0)
r
-^(t-t0)>+...
+
If we know the coordinates of the atoms at time t0 and would like to know them at time to + §t we can write ri(t0+a)
= ri(t0) + \i(t0)a
+
-ai(t0)a2.
For time t0 - 5t we can write r, (t0 -&) = r, (t0) - v,. (f0)<* + -a,- (t0)&2 . The sum of these two equations r,-(t0 +d) = 2r,-(f0) - r,-(t0 -&) + a,- (t0)a2 , gives the formula used in the Verlet algorithm for calculation of the atomic coordinates at time t0 + &t. At a certain step of this algorithm (time to + 80, the coordinates of the atoms are calculated using the atomic coordinates calculated in the previous steps (times t0 and t0 - 80 and the accelerations at time t0. In this way the coordinates of the individual atoms, r,-, are determined as a function of time. They, as well as the components of the potential U, are collected at certain time intervals and used for the analysis of the phenomena of interest. We are interested in the collection of the atomic coordinates, which is, in fact, a collection of different conformations of the protein molecule.
246
Introduction to Non-covalent Interactions in Proteins
7.3.2 Average pK values Our working hypothesis is that the main source of discrepancy between the observed and calculated pK values is that the latter are a result of calculations based on a single structure. Thus, if we generate a representative set of structures and average the pK values calculated for each structure, the resulting value should approach the experimentally observed one. Here we employ the hypothesis that the time average equals to the ensemble average, t = e (Appendix A). For the purposes of our analysis we will make use of the molecular dynamics study on the protein xylanase7. We will consider a set of protein structures collected from a molecular dynamics simulation of 1 ns (10~9 s). The individual structures are snapshots taken every 2.5 ps (1 ps = 10~12 s). For all 400 structures the pK values of the titratable sites are calculated by means of Eq. (6.22).
Figure 7.11 Snapshot pK values of Aspl21 and Asp 119 of xylanase from Bacillus circulans taken each 2.5 ps. The time evolution of the average pA" values are given as continuous lines. The dashed line corresponds to the experimental pA" value of 3.6 for Aspl21 and 3.2 for Aspl 19.
In Fig. 7.11 the "snapshot" pK values of two aspartic acids (Aspl21 and Aspl 19) are shown. These two residues are accessible to the solvent
Conformational Flexibility
247
and presumably flexible but are characterised by different environment. Asp 121 is completely accessible to the solvent and weakly interacts with the other titratable sites of the protein (see Fig. 7.12). As expected, the pA" values calculated for the different snapshot structures vary. It is notable that the variation of pA" covers a band of about 2.5 pH units, an effect which should not surprise us either. The snapshot structures are instantaneous structures with the meaning of fluctuations, so that they are not necessarily characterised by an optimal conformation. This is reflected by the pA* values, which fluctuate as well. The plot of the snapshot pK helps us to develop a feeling for the magnitude of these fluctuations. The quantity we are interested in is the time average pK value of the group. It is given in Fig. 7.11 as a function of time, pK(t). The values of pK(t) represent the time evolution of the pK value obtained by averaging over all snapshot values from the start of the sampling (time zero) up to time t. It is seen that with time the pA"(0 approaches the experimental value. The time evolution of the average pA" of Asp 119 (right hand side of Fig. 7.11) exhibits a similar behaviour: it approaches the experimentally observed value sufficiently closely after about 500 ps. The crystal structure of the protein shows that this residue forms a hydrogen bond (salt bridge) with an arginine residue (Arg73, Fig. 7.12). It is reasonable to presume that Aspl 19 is immobilised by the hydrogen bond (Fig. 7.12). Nevertheless, the snapshot pA" values fluctuate practically within the same amplitude as Aspl21. One can also notice that the fluctuations show a pattern, a periodicity of about 300 ps, indicating conformational transitions of this side chain or of other side chains in the vicinity. The magnitude of these fluctuations is again about 2 pH units. These two examples show that indeed the average pK values approach the observed ones. We should note however that other examples can be given where the pK(t) does not approach the experimentally observed values. Two reasons for this discrepancy can be pointed out. First, the time window of 1 ns is too short so that the hypothesis , = e does not hold. For instance, the influence on the ionisation equilibria of aromatic side chains flipping or of possible
248
Introduction to Non-covalent Interactions in Proteins
cooperative motion, which occur within a time window larger than 1 (JS, cannot be explored within this short time interval.
Figure 7.12 Aspl21 and Aspll9 in the context of the three-dimensional structure of xylanase from Bacillus circitlans. The bifurcated hydrogen bond between the carboxyl oxygen atom of Asp 119 and the guanidinium group of Arg73 is marked with dotted line.
The second reason lies in the concept of calculation of the ionisation equilibria itself. Regardless of the way the protein structure is provided — single structure obtained by X-ray analysis or snapshot structure from molecular dynamics simulation — the main assumption for invariancc of the structure with the change of pH remains. In both cases electrostatic interactions are calculated for a single structure whereas the ionisation equilibria are calculated for a wide range of pH. The set of snapshot structures collected by the molecular dynamic simulation is predetermined, and hence constricted by the a priori defined protonation state of the protein molecule, for instance all acidic and basic groups are in their charged forms. Hence, even after an essentially longer simulation, conformations corresponding to another protonation state of the protein cannot be collected in a statistically relevant amount. In this context, the optimistic results illustrated in Fig. 7.11 are expected for sites which are less dependent on pH-induced structural changes of the protein molecule.
Conformational Flexibility
249
7.3.3 Desolvation and charge-dipole energy compensation We have pointed out in Chapter 6 that there is a certain compensation between the desolvation effect and the influence of the protein net charge. The charge-dipole interactions also show a tendency of compensation of the desolvation energy. This effect can be clearly seen in Fig. 7.13, where the pK shifts of Aspl21 and Aspll9 caused by desolvation penalty, zlpA^orn, and by the electrostatic influence of the peptide dipoles, ApKpc, are shown. The time evolution of ApKSom suggests two preferred conformations of Asp 121. During the first 300 ps of the molecular dynamics simulation the influence of the desolvation and the charge-dipole interactions is negligible. This corresponds to the conformation observed in the crystal structure of the protein, at which this residue is completely accessible to the solvent. After another 200 ps, the side chain of Aspl21 adopts a conformation at which the desolvation effect shifts its pK value by about 2 pH units. At the same time the influence of the charge-dipole interactions increases and partially compensates the shift due to the desolvation effect.
200
400
600
time, ps
800
1000 0
200
400
600
800
1000
time, ps
Figure 7.13 Snapshots of the pK sifts due to desolvation (squares) and due to electrostatic interactions between the titratable sites of Aspl21 and Aspll9 with the dipole moments of the polypeptide backbone of xylanase from Bacillus circulans.
250
Introduction to Non-covalent Interactions in Proteins
Aspll9 shows the same tendency. It can be recognised by the fluctuation in the time window between 600 and 700 ps in Fig. 7.13. We also notice that the interaction of this residue with the environment results in two sets of ApKBom values, the difference between which is about 2 pH units. Thus, we can conclude that the factor 5 from Table 6.1, namely the conformational flexibility, may induce changes of the desolvation energy corresponding to a pK shift on average of about 2 pH units. Let us return to the observed compensation effect of desolvation and charge-dipole interactions illustrated in Fig. 7.13. It is pronounced only for the acidic groups. The reason for this is the chemical nature of the polypeptide backbone determining the direction of the peptide dipoles. The electrostatic potential of the peptide dipoles is predominantly positive in the region where the side chain atoms are situated (see Fig. 5.19). This effect can also be detected by calculating of the charge-dipole interaction in fixed crystal protein structures. The plot in Fig. 7.14 shows the values of the electrostatic potential created by the peptide dipoles at the positions of the atoms along the side chains averaged over a common pool of 127 non-homologous proteins. The atoms which are most proximal to the polypeptide backbone, those at position (3, are on average in most positive electrostatic potential. With the increase of the separation from the backbone, the electrostatic potential reduces in magnitude, but practically remains positive. Only the nitrogen atoms at C, and r\ positions, belonging to the lysine and arginine residues, are in slightly negative electrostatic potential. It turns out that the carboxyl oxygen atoms of the aspartic and the glutamic acids are predominantly in the positive potential of the peptide dipoles. This leads to a reduction of the pK values of about 2.3 and 1.1 pH units for the aspartic and the glutamic acids, respectively. These shifts of the ionisation equilibria are opposite to the shift caused by the desolvation energy and are the main source of the compensation effect. The effect of the peptide dipoles on the e-amino and guanidine groups of the lysines and arginines is on average less than 0.1 pH units, which indicates that the tendency of compensation is negligible.
Conformational Flexibility
p •
7
5
o
asp O
8
251
c
Tl
a
D O tyr
• lys
• arg
-
• glu ~
a• his
8 •
Figure 7.14 Average electrostatic potential of the peptide bond dipoles on the side chain atoms8: carbon atoms (squares), oxygen atoms (open circles), nitrogen atoms (solid circles).
Figure 7.14 reveals an interesting relation between the electrostatic potential of the peptide dipoles and the chemical nature of the titratable side chains. The basic amino acids — which are positively charged in their protonated forms — have longer side chains compared to those of the acidic amino acids. In this way the effects of the desolvation and the positive electrostatic potential of the peptide dipoles, which in this case are parallel, are minimised. Imagine proteins that, instead of by lysines and arginines, are constituted by amino acid residues with titratable sites at 8 and 8 positions along the side chain. Such could be side chains with y- and 8-amino groups. Compounds analogous to these imaginary side chains are n-propyl- and n-butylamine, which have pK values between 10.5 and 10.6. Taking into account the above estimated pK shift induced by the peptide dipoles at 5 and s positions and assuming a moderate value of 2 for ApKBom, it turns out that the intrinsic pK values of these side chains would be 6.2 and 7.5. Hence, such proteins would be destabilised by the uncompensated negative charge in the physiologically relevant pH region, which in most cases is around pH 7. It follows that the asymmetry in the charge-dipole interaction in proteins, which is
252
Introduction to Non-covalent Interactions in Proteins
inherent to proteins (due to the peptide bond), is compensated by the asymmetry in the chemical nature of the side chains (in terms of their length). 7.3.4 Dynamics of salt bridges The observation of the protein molecule in motion reveals a number of interesting phenomena, which otherwise remain hidden. Thus, for instance, in a fixed single conformation of the protein molecule salt bridges are treated as stable configurations. This is correct for the case of the salt bridge Aspll9-Arg73. The bifurcated hydrogen bond connecting the functional groups of these residues acts as a stabilising factor (Fig. 7.12). As a consequence of this stabilisation the molecule dynamics shows a single fluctuation breaking the salt bridge (at time between 650700 ps). There are salt bridges which show a different behaviour. In Fig. 7.15 the snapshot pK values of two residues forming a salt bridge are shown. The titratable groups involved in the salt bridge are the carboxyl group of Asp 15 and the e-amino group of Lys52 from Bacillus agaradhaerens xylanase. Due to the flexibility of the side chains of these residues the salt bridge turns out to be unstable. The time periods within which the residues adopt conformations corresponding to a salt bridge are shown as filled bar segments in the figure. One can see that the salt bridge exists approximately half of the time of the molecular dynamics simulation. In accord with the formation and breaking of the salt bridge the pK values of the two titratable groups shift significantly. Thus, one can identify two sets of pK values for each group in the pair. When involved in salt bridge the average pK value of Asp 15 is about 2, whereas that of Lys52 is about 14. Upon breaking of the salt bridge the pA" of Aspl5 and Lys52 are shifted to average values of about 4.5 and 10.5, respectively. These values are close to the model pK values of the groups, suggesting that the other factors influencing the ionisation equilibria are compensated or small. In Fig. 7.15B the snapshot values of ApA'Born for Aspl5 and Lys52 are shown. We notice the same pattern as in the left hand side of the figure. The Born energy (the desolvation penalty) shifts the pK value of Aspl5 by about 5 pH units when this residue makes a salt bridge. The
Conformational Flexibility
253
value of ApA^Bom reduces to about 3 upon breaking of the salt bridge. Hence the participation in a salt bridge "costs" this residue a shift of its ionisation equilibrium of about 2 pH units (or about 2.5 kcal/mol). Similarly, upon formation of a salt bridge the change of the Born energy of Lys52 is about 1.4 kcal/mol. Hence, the formation of a salt bridge is accompanied by an increase of the desolvation penalty which is a destabilising factor.
0
200
400 600 time, ps
800
1000
0
200
400 600 time, ps
800
1000
Figure 7.15 (A) Snapshot pK values of Lys52 (squares) and Aspl5 (circles) from Bacillus agaradhaerens xylanase. (B) Snapshots ApA^Born of the same residues. The time intervals at which the two residues form a salt bridge are marked with filled segments in the horizontal bars.
In the example illustrated in Fig. 7.15, the factors influencing the ionisation equilibria are relatively clearly separated, which helps to assess their dependence on the conformational flexibility. On this basis we can conclude that salt bridges can be dynamic formations. The role of such dynamic salt bridges as stabilisers of the three-dimensional structure of native proteins is less than that if they were unbreakable links. Sometimes this behaviour of salt bridges is overlooked, which leads to an overestimation of their stabilising role. Finally, we have to note that the above examples are not given to demonstrate that the problem of the description of the protein conformational flexibility as a factor influencing the ionisation equilibria in proteins is solved. Rather, they should be considered as indicators for the importance of this factor. Being flexible, the side chains in the
254
Introduction to Non-covalent Interactions in Proteins
crystalline state of the protein can adopt conformations which are not populated when the protein is in solution. This can occur in the regions of crystal contacts, for instance. Hence, the calculated ionisation properties of such sites do not correspond to those measured for proteins in solution. Although the molecular dynamics simulation or other methods, such as the Monte Carlo method, can account for these effects, the introduction of the conformational flexibility in an integrated theoretical method still remains in general unsolved. References 1. Alexov E and Gunner MR, (1997) Incorporating protein conformational flexibility into the calculation of pH-dependent protein properties. Biophys. J., 72: 2075-2093. 2. Giletto A and Pace CN, (1999) Buried, charged, non-ion-paired aspartic acid 76 contributes favorably to the conformational stability of ribonuclease Tl. Biochemistry, 38: 13379-13384. 3. Spitzner N, Frank L, Pfeiffer S, Koumanov A, Karshikoff A and Riiterjans H, (2001) Ionization properties of titratable groups in ribonuclease Tl. I. pKa values in the native state determined by two-dimensional heteronuclear NMR spectroscopy. Eur. Biophys. J., 30: 186-197. 4. Winberg JO, Brendskag MK, Sylte I, Lindstad RI and McKinley-McKee JS, (1999) The catalytic triad in Drosophila alcohol dehydrogenase: pH, temperature and molecular modelling studies. J. Mol. Biol, 294: 601-616. 5. March KL, Maskalick DG, England RD, Friend SH and Gurd FRN, (1982) Analysis of electrostatic interactions and their relationship to conformation and stability of bovine pancreatic trypsin inhibitor. Biochemistry, 21: 5241-5251. 6. Yang A-S, Gunner MR, Sampogna R, Sharp K and Honig B, (1993) On the calculation of pKas in proteins. Proteins, 15: 252-265. 7. Koumanov A, Karshikoff A, Friis EP and Borchert TV, (2001) Conformational averaging in pK calculations. Improvement and limitations in prediction of ionization properties of proteins. /. Phys. Chem. B, 105: 9339-9344. 8. Spassov VZ, Ladenstein R and Karshikoff A, (1997) Optimization of the electrostatic interactions between ionized groups and peptide dipoles in proteins. Protein Set, 6: 1190-1195.
Chapter 8
Electrostatic Interactions and Stability of Proteins
The understanding of the role of the different types of non-covalent interactions in the stabilisation of the native protein structure is one of the fundamental questions of the physical chemistry of proteins. We have already shortly discussed the contribution of the hydrophobic interactions and hydrogen bonding to protein stability in Chapters 3 and 4. In this chapter we shall try to analyse the role of electrostatic interactions. For this purpose we will approach two problems. The first one is the pH dependence of protein stability, whilst the second one treats a special class of proteins: the hyperthermostable proteins. 8.1 Definitions In the various research areas of protein science different terminology is used when protein stability is considered. Since in some cases this may lead to misunderstandings, it is useful first to specify some terms which can cause ambiguity. Usually, the folded state is identified with the native state of the proteins, whereas the unfolded state corresponds to the denatured state. There are proteins whose biological function is coupled with the folding/unfolding transition. From a functional point of view both the folded and the unfolded states of such proteins can be treated as native. Other proteins, in their biologically active forms, have unstructured segments of the polypeptide chains, hence they are partially unfolded. Therefore, we will avoid in this chapter the terms "native" and "denatured" proteins. Instead, we will use the terms "folded" protein, 255
256
Introduction to Non-covalent Interactions in Proteins
meaning that this is a state of the protein in which the three-dimensional structure is (or can be) defined, and "unfolded" protein, referring to proteins whose polypeptide chain adopts arbitrary, yet stereochemically allowed, conformations. Stability of proteins in certain conditions is defined as the difference between the free energies of their folded and unfolded states. By convention, the reference state is taken to be the folded protein. Thus, stability of proteins is measured by the free energy of unfolding ^G M " / (pH,r,^.) = G c / ( p H , r , / / , ) - G F ( p H , r , / / , ) ,
(8.1)
where Gu and GF are the free energies of the unfolded and folded states, respectively. In the above equation the dependency of protein stability on the most important factors is given: pH, the temperature T, and the concentration of the co-solvents, //,. The quantity AGunf refers to the thermodynamic stability of proteins. Often just the term stability is used to specify AG"nf or some of its components. Throughout the previous chapters we have also been using the term stability in the context of structural stability or specifying the contribution of different factors to the stabilisation of the native (folded) structure. In all these cases we were discussing the thermodynamic stability. It should be noted that according to Eq. (8.1) AGunf>0 when the folded state is stabilised, a peculiarity of the terminology which follows from the choice of the reference state. In this chapter we stand by the definition as given by Eq. (8.1). In Fig. 8.1 the temperature dependencies of AGunf of two different proteins are shown as examples. The temperature at which AGunf - 0 is called unfolding or melting temperature. The melting temperature, Tm, refers to the thermal stability of proteins. The higher the melting temperature, the higher the thermal stability of the protein. According to the AGunf curves shown in Fig. 8.1, protein b is more thermostable than protein a. A high thermal stability does not necessarily mean an overall high thermodynamic stability. As seen from the figure, in the temperature range below 40°C the less thermostable protein a is essentially more thermodynamically stable than protein b. It should also be noted that zIG^and Tm are defined only if the folded and the unfolded states are in equilibrium.
Electrostatic Interactions and Stability of Proteins
15-
257
a
'o 10-
a °
^~~-
5-
AGunf
^ \
^ o-
X J-m
20
40
J-m
\ / \
60
80
100
temperature, °C Figure 8.1 Examples of the temperature dependence of AGunf of two proteins with different temperature of unfolding.
8.2 Unfolding Induced by pH The pH dependence of protein stability is expressed by the difference between the free energy of the unfolded and folded protein states: ZlG(pH) = Gu (pH) - GF (pH).
(8.2)
Written like that, the above equation gives the free energy zlG(pH) of unfolding and corresponds to Eq. (8.1) as a function of pH with all parameters, such as T and jut, kept constant. Our goal is to find an expression for zlG(pH) which consists of measurable quantities and thus allowing us to predict its value. The free energy of a system at a given state is given by the expression (see Appendix A) G = -#rinZ, where Z is the partition function of the system: g(x,pH)
z=5>"
RT
(8.3)
We assume that only electrostatic interactions change with pH. Hence, the exponential term E(x, pH) is the energy of the system at certain
258
Introduction to Non-covalent Interactions in Proteins
protonation microscopic state of the molecule x at given pH defined by Eq. (6.20) or Eq. (7.7). In the following considerations we will work with Eq. (6.20) £(x,pH) = 23RTYdxi{p¥.iM
-pH) + ^WiXjJXj
.
(8.4)
i±j
i
Using Eq. (7.7) leads to the same results. The pH dependence of the free energy G can be examined by taking the derivative 3pH
3pH
Z 3pH'
where dZ _ d y e _ £ ( x , p H ) / R T = _ y e - £ ( x ' p H ) / R T 3 £ ( x , p H ) 3pH 3 p H ^ ^ RT 3pH Thus, for the derivative of G we obtain dG _ ! y 3pH Z g
C -£(X, P H)/RT
dE(x,pH) 3pH
For a single microscopic state x (see Eq. 8.4) 3£(x,pH) 0„„T^ = -2.3RT2_ lxi 3pH l
(8.6)
We have chosen to work with the dissociation constant, so that the variables xt reflect the degree of deprotonation. The sum in the right hand side of Eq. (8.6) has then the meaning of the number of the protons released at given pH (and, of course, for the given microscopic protonation state of the molecule x). This can be expressed as N
2-i Xi
=
N acid + Nbase ,
where ^acid and f^base are the number of the deprotonated acidic and basic groups, respectively. On the other hand, the net charge of the protein molecule in protonation state x is
Electrostatic Interactions and Stability of Proteins
259
Q(x)=-Ndacid+N?ase, where Whose is the number of the protonated basic groups. Combining the last two equations we obtain Ndacid=Nbpase-Q(x) and Z Xi = XLe + NLe ~ GOO = Nbase - Q(X) . i=l
The above result can be substituted in Eq. (8.6), so that we can write dE{
*^H) dpH
= 23RT x G(x) - 23RT x J V W .
Thus, for the derivative of the free energy given in Eq. (8.5) we obtain
Xe(x>- £(x ' pH)/RT 3pH
= 23RT —
jy E(x,pH)/RT 23RTxNbase
—
.(8.7)
The first term on the right hand side of the above equation contains the Boltzmann weighed sum of the average net charge of the protein molecule at given pH:
Ze^ £ ( x ' p H ) / * r <2(pH)=<e>=
£^" £ { x ' p H ) / S T
-£(x,pH/ffT
Taking into account Eq. (8.3), the ratio in the second term of Eq. (8.7) is just unity. Hence, | ^ = 23RTQ(pH) - 23RT X Nbase. opH In order to obtain the free energy at given pH, we integrate the above equation: pH
pH
JdG(pH) = 23RT Jg(pH)rfpH - 23RT JiV^rfpH , pH„
pH 0
260
Introduction
to Non-covalent
Interactions
in Proteins
where pHo is an appropriate reference pH value. For the folded state of the protein we obtain pH
GF (pH) = 2.3RT JQF (pH)dpH - 2.3RT x Nbase(pH - p H 0 ) , PH0
and the same for the unfolded state pH
Gu (pH) = 2.3RT JQU (pH)dpK - 2.3RT x Nbase (pH - p H 0 ) . PH0
Substituting the expressions obtained for Gu{pH) and GF(pH) in Eq. (8.2), for the pH dependence of the free energy of unfolding we obtain AG(pH) = 2.3RT J(QU (pH) - QF (pH))dpli + AG0.
(8.8)
PH0
Equation (8.8) is the desired expression for the pH dependence of the free energy of protein unfolding. The quantities Qu and QF are the net proton charge of the unfolded and folded states of the protein molecule, respectively. They can be directly measured, for instance by means of potentiometric titration. The term AG0 indicates the fact that only the change of the free energy of unfolding can be calculated and measured. Hence, the zlG(pH) is defined up to an additive constant AG0. Let us consider a simple example of a protein molecule containing only two titratable groups: one acidic and one basic group. Assume that the titratable groups in unfolded state have pK values 4.4 and 10.4, respectively. The chemical nature of the titratable groups is of no interest for this particular example, nevertheless, one can think of a pair of a glutamic acid and a lysine residue with pK values to those of the corresponding model compounds (see Table 6.3). Hence, we have implicitly assumed that in unfolded state the two groups do not interact. As we shall see below in this section, this assumption is a rather rough approximation. In the folded state, the two groups interact, which causes a shift of their equilibrium constants. Let this shift be 0.5 pH units, i.e. the pK of the two groups be 3.9 and 10.9, respectively. The titration curves of the putative protein in folded and unfolded state are shown in Fig. 8.2A. The two curves differ only in the pH region
Electrostatic Interactions and Stability of Proteins
261
where the titratable groups change their protonation state. Accordingly, these regions are expected to show pH dependence on the unfolding free energy. Before calculating JG(pH), we have to choose the reference value, pH0, as required by Eq. (8.8). The most convenient reference pH is that at which the two states of the molecule are in completely protonated form (the utmost left flank of the titration curves), or in completely deprotonated form (the utmost right flank of the titration curves). Usually one chooses the former option. Let us also choose pH 0 to be the reference pH. As seen from Fig. 8.2A gU(pHo) = <2F(pH0), so that AG(pR0) = AG0. We can set AG0 = 0, which means that we do not consider the pH-independent part of the unfolding free energy.
Figure 8.2 pH-Dependence of a putative protein with two titratable groups. (A) Titration curves of the folded protein (continuous line) and of the unfolded protein (dashed line). (B) pH-Dependence of the unfolding free energy calculated by Eq. (8.8). The dashed line results from a symmetric shift of the pK values, whereas in the case of the continuous line, the two groups have different pK shifts.
The pH-dependence calculated with the above input data is given by the dashed line in Fig. 8.2B. As seen, the acidic and the alkaline flanks of the curve asymptotically approach zero. If we are interested in the electrostatic term of the free energy of unfolding at pH 7, according to Eq. (8.8) we obtain a value about 0.69 kcal/mol. The same result is obtained if we choose the reference pH corresponding to all titratable groups in deprotonated state (i.e. the alkaline flank of the curve in Fig.
262
Introduction to Non-covalent Interactions in Proteins
8.2A). Thus, one can conclude that Eq. (8.8) gives the absolute value of the electrostatic term of the free energy of unfolding. Although this is true for our example, in general such a conclusion is incorrect. The misleading track rooted in the example is that the two groups have equal but opposite pK shifts. One can interpret this as equal contribution of the desolvation and charge-dipole energies to the ionisation equilibria of the groups in the folded state of the protein. If so, the pK shift of each group is regulated by the strength of the charge-charge interactions and, of course, leads to a symmetrical pH dependence of zlG(pH). Let us now assume that the acidic group is somewhat less accessible to the solvent than the basic group. This can be simulated by increasing the pK value of the acidic group when the protein is in its folded state. Let this increase be 0.1 pH units. AG(pR) calculated with pK 4 for the acidic group (and all other parameter kept as before) is shown in Fig. 8.2B as a continuous line. As expected, the stabilisation contribution of electrostatic interactions is reduced: at pH 7 the electrostatic term of the free energy of unfolding is about 0.55 kcal/mol. However, if we decide to use pH 14 (all titratable groups deprotonated) as a reference pH, Eq. (8.8) would give again zlG(pH 7) = 0.69 kcal/mol. Hence, the value of zlG(pH) calculated by Eq. (8.8) depends on the reference pH. It is important to realise that the electrostatic term of the free energy of unfolding is taken towards a reference pH value. In other words, Eq. (8.8) gives the pH dependence of the free energy of unfolding, rather than its absolute electrostatic term. 8.3 Modelling of Unfolded Proteins The theoretical prediction of zlG(pH) requires Qu(pH) and <2F(pH) to be known. We have already considered a method for prediction of the ionisation equilibria in native (folded) proteins in Chapters 5 and 6. Now we need to find a method to enable us to calculate the ionisation equilibria in unfolded proteins. Simple models of unfolded proteins have already been used in Chapters 3 and 4, where we have analysed the contribution of the hydrogen bonding and hydrophobic interactions to protein stability. In
Electrostatic Interactions and Stability of Proteins
263
these cases just a single property of the unfolded state was used, namely that the amino acid side chains are completely hydrated. Such an approach was justified because we were interested in short range interactions whose contribution to protein stability is assumed to be additive. Therefore we needed to investigate only the environment of the individual amino acids in the unfolded state. Often, the same approach is employed for electrostatic interactions in denatured state. It is assumed that charge-charge interactions in unfolded state are negligible so that the ionisation equilibria can be determined by the pK values of the free amino acids or from the corresponding model compounds. Thus, using the pK values listed in Table 6.3, one can calculate Q^CpH) from the degree of deprotonation of the individual groups using Eq. (6.6): ^ acid
^base/
<
j
\
Qu (pH) = - 2>,- (PH) + £ (l - aj (pH)), where the indexes / and j enumerate the acidic and the basic groups in the protein. This approximation is however not adequate because electrostatic interactions, in contrast to hydrogen bonding and van der Waals interactions, are long-range interactions. This fact has been independently demonstrated by Pace et al.x and Oliveberg et al.2. The latter have experimentally shown that in unfolded proteins the carboxyl groups have somewhat lower pK values than those of the corresponding model compounds. The origin of the lowering of the pK values acidic groups has been analysed in the beginning of Section 6.5. We have concluded that it arises from the electrostatic influence of the net positive charge of the protein at acidic pH. Hence electrostatic interactions are present also in unfolded proteins. Moreover, it has been experimentally demonstrated that pK values of the individual titratable sites in unfolded proteins can have distinguishable, discrete values3. All these recent findings show that a more adequate model of unfolded proteins has to be developed, which accounts for the non-zero electrostatic interactions in this state.
264
Introduction to Non-covalent Interactions in Proteins
8.3.1 Spherical model of unfolded proteins The polypeptide chain of an unfolded protein can be considered as a flexible chain that can adopt all stereochemically allowed conformations. The ensemble of these conformations contains also those which belong to the set of native conformations, as well as those at which the polypeptide chain is in extended conformation. The contribution of these two extreme sets of conformations to the properties of the unfolded proteins is, of course, negligible. From polymer science we know that on average a flexible polymer chain adopts a spherical form. We assume that the polypeptide chain of unfolded proteins follows this rule. Hence, we will construct a spherical model for the unfolded proteins.
-.*•" /
'••wnirrntw'..:..
•.,'••
s
/ v ,. / r " ^ \
v \ •>•. \
\ ;
ri * r y *y--i
r
/ .-..'••' s
<{>**,•*A
Figure 8.3 Schematic representation of unfolded protein. The polypeptide chain occupies a spherical region with a radius equal to the radius of gyration Rg. Two charges at an instant distance, rtj, are also schematically illustrated.
The unfolded protein is presented as a material with a spherical form surrounded by the medium of the solvent (Fig. 8.3). The material within the sphere consists of the protein polypeptide chain and solvent (water) molecules, hence it has a lower dielectric constant than the surrounding solvent. As seen, this is a dielectric cavity model for which the region with low dielectric constant has an analytical form. The problem of evaluating the permittivity of the dielectric sphere is as difficult as for the
Electrostatic Interactions and Stability of Proteins
265
native (folded) proteins. Perhaps the most appropriate value is the relative dielectric constant of the alcohol or other water solutions of organic compounds, which on average is er ~ 40. As we have pointed out, the unfolded state should be considered in equilibrium. Due to the effect of the desolvation energy the charged titratable groups will tend to have a maximum accessibility to the solvent. It follows that in terms of our model, these charges will be distributed on the surface of the sphere. In principle, we have all tools to perform calculation of electrostatic interactions and to obtain the ionisation constants of the titratable groups. Before proceeding with the investigation of the ionisation equilibria, it is interesting to explore how the two main parameters of the model, the radius of the sphere and the average distance between the charges, depend on electrostatic interactions. 8.3.2 Size of the dielectric sphere It is reasonable to assume that the radius of the sphere is equal to the radius of gyration, Rg, of a flexible polymer chain. For a given (instant) conformation of the polymer the radius of gyration is given by V
.2
R
g,(inst) ~ ^ ^
'
(8-9)
where i enumerates the amount of atoms, each with a mass of m,. The distance of atom i to the centre of mass of the polymer is r,. For a flexible polymer chain the radius of gyration is an average of Rg,(inst) over all conformations:
Ei < i i m r
R„ =
where
,1/2 \
2 >
266
Introduction to Non-covalent Interactions in Proteins
i
i
k
and M is the amount of all possible conformation of the chain. Of course, the completion of the sum over M is an unrealistic task. One can use different approaches for evaluation of Rg> avoiding this summation. For instance, it can be shown that for an excluded volume chain, which we are interested in, the radius of gyration is Rg~Nrab,
(8.10)
where Nr is the number of the bonds between the polymer segments and b is the length of this bond. The exponent oris approximately 0.588. The above relation connects the radius of gyration with the length of the polymer in which the intramolecular interactions are limited to avoiding overlap of the chain (hence, excluded volume chain). Electrostatic interactions, whose influence on the radius of gyration we would like to investigate, are however not taken into account. Therefore, we have to find a connection between the radius of gyration and electrostatic interactions. We also would like to take into account the fact that our polymer is branched. To achieve this goal we shall employ the Monte Carlo method for a polypeptide chain which contains charged side chains. We will keep the side chains in extended conformation to reduce the complexity of the calculations. A Monte Carlo step begins with the building of a polypeptide chain with a given length and sterically allowed conformation by a random choice of the dihedral angles (pt and Yi of each peptide bond (see Fig. 1.6). For the obtained instant conformation the interaction energy
£=2XT+2Xu
(s.ii)
u
is calculated. If we use the corresponding expressions, Eq. (2.26) for the van der Waals interaction energy EvdW and Eq. (6.22) for the electrostatic interaction energy £*', the calculations become prohibitively time consuming. Therefore, we shall do some essential simplifications. The van der Waals interactions can be approximated by the so-called hard sphere potential, which is
Electrostatic Interactions and Stability of Proteins
267
0
EVdw\
lJ
'
>[iriJ>Ri+RJ
[oo, if r^Rt+Rj
'
where Rt and Rj are the van der Waals radii of atoms / and j . With this simplification we just avoid overlaps of the atoms. The electrostatic interactions energy we shall calculate using Coulomb's law Eel = hJ
gig; 4x£o£rrij '
where qt and g; are the charge values of atoms i and j . Here we shall use the approximation of distant dependent relative dielectric constant given by Eq. (5.22a). The Monte Carlo simulation continues with variation of a randomly chosen pair of dihedral angles $ and y/t.. The energy for the new conformation is calculated and compared with the one calculated at the previous conformation. As seen, this is the Metropolis procedure we have already considered in Section 6.4.2. We select the varied conformation according to the probability |l,
if
E(k*)<E(k*-l)
^)-E^-^'RT,ifE(k*)>E(k*-l)' where the parameter k* indicates the consecutive number of a conformation variation step. A conformation is selected after variation of all pairs ( $ , ^ . ) . This procedure of sampling of conformations is repeated L times, however the first L0 selected conformations are discarded. In this way, as we have done in Section 6.4.2, the less statistically relevant conformations are ignored. The rest of the selected conformations are used for calculation of the radius of gyration, Rg,(inst)(k), by means of Eq. (8.9). The radius of gyration of a flexible polypeptide chain is then calculated as ,
M(L-Lo)
where M is the number of Monte Carlo steps.
268
Introduction to Non-covalent Interactions in Proteins
Let us first investigate the dependence of the radius of gyration calculated by Eq. (8.12) on the length of the chain and its amino acid side chain content. For this purpose we set Efl = 0 in Eq. (8.11). In this case the collection of conformations is done by Monte Carlo steps only, i.e. L = 1 and L0 = 0 in Eq. (8.12). If the polypeptide chain is homogeneous the calculated radius of gyration, Rg°(Nr), is expected to follow the relation (8.10). Indeed, as illustrated in Fig. 8.4, the radius of gyration exponentially depends on the chain length. The logarithms of the values of Rg° and Nr give straight lines with a slope a= 0.66. One can also see that the radius of gyration slightly increases with the length of the side chain.
U.5-L-,
1.0
,
1.2
,
,
1.4 1.6 lg(AW
?
1.8
Figure 8.4 Logarithm of the values of Rg° versus the logarithm of the values of Nr The symbols, as indicated in the legend in the panel, correspond to aliphatic side chains with different length taken into account in the calculations.
Now we can continue with the analysis of the influence of electrostatic interactions on the magnitude of Rg. Repeating the above procedure with non-zero electrostatic interactions we obtain an essential reduction of the radius of gyration. Thus, for instance, for a charged polypeptide chain with length of 80 peptide bonds and 20 charged side chains (10 positive and 10 negative) the Monte Carlo simulation gives a radius of gyration, Rg~ 17 A. For comparison, the radius of gyration of a uncharged chain with the same length is Rg° ~ 28 A. This result is worth commenting. First, it turns out that electrostatic interactions in unfolded state tend to reduce the size of the molecule. This effect is parallel to the
Electrostatic Interactions and Stability of Proteins
269
hydrophobic effect which enforces the protein chain to collapse and to form a hydrophobic core. It should be noted that the energy function used in our Monte Carlo simulation does not include the desolvation penalty, hence the influence of electrostatic interactions can be overestimated. We can assess this overestimation by comparing the calculated and the experimentally observed radii of gyration of unfolded proteins. In Table 8.1 such a comparison is given for two proteins: cytochrome c and hen egg lysozyme. The cytochrome c is a small but highly charged protein: among 104 amino acid residues 21 are basic and 12 are acidic. At pH 3 the net proton charge of the unfolded form of this protein is +10. This gives a radius of gyration, Rg = 20 A, which is fairly close to the experimental value. With the decrease of pH, the positive charge of the molecule increases, which leads to an increase of the electrostatic repulsion and consequently to an increase of the radius of gyration. As seen from the table, the experimentally measured and the predicted radii of gyration increase to 30.1 A and 33 A, respectively. A similar correlation between the charge content, or pH, and the radius of gyration is observed for the second protein (hen egg lysozyme). The satisfactory agreement between the theoretical and experimental results suggests that the omission of the desolvation energy does not induce essential distortion of the predicted effect of the charge-charge interactions on the size of the unfolded proteins. Hence, our conclusion that electrostatic interactions tend to reduce the size of unfolded proteins holds. Table 8.1 Experimental and predicted values of Rg in A for two proteins unfolded at different conditions. Protein Cytochrome c
Lysozyme
Conditions
Rg, Spherical model
pH3 pH<2 pH 7, 4 M GdmCl
20 33 33 - 38, Rg° 23 39 39 - 42, R°
pH5 pH2.0 pH 2, 4 M GdmCl
Rg, Experimental 18.1 30.1 31.0 23.5 37.9 35.8
270
Introduction to Non-covalent Interactions in Proteins
Independent evidence that electrostatic interactions tend to reduce the size of the unfolded proteins is provided by the chemical denaturation with guanidinium chloride. It is known that this denaturation agent efficiently screens the charge-charge interactions. In accord with this, the radii of gyration of chemically unfolded proteins are larger than those corresponding to pH 3 (cytochrome c) and pH 5 (lysozyme), approaching the calculated Rg°, as indicated in Table 8.1. The radii of gyration obtained at pH 2 or pH<2 are comparable or larger (lysozyme) than Rg° due to the electrostatic repulsion between the uncompensated positive charges (the carboxyl groups are in their protonated forms and hence neutral at these pH). Summarising the above results we can posit the hypothesis that at pH close to the isoelectric points of protein, which is within the pH range where proteins spontaneously fold in their native structures, electrostatic interactions act in parallel to the hydrophobic effect, i.e. electrostatic interactions contribute to the formation of the compact protein molecule. 8.3.3 Average distance between charges The determination of the distances between the titratable groups in unfolded proteins is a complex task. A direct application of the results of the statistics of linear polymers may lead to incorrectness because proteins are branched polymers with heterogeneous side chains. However, we have seen that the radius of gyration of unfolded proteins obeys Eq. (8.10), which is in fact an equation deduced for linear polymers. We can presume that the average distance between two titratable groups in an unfolded protein, being a quantity geometrical in character, also obeys the relation deduced for linear polymers. Let us explore this presumption. A possible way to express the distance between two charges in an unfolded polypeptide chain is to adopt the distribution function of endto-end distance in a flexible polymer. This approach has been chosen by Zhou4 who proposed an analytical expression for the probability for two charges to be at a distance range r + dr(r = r^ in Fig. 8.3):
Electrostatic Interactions and Stability of Proteins
r
p(r) =
i
Am 2 2nd'
\
3 n
271
3r
e
2d
dr.
(8.13)
A comprehensive deduction of the original expression is given in the book of Tanford "Physical Chemistry of Macromolecules"5. Instead of the end-to-end distance, Zhou proposes r to be the distance between two charged groups. In this context, the parameter d is the root-mean-square distance between the groups if they were at the ends of the chain. The connection between d and the separation of the polymer elements in a flexible chain is d = blm, where b is the bond length and / is the number of the bonds separating the individual polymer elements. The fact that the titratable groups reside on the side chains can be taken into account by introducing a correcting parameter: d = blm + s. The bond length, b, and the correcting parameter, s, are adjustable parameters. Alternatively, we can employ the already known Monte Carlo method used to evaluate the radius of gyration. Because Eq. (8.13) is for a homogeneous chain, we shall perform Monte Carlo simulation for various homogeneous polypeptides, each with different amino acid side chains. Here we let the side chains adopt different stereochemically allowed conformations. In Fig. 8.5, examples for two values of / are given for poly-glutamic acids and for poly-lysine. The comparison of the probability distributions obtained by the two approaches shows some difference. The distributions obtained by Eq. (8.13), which are in fact Gaussian distributions of the charge-charge distances, are broader and allow unrealistic distances. Such are the distances r < 3 A at which the atoms overlap. Also, Eq. (8.13) allows distances that are larger than those two charges can really have. For instance, the maximum distance that the s-amino groups of two adjacent lysine residues in the sequence can have is less than 16 A. As seen from the upper panel of Fig. 8.5, according Eq. (8.13) distances beyond this physical limit are with a substantial probability. On the other hand, the maxima of the distribution calculated by Monte Carlo simulation for the different types of polypeptides are close to each other and very close to those predicted by Eq. (8.13). This is clearly seen in both panels of Fig. 8.5. The distributions calculated by
272
Introduction to Non-covalent Interactions in Proteins
Monte Carlo simulation for / = 4 and shown in the lower panel of Fig. 8.5 have peaks at 13 A and 15 A for the poly-glutamic acid and polylysine, respectively, whereas Eq. (8.13) gives a maximum at 16.5 A. It can also be shown by Monte Carlo simulations that heterogeneous polypeptide chains show similar behaviour. Hence, if we are interested in the most probable distances between the charges, any of the two approaches can be applied.
Figure 8.5 Probability distribution of the distance between the charge sites of the titratable groups in poly-glutamic acid and poly-lysine. The distributions calculated by Monte Carlo simulation6 (accordingly marked) are compared with the distributions calculated by Eq. (8.13) with parameter b = 7.5 A, and s = 5 A4.
Electrostatic Interactions and Stability of Proteins
273
8.3.4 Ionisation equilibria in unfolded proteins As we mentioned at the beginning of this chapter, the tools for calculation of ionisation equilibria in unfolded state are already available. The degree of deprotonation can be calculated by Eq. (6.22), with the energy function defined by Eq. (6.20). We can make the simplifying assumption that all titratable sites have the same environment. This means that the titratable groups of a given type will be characterised by the same shifts of their ionisation equilibria due to the desolvation energy and charge-dipole interactions. Hence, only charge-charge interactions can make the individual sites in unfolded proteins distinguishable. To calculate the charge-charge interactions we need to know the average distances between the charges. They can be obtained by Eq. (8.13) or from the results of the Monte Carlo simulation. In both cases however electrostatic interactions are not taken into account.
Figure 8.6 An instant configuration of the virtual chain (black lines) connecting the charges on the surface of the dielectric sphere representing the unfolded protein. The polypeptide backbone is schematically given in white lines. The variation of a charge position on the surface of the sphere is shown by an arrow. The values of/), and bM do not change upon the variation.
According to the spherical model, presented in Section 8.3.1, the protein charges are located on the surface of a sphere. We will combine this assumption with the results from the previous section by
274
Introduction to Non-covalent Interactions in Proteins
representing the charges as segments of a virtual chain with different bond lengths, &,-(/,•)> which depend on the number of peptide bonds, /*, separating the titratable amino acid residues i and i + 1 (Fig. 8.6). For simplicity, we can use fixed bond lengths corresponding to the maxima of the distributions of r obtained by one of the methods described in the previous section. Each of the configurations of the virtual chain determines a charge constellation on the surface of the sphere and a set of mutual chargecharge distances ry. All sets {r^} satisfy the condition ru+1 = &,•(/,•), i.e. the adjacent charges, i and i + 1, in the virtual chain must maintain a fixed distance. The different configurations will be characterised by different magnitude of the electrostatic interactions, resulting in different ionisation equilibria of the titratable sites. This can be taken into account by considering the individual configurations of the virtual chain as microscopic states. For this purpose we need to extend the sequence x used in Eqs. (6.20 and 6.22) by introducing additional states. Such an extension has already been discussed in Chapter 7, where we were looking for ways to take into account the conformational flexibility of folded proteins. We have formulated Eq. (7.8), which was limited to position variation of the polar hydrogen atoms only, which helped us reduce the prohibitively complex computational task. In the case of the spherical model, the computational complexity is already essentially reduced. For instance, here we do not need to calculate the desolvation energy of the titratable sites for each individual configuration of the virtual chain. For the purposes of our task, we extend the sequence x defining the microscopic states in Eqs. (6.22 and 7.8) towards the variation of the charge positions: X = (X\, X2,...,
Xi,...,Xpj,
I"i, T2,-..,
1*;,..., I ' M ) >
where r ; is the location of the /-th node (charge) of the virtual chain on the surface of the sphere. It is clear that the sums in Eq. (6.22) cannot be directly calculated. We can however substitute the rigorous solution of Eq. (6.22) with the Monte Carlo simulation and Eq. (6.24), as described in detail in Section 6.4.2. According to the extension of x, the variation of the microscopic state within one Metropolis step includes not only the
Electrostatic Interactions and Stability of Proteins
275
protonation variables, x„ but also variation of the charge positions r,. The variation of r, should satisfy the condition of invariance of the bond lengths bi(lt) and bM(li+{) connecting the adjacent nodes in the virtual (see Fig. 8.6). Table 8.2 Comparison of experimental and calculated p/f values of the titratable sites in Drosophila protein drk. Site Glu02 His07 Asp08 Asp 14 Asp 15 Glul6 Glu31 Asp32 Asp33 Glu40 Asp42 Glu45 Glu54 His58
Experimental3 4.08 7.07 3.75 3.97 3.99 4.44 4.37 4.05 4.13 4.42 4.05 4.45 4.28 7.83
Calculated7 4.23 6.71 3.71 3.94 3.95 4.43 4.48 3.99 3.97 4.37 3.87 4.29 4.32 6.85
Calculated6 4.51 6.94 3.89 4.00 3.94 4.47 4.39 4.01 3.99 4.39 3.99 4.49 4.43 7.06
The above condition makes the distributions of the charges on the sphere a quasi-random distribution. This has an important impact on the ionisation equilibria in unfolded proteins: the individual titratable sites have distinguishable pK values. As seen from Table 8.2, where two sets of calculated values are compared with the experimental data, the variation of the experimental pA" values of the individual sites reaches a magnitude of about 0.4 pH units. This variation is smaller than that we usually observe in folded proteins, still it is significant enough to be measured and hence to have detectable influence on the pH dependence of protein stability. It is important to note that the prediction of the pK values was made without any assumption about a residual structural organisation of the protein molecule. It follows that the obtained variation of the ionisation equilibria is a result of the quasi-random distribution of the charges in the unfolded state. If we calculate the pK values using a random charge
276
Introduction to Non-covalent Interactions in Proteins
distribution, i.e. if we ignore the condition r,,,+; = £,(/,), the split of the pK values for the different types of titratable groups vanishes. This is illustrated in Table 8.3, where the data for the protein barnase are listed. For the unfolded state of this protein only average pK values are experimentally available (column A). In columns B and C the calculations made on the basis of quasi-random and random distributions of the charges are presented. As seen, the pK values calculated on the basis of random charge distribution do not differ for a given type of titratable groups. Our analysis shows that electrostatic interactions in unfolded state are relevant and should be taken into account when pH dependence of protein stability is at issue. Thus, in Eq. (8.8), which gives the pH dependence of the unfolding free energy, the charge value Q" should not be taken from the pK values of the free amino acids or from model compounds. This quantity should rather be calculated by taking into account electrostatic interactions in unfolded state. An illustration of that can be the comparison between experimental and theoretical values of ZlG(pH) given in Fig. 8.7. There it can immediately be seen that neglecting electrostatic interactions in unfolded state leads to an essential overestimation of the stability of the protein at pH above 4. Table 8.3 Experimental and calculated pK values of the titratable sites in the protein barnase. Site Asp8 Asp 12 Asp22 Asp44 Asp54 Asp75 Asp86 Asp93 AsplOl Average Glu29 Glu60 Glu73 Average
A, Experiment2
3.50
3.70
B, Quasi-random 3.31 3.20 3.17 3.33 3.28 3.09 3.27 3.15 3.28 3.23 3.78 3.64 3.62 3.68
C, Random 3.36 3.35 3.35 3.36 3.36 3.36 3.36 3.36 3.37 3.36 3.87 3.87 3.87 3.87
Electrostatic Interactions and Stability of Proteins
277
20-
„ 15o .5
1 10 S
5
a. ^
0 -5-1 1
1 2
, 3
, 4 pH
, 5
, 6
Figure 8.7 pH dependence of the denaturation free energy of the protein barnase. Experimental data2 (open circles) are compared with the theoretical curves obtained by taking into account electrostatic interactions in the unfolded state (continuous line) and with the model pK values from Table 6.3 (dashed line).
8.4 Thermal Stability of Proteins During the last few decades new, and in a sense surprising, knowledge about the conditions determining the limits of life has been gained. It has been found that there are organisms which grow at temperatures of boiling water. These organisms are called hyperthermophilic organisms or hyperfhermophiles. Accordingly, the proteins from hyperthermophiles are called hyperthermostable proteins. The organisms that grow at temperatures between 50 and 80°C are called thermophilic organisms or thermophiles. Obviously, thermo- and hyperthermostable proteins must maintain their functions, and hence their three-dimensional structure, at temperatures at which their counterparts from the mesophilic organisms (those preferring temperatures between 20 and 50°C) are unfolded. We should stress again that the increased thermodynamic stability is inherent to the proteins from thermo- and hyperthermophiles only at temperatures around or higher than Tm of their mesophilic counterparts. It is striking that no essential difference in the fold pattern or in the amino acid
278
Introduction to Non-covalent Interactions in Proteins
composition between these two classes has been found. It follows that the enhanced thermal stability of proteins from thermophiles is coded in their amino acid sequences so that the balance of non-covalent interactions is appropriately shifted without affecting the fold pattern and the functional properties. It is a common opinion that thermal stabilisation can be achieved by small changes in different locations in the protein structure at which electrostatic interactions, hydrogen bonding, as well as packing and the hydrophobic effect, are involved to different and varying extent. There is however an increasing body of experimental evidence showing that proteins from hyperthermophiles are relatively richer in salt bridges and salt bridge networks in comparison to their mesostable counterparts. For instance, the salt bridge network shown in Fig. 3.20 stabilises the subunit assembly of the hyperthermostable protein disulfide oxidoreductase from Pyrococcus furiosus. As far as this protein is active as a dimer at the physiological temperature for this organism (~90°C), one can presume that the salt bridge network contributes to the (thermodynamic) stabilisation of the dimer at high temperature and hence contributes to the increased thermal stability. Another example of increased content of salt bridges is the lumazine synthase from the hyperthermophile Aquifex aeolicus. This hyperthermostable protein has a melting temperature about 119°C and is characterised by the largest ion pair content observed so far8. The lumazine synthases are large complexes of 60 subunits which are organised in an icosahedral capsid resembling small viruses. The geometrical organisation of the complex approximates a sphere, which results in a minimised solvent accessible surface and hence in favoured protein-solvent interactions. The content of charged side chains of two lumazine synthases from the mesophile Bacillus subtilis and from the hyperthermophile Aquifex aeolicus are compared in Fig. 8.8. As seen, most of the polar residues in the Bacillus subtilis protein are substituted by charged ones in the protein from Aquifex aeolicus. We have to note that the lumazine synthase from Bacillus subtilis is also a thermostable protein with Tm ~ 93°C, yet essentially less thermostable than the counterpart from Aquifex aeolicus.
Electrostatic Interactions and Stability of Proteins
279
Figure 8.8 Titratable and polar side chains on the capsid surface of lumazine synthase from Bacillus sttbtilis (left panel) and from the hyperthermophile Aqttifex aeolicus (right panel)s. The side chains are coloured as follows: acidic - red, basic - blue, polar - green. Images were kindly provided by Dr. Xiaofeng Zhang, Karolinska Institutet, Dept. Biosciences and Nutrition.
The large number of observations similar to those shown in Figs. 3.20 and 8.8 suggest that the elevated thermal stability of the proteins from hyperthermophiles and the increase of the salt bridges number are parallel phenomena. There are arguments in favour of such a hypothesis. At temperatures around 100°C the relative dielectric constant of water is around 55. This leads to a straightening of the charge-charge interactions in comparison with those at temperatures typical for the environment of the mesophiles (see Fig. 5.3 for comparison). Also, taking into account that the desolvation effect results from the difference between the relative dielectric constants of the protein and the environment (see for instance Eq. 5.17), the desolvation penalty arising from the formation of a salt bridge reduces. These two effects act opposite to the increased thermal motion at high temperature, thus contributing to the salt bridge stabilisation. There is also an indirect indication that electrostatic interactions may be of importance for the stabilisation of protein structure at high temperatures. At temperatures above 110°C the hydrophobic effect diminishes, therewith its stabilising effect of protein-solvent reduces. On the other hand, the analysis of electrostatic interactions in unfolded
280
Introduction to Non-covalent Interactions in Proteins
proteins shows that the reduction of Rg weakly depends on temperature in the range of interest (Fig. 8.9). At high temperatures, this effect can compensate the reduced role of the hydrophobic interaction, which are — as we have seen in Chapter 4 — a major stabilising factor. 28-
n n n D n p n n n n D n a n
26<
24-
so 22as 20-
1816i ()
o
„ o o o o o ° o n o o O ° OO 20 40 60 80 100 120 140 Temperature, °C Q
Figure 8.9 Temperature dependence of the radius of gyration of a charged polypeptide chain of 80 amino acid residues (circles) (see Section 8.3.2). The radius of gyration of the same chain in the absence of electrostatic interactions is given for comparison (squares).
The observed increase of the size of the salt bridge networks in hyperthermostable proteins also deserves attention. We know that hydrophobic material in water, such as an oil drop, is surrounded by a clathrate of hydrogen bound water molecules. In proteins, this clathrate is more complex, including hydrogen bond networks in which protein functional groups are involved. The clathrate formation is an effect of protein solvent interactions which stabilises the three-dimensional structure of the protein molecule. We have noted that the mesophilic lumazine synthase is characterised with optimised protein-solvent interactions and is also a hyperthermostable protein. This observation suggests that perhaps the thermal stabilisation of proteins is driven by the optimisation of protein-solvent interactions. As optimisation of protein solvent interactions at high temperature can be considered the modification of the hydrogen bond network forming the clathrate surrounding the protein molecule by replacing the polar functional groups with charged ones. In this context, one can say that hyperthermostable proteins "rely" less on the surrounding solvent for the formation of a stabilising clathrate.
Electrostatic Interactions and Stability of Proteins
281
On the basis of the experimental material collected so far one can conclude that electrostatic interactions are acting as an optimiser of the protein-solvent interactions and play in this way an important role in the elevation of the thermal stability of proteins from hyperthermophilic organisms. However, it should be noted that our analysis is too speculative and on a descriptive level, which can make the above conclusion premature. The increase of the charge content and the number of salt bridges of hyperthermostable proteins can be a side effect. At high temperatures the amide groups of the glutamine and asparagine side chains tend to be chemically labile; at temperatures around 100°C deamidation of the side chains of these residues may occur. This fact correlates with the observation that hyperthermostable proteins are characterised by less glutamine and asparagine residues in comparison with their mesophilic counterparts. At the same time, hyperthermostable proteins have more glutamic acids and arginines. One could presume that during the evolution the unstable and solvent accessible glutamines and asparagines are substituted by glutamic and aspartic acids. In parallel, appropriate mutations of neutral to basic amino acids can compensate for the consequential charge excess. Moreover, the statistically significant increase of the number of arginines, whose guanidinium group can participate in up to four hydrogen bonds (see Table 3.4), can be correlated with an increase of efficiency of formation of hydrogen bond (salt bridge) networks, thus stabilising the hydrogen bond clathrate around the molecule. The resulting effect turns out to be favourable in both electrostatic interactions and protein-solvent interactions. Whether this effect is sufficient for the stabilisation of the proteins at high temperature remains however an open question. References 1. Pace CN, Alston RW and Shaw KL, (2000) Charge-charge interactions influence the denatured state ensemble and contribute to protein stability. Protein ScL, 9: 1395-1398. 2. Oliveberg M, Arcus VL and Fersht AR, (1995) pKa values of carboxyl groups in the native and denatured state of barnase: the pKa values of the denatured state are on 0.4 units lower than those of model compounds. Biochemistry, 34: 9424-9433.
282
Introduction to Non-covalent Interactions in Proteins
3. Tollinger M, Forman-Kay JD and Kay LE, (2002) Measurement of side-chain carboxyl pKavalues of glutamate and aspartate residues in an unfolded protein by multinuclear NMR spectroscopy. J. Am. Chem. Soc, 124: 5714-5717. 4. Zhou H-X, (2002) A Gaussian-chain model for treating residual charge-charge interactions in the unfolded state of proteins. Proc. Natl. Acad. Sci. U. S. A., 99: 3569-3574. 5. Tanford C, (1961) Physical Chemistry of Macromolecules. NY, London, Sydney: John Wiley & Sons. 6. Kundrotas PJ and Karshikoff A, (2004) Charge sequence coding in statistical modeling of unfolded proteins. Biochim. Biophys. Acta, 1702: 1-8. 7. Zhou H-X, (2003) Direct test of the Gaussioan-chain model for treating residual charge-charge interactions in the unfolded state of proteins. J. Am. Chem. Soc, 125: 2060-2061. 8. Zhang XF, Meining W, Fischer M, Bacher A and Ladenstein R, (2001) X-ray structure analysis and crystallographic refinement of lumazine synthase from the hyperthermophile Aquifex aeolicus at 1.6 angstrom resolution: determinants of thermostability revealed from structural comparisons. /. Mol. Biol, 306: 1099-1114.
Appendix A
Basic Definitions of Thermodynamics and Statistical Thermodynamics
Thermodynamics and statistical thermodynamics (also called statistical mechanics or statistical physics) are fundamental scientific disciplines. In this appendix, only the basic definitions and equations used in the textbook are given. Some of the equations are deduced in order to reveal the connection between the observed phenomena and the terminology used for their description on the one hand, and the mathematical instruments used to develop the appropriate formulae on the other hand. Subject and Main Definitions of Thermodynamics Thermodynamics is a phenomenological science, based on human experience collected through observations and experiments. This experience is summarised in laws called laws of thermodynamics. It is important to note that thermodynamics is not interested in what the structure of matter is. Neither is it interested in the nature of the variety of interactions that cause the properties of the observed objects. Thermodynamics describes and predicts the behaviour of the objects in nature and does not take into account their structure. Thermodynamics, like any other science, has its own "language", consisting of specific terms and basic definitions, used for the description of the phenomena of interest. In terms of this "language" the subject of thermodynamic can be summarised as follows: Thermodynamics studies macroscopic systems and the processes taking place in there. This includes 1) the thermodynamic state of a macroscopic system; 2) the direction of the
283
284
Introduction to Non-covalent Interactions in Proteins
processes; 3) the conditions at which the macroscopic system reaches equilibrium; and 4) the criteria determining equilibrium. Macroscopic system Each part of the world (which is part of matter) separated from the rest of the world (the environment) by an exactly defined boundary is called macroscopic system. Since thermodynamics does not deal with systems other than macroscopic ones, the word "macroscopic" is omitted and only the term system is used below. A system can be a single object or a group of objects that interact. Depending on the possible interactions of the system with the environment one distinguishes several types of systems. They are listed in Table A.l. Table A.l Different macroscopic systems. System Open Closed Adiabatic Isolated
Exchange of matter yes no no no
Exchange of heat yes yes no no
Exchange of work yes yes yes no
Equilibrium, thermodynamic state of a system Each system is described by a variety of parameters. Some of them like its position towards a given coordinate system, or its velocity do not change the nature of the system. Other parameters, such as temperature, density, volume, etc. are intrinsic to the system. These parameters are called internal parameters and determine the state of the system. If the parameters describing the system do not change with time, the system is in equilibrium state or just in equilibrium. The states treated by thermodynamics are the equilibrium states, hence equilibrium states are also called thermodynamic states. It is commonly accepted to use the term state instead of thermodynamic state.
Basic Definitions of Thermodynamic and Statistical Physics
285
Process The change from one thermodynamic state of a system to another thermodynamic state due to interactions with the environment is called thermodynamic process. Some processes in nature are spontaneous, whereas others are not. Spontaneous processes can only proceed towards equilibrium. For example, the transfer of heat can take place only from a warm body to a cold body until equilibrium is achieved (equalising of the temperatures of the bodies). One can also distinguish reversible and irreversible processes. A reversible process is a process in which the system and the environment both can resume their initial states. The cyclic process shown in Fig. A.l and discussed thereby is reversible. A cyclic process is not always reversible. If a cyclic process is irreversible, the system may return to its initial state, but some changes occur in the environment. In general, a process in which either the system or the environment cannot resume their initial states is called an irreversible process. Function of state As it has been pointed out above, a system is described by a set of internal parameters, such as volume, pressure, temperature, etc. A very important characteristic of these parameters is that they do not depend on the way the system attains a given state. For instance, if the system is in state a, characterised by volume Va and temperature Ta, and is shifted to another state b, with parameters Vb and Tb, it is not relevant whether first Va is changed to Vb and then Ta to Tb or first the temperature is changed and then the volume. Another important feature of the internal parameters is that they are mutually dependent. Each parameter can be expressed as a function of the other parameters determining a given state of the system:
y=f(Pi.P2,-)The above equation is called equation of state, whereas Y is function of state. The simplest example of equation of state is that for the ideal gas
V = kT/p,
286
Introduction to Non-covalent Interactions in Proteins
where Vis the volume, p is the pressure, and k = 1.381 X 10" J/K is the Boltzmann constant. It is useful to note here that the parameters of a system can be extensive, such as mass, energy, volume, or intensive, such as temperature, pressure, density. The extensive parameters are additive. For example, if the system is composed by sub-systems, each with a given volume, the volume of the main system is the sum over the volumes of the sub-systems. Also, if the sub-systems do not interact, the energy of the system is the sum over the energies of the individual subsystems. The temperature of the system cannot be a sum of the temperatures of the subsystems, hence temperature is an intensive parameter. In many cases it is convenient to work with intensive parameters. In such cases the Gas constant, R = 8.315 J/(K.mol) = 1.986 cal/(K.mol), is used instead of the Boltzmann constant [see also Eq. (2.1)]: Vp = RT . First law of thermodynamics, energy The first law of thermodynamics is the law of conservation of energy. The term energy is difficult to formulate. Perhaps the first statement to treat energy and its conservation has been given by Rene Descartes (in 1644), who postulated that in the universe there is an amount of motion that could neither be increased nor decreased. A few decades later, Huegens discovered that when two elastic bodies collide, the sum of the products of the mass and the square of the velocity of each of the bodies remains constant. Leibniz called this unchangeable quantity vis viva. Almost a century later, in 1735, Bernoulli noticed that a part of vis viva vanishes. This part he called vis mortua. The first quantitative formulation of the principle of conservation of the energy has been given by Gabrielle du Chatelet, who made the assumption that the sum of vis viva and vis mortua should be preserved. In 1787 Thomas Young renamed vis viva and vis mortua to actual energy and potential energy, respectively. William Thomson (Lord Kelvin) introduced the term kinetic energy instead of actual energy. The term
Basic Definitions of Thermodynamic and Statistical Physics
287
energy was for the first time introduced by d'Alambert in 1785 to describe that a body in movement can do work. This short historical survey illustrates the long way of development of the concept of energy. Today we understand energy as a general quantitative measure of the interactions and movements of matter. Energy cannot arise from nothing and cannot be transformed to nothing. A system can change its energy by receiving or emitting energy from or to the environment. The experimental determination of the energy content of a system is practically impossible. Only the change of the energy can be evaluated experimentally. The change of the energy of a system can occur by exchange of matter, by exchange of heat, and by exchange of work. Before giving a mathematical expression of the first law of thermodynamics, some comments on heat and work should be made. Heat The concept of heat is sometimes confused with temperature. This confusion comes from the observation that when two bodies which have different temperatures are in contact, the temperatures become equal after equilibrium is reached. However, what is transferred between the two bodies is not temperature but heat. According to Galilei, heat is a substance. It is available in all bodies. This substance is transferred between the bodies in contact and causes changes in their temperatures. It was called caloric. Caloric has the properties of a fluid, but is weightless. Observing the coupling between friction and heat emission, Francis Bacon proposed an alternative concept for heat. He related the heat with the internal motion of the particles in a system. One can speak about heat only if two bodies are in contact, i.e. we speak about exchange of heat. As a result of this exchange, the energy of one of the bodies is reduced, whereas the energy of the other is increased. Thus, a system can change its energy by exchanging heat with the environment. Because we speak only about exchange of heat it is not correct to say that a system contains heat.
288
Introduction to Non-covalent Interactions in Proteins
Work Work is the transfer of energy by ordered movement of the particles constituting a given system. Like heat, work is not inherent to any system. Heat and work are related and this relation was first noticed by Julius Robert Mayer in the middle of the 19th century. Analysing his observations on the hue of red of blood and connecting them with his knowledge of physiology, he arrived at the idea that heat and work must be exchangeable. He published his results (not without problems) in 1884. The mathematical expression of this relation is given by the equation AU=W + Q,
(A.l)
where AU is the change of the internal energy of the system, and W and Q are the work and heat exchanged, respectively. This expression is the first law of thermodynamics: The change of the internal energy of a system is equal to the sum of the heat exchanged with the environment and the work done by or on the system. Internal energy The internal energy of a system is the sum of the kinetic and the potential energy of particles constituting it. The internal energy is a function of the state. This statement directly follows from the first law of thermodynamics. Let us consider a reversible cyclic process, a thermodynamic cycle (Fig. A.l). For the moment we are not interested how the system changes its internal energy. From state a to state b the system changes its internal energy from Ua to Ub, AUa^}, = Ub~Ua. From state b, it returns to state a using a different and arbitrarily chosen way (shown in Fig. A.l as a dashed curve). Let us assume that the system changes its energy from Uh to Ua\ If Ua ^ Ua' it would mean that the system received energy from nothing or energy is transferred to nothing, which contradicts the first law of thermodynamics. It follows that Ua = Ua' and AUa^b-AUt^, = 0.
(A.2)
Basic Definitions of Thermodynamic and Statistical Physics
289
This also means that the internal energy is constant for a given state and does not depend on the way the system attains this state. State a
w
AUb-,a Figure A. 1 Thermodynamic cycle.
If the changes of the systems are sufficiently small, we can replace AU with dU. The thermodynamic cycle can be then described as
jdU=0. In this case dU is called exact differential. In general, if a quantity A is a function of state, dA is an exact differential. Enthalpy Consider a process in which the pressure is constant (isobaric process). From the first law of thermodynamics we have (dU)p=(dQ)p+(dW)p.
(A3)
We assume that the work is only due to the change of the volume: (dW)p = -pdV. The negative sign is conventional and means that the system is doing work. Substituting this in the above equation, one obtains (dU)p=(dQ)p-pdV or (dQ)p=(dU)p+pdV.
290
Introduction to Non-covalent Interactions in Proteins
Both terms on the right hand side of the above equations are functions of the state. The sum of functions of the state is also a function of the state hence Q is a function of state for an isobaric process. For an isobaric process we can write (dQ)p=d(U + PV)p. Because the left hand side of the above equation is an exact differential, the right hand side is also exact differential. It follows that U + pV is a function of the state. It is called enthalpy: H=U + pV. (A.4) Enthalpy is usually symbolised by H, which comes from the definition "heat content". This is to a certain extent misleading, because as we have pointed out heat is not inherent to the system. It is a function of state only for isobaric processes. On the contrary, according to Eq. (A.4) enthalpy is inherent to the system and is a function of state. For the enthalpy the exact differential can be written as follows: dH = dU + pdV+Vdp. Taking into account Eq. (A.3), the above equation becomes dH = dQ + Vdp.
(A.5)
Second law of thermodynamics, entropy The first law of thermodynamics gives the energy balance of processes. Equation (A.l) relates the work and heat exchanged by the system, but cannot be used to predict whether a process is possible, nor can it be useful for predicting the direction of a possible process. Let us return to the cyclic process shown in Fig. A. 1. Assume that the system moves from the state a to the state b absorbing heat: Moving from the state b to the state a the systems performs work: AUb^,=-W.
Basic Definitions of Thermodynamic and Statistical Physics
291
Thus, the introduced heat is transformed to work. This transformation occurs only through an appropriate engine. It follows from Eq. (A.2) that the heat introduced to the system is equal to the work performed by the system. This conclusion does not contradict the first law of thermodynamics. However, it is not true for all types of processes. It is a common feature of the processes in nature that while work can be converted entirely to heat, heat cannot be converted entirely to work. Also, human experience shows that the processes in nature are spontaneous and irreversible. These two empirical statements are the basis of the second law of thermodynamics. There are different ways to formulate the second law of thermodynamics. On the basis of the fact that heat cannot entirely be transformed to work, Thomson formulated the second law of thermodynamics as follows: "It is impossible to build a cyclic engine, whose only action is to produce work from an equivalent quantity of heat extracted from only one reservoir". It follows from the second law of thermodynamics that there is a part of the energy of a system that cannot be realised or converted to work. This part is called entropy. It is important to find an expression that connects entropy with the energy of the system. This expression can be derived from a special cyclic process realised in the so-called Carnot's cycle or Carnot's engine. We will briefly consider this cycle. The cycle of Carnot consists of four steps (Fig. A.2). Step 1 (from state 1 to state 2): The gas (the working substance of the Carnot's engine) is in contact with a hot reservoir with temperature T{. Heat, <2i> is transferred from the reservoir to the engine causing isothermal expansion of the gas. The work performed by the engine is -W\. Because the internal energy of the gas does not change, according to Eq. (A.l) Qx = -Wi. Step 2 (from state 2 to state 3): Adiabatic (no exchange of heat) expansion of the gas. The temperature is reduced to T2 and the work, -W2, is done at the expenses of the internal energy. Obviously at this step the internal energy is reduced. It can be shown that W2 = CV(T2 - Ti), where Cv is the heat capacity of the gas. The heat capacity is equal to the heat needed to increase the temperature of a system with 1 degree at constant volume (Cv) or at constant pressure (Cp). Step 3 (from state 3 to state 4): The gas is in contact with a cold reservoir with temperature T2. Isothermal compression of the gas takes
292
Introduction to Non-covalent Interactions in Proteins
place. The heat released by the gas is -Q2 and the work done on it is W3. Similarly to step 1, we use Eq. (A.l) to write Q2 = -W3. Step 4 (from state 4 back to state 1): Adiabatic compression; the work done on the gas is W4 = CV(T1-T2) = -W2.
hot reservoir, Tx
cold reservoir, T2 V Figure A.2 Cycle of Carnot.
On the basis of above considerations, for the work performed by the Carnot's engine one obtains -W = Ql+Q2,
(A.6)
where W = Wj + W2 + W3 + W4. From the expression for the efficiency of a thermal engine, 77, which we take here as given, we can relate the temperature of the reservoirs and the exchanged heat: W
=
QX+Q2=TX-T2
Q
(A.7)
Equation (A.6) states exactly that heat is entirely transformed to work. This is valid only for reversible processes. According to the second law of thermodynamics, if at least one of the processes in the Carnot's cycle is irreversible, the exchanged heat cannot entirely be transformed to work. Hence, in general Ql+Q2>-W and
Basic Definitions of Thermodynamic and Statistical Physics
293
Ql+Q2_Jl-T2 Qi T, For a reversible process Eq. (A.7) can be written as
It can be shown that any cyclic process can be represented as a chain of elementary Carnot's cycles, i.e. as a chain of small isothermal and adiabatic steps. The above equation can then be written in the form • ^ = 0, T where the subscript r indicates reversible process. The quantity
(A.8)
dQr T is an exact differential, hence it is a function of state. This function of state is the entropy: dQr T For an irreversible processes
(A.9)
• dS.
hence
If we compare the above equation with Eq. (A.8) we obtain ^ <J ^ T T
=0
or ^ 0. This inequality shows that for an isolated system the
294
Introduction to Non-covalent Interactions in Proteins
irreversible processes are associated with an increase of the entropy. As far as the spontaneous processes are irreversible, one can state that the spontaneous processes proceed with an increase of entropy. The entropy increases until the system reaches equilibrium. At equilibrium all parameters determining the state of the system, including the entropy, remain constant. Thus, at equilibrium, entropy is a constant and this constant is the maximum value that the system can attain. Connection between the first and second laws of thermodynamics Equation (A.9) and inequality (A. 10) can be united: dS>^-. (A.ll) T From the first law of thermodynamics, dU = dW + dQ , we substitute dQ and obtain du dw ds> - . Because pdV = —dW , ds>JU
+ PdV
T For a reversible process, the link between the first and the second law of thermodynamics is then given by dU=TdS-pdV, (A. 12) which is called the fundamental equation of thermodynamics. From Eq. (A.5) one can make the link between entropy and enthalpy: ds>dH-ydP T or for a reversible process dH=TdS+Vdp. (A. 13) This equation is also a fundamental equation of thermodynamics.
Basic Definitions of Thermodynamic and Statistical Physics
295
Free energy Consider Eq. (A. 12). If we replace pdV back with -dW we obtain dW = dU-
TdS.
This equation is, of course, valid only for reversible processes. If the temperature is kept constant (isothermal process), the above equation can be written as (dW)T =
d{U-TdS)T.
On the right hand side of the above equations we have (U - TS) which is a sum of functions of the state. Hence, WT is also function of the state. This function of the state, F = U-TS, (A.14) is called free energy or Helmholtz free energy. The free energy is this part of the internal energy of a system which can be transformed to work. Exactly the same procedure can be performed for Eq. (A. 13) to obtain the Gibbsfree energy: G = H-TS. (A.15) Processes at constant pressure and temperature The biological processes take place mainly at constant pressure and temperature (isothermal-isobaric processes). Therefore, it is useful to have a fundamental equation of thermodynamics for these processes. Combining Eqs. (A.4) and (A.15) one can write G = U + pV-TS. The exact differential of the Gibbs free energy is then dG = dU + pdV + Vdp - SdT - TdS . At constant temperature and pressure Vdp = 0 and SdT = 0, so dG = dU + pdV-TdS . Taking into account that for an isobaric process dH = dU + pdV, we can write dG = dH-TdS ,
296
Introduction to Non-covalent Interactions in Proteins
which is the desired expression. Thus, for an isothermal-isobaric process that enforces the system to move from one state to another AH = AG - TAS . (A. 16) Equation (A. 16) is among the most used thermodynamic expressions in physical chemistry of biological macromolecules, including proteins. It states that the enthalpy of a system consists of two parts: a part that can be transformed to work, AG, and a part that cannot be transformed to work, TAS. Direction of processes For an irreversible process Eq. (A.l 1) gives TdS > dQ or dQ - TdS < 0 . If we replace dQ with dU + pdV we obtain dU + pdV -TdS <0 or for an isothermal-isobaric process d(U + pV- TS)Tp
= d{H - TS)Tp
<0
or dG
T,p=const
<°•
This inequality shows that for an irreversible isothermal-isobaric process the Gibbs energy decreases. When the system attains equilibrium the Gibbs free energy achieves its minimum. Thus, for an isolated system the entropy tends towards its maximum, while the free energy tends towards its minimum. These two criteria allow us to predict the direction of a process. Temperature In thermodynamics, only the absolute temperature is used. It differs from the temperature scales used in other areas of science, such as Fahrenheit or Celsius. The connection between the different ways of measurement of temperature is simple, but important. The thermodynamic temperature is measured in Kelvin: 0 K = -273.15°C,
Basic Definitions of Thermodynamic and Statistical Physics
297
whereas 1 F = 9/5°C + 32. It is important to note that thermodynamic relations are valid only if thermodynamic (absolute) temperature is used. Subject and Main Definitions of Statistical Thermodynamics As we have pointed out, thermodynamics does not give a molecular picture of nature. Being a phenomenological science, it gives only a mathematical description of the macroscopic systems. For instance, Eqs. (2.1) and (2.2) given in Chapter 2 are equations of state of a gas system. The first one is valid for ideal gases, whereas the second one for real gases. Equation (2.2) takes into account the interactions between the gas molecules by introducing additional parameters; however, it does not reveal the nature of these interactions. The explanation of the phenomena described by thermodynamics on a molecular level is the subject of statistical thermodynamics. One can say that statistical thermodynamics (or statistical mechanics) makes a bridge between thermodynamic description of matter and the observable (measurable) quantities on molecular level. Usually, the experimental observation of a certain quantity is recorded by a smooth curve. Such a record is illustrated in Fig. A.3. This could be a spectroscopic record of the thermal unfolding of proteins. The left flank of the curve corresponds to the native state of the proteins and the measurable quantity has the value Q0; the system is in state 0. The right flank of the curve corresponds to the unfolded state, for which has the value <2i; m e system is in state 1. The record could also be the NMR chemical shift, S, of a histidine residue. At low pH (the left flank of the curve) the histidine residue is protonated (see for instance the equilibrium shown in Table 6.3, row 3) and the chemical shift has a value S= Q0. Upon increasing pH deprotonation occurs which is reported by . At deprotonated state (the right flank of the curve) S= Q\. The feature of interest here is that changes gradually. Taking the example with the histidine deprotonation we notice that a single histidine residue cannot be, say, half protonated. A hydrogen atom is either bound to the imidasole of the histidine residue or not bound, i.e. for a single histidine residue the measured £is either equal to Q0 or equal to Q\. The
298
Introduction to Non-covalent Interactions in Proteins
interpretation of the continuous change of is that at any stage of the experiment, certain amount of histidine residues are protonated while the rest of them deprotonated. The observable is then an average value. Another way to read this experimental record is that correlates with the probability to find a single histidine residue at protonated state. If is close to Q0, this probability is close to 1. We can also say that the population of the protonated histidine residues decreases with the increase of pH. ei
GoCondition: T, pH Figure A.3 Example of a dependence of an observable quantity Q on external conditions, such as temperature or pH. The observed value at given condition is the average .
In order to describe the mechanism of the dependence of on the change of the external parameters, i.e. to give a statistical mechanical interpretation, we have to be familiar with some basic concepts. Postulates of statistical thermodynamics The theory of statistical thermodynamics is based on two postulates. Similarly to the laws of thermodynamics, these postulates follow from human experience. However, while the laws of thermodynamics seem to be obvious and they do not need to be proved, the postulates of statistical thermodynamics are not obvious, they are not directly related to experimental observation, and cannot be proved. Before giving the definitions of the postulates, we should introduce the concept of ensemble.
Basic Definitions of Thermodynamic and Statistical Physics
299
Consider a system in equilibrium. This means that the macroscopic parameters describing the system do not change with time. An ensemble is an imaginary set of a very large number, iV^oo, of exact macroscopic copies of the real system (Fig. A.4).
original
Figure A.4 Ensemble: a set of copies of the original system.
All the systems in the ensemble are thermodynamically equivalent, but they are microscopically different. There are a very large number of microscopic states (quantum mechanical or classical) that correspond to one thermodynamic state. The microscopic states reflect the molecular character of the system. Consider a system of two molecules. Let us also assume that each molecule can exist in only two different forms, a and b. This simple system has four microscopic states: aa, bb, ba and ab. Because the forms a and b are different, the parameters describing the different microscopic states should also be different. The quantity Q is uniquely defined for each microscopic state. The states ab and ba are characterised with the same value of Q. The system cannot exist in two microscopic states simultaneously. However, the system can go from one microscopic state to another and then return to the first one, or go to a third microscopic state. Thus, in a certain interval of time the system occupies different microscopic states. If we measure the quantity Q in this interval, we will observe its average over the time, t. As defined above, the ensemble is a collection of systems, which have different microscopic states. The average over the values of Q corresponding to the individual microscopic states of the
300
Introduction to Non-covalent Interactions in Proteins
systems in the ensemble is called average over an ensemble e- The first postulate of the statistical thermodynamics is: In equilibrium, the average over time of a measurable quantity is equal to the average of this quantity over the ensemble: t = eOur goal is to predict e. In order to achieve this goal we have to know how the individual microscopic states of the systems are presented in the ensemble. Before approaching the problem of finding a mathematical expression for e (hereafter denoted just by ), we have to acquaint ourselves with the different types of ensembles. An ensemble comprising isolated systems is called microcanonical ensemble. The isolated system is defined by the number of molecules, N, its volume, V, and energy, E. If the ensemble comprises closed isothermal systems (defined by N, V and temperature, 7), it is called canonical ensemble. The third major type of ensembles is the grand canonical ensemble which comprises open systems. We are not going to consider this type of ensembles. While the first postulate of the statistical thermodynamics applies to all types of ensembles, the second postulate concerns only the microcanonical ensemble: In a microcanonical ensemble the distribution of the systems in the ensemble is uniform. Another expression of this postulate is that the individual microscopic states are presented by equal number of systems in the ensemble. This means that if we have Q. possible states, the probability to find a system in a given state is l/Q.. It follows from the first and the second postulate that an isolated system spends equal time in each microscopic state. This statement is known as the ergodic hypothesis. Ensemble average We will consider a canonical ensemble of systems, characterised by a fixed number of molecules, volume and temperature: (/Wr-ensemble). There are other canonical ensembles, such as Af/T-ensembles comprising of fixed molecules, pressure and temperature (isobaric-isothermal systems). As we have pointed out, the thermodynamic quantities describing these systems are mostly used in physical chemistry of biological macromolecules. We choose to work with NVT-ensemble
Basic Definitions of Thermodynamic and Statistical Physics
301
because the assumptions we are going to make are directly connected to the postulates of the statistical thermodynamics and closest to our intuition.
thermal ^ " isolation^ -
Et
Ei Ej
Ej
"Super system -j
1—
Figure A.5 Left: Canonical ensemble. Each system in the ensemble is characterised by the number of particles, N, volume, V, and temperature, T — (N,V,T) ensemble. Right: Super-system, member of (!NN,9/V,Et) microcanonical ensemble. Each of the system, comprising the super-system is in a thermal bath. The energy Ej is a function of N and V.
All systems in the canonical ensemble are characterised by a fixed number of molecules, volume and temperature, but the internal energy of the individual systems, however, can fluctuate (Fig. A.5). As far as the individual systems in the ensemble reflect microscopic states, it is better to avoid the term "internal energy", which is a macroscopic characteristic. Instead we will use only the term energy, Ej, to denote the energy of interaction between the particles (atoms, molecules) comprising the system. We can arrange the systems as shown in the right hand side of Fig. A.5. The boundaries between the systems are penetrable for heat, but impenetrable for the molecules of the systems. Let us imagine that all systems are immersed in a large thermostat with temperature T. On attaining equilibrium the thermostat is replaced by thermal isolation. This ensemble is now an isolated system which we call super-system. We have come to an essential point in our considerations: we will consider the canonical ensemble as an original thermodynamic system (the supersystem). Since the super-system is isolated, we can treat it in terms of the
302
Introduction to Non-covalent Interactions in Proteins
microcanonical ensemble and apply the second postulate of statistical thermodynamics: all microscopic states of the super-system are equally probable and are characterized by equal energy, Et. The components of the super-system (the systems comprising the canonical ensemble) can have different energies. As illustrated in Fig. A.5, two components have energy Et and three components have energy Ej. The total energy of the ensemble is E
t=HnJEJ>
where the summation is over the whole energy spectrum accessible to the components of the super-system. The product n,£} gives the energy of all components that have energy Ej. The number of these components is rij (in the example given in Fig. A.5 rij - 3, whereas nt - 2). The total number of components of the super-system is constant 3V = £ > , . .
(A- 18 )
The set nu n2, ... is called distribution, which we denote as n. There is a large number of distributions, but all of them must satisfy the conditions (A.17) and (A.18). The number of microscopic states, O(n), of the super-system, corresponding to a given distribution n, can be obtained by the combinatorial formula n M
*.<*+'4+•••>'. [[ rij\ ra1!n2!... ;'
(A.19)
Perhaps it is convenient at this point to remind ourselves of our initial task. We would like to construct a mathematical expression, allowing us to describe the change of the value of the measurable quantity when the conditions determining the state of the system are changed (see Fig. A.3). A starting point for our considerations will be the conclusion that reflects the probability to find our system at a given state and the postulate that the measured value is an ensemble average, e- Each system in the canonical ensemble is in a microscopic state characterised by energy Ej. Also, to each microscopic state there corresponds one value of the quantity of interest, Qj. We have to find the systems in the
Basic Definitions of Thermodynamic and Statistical Physics
303
ensemble which are in the state characterised by Ej. For an arbitrary distribution, n = (ti\, n2,... n;...), the fraction of the systems with energy Ej is tij/N. The overall probability will then be the average of nj/Nfor all distributions satisfying the conditions of Eqs. (A. 17) and (A. 18): n
J
J
W
According to the second postulate all microscopic states of the supersystem are equally probable. The amount of these states in the different distributions can be different. In this way, the individual distributions may contribute to the average value, Hj with different weights. This situation is exactly depicted by the standard formula for weighted average
J>;(n)Q(n) n,j =-
Vft(n)
The summation in the above equation is over all distributions satisfying the conditions given by Eqs. (A. 17) and (A. 18). Thus, the probability pj to find the system in the state with energy Ej is np-=
J
J
1 =
I»j(n)fi(n) "
a
(A.20)
w % X (") n
The ensemble average of the energy is then <E>=^PjEj
(A.21)
j
and for any parameter, such as the quantity Q: =YJp]Qj-
(A.22)
j
Equation (A.20) provides the entire information needed for the calculation of the ensemble average of any parameter of the system however it is extremely inconvenient to use. The same equation suggests the solution of this problem. Taking into account that it is a weighted
304
Introduction to Non-covalent Interactions in Proteins
sum, the states n/n) belonging to the distributions with the largest weight — which can be calculated by Eq. (A. 19) — are closest to the average values. The most probable distribution, which we hereafter denote by n*, is the distribution with the largest value of £l(n). It can be shown that if 5V—>oo, the average distribution coincides with the most probable distribution n*. Thus the Eq. (A.20) reduces to nf i n*Q.(n*) n* J Pj= — = —^- = —.
(A.23)
Although the expression for the probability is now simplified, the problem of finding the most probable distribution among the infinitely large number of distributions remains unsolved. Fortunately, there is a mathematical technique that can tackle with this problem. We will go through it skipping the details. The aim is to point up the connection between the mathematical routines and the resulting formula describing an observed phenomenon, for instance the dependence illustrated in Fig. A.3. First, we will use the fact that the largest value of Q.(n) gives the largest value of its logarithm. Thus, we substitute Cl(n) with \nQ.(n). In this way Eq. (A. 19) becomes: In Q(n) = In
= ln(W7) - In J | rij! = ln(W/) - £ In rij!.
iinr
j
j
j
In order to represent the logarithms in a more convenient way, we use the so-called Stirling's approximation, y
>.
In y\= 2_Tn j « jln(j)dj 7=1
~y\ny-y,
1
which we use to eliminate the summation in the above expression: l n a = W l n W - ^ - ^ ( " 7 - lnwj
-rij).
j
Substituting Wfrom Eq. (A. 18) one obtains In n = ( £ rij) In J > ; - Z (rij In rij).
(A.24)
Basic Definitions of Thermodynamic and Statistical Physics
305
The values of m corresponding to the maximum of lnQ.(n) must also satisfy the conditions of Eqs. (A.17) and (A.18). Thus, the most probable distribution nf- - (ni*,n2*,...) can be found by the solution of the set of equations -^-(lnft(n)-a^nI.-pXn«£.-) = 0 J = l , 2 , . . . dn
J
i
where the terms containing the sums are the conditions of Eqs. (A.17) and (A.18). The coefficients a and (3 are undefined parameters. The above equation is just the well known condition of finding an extremum of a function. After substituting lnQ(n) according to Eq. (A.24) and performing the differentiation, one obtains
ln^nt
-lnn ; - -a-$Ej
= 0.
i
Because i
and n; is the wanted n;*, the above equation can be written as l n W - l n n * -a-fiEj
= 0.
After reorganising the above equation, one finally obtains n*=^e-ae~^Ej.
(A.25)
The parameters a and /? are mutually dependent. To see this we will use again Eq. (A.18): j
and substitute nf from Eq. (A.25) in the above equation we obtain
which finally gives j
306
Introduction to Non-covalent Interactions in Proteins
It can be shown that the exponential term (3 = 1/kT, where k is the Boltzmann constant. We have mentioned that Eq. (A.20) provides the entire information needed for the calculation of the ensemble average of any parameter of the system. We also noted that this equation is inconvenient for calculation and reduced it to Eq. (A.23), where n*j had to be expressed. Eq. (A.25) provides the wanted expression. Let us find the ensemble average of the quantity Q given by Eq. (A.22) =TPJQJj
Introducing consecutively Eq. (A.23) and Eq. (A.25) the above equation can be rewritten as follows:
j
j
j
or finally
ZG; = '
I
-PE, RP Wj
•
(A-26)
Equation (A.26), called Boltzmann weighted sum, is the final solution of our problem. It has the form of Eq. (A.20), after all a weighed sum, however, now it contains quantities with physical meaning. It provides an expression of any parameter of a system and connects the macroscopic parameter, such as the quantity Q, with the microscopic character of the system. These are the microscopic states of the system, each characterised by energy Ej and a value of the quantity Q;. The exponent -PE: e ' is a weight factor for the states with energy E}. It determines the contribution of the value Qj to the observable average . It is seen that the lower the energy, the higher the weight of these states in the sum
ZQJ
Basic Definitions of Thermodynamic and Statistical Physics
307
hence the average value is mostly affected by the microscopic values Q;, belonging to the microscopic states with the lowest energies. Partition function The denominator of Eq. (A.26) is of special interest. It is called partition function which we denote here as A = ^~PE]
•
(A-27)
j
The partition function depends on the temperature via the exponential term (3 = 1/kT. Let us consider two extreme cases. The first one is the case of T= 0 (the absolute null). Due to P the partition function ~°°Ei
A
e ' =0 for any j with exception of the term containing Ej = 0. In this case the exponent is unity, i.e. the partition function A = 1. We take this conditionally, because according to the third law of thermodynamics the temperature of absolute zero cannot be reached. Also, there is no matter with zero energy. If T—>co, p = 0 and all terms in the sum in the right hand side of Eq. (A.27) are equal to 1. It follows that A = y—H». These two cases demonstrate that the partition function is a measure of the number of states accessible for the system at a given temperature. The energy Ej arises from the interactions between the elements (particles, molecules, atoms, etc.) of the system in the microscopic state j . In another microscopic state this energy is, of course, different. The ensemble average, Eq. (A.26), of the energies Ej of the microscopic states gives the internal energy of the system: U =<E>=
J
j
This equation we can write in a more compressed form:
308
Introduction to Non-covalent Interactions in Proteins
or after changing the differentiation from /? to T finally we obtain 7rT ,?9lnA U = kT2 . dT This is an expression connecting the thermodynamic quantity, the internal energy, with the partition function of the system. We would also like to have expressions connecting the entropy and the free energy of the system with the partition function. First we shall derive an expression for the entropy. We rewrite Eq. (A.21) U = T,PjEj i and after differentiating it we obtain: du
E d
J Pj + Z PjdEJ •
=Z j
(A 28
- >
i
Our system is in thermal equilibrium with the environment (see Fig. A.5), so the change of the internal energy depends on the volume. Thus, if we apply Eq. (A.3) for one microscopic state dEj = -PjdV, the second term of Eq. (A.28) becomes Z PjdEJ = - Z pjpJdv=~pdv • < A - 29 ) ;' J In the above expression we denoted the microscopic values of the pressure with the capital letter Pj in order to distinguish it from the probability pj. In the last term of the series of equations (A.29) p is the macroscopic pressure. The first sum of Eq. (A.28) will be rearranged as follows. The probability of a certain state pj is PJ=^rWe can express Ej from the above equation as follows: e
' = Apj
(A-30)
Basic Definitions of Thermodynamic and Statistical Physics
Ej=~(\nA
+
309
\nPj).
Substituting Ej in Eq. (A.28) we obtain 1 dU=--In
A^dpj
1 -^^nPjdpj-pdV
.
Because Lpj = 1, Ldpj = 0. Hence, the first term on the right hand side of the above equation is zero: dU
=~d(^ipjlnpj)-pdV.
If we compare this equation with Eq. (A. 12), we see that TdS =
-—dfYpAnpj). P J To obtain S we integrate the above equation S = -k^PjlnPj.
(A.31)
If we now substitute pj back from Eq. (A.30), we obtain the desired connection: 5 = felnA + —. T This equation immediately gives the connection between the free energy and the partition function: -kTlnA = U-TS , which is Eq. (A. 14). Hence the free energy is F = -kT\nA. The above expression gives the Helmholtz free energy, i.e. for a system held in constant volume and temperature. In many cases systems with constant pressure and temperature are of interest. For such systems (NPT)-ensemble is used. The free energy related to the partition function of this ensemble, called isobaric-isothermal partition function, is the Gibbs free energy. In this case, the enthalpy is the quantity at constant pressure corresponding to the internal energy at constant volume
310
Introduction to Non-covalent Interactions in Proteins
[compare Eqs. (A. 14) and (A. 15)]. All final relations between the thermodynamic quantities and the partition function of the (NPT)ensemble, Z, have the same form. Thus, we can write G = -kTlnZ. The partition function Z is called the isobaric-isothermal partition function.
Appendix B
Electric Dipoles
Definition The electrostatic potential at point A, q>A, of voluminous charges with density p(r) defined in a given volume, V, is given by the expression 9 = _J_
\mdV ,
(B.l)
where r is the radius-vector of the spatial coordinates (x, y, z), R is the distance between the volume element dV and point A. As far as the point A is arbitrarily chosen, we substitute cp^ with 9.
Figure B. 1 Charge distribution with density p(r) and point A distant from p. The distance between point A and the centre of the coordinate system, O, is r.
If p(r) presents the charge distribution of an atom or molecule, two basic features of the charge distribution are of interest. The first one is when the atom or the molecule is ionised. In this case the net charge, Q, of distribution is not zero: 311
Introduction to Non-covalent Interactions in Proteins
312
jp(r)dV = Q*0 . v
The second one concerns a neutral atom or molecule, i.e. Q-0. We will find an expression for the electrostatic potential that describes the above two cases. Consider the triangle OAdV (Fig. A . l ) . According to the cosine theorem R = (r2+m2
2
1
-2rmcos8)U2
=-(l +^ - - 2 - c o s # ) 1 / 2 , r r r
the integral can be written as follows
p = ^ _ . I J U l + 24--2^cos(9)-1/2dV. ATTEQ
r
r
r
Because the integration is independent of r, the term — was taken out r of the integral. If the point A is sufficiently distant from the charge distribution, one can assume that m « r. This allows us to take advantage of the series expansion 2 where
8
S«l: 1/2
and after ignoring all terms with exponent > 2 (because m « r) 1
1
J
2
fp[l + — cosfl + — ^ - ( 3 c o s 2 6 » - l ) ] J V . r 2 r
The above expression can be presented in a more convenient way:
Electric Dipoles
(p-
\pdV + 47t£Qr J 4zE0r2
J
313
\mcos8 • pdV (B.2)
2
+
f— (3cos 2 <9-l)pdV.
The first integral on the right hand side of the above equation gives the net charge Q. Consider the case Q ^ 0. As we have assumed that m « r, the second and the third terms of Eq. (B.2) are negligibly small in comparison with the first term and can be ignored. Thus, for the electrostatic potential one obtains 47Z80rV *
4neQr
This is the formula of the electrostatic potential created by a point charge. It follows that the electric potential of an arbitrary charge distribution can be approximated by a point charge. We should remember that this is true only if the point where the electrostatic potential is of interest is at a distance from the charge distribution essentially larger than the size of this distribution, i.e. if r » m. If Q = 0, the term —!— \pdV = 0 4x£0r J and (p at the point A is determined by the second integral in the right hand side of Eq. (B.2). This term is the dipole moment of the charge distribution. Similarly, if the dipole moment is zero, then the electrostatic field is determined by the third integral, which is called the quadrupole moment of the charge distribution. The electrostatic potential of a charge distribution with net zero charge at point A is given by 1 f (p = — \mcos9• pdV . 47TEQr J As before, we have ignored the third term in Eq. (B.2). The angle 6 is defined as the angle between the radius-vectors r and m determining the position of point A and the displacement of the elementary volume dV
314
Introduction to Non-covalent Interactions in Proteins
with regard to the origin, respectively (see Fig. A . l ) . It is known from vector-algebra that cos 6 = r • m , where „ r , „ m r = — and m = — r m are the unity vectors and f • m denotes the scalar product of f and m . Thus, taking into account that m c o s # = f - m , for the electrostatic potential one obtains
7 \mpdV .
A7l£Qr
J
The dipole moment, (X, of a given charge distribution can then be defined as \i= jmpdV
.
(B.3)
In this notation, the electrostatic potential looks like
(B.4)
V
A7t£Qr
Without limiting the generality of the above considerations, we can apply Eq. (B.3) for a system of point charges. In this case the integral transforms to a sum and i
where qt is the magnitude of the i-th charge and r, is its radius-vector. The geometrical centres of the positive and negative charges are rp = '+^
+
and r„ = ^—
,
(B.6)
i+
where the indexes i+ and f indicate summations over the positive and the negative charges, respectively. As far as the net charge of the system is zero, one can write
ZV =1 and 2?i =~1-
Electric Dipoles
315
Equation (B.5) can then be written as H = ( r p - r n ) 4 =m 4 -
(B.7)
In Eq. (B.7) m is a vector equal to the difference between the centres of the positive and negative charges (Fig. B.2).
Figure B.2 Left: vector difference. Right: vector sum.
It might be useful to recall the definitions of scalar and vector quantities. The electrostatic potential is a scalar quantity. It is characterised by its magnitude and, of course, does not depend on the choice of the coordinate system. This simply means that the magnitude of (p, calculated for instance by Eq. (B.4), is independent of the choice of the coordinate system. The vector quantities, such as the dipole moment, are characterised by magnitude and direction. Vectors should not be confused with radius-vectors. For instance, the coordinates r{xA, yA, zA), of point A in Fig. (B. 1) determine the components of a radius-vector. The elements of the sums in Eqs. (B.6) g,r, are also radius-vectors. They are called electric moments. If the origin of the coordinate system is changed then at least one of the values of xA, yA, and z,A is changed. In this way both the magnitude and the direction of the radius-vector r are changed. Hence, radius-vectors depend on the choice of the coordinate system. In contrast, vectors do not depend on the choice of the coordinate system. The dipole moment is a vector and hence it is a quantity independent of the origin of the coordinate system. It should be noted that the dipole moment has the physical meaning, i.e. it is a vector, only if Q = 0. Indeed, if we shift the origin of the coordinate system to point p', according to the vector algebra, the
316
Introduction to Non-covalent Interactions in Proteins
geometric centres of the positive and negative charges will have the coordinates r'„ = r„ + p' and r'^ = rp + p'. According to Eq. (B.7) the dipole moment in the new coordinate system will then be \i' = (r'„ - r'p)q = (r„ + p' - rp - p') = (r„ - rp)q = |X, i.e. \i' = \i.]fQ* 0, t
r
and we cannot construct Eq. (B.7). The charged groups in proteins are often represented as point charges. If the dipole moment of such a system of charges is of interest, one should take into account that this quantity has a physical meaning only at the isoelectric point, i.e. at the pH for which the net charge of the protein molecule is zero. At pH other than the isoelectric point Eqs. (B.3, B.5 and B.7) are not applicable. The discussions of pH-dependence of the dipole moment of a protein molecule that can sometimes be found in the literature are too liberal in the context of the above considerations. Electrostatic Field of a Dipole The electric field of a charge distribution is a vector quantity defined as dx
dy
dz
where the elements in the brackets in the right hand side are the components of the vector E, (Ex, Ey, Ez). From Eq. (B.4), and taking into account that m cos 6 - r • m , the electrostatic potential of a dipole can be written as 1
/u cos 6
'=4=^7-
(B 8)
'
For a given value of ju, there are two geometric parameters that determine the magnitude of (p. These are the angle Q, between the axis of the dipole moment direction and the axis connecting the dipole and point A where
Electric Dipoles
317
point A*. Using the fact that \y does not depend on the origin of the coordinate system we position the dipole at the origin collinear with the x-axis (Fig. B.3). In this orientation the electrostatic potential is symmetric in respect to x-axis, i.e. when rotating the point A around the x-axis at fixed 0 and r the value of (p does not change. This allows us to fully describe the electrostatic potential using two-dimensional space, for instance in the xy-plane.
Figure B.3 Dipole aligned with the x-axis. The electrostatic field is symmetric with respect to the x-axis, which means that rotating point around the x-axis at fixed 9 and r the value of (p does not change.
To find E, it is first necessary to express cos# and r in the Cartesian coordinates, x and y, of the point A. We will use the well known trigonometric relations x x 2 2 2 , n (B.9) r = x + y and cos 6 = — r (x2 + y 2 ) 1 ' 2 For the electrostatic potential one obtains then jUX
Reminder again that from the assumption m « r, it follows that the positive and negative poles of the dipole are equally distant to the point where the potential is of interest.
318
Introduction to Non-covalent Interactions in Proteins
for simplicity.
In the above expression we skipped the multiplier AXEQ
The x and y components of the electrostatic field are Er=-
E
y=-
dip dx
a
M(2x2-y2)
JUX
(x2+y2)5'2
dx (x2+y2)3'2
d(p d fix = 2 dy~ ~dy (x + y2)3/2
3/Jxy 2
(x +y2)5'2
Due to the symmetry, when rotating around the x-axis by 90°, the electrostatic potential does not change. Therefore, the z component can be expressed through Ey, just by substituting y with z. The magnitude of the electrostatic field of a dipole can be calculated using the expression E = Ex + E2 : 3x2
4ft£nr3
1/2
1+ 2
(* + A
where we put the skipped multiplier back. Using Eq. (B.9) one finally obtains i"
(?>cos26 + \)1/2
(B.10)
A7T£rlr'
It should be noted that the magnitude of the electric field of a dipole varies as r~3. Equation (B.9) tells also that at given r, E approaches its maximum when 6—> 0, i.e. along the dipole axis:
For 6 = 90° E has its minimum value. Note that from Eq. (B.8), it follows that for 6 = 90° the electrostatic potential (p = 0.
Appendix C
Solution of Laplace and Poisson-Boltzmann Equation
Laplace Operator The Laplace operator can be written as v V = V-(Vp).
(C.l)
The symbol V is called nabla (or del, or Hamilton's nabla) operator and denotes the gradient of a scalar field. To denote the Laplace operator the symbol A is also used (A = V ). If ^ is a scalar field (for instance, this can be the electrostatic potential created by a charge distribution, p) one can write V
319
(C.3)
320
Introduction to Non-covalent Interactions in Proteins
where Ex, Ey, and Ez are the components of E along the Cartesian axes x, y, and z, respectively. The divergence shows the density of the points which are the source of the field E. An important connection between the divergence of the field E and its source is given in the last section of this appendix. If E is an electrostatic field V-E =
—p, £0£s
where p is the charge distribution creating the field, whereas £Q and es are the dielectric constant of vacuum and the relative dielectric constant, respectively. This is the Poisson equation. If there is no source, p = 0, one obtains V-E = 0, which is the Laplace equation [see Eqs. (5.1)]. Spherical Symmetrical Form of the Laplace and Poisson-Boltzmann Equations In the case of spherical symmetry, the only space variable is the distance, r, between a point and the origin of the coordinate system. The origin of the coordinate system is also the centre of symmetry. Equation (C.l) can be written as V2(p(r) = V • V^J(r) = divgrad(p{r). For simplicity, in the following we will use just (p instead of
(C.4)
Nabla-Operator, Solutions of the Poisson-Boltzmann Equations
321
and taking into account that r is the distance to the origin, r - (x2 + y2 + z2)1'2, for the individual components one obtains —, —, and r r z —, respectively. Thus r r grad{r)= — = r ° . (C.5) r In fact, the above relation is valid for any vector. We continue with Eq. (C.4) by substituting grad(r) with the first equation from the series (C.5): • r) . (C.6) div{^P.grad{r)) = div(-^dr r dr The term
— is a scalar, so we use the rule: r dr div(sr) = V • (sr) = Vs • r + sV • r
or V • (sr) = r.grad(s) + s.div(j), where s is a scalar. After applying the above rule to Eq. (C.6) we obtain tfv(!^.r) =V(!^).p + 2 ^ , (C.7) r dr r dr r dr where we have used div(r) - 3. This value can be easily obtained just using the definition of divergence [Eq. (C.3)]. Recalling that ^depends on r, for the first term on the right hand side of the above equation we use the rule Vq>(r) = — dr
grad(r).
Thus the right hand side of Eq. (C.7) becomes (_
1 dq> 1 d2
3 dq> r dr
where gradir) is substituted according to Eq. (C.5). Taking into account that r.r = r2 we can write
322
Introduction to Non-covalent Interactions in Proteins
,_ 1 da> ld2(p,r2 r2 dr r dr2 r
3da>_d2tp r dr dr2
2 d
The above expression depends only on the distance to the origin. In this case the Laplace equation is written as
4 ^ - ^ = °1
dr r dr whereas the Poisson-Boltzmann equation [Eq. (5.9)] becomes: d2w 2 da) ,nou 2 l —%- + T- = K (p. (C.8b) dr2 r dr For the purposes of the further analysis, the above equations can have a more convenient form. Let us make the substitution z = <pr.
Then dz dr
.
d(p^ dr
and d2z „dw d2
r dr
dr2
The right hand side of this equation is exactly the left hand side of Eqs. (C.8). Thus, combining these equations and substituting back z with (p.r we obtain r dr or after multiplying both sides by r the above equation becomes: ^
= 0. dr
(C.9)
Nabla-Operator, Solutions of the Poisson-Boltzmann Equations
323
The Poisson-Boltzmann equation becomes respectively dr2
V
Solution of Laplace and Poisson-Boltzmann Equation in the DebyeHiickel Theory Boundary conditions The conditions that the solution of Eqs. (C.9) and (CIO) have to fulfil are called boundary conditions. They are applied at the boundaries between the three zones (see Fig. C.l), as well as at values of r where the electrostatic potential should have physically justified behaviour. In our case they are the following: a The electrostatic potential at the boundaries between the different zones is a smooth function of r. This is expressed by the equations g^R) = ^R) at the boundary between Zones I and //, and
b
(C.ll)
(pn{a) = (pm(a) (C.12) at the boundary between Zones II and ///. The normal component of the electrostatic field undergoes discontinuity proportional to the surface charge at the boundary. For the two boundaries one writes:
s„e,^L-e0e,^-a dn
dn
(C.13)
and
c d
d(pu = d
324
Introduction to Non-covalent Interactions in Proteins
0r//(a)=^//(a)
Figure C.l Boundary conditions for the solution of Eqs. (C.9) and (CIO). The different zones are designated by /, //, and /// in accordance with Fig. 5.1.
Solution We will begin with the Poisson-Boltzmann Eq. (C.IO), which is valid in Zone III. This equation falls in the class of differential equations z" + p.z' + q.z = 0, where z= <pr. The solution of this equation is given by the roots, X\ and X2, of its characteristic equation.A,2 + pX + q - 0 and is written as z=
Cle*+C2e**r.
In our case p = 0 and q - -x 2 : z"-x 2 .z = 0 and X2 - x2 = 0, which gives for A,i,2 = ±K. Thus, the solution of Eq. (C.IO) is (pr = Cxe~Kr +C2eKr .
Nabla-Operator, Solutions of the Poisson-Boltzmann Equations
325
In order for condition d to be fulfilled C2 = 0. Otherwise, the electrostatic potential will grow to infinity when the distance from the origin increases (r—>oo), which is absurd. For the potential in Zone III we finally write e~w • (C15) r To obtain the electrostatic potential in Zones II and / we use the equation of Laplace, Eq. (C.9). Because
dr2 d(<pr) = const dr
and hence
\d(<pr) = const \dr. For Zone II the integration of the above equation gives r
(C.16)
Obviously the same expression is valid for Zone I:
(C.17)
The integration constants C\, C3, C4, and C5 remain undefined. They will be found via the boundary conditions a and b. Let us begin with the boundary between Zones I and //. At this boundary r-R. According to condition a Eq. (C.ll) gives C5=C3+^f, R whereas Eq. (C.13) looks as
(C.18)
Introduction to Non-covalent Interactions in Proteins
326
£s£o
dr
'£i£o~rC5 dr
(C3+^) r=R
r=R
47JRl
The second term of the left hand side of this equation is zero because C5 does not depend on r. Also, we have taken into account that the surface charge o= q/47?R2. Thus, according to conditionb for the boundary r-R we obtain C „Ci
CA4
<7
R2
AKR
2 '
or CA
(C.19)
47I£Q£S
Condition b for the boundary between the Zones II and /// [Eq. (C. 14)] gives d(pu = d(pUI dr dr or taking into account Eqs. (C.15) and (C.16) C,
= C
dr
r
d_ dr
J
V
After performing the differentiation and substituting C4 from Eq. (C.19), for C\ one obtains Q=-
qe 47teQes (1 + «i)
(C.20)
Condition a, applied to the same boundary, where r = a [Eq. (C.12)], gives C,
=Cl + <± a
Nabla-Operator, Solutions of the Poisson-Boltzmann Equations
327
and after substituting C4 and Cx from Eq. (C.19) and Eq. (C.20), respectively: qe Ka e -m q - = C3 + 47T£0ss (1 + m) a AxeQ£sa ForC 3 we finally obtain <7 A7teQesa{\ + m)
C3 -
I t 4n£0esa
1
1 47T£0£S
a
(C.21)
qK 4TT£Q£S (1 + m)
1W
1+m
The last constant which remains without explicit expression is C5. We obtain it by substituting C3 from Eq. (C.21) and C4 from Eq. (C.19) in Eq. (C.18):
C5=
^
+
47U£0£S (1 + m)
^L
= ^L (1 _.J^ ) .
4ft£0£sR
47V£0£SR
\+ m
After all the integration constants have been defined we can substitute them in the expressions for the electrostatic potential in the different zones. In Zone /Eq. (C.17) is valid: (Pi = (1 —) • 4x£0£sR 1+ m For Zone II, according to Eq. (C.16) we write
(1--^-)
4!T£0£sr
(C.22)
(C.23)
l+m
and for Zone HI from Eq. (C. 15) we obtain a eK{a-r)
328
Introduction to Non-covalent Interactions in Proteins
Gauss' Theorem Let us return to Eq. (C.3) which defines the divergence of a vector field. We will consider an arbitrary vector field E(r), which is defined and differentiable in a certain space region V. This field could be, of course, an electrostatic field. The surface enclosing this region we designate with S.
E
ds/-
&];
Figure C.2 A vector field E and its source. The space region V is enclosed by the surface S.
Let dv and ds be the volume element and the normal vector to the surface S, respectively (Fig. C.2). The Gauss' theorem states that jdivEdv = JE-ds v s
(C.25)
and is valid for an arbitrarily chosen vector field, E, and space region, V. This means that the integral of the divergence of a given vector field — which gives its sources in the region V (in Fig. C.2 the source is just one point charge) — is equal to the magnitude of vector flux going through the surface S. If the source of E is outside V, i.e. there is no source in the space region, divEdv = 0, v
meaning that the flux in equals the flux out of the region V.
Index
absolute temperature, 27, 101, 296 aconitase, 118-120 active site, 2, 4, 9, 23, 27, 84, 211, 235-240 alcohol dehydrogenase, 83, 84, 179, 180,234-236 amino acid sequence, 7, 20 amino acids, 15, 16, 19 Anfinsen's dogma, 8, 22, 113
charge-dipole interactions, 177, 178, 197, 249, 250, chirality, 18 clathrate hydrates, 97 computational box, 160, 166 conformational entropy, 114 conformational flexibility, 178, 239 conjugated acid, 182 conjugated base, 182 cooperative ionisation, 212-215, 236 Coulomb's law, 32, 43, 136, 138, 155, 172,242,267 C-terminus, 17, 174 cycle of Carnot, 291,292 cytochrome c, 269
backbone, 13 barnase, 276, 277, 281 Boltzmann constant, 39, 286, 306 Boltzmann distribution, 130, 132, 159 Boltzmann factor, 207, 209 Boltzmann weighted sum, 39, 306 Born energy, 145, 146, 191-194, 225, 252, 253 Born model, 140-144, 146, 151 Born radius, 45 effective, 150 boundary conditions, 168, 323-325 Debye, 168, 169 zero, 168 Buckingham potential, 46, 47, 243
Debye boundary conditions, 168, 169 Debye map, 161, 162 Debye parameter, 136-139, 159 Debye-Huckel theory, 130-136, 139, 140, 151, 158,323 degree of association, 185 degree of deprotonation, 184, 200, 203, 204, 209, 212, 228, 230, 233, 236, 258, 273 degree of dissociation, 184 degree of protonation, 184-186 denatured state, 5, 115, 225 desolvation energy, 145, 150, 190, 191, 193, 194, 198,209,249,265 desolvation penalty, 145, 252, 253, 279 dielectric cavity, 139, 146, 148, 149, 151, 157,264
caloric, 287 cavity volume, 123-125 central ion, 130-134 charge map, 162, 196 charge-charge interactions, 146, 177, 178, 199, 200, 270 in unfolded state, 262, 269, 273
329
330 dielectric constant, 94, 153-155 distant dependent, 155, 267 effective, 156 relative, 132, 143, 152, 154, 156,158, 159,239,265 of water, 93, 94, 136,137,279 in proteins, 154, 155, 157 vacuum, 33, 132 dielectric map, 161, 162, 218 dihedral angle, 18, 19, 59, 243, 266, 267 dipole moment, 29, 30, 35, 38, 51, 52, 153, 154,313-316 peptide, 18, 173, 249 induced, 35, 37, 42 instant, 30, 31 dipole-dipole interactions, 38, 42-44 dipole-induced dipole interactions, 30, 42,44 dispersion forces, 29-31, 35-38, 42-44,71 disulfide oxidoreductase, 82, 83, 279 disulphide bridge, 14 divergence, 319-321,328 effective Born radius, 150 electronegativity, 52-55, 66, 67, 70 end-to-end distance, 271 ensemble, 298-303, 310 average, 204, 246, 302, 303, 306 canonical, 300, 303, 304 microcanonical, 300 enthalpy, 142, 143, 290, 296, 310 entropy, 98, 99, 102, 114, 143, 291, 293, 294, 296, 308 configurational, 115 conformational, 114 of folding, 114,116 of transfer, 101 equation of Laplace, 131, 319, 323 equation of motion, 242 equation of Poisson, 131, 158 equation of Poisson-Boltzmann, 131, 134-136, 159,319-324 equation of Schrodinger, 31-33
Index equation of state, 27, 285 equilibrium state, 284 ergodic hypothesis, 300 exact differential, 289 excluded volume chain, 266 finite difference formula, 165 finite difference method, 157, 159-162, 172, 196 flexible chain, 264, 271 flexible polymer, 264, 265, 270 flickering clusters model, 95 focusing, 169 folded state, 5, 20, 86, 104, 110, 255, 256 free energy, 86, 87, 105, 149, 189, 259, 294,295,309,310 ofGibbs,295,310 ofHelmholtz, 295, 309 of hydrogen bond formation 84, 85 of transfer, 85, 101,104, 105 of unfolding, 256, 257, 260, 261, 277 function of state, 285, 289, 290, 293 functional group, 56, 63, 66, 76, 78, 83, 87,88, 177,218,219 functional properties, 2 Gas constant, 27, 114,286 Gauss' theorem, 162, 328 Gaussian distributions, 271 Generalised Born model, 149 Gibbs free energy, 295 hard sphere potential, 266 harmonic function, 243 heat, 287 heat capacity, 291 Helmholtz free energy, 295, 309 hen egg lysozyme, 269 Henderson-Hasselbalch equation, 182, 184 Hooke's law, 243 human serum albumin, 210 humany-interferon, 20, 111, 112, 114, 115
Index hydration shell, 140 hydrogen bond, 6, 9, 51, 56-65, 84-86 angle, 57, 72, 74 bifurcated, 67 intermolecular, 58 intramolecular, 58-60, 73, 74 length, 64-66, 68-70, 72, 74 local, 79 long-range, 79 moderate, 69 networks, 80-83 potential functions, 71 strength, 67 strong, 67, 68 three-centre, 67 two-centre, 66 weak, 70 hydronium, 68 hydrophobic bond, 10, 91 hydrophobic compounds, 10, 91, 92 hydrophobic core, 10, 25, 102, 104, 112, 116, 118 hydrophobic effect, 102 hydrophobic force, 91, 92 hydrophobic groups, 14 hydrophobic packing, 121 hydrophobicity, 105-107 hydroxonium, 68 hyperthermophiles, 277-279 hyperthermostable proteins, 277, 278, 280,281 iceberg model, 96-99, 101 inclusion compounds, 96, 97 insulin, 7, 23 internal cavity, 122, 125 internal energy, 288, 289, 301, 307, 308 intrinsic pK, 198, 202, 226, 251 ion exclusion layer, 158 ion pairs, 80 ionic bond, 52 ionic strength, 135 ionisable groups, 146, 177, 178 isoelectric point, 210, 270, 316
331 kinetic energy, 26, 37, 286 Laplace equation, 131,319, 323 Laplace operator, 32, 131, 319 Laplacian, 131 Lennard-Jones potential, 47 London dispersion forces, 29 Lorentz-Berthelot mixing rule, 49 lumazine synthase, 278-280 macroscopic system, 283, 284, 297 main chain, 12 melting temperature, 94, 256, 278 Metropolis algorithm, 207, 208, 267 Metropolis step, 274 microscopic pK, 219, 223, 224, 226 microscopic state, 114, 201, 207-209, 219-222, 224, 225-228, 232, 241, 257, 258, 274, 299, 300-303, 307, 308 model compound, 84, 178, 186-188, 191-193, 195,221-226,263 models of unfolded proteins, 262 molten globule, 121 Monte Carlo simulation, 206, 207, 266, 271-274 mutagenesis, 8, 21, 81, 87, 125, 230 site directed, 81, 125 native state, 5, 255 NK-lysin, 172, 173 N-terminus, 17, 174 overlap forces, 46 oxonium ion, 68 pancreatic trypsin inhibitor, 240 parameter ofDebye, 136-139, 159 internal, 284, 285 partial ionic character, 53, 54, 140 partial specific volume, 121, 123 partition function, 257, 307, 310 peptide bond, 12, 17, 251 pH, 183
332 pH-sensitive sites, 218, 219, 221, 226 Poisson-Boltzmann equation, 131, 134-136, 159,319-324 polar hydrogen, 54, 217-220 polarisability, 35-37 polarisation, 36, 194 atomic, 152, 155 electronic, 152, 155 polypeptide chain, 17 porin, 75, 211,215 potential bond bending, 243 bond stretching, 243 Buckingham, 46 energy, 31-34, 38, 42, 242, 286, 288 functions, hydrogen bond, 72 hard sphere, 266 Lennard-Jones, 47 primary structure, 8 principle of Pauli, 44, 56 process, 285 cyclic, 285, 288, 290, 291, 293 irreversible, 285, 291, 293, 296 reversible, 285, 294-296 spontaneous, 285 thermodynamic, 285 proton acceptor, 51, 55, 58 proton donor, 51, 52, 54, 55, 57, 58 proton relay chain, 235, 238 protonation state, 14, 78, 129, 177, 200-209, 258 distribution, 202, 203, 205, 258 protonation/deprotonation equilibrium, 181 pseudo forces, 91,92, 102 quadrupole moment, 313 quasi-random distribution, 275 quaternary structure, 11 radius of gyration, 264-269, 280 radius-vector, 311, 315 Ramachandran plot, 19 resonance assisted bonding, 75
Index ribonuclease Sa, 88, 89 ribonuclease Sa3, 88, 89 ribonuclease Tl, 2, 3, 229, 231 mutant, 230 salt bridge, 78-80, 82, 83, 231, 253, 256 dynamics, 252, 253 networks, 80-82, 247, 278-280 scalar quantity, 315 secondary structure, 10, 73, 76, 82, 89 elements, 74, 85, 173, 179 self energy, 142,193,196 side chain, 13 simulated annealing, 209 solvation energy, 141, 142 solvation enthalpy, 142, 143 solvent accessibility, 105-107, 109, 111, 118 static, 109, 112 solvent accessible surface, 108, 111 solvent contact surface, 108, 109, 157 spherical model, 264, 269, 274 Stern layer, 158, 168 structural stability, 256 subunit, 11 tertiary structure, 11 thermal stability, 256, 277, 278, 281 thermodynamic process, 285 thermodynamic stability, 256, 257 thermodynamic states, 285 thermophiles, 277 titratable group, 177 titratable site, 178 torsion angle, 18, 144,243 transcription enhancer factor 2, 78 unfolded state, 5, 20, 84, 102, 104, 109, 113,255,273,277 united atoms, 109 van der Waals radii, 47, 48, 108-110, 150,267
333
Index vector difference, 315 vector quantities, 315, 316 vector sum, 315 Verlet algorithm, 245 virtual chain, 273, 274 void volume, 122-124 volumetric anomaly, 93
water molecule, 55 weighted Boltzmann sum, 204 xylanase, 246, 248, 249, 252, 253 zero boundary conditions, 168
Ithough textbooks on the physics of condensed matter consider non-covalent interactions in detail, their application for analysis of protein properties is often poorly presented or omitted. On the other hand, books on biochemistry, molecular modeling or molecular simulation introduce these interactions in the context of the corresponding topic, which sometimes results in superficial explanations of their nature. This book succeeds in uniting comprehensive considerations of non-covalent interactions with the specificity of their application in protein sciences. The ideal aid for students of physics or chemistry, with interests in biology and biophysics, the book can also be useful for students of biology, biochemistry, or biomedicine who want to extend their knowledge of how protein properties are described at the molecular level.
P477 he ISBN 1-86094-707-7
Imperial College Press www.icpress.co.uk
9