Practical Protein Crystallography, Second Edition

PRACTICAL PROTEIN CRYSTALLOGRAPHY SECOND EDITION This Page Intentionally Left Blank PRACTICAL PROTEIN CRYSTALLOGRAP...

Author: Duncan E. McRee (Author)

59 downloads 1418 Views 25MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

PRACTICAL PROTEIN CRYSTALLOGRAPHY SECOND EDITION

This Page Intentionally Left Blank

PRACTICAL PROTEIN CRYSTALLOGRAPHY SECONDEDITION

DUNCAN E. McREE Department of Molecular Biology The Scripps Research Institute La Jolla, California

With contributionsby PeterR. David Department of Structural Biology Stanford University Medical Center Stanford, California

ACADEMIC PRESS An Imprint of Elsevier San Diego London Boston New York Sydney Tokyo Toronto

This book is printed on acid-free paper. ( ~ Copyright 9 1999, 1993 by ACADEMIC PRESS All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval sy~em, without permission in writing from the publisher. Permissions may be sought directly from Elsevier's Science and Technology Rights Department in Oxford, UK. Phone: (44) 1865 843830, Fax: (44) 1865 853333, e-mail: [email protected]. You may also complete your request on-line via the Elsevier homepage: http://www.elsevier.com by selecting "Customer Support" and then "Obtaining Permissions".

Academic Press An Imprint of Elsevier 525 B Street, Suite 1900, San Diego, California 92101-4495, USA http://www.apnet.com

Academic Press 24-28 Oval Road, London NW1 7DX, UK http ://www.hbuk. co. uk/ap/ Library of Congress Catalog Card Number: 99-61577 ISBN-13:978-0-12-486052-0 ISBN-10:0-12-486052-4 PRINTED IN THE UNITED STATES OF AMERICA 05 06 07 08 09 BB 9 8 7 6 5 4

3

CONTENTS ,

,

,

,

,

,

,

,

Foreword to the Second Edition Preface to the Second Edition Acknowledgments xix

o

9 o

9 o

1

o o e e

,

0

,

,

,

,

0

,

xiii XV

9

LABORATORYTECHNIQUES 1.1 PreparingProtein Samples

1

History and Purification Exchanging Buffers 2 Concentrating Samples Storage of Samples 4 Ultrapurification 5

1.2 ProteinCrystal Growth

1 3

6

Protein Solubility Grid Screen Initial Trials 9 Growth of X-Ray Quality Crystals

1.3 CrystalStorage and Handling 1.4 CrystalSoaking 18 1.5 AnaerobicCrystals 20 o o e

9o

2

o o

9o

13

17

9

DATA COLLECTION TECHNIQUES 2.1 PreparingCrystals for Data Collection Crystal-Mounting Supplies

24

23

vi

CONTENTS Mounting Crystals 24 Drying Crystals 29 Preventing Crystal Slippage

2.2 OpticalAlignment 2.3 X-Ray Sources

30

31 34

Nickel Foil Filtering 36 Filtering by Monochromators 36 Focusing with Mirrors 37 Increasing Brilliance with Mirrors 38

2.4 PreliminaryCharacterization

39

Precession Photography 40 Rotation Photography 44 Blind Region 50 White-Radiation Laue Photography Space Group Determination 53 Unit-Cell Determination 60 Evaluation of Crystal Quality 61

51

2.5 Heavy-AtomDerivative Scanning with Film 2.6 OverallData Collection Strategy

64

67

Unique Data 67 Bijvoet Data 69 Indexing of Data 70

2.7 Overviewof Older Film Techniques

73

2.8 Four-CircleDiffractometerData Collection 2.9 Area Detector Data Collection

76

Increasing Signal-to-Noise Ratio

2.10 ImagePlate Data Collection

81

82

2.11 SynchrotronRadiation Light Sources

84

Differences from Standard Sources 84 Special Synchrotron Techniques 85 Time-Resolved Data Collection 86

74

CONTENTS

2.12 Data Reduction

vii

87

87 Integration of Intensity Error Estimation 88 Polarization Correction 89 Lorentz Correction 89 Decay or Radiation Damage Absorption 90

89

..... 3..e.o

COMPUTATIONALTECHNIQUES 3.1 Terminology 93 Reflection 93 Resolution 95 Coordinate Systems 96 R-Factor 98 Space Groups and Symmetries 99 Matrices for Rotations and Translation B-Value 100 Anisotropic B-Values 101

3.2 Basic ComputerTechniques

100

102

File Systems 102 Portability Considerations 103 Setting Up Your Environment 104 File Formats and mmCIF 104 CCP4 Crystallographic Programs 111

3.3 Data Reduction and StatisticalAnalysis

115

Evaluation of Data 115 Filtering of Data 116 Merging and Scaling Data 117 Heavy-Atom Statistics 120

3.4 The PattersonSynthesis

124

Patterson Symmetry 125 Calculating Pattersons 125 Harker Sections 127 Solving Heavy-Atom Difference Pattersons

128

viii

CONTENTS

3.5 FourierTechniques

13 7

Types of Fourier 137 Solving Heavy Atoms with Fouriers

3.6 IsomorphousReplacementPhasing

145

145

Heavy-Atom Refinement 145 Isomorphous Phasing 146 Heavy-Atom Phasing Statistics 153 154 Including Anomalous Scattering Fine-Tuning of Derivatives 161

3.7 MolecularReplacement

162

Rotation Methods Translation Methods

163 168

3.8 NoncrystallographicSymmetry

171

Self-Rotation Function

3.9 DensityModification

175

176

Solvent Flattening 176 Histogram Modification

3.10 MultiwavelengthData with Anomalous Scattering

178

182

Choice of Wavelengths 185 Collection of Data 188 Location of Anomalous Scatterers Phasing of Data 189

3.11 Refinementof Coordinates

193

Available Software 193 Rigid-Body Refinement 194 R-Factor or Correlation Search for Rigid Groups 194 Protein Refinement 195 Evaluating Errors 197 Very-High-Resolution Refinement with SHELX-97 202

188

CONTENTS

ix

Block Diagonal Calculations SHELX Refinement Strategy

3.12 Fittingof Maps

213 215

217

Calculating Electron Density Maps 217 Evaluating Map Quality 223 Fitting and Stereochemistry 225 Chain Tracing 232 General Fitting 243 Phase Bias 261 Adding Waters and Substrates 264

3.13 Analysisof Coordinates

265

Lattice Packing 265 Hydrogen Bonding 266 Solvent-Accessible Surfaces ..o..4

266

o...o

XtalView TUTORIALS 4.1 Installation 4.2 ObtainingHelp 4.3 XView

271 272

272 XView Widgets 273 The xtalmgr 274 Preparing Data 277 Merging Heavy-Atom Data 281 Patterson Solutions 282 Bijvoet Difference Pattersons 284 Difference Fouriers 284 Heavy-Atom Refinement and Phase Calculations 285 Absolute Configuration (Hand) 289 Enantiomorphic Space Groups 290 Exporting Data 290 Xfit 290

x

CONTENTS Atom Stack 292 Fitting with the Mouse 293 Additional Xfit Functions 295 Semiautomated Fitting 297

4.4 A Typical Manual Fitting Session with Xfit

303

Loading the Map 305 Contouring the Map 305 Combining Phases 307 Improving Phases 308 Saving Phases 310 310 Some Additional Xfit Features Fitting a Residue 312 Raster 3D 319 Model Window 320 SfCalc Window 322 Finding Geometry Errors 325 Editing Waters 325

4.5 Interfacingto Other Programs

326

XPLOR, TNT, PROLSQ, and Other Refinement Programs

9 9 9 9 9

5

o

328

9 9 9 9

PROTEIN CRYSTALLOGRAPHYCOOKBOOK 5.1 Multiple Isomorphous Replacement

329

Example 1: Patterson from Endonuclease III 331 Example 2: Single-Site Patterson from Photoactive Yellow Protein 334 Example 3: Two-Site Patterson from Photoactive Yellow Protein 33 7 Example 4: Complete Solution of Chromatium vinosum Cytochrome c' 339

5.2 Mutant Studies

366

Example 1: MKT D235E Mutant 370 Example 2: SOD C6A Mutant 371

CONTENTS

xi

5.3 Substrate-AnalogExample

380

Example: Isocitrate-Aconitase

5.4 MolecularReplacement

380

384

Example: Yeast Copper-Zinc Superoxide Dismutase 385

5.5 MultiwavelengthAnomalous Dispersion (MAD) Phasing 9 388 Example: CUA Subunit from T. thermophilus Cytochrome c Oxidase 388 XAS Scan 389 Data Collection 389 Patterson Maps 392 .....

6 oo.e.

CRYOCRYSTALLOGRAPHY:BASICTHEORYAND METHODS Peter R. David

6.1 Overview 6.2 Theory

410 411

6.3 Room Temperatureversus Low-Temperature Crystallography 413 6.4 CryogenicSafety 6.5 Equipment

415

416

Crystal Supports Making Loops

418 420

6.6 CrystalLoop Mounting Techniques The Use of Platinum Wire

423 424

Attaching Pins to Goniometer Heads

6.7 CrystalStorage Overview Lists of Materials

427 428

425

xii

CONTENTS Additional Practical Considerations 429 Manipulating and Storing Frozen Crystals Storing Crystals for Later Use 430

6.8 CrystalRemoval and Storage

432

434 Freezing Away from the Cold Stream Selecting a Cryoprotectant 435 Testing Cryoprotectant Solutions 435 Setup 436 Method 437 Rationale 438 Cryoprotectant Optimization 440

6.9 CrystalHandling

441

Osmotic Effects

441

APPENDIX A Crystallographic Equations in ComputerCode 445 APPENDIX B Useful Web Sites 455 Practical Protein Crystallography II Web Site 455 Software Web Sites 455 Databases 459 Synchrotrons 459 Useful Information 460 Heavy-Atom Information 460 Crystallization 461 461 X-Ray Anomalous Scattering 461 X-Ray Equipment Vendors 461 Crystallographic Associations Index

463

429

FOREWORD TO THE SECOND EDITION

The five years since the first edition of this remarkably useful book have been marked by a number of significant advances in protein crystallography. Synchroton radiation and cryocrystallography now routinely give data of much higher resolution and superior quality than the average datasets available in the past. The use of multiple wavelength anomalous dispersion and improved methods of locating heavy atoms has made high-quality experimentally determined phases much more common. Serious progress has been made on the direct phasing problem, to the extent that structures containing some metal atoms and data to better than 1.2 A resolution have a realistic chance of being solved with one dataset. Triclinic lysozyme, which contains 1000 atoms (all light atoms), has been solved ab initio by two methods. The increasing occurrence of atomic resolution diffraction data for protein molecules is changing the methods used for refinement. It is now becoming evident that the mathematical and physical model that describes the diffracting contents of a protein crystal is essentially the same as that which has stood the test of time for small molecules. The focus of this book, and one of the keys to its popularity, is the word "Practical" in the title. Duncan McRee has an excellent ability to strip a problem to its essentials and then cast those essentials into computer programs that can be used by people who are not expertsmwithout sacrificing the power and features required by people who are experts. His program XtalView, introduced with the first edition of this book, has rendered the arcane comprehensible. It has been demonstrated time and again that it is not sufficient for computer programs to be comprehensive and correct. If the ~ Xlll

xiv

FOREWORDTO THESECONDEDITION

task is complex, the user interface must be well designed or the program becomes a barrier instead of an aid. XtalView provides a clear, interactive, modular interface to the complex tasks of macromolecular crystallography. It thus gets the apparatus out of the way of the science and becomes a powerful tool both for the beginner learning the techniques and for the expert seeking higher productivity. The technical advances in macromolecular crystallography, particularly the advent of higher and higher resolution, improved phasing methods, and improved methods for highlighting problem areas of a structure, have required supporting changes in the software. In most cases these make the underlying science more discernable. XtalView has been enhanced to support atomic resolution structures, improve the phasing methods to include MAD (multiple anomalous dispersion) phasing, and greatly improve the ease with which models of new structures can be built. XtalView's comprehensive analysis tools have also proven useful beyond the crystallographic community. The program is increasingly popular with molecular modelers. XtalView, which is described in this book, is distributed through the Computational Center for Macromolecular Structure (CCMS) at the San Diego Supercomputer Center. Readers can obtain copies of the program through the CCMS web page at http'//www.sdsc.edu/CCMS. Support services are available by e-mail at [email protected]. XtalView provides a modern user interface and runs on very inexpensive computer systems, including PC platforms running Linux. Because XtalView is free to academic users, the total cost of the crystallographic workstation is that of a personal computer with a reasonably powerful graphics card and a good-quality monitor. With XtalView, every postdoctoral and graduate student in a laboratory can have his or her own graphical workstation, with all of the corresponding increases in productivity. This point cannot be made too strongly. The productivity we are talking about here is the removal of barriers to the interactive exploration of ideas. I have great faith in the creativity of students. I consider it my great good fortune to be able to participate in the development and distribution of a tool that is capable of changing a whole field of science. Lynn F. Ten Eyck San Diego Supercomputer University of California, San Diego

PREFACETO THE SECOND EDITION

This book is a practical handbook for anyone who wants to solve a structure by protein crystallography. It should prove useful both to new protein crystallographers and to old hands. The topics covered in this book are well-tested, robust methods commonly used in our laboratory and in others. The well-informed crystallographer will note, however, that many techniques and methods are not mentioned or are mentioned very briefly. These have been omitted because of space, less common use, difficulty of application, the need for special expertise, or perhaps oversight on the author's part. The exclusion of a method should not be taken in any way as a disapproval; one book cannot possibly include everything. For the second edition, a new chapter on cryocrystallography, written by Peter R. David, has been added. Other topics that have been added or expanded are very high-resolution refinements, MAD phasing, a tutorial section on XtalView, and material on CCP4 software and mmCIF (macromolecular crystallographic information file), as well as minor changes throughout the book. When I wrote the first edition of this book in 1993, there were a few hundred entries in the Protein Databank; in 1999, the number is rapidly approaching ten thousand. I hope that the first edition had some small part in this explosion of protein structures solved by crystallography. Now we are looking forward to the day when the structures of most of the proteins in an entire genome will be solved. If you are reading this book, then you want to be part of this coming revolution in structural biology. XV

xvi

PREFACETO THESECONDEDITION

One exciting development is the dramatic drop in the cost of the computers needed for solving structures. An IBM PC running LINUX is more than adequate for all of the tasks of solving a protein structure and can be purchased for a minimal cost. Often an existing Windows computer can be converted to a dual boot Windows/LINUX machine and structures solved with freely available software. The book is divided into six chapters: (1) Laboratory Techniques, (2) Data Collection Techniques, (3) Computational Techniques, (4) XtalView Tutorial, (5) Protein Crystallography Cookbook, and (6)Cryocrystallography, by Peter David. There is no need to read the book in any order; chapters that use information from other chapters will reference them when necessary. In fact, the best use of the book may be to read the cookbook first and then refer to the other chapters as needed. Chapters 4 and 6 are new to the second edition, and there is new material in all of the chapters reflecting the advances since the first edition. In particular, freezing techniques, better detectors, and the wide availability of synchrotron beamlines have increased the resolution of the average protein structure considerably. To cover this new material on very high-resolution structure, refinement and analysis have been added. In Chapter 1, only cursory information on crystal growth is presented because several excellent texts on crystal growth already exist. A fair amount of attention is given to protein sample handling, because proper attention to the sample can often mean the difference between a successful protein and a failure. Proteins are delicate materials that demand special handling and are very difficult to purify in large quantities. Protein crystals are also very delicate and require special handling techniques different from those of small molecule crystals. Chapter 2 bridges the gap between the laboratory and the computer. Special emphasis is placed on the newer techniques using area detectors and synchrotron sources. Because of the high cost of these area detectors, the user will probably have access to only one. Experience has shown that the user will then come to regard the available detector as best and will defend it vehemently against all others. Emphasis is placed less on a specific system and more on general techniques relevant to all area detectors. Chapter 3 provides information about using computers and file systems and how theory translates into actual methods. Rules of thumb are provided throughout to serve as a guide. Like all rules, these are made to be broken and should not be taken too literally. The variety of software used by different groups is enormous and no book could hope to cover even a small portion of it. General information that should be applicable to most techniques is given. Although this book can never substitute for the individual manuals

PREFACETO THE SECONDEDITION

xvii

of each program, it does give guidelines that will allow the reader to make intelligent choices among the program options. To allow discussion of specifics, the XtalView system and CCP4 are used. Information on obtaining XtalView, CCP4, and other crystallographic software can be found at this book's web site at http://ppcII.scripps.edu. Chapter 4 is a tutorial for using XtalView to perform common crystallographic tasks and covers everything from making Patterson maps to automated map fitting. Chapter 5 contains examples drawn from the experience of the author and his colleagues, providing some examples of protein structures solved by various methods. These examples can be used as guides for the user's own projects and to give a feel for how to apply the varied methods. Real numbers are given as a basis for interpreting the user's own data. By following the examples in multiple isomorphous replacement, users can, with luck and perseverance, solve their own structures. For the second edition, new material on a MAD phasing example has been added. Chapter 6 covers the now standard method of collecting data on frozen crystals, including the apparatus needed and many practical pointers. Appendix A contains formulas commonly used in protein crystallography, but with a twist: the formulas are coded in both FORTRAN77 and C. These two languages easily account for 99% of all protein crystallographic software. This will be of great aid to users in writing their own software and in understanding other software. Also for those of us who understand a computer language better than we do math, this appendix explains the formulas. One goal of this book is to provide enough information for the computer neophyte to write a simple program to reformat or filter data. Unfortunately, because of the incredible variety of software available and the consequent large variety of file formats, this is a necessary skill. Appendix A provides information for writing programs that will continue to be useful with different operating systems and for other projects.

SuggestedReading Crystal Growth: McPherson, A. (1998). Crystallization of Biological Macromolecules. Cold SpringHarbor Laboratory Press, Cold SpringHarbor, New York. Crystallography: Stout, G. H., and Jensen, L. H. (1989). X-Ray Structure Determination: A Practical Guide, 2nd Ed. Wiley,New York. Ladd, M. S. B., and Palmer, R. A. (1985). Structure Determination by X-Ray Crystallography, 2nd Ed. Plenum,New York.

xviii

PREFACETO THE SECOND EDITION McKie, D., and McKie, C. (1986). Essentials of Crystallography. Blackwell Scientific, Oxford.

Protein Crystallography: Blundell, T. L., and Johnson, L. N. (1976). Protein Crystallography. Academic Press, San Diego. Wyckoff H., ed. (1985). Diffraction Methods for Biological Macromolecules, Methods in Enzymology, Vols. 114 and 115. Academic Press, San Diego. Carter, C. W., Jr., and Sweet, R. M., eds. (1985). Macromolecular Crystallography A and B, Methods in Enzymology, Vols. 276 and 277. Academic Press, San Diego. Drenth, J. (1994). Principles of Protein X-Ray Crystallography. Springer-Verlag, New York.

ACKNOWLEDGMENTS

My thanks go to a number of people for their assistance in preparing the manuscript for this book. In particular, I thank Emelyn Eldredge for giving me the opportunity and impetus to do a second edition. I thank David Stout, Yolaine Stout, Michele McTigue, and Pamela Williams for critically reading the manuscript and making many helpful suggestions. Any errors in the book are, of course, my responsibility. I also thank the large number of patient people in the Structural Biology Group at the Scripps Research Institute who have beta-tested XtalView for me. I hope this book will answer some of their questions. In particular, I thank David Goodin, John Tainer, Elizabeth Getzoff, Ian Wilson, Jack Johnson, Ethan Merritt, Chris Bruns, Cliff Mol, Jean-Luc Pellequer, Marieke Thayer, Yi Cao, Sheri Wilcox, Rabi Musah, Gerard Jensen, Melissa Fitzgerald, G. Sridhar Prasad, Brian Crane, Andy Arvai, John Irwin, Alex Shah, Gary Gippert, Arno Pahler, Nicole Kresge, Jacek Nowakowski, Robin Rosenfeld, John Blankenship, Nathalie Jourdan, Ward Smith, Paul Swepston, Phil Bourne, John Badger, and John Rose for their numerous comments and suggestions over the years. I thank Mark Israel and the folks at CCMS for several years of dedicated support of XtalView users while always remaining cheerful. I especially thank Mike Pique for keeping me up to date on programming and computer graphics. I have worked with many people over the years who have provided the intellectual stimulus, knowledge, and help that made this book possible. David Richardson was my Ph.D. advisor and started me on the right path. Jane Richardson has been a major inspiration over the years. Wayne Hendrickson took me under his wing and taught me much about anomalous xix

xx

ACKNOWLEDGMENTS

scattering, phasing, refinement, scaling, and critical thinking. Bi-Cheng Wang and Bill Furey taught me all about phase modification; Fred Brooks taught me the virtues of a good user interface and the true powers of computers. John Tainer and Elizabeth Getzoff completed my education in protein crystallography while I worked as a postdoctorate in their lab. Lynn Ten Eyck taught me all about the mathematics of crystallography and provided many good ideas. George Sheldrick has been an inspiration to keep programming despite the academic consequences, and his encouragement has been invaluable. XtalView distribution through the Center for Macromolecular Structure (www.sdsc.edu/CCMS) at the San Diego Supercomputer Center is funded by Grants BIR 93-31436 and 96-16114 from the National Science Foundation. Finally, I thank my wife, Janice Yuwiler, for her dedication and support to her sometimes trying husband, and her father, Art Yuwiler, for always believing in me. And last, but not least, I want to thank my three children, Alisa, Kevin, and Alex (one more than the first edition!) for putting up with me while I spent so many evenings late at work. Peter R. David, the author of Chapter 6, thanks Duncan McRee, E1speth Garman for critically reading the chapter, Roger Kornberg and Michael Levitt for their support and encouragement on difficult problems, Kerstin Leuther for many thingsmespecially for asking w h y m a n d finally, Mike Blum for starting him out on crystallography.

LABORATORY TECHNIQUES

.....

1.1

.....

PREPARING PROTEIN SAMPLES A protein sample must be properly prepared before it can be used in a crystal growth experiment. There are many ways that this can be done to accomplish the same goal: to put the protein at a high concentration in a defined buffer solution. Methods of preparing a protein sample that have worked well in our laboratory are outlined here, but if you know of a quicker, easier method then by all means use it.

History and Purification Since it is not uncommon for one batch of protein to crystallize while the next will not, it is vital to keep a history of each sample and to track each batch separately. Your records may provide the only clue to the differences between samples that produce good crystals and samples that are unusable. For example, there have been several cases where the presence of a trace metal is needed for crystallization. The most famous case is insulin. It seemed that the only insulin that would crystallize was that purified from material collected in a galvanized bucket. It was eventually discovered that zinc was required, and later it was added directly to the crystallization mix. In our lab

2

LABORATORYTECHNIQUES

any sample received is logged into a notebook with a copy of any letters or material sent with the sample. Samples should be shipped to you on dry ice and kept frozen at - 70 ~C until they are ready for use. Ask the person sending the sample to aliquot the protein into several tubes and to quick-freeze each one. This way you can thaw one aliquot at a time without having to repeatedly freeze-thaw the entire sample, which can damage many proteins. Keep a small portion of the sample apart and save it for future comparison with samples that do not crystallize or crystallize differently. Always keep protein samples at 4~ in an ice bucket to prevent denaturation and to retard bacterial growth. Perform all sample manipulations at 4~ either in a cold room or on ice. When the samples are finally set up they can be brought to room temperature. Proteins are usually stabilized by the presence of the precipitants used in crystallization, and agents are added to retard microbial and fungal growth. A common antimicrobial agent is 0.02% sodium azide. Other broad-spectrum cocktails sold for use with tissue culture are quite effective.

Exchanging Buffers If the desired buffer of the sample is not already known, the sample should be placed in a weak buffer near neutrality. A good choice is 50 mM Tris-HC1 at pH 7.5 with 0.02% sodium azide. Some proteins will not be stable at low ionic strength, so a small portion should be tried first to see if a precipitate forms. Be sure to wait several hours before deciding if the sample is stable. Observe the sample in a clear glass vial and hold it near a bright light to detect any cloudiness in the sample. There are two methods for exchanging the buffer solution of the protein sample: dialysis and use of a desalting column. The desalting column is the fastest method, and if disposable desalting columns such as a PD-10 column from Pharmacia-LKB is used, it is very convenient. A single pass through the column will remove 8 5 - 9 0 % of the original salt in the sample and, if this is not enough, two passes can be used. Every time the sample is passed over the column it will be diluted about 50%. Unless the sample is very concentrated to begin with, it may be necessary to concentrate the sample after desalting. Dialysis on small volumes is best carried out in finger-shaped dialysis membrane or with a Microcon (Fig. 1.1). Always soak the membrane first in the buffer to remove the storage solution. A minimum of two changes of dialysis buffer 12 h apart is recommended. The dialysis buffer should be stirred and should be at least 100 times the sample volume. Use membranes with a molecular weight cutoff less than half the sample molecular weight;

3

1.1 PreparingProtein Samples

Dialysis m e m b r a n e

/ Flask

solution

.

FIG. 1.1 Dialyzingsamples to exchange buffers. otherwise you risk losing a substantial portion of the sample. When removing the sample, wash the inside of the membrane with a small amount of buffer to recover the sample completely.

ConcentratingSamples First measure the starting concentration of the sample. The simplest way is to measure the absorbance of a 50-times dilution of the sample at 280-nm wavelength and assume the concentration is simply the absorbance times 50. While this method is not very accurate, it is reproducible, and the sample should be pure enough to warrant the assumption that all absorption is due to the protein. 1 Always use buffer as a blank and check the buffer versus distilled water to be sure it does not have significant absorption. Some buffers have a significant amount of absorption at 280 nm, which can greatly reduce the accuracy of different absorbance measurements. The diluted absorbance must be below 1.0 or the measurement will be inaccurate. Concentrate the sample to 1For further information on determining protein concentration, see Scopes, Robert K., and Canter, Charles R., eds. (1994). Protein Purification, Principles and Practice, 3rd Ed. Springer Verlag, New York.

4

LABORATORYTECHNIQUES

1 0 - 2 0 mg/ml. If you have enough sample, it is better to concentrate to 30 mg/ml, wash the concentrator with one-half the sample volume, and then add the wash to the sample to make a final concentration of 20 mg/ml. A Centricon is one of the best ways to concentrate the sample. An Amicon will also work well. Another method is to dialyse against polyethylene glycol 20,000 (PEG-20K) using a finger-shaped dialysis membrane. The advantages of this method are that it can be combined with dialysis and that the same membrane used to dialyze the sample can be transferred directly to the PEG-20K for concentrating. The dialysis tube can be put directly onto solid PEG-20K. The water in the sample will be quickly removed, so check the sample often. However, be aware that PEG is often contaminated with salts and/or metals and this may or may not be desirable. In many cases, though, such contamination has actually contributed to crystallization. You may want to keep a sample of the particular batch of PEG you use. If, in the future, a new batch causes problems, you can analyze the differences. Another method often used is precipitation with ammonium sulfate. If an ammonium sulfate step has already been used in the purification procedure, this may be an easy way to achieve a high concentration. You will want to use a high level of ammonium sulfate to ensure that the entire sample is precipitated. The ammonium sulfate should be added slowly to the solution while kept cold. Let the solution sit for at least 30 min after all ammonium sulfate has dissolved. Spin down the precipitate and remove the supernatant. The pellet can be redissolved in a small amount of buffer. However, since the pellet will contain some salt, a dialysis step will be needed before the protein is ready. It is not uncommon for a protein to precipitate at high concentrations. If this happens while you are concentrating, add buffer back slowly until all the sample has dissolved. Raising the level of salt by using a more concentrated buffer or by adding sodium chloride can often help stabilize protein solutions. If a precipitate forms, examine it carefully to make sure it is not crystalline. Amorphous precipitates are cloudy and have a matte appearance. Crystalline precipitants are often shiny and if from a colored protein are brightly colored with little cloudy appearance. Two proteins have been crystallized in our lab accidentally during concentration. One was found in an ammonium sulfate precipitation step and the other during concentration on an Amicon to lower the salt concentration.

Storage of Samples The entire sample will not be used all at once, and the remaining protein solution should be aliquoted and stored frozen at - 7 0 ~C. Divide the sample into 100- to 200-/~1 aliquots in freezer-proof tubes (not glass, which

1.1 PreparingProteinSamples

5

FIG. 1.2 Storageof samples. The procedure involves (1) aliquoting samples into several tubes, then (2) quick-freezingeach sample in an acetone-dry ice bath and storing at - 70~C. becomes brittle and shatters at low temperatures) and quick-freeze the tubes in an acetone-dry ice or liquid nitrogen bath (Fig. 1.2). Label each tube with the date, a code to identify the sample and the particular batch of the sample, and your initials. Cover the label with transparent tape to prevent the ink from rubbing off when you handle the frozen tubes later. Place the tube in a cardboard box and store in the freezer. It will harm protein samples to be freeze-thawed; although often they may withstand several cycles of freezethawing, it is best not to find out the hard way. Thaw the samples in an ice bucket or the cold room when they are to be used. If some sample is left over and it will be used the next day, it can usually be stored at 4~ overnight.

Ultrapurification While it is beyond the scope of this handbook to cover purification techniques, the crystallographer has one special technique that is usually not tried by others to further improve the sample: recrystallization (Fig. 1.3). We will assume that you have succeeded in finding conditions that will grow small crystals but are having trouble growing larger ones. It may be worthwhile to recrystallize the sample to improve the purification. A large sample

6

LABORATORYTECHNIQUES

FIG. 1.3 Redissolvingcrystals. of the protein can be set up in the crystallization mixture and seeded with a crushed crystal. After crystals have grown, you may wish to add slightly more salt to push more protein into the crystalline state. Gently centrifuge down the crystals, or allow them to settle by gravity, and remove the supernatant. Resuspend the crystals in a mother liquor higher in precipitant by about 10% to avoid redissolving and to wash them, and then remove the supernatant. Resuspend the crystals in distilled water to dissolve them. If you have a large amount of precipitate present with the crystals, this method will not remove the precipitate unless it settles the crystals. In these cases, the crystals can be resuspended in 2 ml of artificial mother liquor in a petri dish, then picked up individual crystals with a capillary and manually separated them from the precipitate. For the crystals to redissolve well, they should be freshly grown. Old crystals that still diffract well often will not redissolve even in distilled water because the surface of the crystal has become cross-linked. This is especially true of crystals grown from polyethylene glycol.

.....

1.2 . . . . .

P R O T E I N CRYSTAL G R O W T H

Several excellent texts have been published on methods for growing protein crystals (see Suggested Reading in the preface) and I will not repeat

1.2 ProteinCrystalGrowth

7

this material here except briefly, to add some of our own experience. Like fine wine, protein crystals are best grown in a temperature-controlled environment. M o s t cold rooms have a defrost cycle that makes them especially poor places to grow crystals. Investing in an air conditioner for a small r o o m to keep it a few degrees colder than the rest of the laboratory is the best way to keep a large area at a constant temperature for crystal growth. To ensure that the r o o m is tightly regulated, get a unit with a capacity larger than needed. Another alternative is to use a temperature-controlled incubator. However, a r o o m is best because you will need to examine your setups periodically at a microscope. In a r o o m everything can be kept at the same temperature. Invest in a good dissecting stereomicroscope and remove the lightbulb in the base. Substitute a fiber-optic light source so that the base does not heat up and dissolve your crystals as you observe them. Even with the fiber-optic source be careful not to put the setup down near the fiber-optic light source, which gets hot during operation. Have a Plexiglas base built over the dissecting scope base (Fig. 1.4) to provide a large surface on which to place setups so that they do not fall off the edges during examination. This will also provide a base to steady your hands during delicate mounting procedures.

Fiber-optic light source

Plexiglas cover

Z

FIG. 1.4 Modifieddissecting scope. A Plexiglas base is put over the scope to make a larger area, providing a place to rest your hands during mounting operations and prevents tipping hangingdrop plates over the edge. A fiber-optic light source is used instead of the built-in light to prevent the base from heating and damaging crystals.

8

LABORATORYTECHNIQUES

Protein Solubility Grid Screen Before embarking on trials of a protein, we routinely screen it for solubility with a grid screen invented by Enrico Stura. 2 The grid consists of a number of common precipitates and a wide range of precipitant concentrations as well as a wide range of pH values (Fig. 1.5). To make the grid screen, we make up a 100 ml of stock solution of the highest concentration of precipitant in row D in the buffer listed and store the solutions in lighttight bottles (wrap with aluminum foil or keep in a dark place). Make up a 4 • 6 Linbro plate 3 with 1 ml in each well of the solutions as shown in Fig 1.5, diluting the stock solution with buffer as appropriate toward the top of the grid. The top edges of the grid are liberally coated with vacuum grease to make a seal with a 22-mm siliconized circular glass coverslip to be added in the next step. Make a hanging drop of protein solution over each well by mixing 5 #1 of protein solution ( 1 0 - 2 0 mg/ml) with 5/~1 of well solution in the center of a coverslip and then quickly invert the coverslip with forceps and place the coverslip over the well and press into the grease seal. Make sure the slip is sealed completely with grease by looking for air gaps in the seal. After preparing all 24 wells, place the tray in a dark place with a constant temperature. We use large incubators 4 set at 17-22 ~ C for crystal trials in our lab, although a well-air-conditioned room can also be used. Check the plates for precipitation soon after setting up and the next day, by observing the drop on the coverslip with a dissecting microscope. What you are looking for as you scan down a row are rows where relatively clear drops turn cloudy with precipitate as you go up the row. The midpoint of this transition is where you want to start crystal trials. You want to avoid precipitants where every drop is fully precipitated--these precipitants may specifically interact with the protein. You also want to avoid precipitants that never precipitate. If you are lucky you may find a well with crystals. Armed with this solubility information, you can make intelligent choices of precipitant concentrations to start with. Unstable proteins will form precipitant in every condition regardless of precipitant or concentration. In this case, you need to find conditions to stabilize the protein for at least 24 h or you stand little chance of growing crystals. Possible stabilizers are lower-temperature metal ions, cofactors, non-ionic detergents, glycerol, and ligands. If a number 2Stura, E. A., Satterthwait, A. C., Cairo, J. C., Kaslow, D. C., and Wilson, I. A. (1994). Reverse screening. Acta Crystallogr. D50, 4 4 8 - 4 5 5 . 3These plates and other crystallization supplies as well as excellent material on crystallization can be obtained from Hampton Research, Irvine, California, http://www.hampton research.com. Another source is Emerald Biosciences. 4These incubators need to be of the type that can both cool and heat to keep a constant temperature so close to room temperature. Heat-only incubators designed for bacterial growth at 37 ~ C are not suitable.

9

1.2 ProteinCrystalGrowth Precipitant

Buffer

1

2

3

4

PEG 600

PEG 4K

PEG 10K

(NH4) 2 SO 4 PO 4

Citrate

15%

10%

7.5%

0.75M

0.8M

0.75M

24%

15%

12.5%

1.0M

1.32M

1.0M

33%

20%

17.5%

1.5M

1.6M

1.2M

42%

25%

22.5%

2.0M

2.0M

1.5M

0.2M imidazole malate pH 5.5

0.2M imidazole malate pH 7.0

0.2M imidazole malate pH 8.5

0.15M sodium citrate pH 5.5

Nail 2 PO 4 K2HPO4 pH 7.0

FIG 1.5

5

6

10mM

sodium borate, pH 8.5

Enrico Stura's grid screen.

of possible variants of a protein are candidates for crystal trials, such as different species or mutant constructs, the grid screen can be used to select the best candidates for further trials. Look for proteins that exhibit sharp transitions from clear to cloudy drops. We have used the g r i d t o screen for solubility of a membrane-attached protein that was engineered to be soluble by screening various constructs to test hypotheses about how the protein was attached to the membrane. In our initial attempts all the drops were cloudy, and after many rounds of mutagenesis we found a mutant that showed a clear transition from soluble to precipitate in high salts. Prior constructs precipitated within 24 h in all the conditions and never produced crystals. The soluble mutant eventually produced large crystals that we subsequently used to solve the structure.

Initial Trials In all aspects of protein crystallography except initial crystal trials, the more past experience you have the better. Beginner's luck is definitely a factor in finding conditions for crystallizing a protein the first time. This is partly because beginners are more willing to try new conditions and will often do naive things to the sample, thus finding novel conditions for crystal growth. This is also because no one can predict the proper conditions for crystallizing a new protein. There are conditions that are more successful than others, but to use these exclusively means that you will never grow crystals of proteins that are not amenable to these conditions. So fiddle away to your heart's content. What is needed is to observe carefully what does happen to your

10

LABORATORYTECHNIQUES

sample under different conditions and to note carefully the results. The least experienced part-time student can outperform the most expensive crystallization robot because he or she has far more powerful sensing faculties and reasoning abilities. Leave the "shotgun" setups to the robots. Having said all this, I present in Table 1.1 a recipe to use for initial trials. The most commonly used methods for initial crystal trials are the hanging-drop and sitting-drop (Fig. 1.6) vapor-diffusion methods. The batch method can actually save much protein if done properly. In the hanging-drop method many different drops are set up. Most of these will never crystallize. It is hoped that just the right conditions will be hit upon in a few of the drops. A method that I have used successfully for many years is to place a small amount of protein in a 1/4-dram shell vial with a tightly fitting lid (caplug). Small aliquots of precipitant are added slowly. After each addition, the shell vial is tapped to mix the samples, then held up to a bright light. When the protein reaches its precipitation point, it will start to scatter light as the proteins form large aggregates that cause a faint "opalescence." Slowly make additions to the sample, waiting several minutes before each addition to avoid overshooting the correct conditions. If two vials are used, they can be leapfrogged so that when one reaches saturation, the other will be just below. For example, set up the first vial with 20 #I of protein plus 2 #1 of precipitant and the second with 20 btl of protein plus 4/_tl of precipitant. Then add 4/zl to the first and then 4/.tl to the second, and so forth, so that one of the vials is always 2/.tl ahead of the other. When opalescence is achieved in one vial, put both away overnight to be observed the next day. If the precipitation point is overshot, a small amount of water may be added to clear the precipitate

TABLE 1.1 Conditions for Initial Trials

Precipitant

Concentration range

Additives

Polyethylene glycol 4000

10-40% w/v

0.1 M Tris, pH 7.5

Polyethylene glycol 8000

10-30% w/v

0.2 M Ammonium acetate

Ammonium sulfate, pH 7.0

5 0 - 8 0 % saturation

Ammonium sulfate, pH 5.5

5 0 - 8 0 % saturation

Potassium phosphate, pH 7.5

0.5- 2.5 M

2-Methyl-2,4-pentanediol Low ionic strength sodium citrate

15-60% Dialysis" 0.5-2.5 M

50-200 mM Potassium phosphate, pH 7.8

FIG. 1.6 Crystal setup using ACA plates. A cross section through a single well is shown. (A) The lips of the wells are greased where the coverslips will later be placed. High-vacuum grease can be used alone or mixed with about 20% silicone oil. The addition of oil makes the grease less viscous so that it flows more easily. (B) The lower coverslip is pressed into place. Make sure there are no gaps in the grease for air to leak through. (C) Place the reservoir solution in the well. (D) Put the protein solution onto the lower coverslip. (E) Carefully layer the precipitant (often some of the reservoir solution) onto the protein. (F) Mix the two layers together quickly by drawing up and down with an Eppendorf pipette. (G) Put on the upper coverslip to seal the well completely. Again check that there are no gaps in the grease. Wait several hours to several weeks for crystals to appear.

12

LABORATORYTECHNIQUES

and the precipitation point can be approached again more slowly. This method is less wasteful because it allows a finer searching of conditions in just two shell vials, substituting for the large number of hanging drops needed to do as fine a scan. It also encourages more careful observation of the samples. Finally some proteins do not fare well during the evaporation that occurs in hanging drops. It is impossible and impractical to systematically scan every possible precipitant that has been used for growing protein crystals. Therefore, another approach is an incomplete factorial experiment, s A small subset of all possible conditions is scanned in a limited number of experiments by combining a subset of solutions. These drops are scanned for crystals or promising precipitates. If anything is found, then a finer scan can be done to find better growth conditions. A particularly successful version of this method was developed by Jancarik and Kim6 and has been optimized to 50 conditions combining a large number of precipitants and conditions. A kit is available from Hampton Research that contains all 50 solutions premixed, so all one has to do is set up 3-5/.tl of protein sample with each of the solutions. This method recommends that you first dialyze the protein against distilled water to allow better control over pH and other conditions. Try this on a small sample first. Many proteins will not tolerate distilled water and will precipitate (or sometimes crystallize). Use as low a concentration of buffer as you can. Phosphate buffers will give phosphate crystals in several of the drops that contain divalent cations. We have used this method with some success. While you may not get usable crystals on the first trial, you may get some good leads. Some of the drops may stay clear for a couple of weeks. You can raise the precipitant concentration in the drop by adding saturated ammonium sulfate to the reservoir (but not to the drop). This will cause the drop to dry up somewhat. More ammonium sulfate can be added until the drop either precipitates or crystallizes. Another crystallization method not often tried approaches the crystallization point from the other end by first precipitating the protein and then slowly adding water until the critical point is reached. Often microcrystals are formed when the protein is precipitated. As the precipitant is lowered, protein is redissolved and crystals large enough to see may grow out of the precipitate using these microcrystals as growth centers. Also, if the excess solution is removed from the precipitant, the result is a high concentration of the protein, which may force crystals. This can be done on a micro basis using a variation of the hanging drop. For example, to use this method with SCarter, C. W., Jr., and Carter, C. W. (1979). Protein crystallization using incomplete factorial experiments.J. Biol. Chem. 254, 12,219-12,223. 6Jancarik, J., and Kim, S.-H. (1991). Sparse matrix sampling: A screening method for crystallization of proteins. J. Appl. Crystallogr. 24,409-411.

1.2 ProteinCrystalGrowth

13

ammonium sulfate, mix the drop with 10/.tl of protein sample and 3 ~1 of saturated ammonium sulfate and set this over a reservoir of saturated ammonium sulfate. The drop should dry slowly and the protein precipitate, which will give a final concentration about three times higher than at the start. Every few hours add some water to the well to lower the ammonium sulfate concentration. Keep careful track of the amount added. If the drop starts to clear, slow the addition down to once a day and add water very slowly. I have grown a large number of crystals using this method. Although they are rarely suitable for diffraction, they can be used as seeds to grow better crystals. This method allows searching a large number of conditions with a small amount of sample. Never give up on a setup unless it is completely dried; it may take several months for crystals to appear. Proteins that are not stable in buffer are often stabilized by high precipitant concentrations. Also, the presence of precipitant in a setup does not preclude crystallization. Often a crystal will grow from the precipitant. Nucleation is a rare event and may require a very long time to occur if you are near, but not right on, the correct crystallization conditions.

Growth of X-Ray Quality Crystals The elation that you experience following the appearance of the first crystals of a new protein can be short-lived. It is often discovered that the first crystals grown are of insufficient quality to use for data collection. A long series of experiments may be needed before large, single crystals can be obtained. The first step to try is a very fine scan of conditions nearest those used initially to find the optimal conditions for growing only a few large crystals in a single setup. Vary the precipitant concentration, the protein concentration, the pH, and the temperature. You may also want to try varying the buffer used and its concentration. Using different types of setup will vary the equilibration rate, which can often lead to improved growth. What are needed are conditions where nucleation is rare and crystal growth is not too rapid. Do not look at your setups too frequently~once a day is e n o u g h ~ since disturbing them can result in the formation of extra nuclei. Leave finescan plates alone for a week before disturbing. Since nucleation is a stochastic process, preparing a large number of identical setups will often yield a few drops that produce nice crystals by chance. This is most useful only if you have sample to waste. If nucleation is unreliable, then seeding is often the answer. Two methods of seeding are used: microseeding and macroseeding. In microseeding small seeds obtained by crushing or those usually present in a large number

14

LABORATORYTECHNIQUES

in old setups are introduced into a fresh drop of preequilibrated protein. Seeds will usually grow in conditions where nucleation will not occur. An extreme example of this is photoactive yellow protein, where seeds will grow in ammonium sulfate solutions at 71% of saturation but nucleation will not occur at concentrations less than about 100%. The microseeds are diluted until only a few will be introduced. This usually requires serial dilutions and can be very difficult to control. Another method is to place a very small amount of solution from a drop in which a crystal has been crushed at one point of a fresh drop without any mixing. A mass of crystals will grow at this point but often a few seeds will diffuse to another part of the drop where a large crystal may develop. The author has found that 30-100 #1 sitting drops are good for this technique. The steps involved in microseeding are illustrated in Fig. 1.7A. The first step in microseeding is to establish the proper growth conditions. Drops with precipitating agent of increasing concentration are set up and preequilibrated overnight. Then a crystal is crushed with a needle so that the entire drop will fill with microscopic seeds. A whisker or eyelash glued to a rod is then dragged through the solution to pick up a small amount of the liquid containing microcrystals. The whisker is then streaked or dipped into the preequilibrated drops. After several hours or days, crystals should grow in the drops with sufficiently high precipitant concentration. To prevent unwanted nucleation, it is desirable to use the lowest concentration that will sustain crystal growth. When proper growth conditions are established, several drops are then preequilibrated to this concentration. A crystal is then crushed as before and a few microliters of the mother liquor in the drop is then pipetted into the first of a series of test tubes with stabilizing solution and mixed well. These are then serially diluted about 10- to 20-fold so that each successive tube contains fewer microcrystals. A few microliters of each tube is then put into the preequilibrated growth drops and after several days examined for growth (Fig. 1.7B). Each drop should contain progressively fewer crystals. The goal is to find a dilution that will provide just a few crystals per drop. If the microcrystals are stable enough, it may be possible to seed many drops from this same tube to grow many large crystals. In macroseeding a single seed is washed and placed in a fresh, preequilibrated drop (Fig. 1.8). The seeds need to be well washed in 2 ml of artificial mother liquor in a plastic petri dish. The dish is gently swirled to dilute any microseeds. The seed is then transferred with a minimum amount of solution to another dish with a precipitant concentration (found by experiment)in which crystals slowly dissolve. This produces a fresh growth surface on the seed and dissolves any microcrystals. The crystal is then transferred with a small amount of solution and placed in a fresh setup. Microseeds can be

1.2 ProteinCrystalGrowth

15

A. Find correct concentration

L

1. Crush crystal

2. Wet whisker increasing concenFation

FIG. 1.7 Microseeding techniques.

broken off by mechanical disturbances. Because protein crystals are soft and fragile, a gentle technique is necessary for this method to work. Let the crystal fall in to the fresh drop and settle of its own accord. Do not disturb the crystal after placing it in the growth drop. Often the less-dense dissolving solution will layer on top of the drop and mix only slowly, allowing any microseeds to dissolve. Several problems can be found with this technique

LABORATORYTECHNIQUES

16

1,.

3. I

s. i

7.

2.

----

@

1

4.

!

6.

,/,

8.

FIG. 1.8 Macroseeding method of growing larger crystals. First, two solutions are prepared in small petri dishes, a storage solution (usually a few percent higher in precipitant than the growth concentration) in which the crystals are stable for a long time and an etching solution in which the crystals will slowly dissolve over several minutes (usually a few percent in precipitant lower than the growth concentration). About 2 ml of each is needed, and the dishes should be kept covered to prevent evaporation. (1) Using a thin capillary, draw up several small but well-formed seed crystals. (2) Transfer these crystals into the petri dish with the storage solution. (3) Gently rock and swirl the storage solution petri dish to disperse the seeds throughout the dish. This dilutes microseeds and separates the crystals from each other and from any precipitate that might have been transferred with them. (4) Pick up a single seed from the storage solution and transfer it to the etching solution, bringing with it as little of the storage solution as possible. (5) While observing the crystal through a microscope, let it sit and occasionally rock the dish gently. (6) The corners of the crystal should start to round and the faces may etch, leaving scars and pits. (7) Pick up the crystal with as little solution as possible and gently transfer it to a fresh drop of protein preequilibrated to the growth conditions overnight. Often the crystal will fall out of the transfer capillary of its own accord so that no solution need be added to the drop. (8) Over several hours or days the crystal should grow larger.

1.3 CrystalStorageand Handling

17

and it will not work in every case. Sometimes the crystal may be so fragile that a trail of microcrystals is left in its wake. Often the seed will not grow uniformly; instead, spikes form on the seed surface. Other times the new growth will not align perfectly with the seed, causing a split diffraction pattern. In this case try using a smaller seed that will not contribute substantially to the overall scattering. Often the dislocation between the old and new crystal can be seen. It may be possible to expose only a region of the crystal away from the old seed. The seed should be freshly grown and well formed. If imperfect seeds are used, then you will only grow larger imperfect crystals. Often several generations of seeding will be needed to produce single crystals. Multiply twinned crystals can be crushed and fragments macroseeded until single crystals are obtained. Better crystals can sometimes be obtained by further purifying the protein sample to make it more homogeneous. Isoelectric focusing is especially useful. If you have a large amount of sample (>100 mg), then you can use preparative isoelectric focusing. Smaller amounts of sample (< 20 mg) can be chromatofocused on Pharmacia-LKB XX media. This method has proved successful in several cases. Be aware, though, that both these methods introduce amphylytes into the solution that can be difficult to remove. The last method for improving crystal quality, and often the best, is simply to look for more conducive conditions. For example, Chromatium vinosum cytochrome c' can be crystallized easily from ammonium sulfate, but these crystals are always so highly twinned that they are unusable, even for preliminary characterizations. By searching for new conditions, it was found that PEG-4K at pH 7.5 produces usable crystals that can be grown to large dimensions and will diffract to high resolution. This is very common for proteins. If they crystallize under one condition, chances are they will crystallize under another condition in a different space group and in a different habit. If there were only one condition out of all possible ones, I doubt that very many proteins would ever crystallize.

.....

1.3 . . . . .

CRYSTAL STORAGE A N D H A N D L I N G

Protein crystals can be stored for a few years and still diffract. Some precautions will help increase lifetime. Some proteins can be simply left in the drops in which they grew. Others, though, will grow small unaligned projections on their surfaces if kept in the original drop. These need to be transferred to an artificial mother liquor for storage. The artificial mother liquor must be found by experimentation. Usually, raising the precipitant a

18

LABORATORYTECHNIQUES

few percent is all that is needed. Do not use mother liquor with precipitant at the growth level. The protein in the crystal is in equilibrium with the protein in solution, and if mother liquor at the growth conditions without protein is substituted, the crystal will partially or wholly dissolve to reestablish equilibrium. Higher precipitant concentrations drive the equilibrium toward the crystal. For the same reason, do not store the crystal in a volume larger than necessary. Too high precipitant concentration will result in cracked crystals because the change in osmotic pressure will cause them to shrink. Change the reservoirs in vapor-equilibrium setups to prevent drying. Observe the crystal in the artificial mother liquor for several days before committing more crystals to it. Ideally, a crystal in a new artificial mother liquor should be examined by X-ray to confirm that no damage to the diffraction pattern has occurred. Keep the crystals in the dark. Light causes free radical chain reactions in the solution which will cross-link and eventually destroy the crystals. This is especially true of polyethylene glycol. Commercial PEG contains an antioxidant to retard polymerization caused by light that will slow, but not completely prevent, oxidation of PEG solutions. Solid PEG and PEG solutions must be stored in the dark at all times.

.....

1.4 . . . . .

CRYSTAL SOAKING

To solve the phase problem, the most common method used is multiple isomorphous replacement. In this method one or more heavy atoms are introduced into the structure with the most minimal change to the original structure that is possible. This gives phasing information by the pattern of intensity changes. A heavy atom must be used to produce changes large enough to be reliably measured. Only minimal changes, or isomorphism, are necessary because the primary assumption of the phasing equations is that the soaked crystal's diffraction pattern is equal to the unsoaked crystal's diffraction pattern plus the heavy atoms alone. For more details, see later chapters and the suggested readings. Heavy atoms or substrates that are to be introduced into protein crystals are usually soaked in an artificial mother liquor containing the reagent of choice. The compound is prepared in an artificial mother liquor solution at about 10 times the desired final concentration, and then one-tenth of the total volume is layered onto the drop containing the target protein crystal. Diffusion occurs within several hours to saturate the crystal completely. With some heavy atoms, secondary reactions often occur that can take several

1.4 CrystalSoaking

19

days. Heavy atom compounds are usually introduced at 0.1- to 1.0-mM concentration. A typical protein in a 10-~tl drop requires roughly micromolar concentrations for equimolar ratios. Many compounds will not dissolve well in the crystillization solution. In these cases it may be beneficial to place small crystals of the compound directly in the drop. Also, many heavy-atom compounds will take several hours to hydrate. If they do not completely dissolve at first, be patient. Gentle heating may speed dissolution. Soak time is more difficult to determine. If the soaking drop has several crystals, they can be mounted at different time intervals. Some heavy-atom reagents that are highly reactive may destroy the crystals and yet be useful if soaked for a short time. Other heavy-atom compounds undergo slow reactions that may produce a new compound that will bind. For instance, a platinum compound in an ammonium sulfate solution will eventually replace all of its ligands with ammonia. As a crystallographer you are not as concerned with exactly what binds to your protein as long as something heavy binds at a few sites in an isomorphous manner. A good way to check that something is binding is to place the crystal in a capillary so that when you invert the capillary the crystal will slowly settle. When heavy atoms bind to the protein, they will increase its density and cause it to settle faster. Similarly, if an artificial mother liquor can be sufficiently concentrated so that it has a slightly higher density than the protein crystals, the crystals will float. When a sufficient number of heavy atoms bind, the crystal will sink. This property can be used as a way to screen a large number of solutions quickly and has the advantage that small crystals can be observed underneath a microscope. The change in osmotic pressure of the increased density mother liquor may cause the unit cell of the crystal to shrink and, if so, any changes found in the diffraction pattern may be due to this effect rather than heavy-atom binding. In any case, it is a simple matter to resoak a fresh crystal in the usual mother liquor. What heavy atoms should you try? Table 1.2 presents a partial list of heavy-atom compounds in the order that I usually try them. (Whenever you meet another crystallographer, first you swap crystal-growing tales and then you always ask what heavy-atom compounds he or she has had particular success with.) Because most of the heavy-atom compounds are extremely toxic, extreme caution is in order. Some people experience respiratory distress and allergic reactions when exposed to these compounds. Therefore, always wear suitable (not latex) gloves and work i n a well-ventilated area. For reproducibility it is best to use fresh solutions; old solutions may oxidize and/or dismutate with time. Solutions must be kept in the dark and preferably under argon. The bottles of reagents themselves should be stored in a well-ventilated area with the caps sealed with Parafilm.

20

LABORATORYTECHNIQUES TABLE 1.2 Useful Heavy-AtomReagents and Conditions

Reagent

Conditions

Platinum tetrachloride Mercuric acetate Ethyl mercury thiosalicylate Iridium hexachloride

1 mM, 24 h

1 mM, 2-3 1 mM, 2-3 1 mM, 2-3 100 raM, 2-3

Gadolinium sulfate Samarium acetate Gold chloride Uranyl acetate Mercury chloride Ethyl mercury chloride

.....

days days days days

100 mM, 2-3 days 0.1 mM, 1-2 days 1 mM, 2-3 days 1 mM, 2-3 days 1 raM, 2-3 days

1.5

ANAEROBIC

..... CRYSTALS

Many proteins lose activity if exposed to air and so they must be grown anaerobically. In other cases the protein must be kept anaerobic in order to reduce it to its active conformation. The easiest method is to use an anaerobic hood. Solutions are passed in and out an airlock and crystals can be set up and handled with conventional techniques. For large-scale work this is by far the best method, but not all of us have access to an anaerobic hood. Another method uses a glove bag. This is a plastic bag with gloves that you can put your hands in to manipulate samples. Everything that you are going to use must be inside the bag before you seal it. This can present some logistical problems. A very simple anaerobic apparatus invented by Art Robbins consists simply of a capillary filled with degassed solution into which you float a crystal (Fig. 1.9). One end of the capillary is sealed by melting and the other is sealed with a layer of diffusion pump oil. Dithionite crystals can be dropped into the oil layer, through which they will float into the lower liquid. Any residual oxygen will be destroyed by the dithionite, and the oil layer prevents the entry of new oxygen. In our laboratory Cu-Zn superoxide dismutase crystals have been reduced in this manner and have stayed reduced for over

1.5 AnaerobicCrystals

21

FIG. 1.9 Simple anaerobic apparatus. Degassed mother liquor is placed in capillary and then a crystal is introduced. The crystal should be large enough so that it will wedge itself in the tapered portion of the capillary as it sinks. Mineral oil is then layered over the mother liquor to form a seal. The top few millimeters of the mother liquor, which have been exposed to air, can be drawn off with a capillary inserted through the oil layer. Solid reductant, such as dithionite, is then placed on top of the oil and allowed to sink through the oil into the mother liquor. Overnight the dithionite will diffuse to the crystal and reduce it. Excess oxygen is destroyed by the dithionite. The data can then be collected by mounting the capillary on a goniometer head as is normally done. Crystals reduced in this manner have remained oxygen-free for over a year.

a year. T h e d a t a are c o l l e c t e d by m o u n t i n g t h e c a p i l l a r y d i r e c t l y o n a g o n i o m e t e r h e a d . T h e size of t h e c a p i l l a r y in this case is c h o s e n so t h a t the c r y s t a l will w e d g e p a r t w a y d o w n . T h e e x t r a s o l v e n t d e c r e a s e s t h e d i f f r a c t i o n d u e to a b s o r p t i o n , b u t it w a s still p o s s i b l e t o get a 2 . 0 - A d a t a set u s i n g a l a r g e crystal.

This Page Intentionally Left Blank

......

2

......

DATA COLLECTION TECHNIQUES

.....

2.1 . . . . .

PREPARING CRYSTALS FOR DATA COLLECTION Protein crystals must be kept wet or they will disorder. Since solvent forms a large portion of the crystal lattice, a large change from the crystallization conditions will cause the crystals either to dehydrate and crack or to melt. For room temperature work, crystals are usually mounted in thinwalled glass capillaries. 1 The thin glass wall minimizes absorption of the scattered X-rays and also minimizes background from the glass. For protein work use the glass capillaries. Quartz capillaries are stronger, but the quartz scatters strongly around 3-~i resolution in a sharper band where the glass scattering is diffuse. Solvent contributes to background, which is always bad, and so as much solvent as possible must be removed without letting the crystal dry up. Alternately, you may want to use the newer cryocrystallography techniques covered in Chapter 6. However, freezing can cause changes in the unit cell that make them nonisomorphous, and a few proteins are not amenable to freezing. In these cases it may be necessary to use the methods in this chapter. 1Available from Charles Supper Company. 23

24

DATACOLLECTIONTECHNIQUES

Crystal-MountingSupplies Before mounting a crystal, make sure you have all the supplies you need at hand: 9 Capillaries. Thin-walled capillaries are needed in a variety of sizes. You will want to have a large supply of suckers previously made to pick from. 9 Tweezers. Two pairs of tweezers are needed: a pair with straight ends and a curved pair for prying up coverslips. 9 Scissors. A sharp pair of surgical scissors is needed for cutting capillaries and another pair for cutting filter paper in thin strips small enough to fit inside capillaries. Do not cut paper with the pair meant for cutting glass or they will quickly dull, and once dull they shatter rather than cut the glass capillaries. 9 Capillary sealant. Dental wax and other types of low-temperature wax are the traditional means of sealing capillaries. Recently, 5-min epoxy has become popular. The epoxy requires no heat, sets quickly, and forms an immediate vapor barrier even before it sets. The handiest kind is clear, comes in a dual-barreled syringe and is quite fluid before it hardens. Avoid types that are thicker and more like clay before hardening. 9 Plasticine. This is also known as nonhardening modeling clay and is available in toy stores. It is very useful for sticking capillaries to goniometers and for holding them in position while mounting crystals. When warmed by rolling with the fingers, Plasticine can be wrapped around thin capillaries without breaking them. An alternative is to use pins that are sold for use with goniometers by Supper and Huber. The glass capillary is inserted into a hole in the pin and held in place with wax or epoxy. 9 Filter paper strips. Cut Whatman #1 filter paper into strips thin enough to fit into capillaries for drying. The ideal strip is about 50 mm long, tapering from about 1.5 mm at one end to a fine point. The strips tend to curl when cut and can be straightened by gently curving in the opposite direction with fingers. Thin paper points originally meant for dental work are available from Hampton Research. As these come, they are too short to reach into capillaries. Mount the fine size into the end of a #18 syringe needle and then they will reach the crystal in most capillaries.

Mounting Crystals There are several ways to mount a crystal in a capillary but they all accomplish the same goal. The method used almost exclusively in our lab is as follows. A capillary at least twice the width of a crystal is used. It is shortened by breaking with a pair of sharp tweezers so that it is about 4 cm

2.1 PreparingCrystalsfor Data Collection

25

long. If it is not shortened it will be too long to fit onto most X-ray cameras. The broken end is sealed with either melted dental wax or 5-min epoxy. The large funnel-shaped end is left open; the crystal will be placed in the capillary through this end. A ring of wax or epoxy is placed where the funnel end narrows to make a place where the capillary can be cut later. Without the ring, the capillary may shatter completely. A small ball of Plasticine is warmed in the fingers and gently wrapped around the capillary to serve as a mounting base. Then the capillary is put aside; use the Plasticine to stick it in a handy spot where it can be reached later. The crystal to be mounted is selected and another capillary that will fit inside the first into which the crystal can be sucked is readied (Fig. 2.1). We use a piece of rubber tubing that fits over the capillary at one end with the other end going into a mouthpiece. It takes a little practice to get the knack of the sucking operation. The liquid will tend to stick at first because of surface tension, and then it comes all in a rush, requiring a little back pressure. Instead of mouth pressure, a syringe can be used to suck up the liquid. It is harder to control the syringe, however, and it takes one of your too-few hands. For toxic solutions, such as heavy-atom soaks, always use a syringe. Before the crystal is sucked out of the drop, a small amount of reservoir solution is sucked up and placed in the bottom of the previously prepared capillary (Fig. 2.2). Often a thin piece of filter paper is then pushed down the capillary to hold the reservoir liquid and to prevent it from moving. The crystal is then sucked up into the transfer capillary (Fig. 2.3). This frequently requires blowing liquid gently back and forth over the crystal to free it from the surface it grew on. More stubborn cases can be removed by very gently inserting the sharp point of a surgical blade between the crystal and the surface to pry it loose. Once the crystal is freely floating, it can be sucked up. The sucker is removed from solution and then a little air space is drawn in. This helps prevent the liquid from being drawn out by capillary action at the wrong time, wedging your crystal between the sucker and the mounting capillary. The sucker is then guided into the capillary, while observing through a dissecting microscope. The crystal is then gently expelled into the capillary and the sucker quickly removed. Some mother liquors are easier to handle than others and some will insist on sweeping the crystal between the sucker and the capillary wall, catastrophically crushing the crystal. This result can be avoided by first placing a band of reservoir liquid in the capillary into which the end of the sucker is inserted; then the crystal can be gently blown out. This leaves a large amount of solution to be removed later. In general, you want to blow out the crystal with as little solution as possible. The next step is to remove the solution around the crystal (Fig. 2.4). A very thin, fine capillary about 0.1-0.2 mm in diameter works the best for removing large amounts of liquid. Use one small enough that the crystal will not fit

26

DATA COLLECTIONTECHNIQUES

I

I

I

J

FIG. 2.1 Making crystal transfer pipettes. The transfer pipettes or suckers used for mounting crystals are made from 200-/~1 capillaries or any thin-walled piece of glass about 1/8 in. in diameter. The capillaries are useful because they come with tubing and mouthpieces. The glass is easily softened by holding over a Bunsen burner flame while turning slowly. When the glass softens, it is removed from the flame and the ends pulled apart. Hold it still for a moment to cool and then put it aside. Then draw out several more. Each drawn-out capillary is cut in the middle to form two pieces. The cut capillary is then bent by holding briefly over a flame until the end droops. With this technique you should be able to produce a number of different sizes. The ends are usually tapered and the proper bore can be obtained by cutting them off at the appropriate length.

inside. Start r e m o v i n g the liquid at the edges first. M a n y liquids with a high surface t e n s i o n c a n n o t be fully r e m o v e d this w a y a n d r e q u i r e f u r t h e r r e m o v a l w i t h a strip of thin filter p a p e r . This can be w o r k e d up n e x t to the crystal, w h e r e it will slowly a b s o r b all the free liquid. Leave a small a m o u n t of liquid b e t w e e n the crystal a n d the capillary to h o l d it in place. T h e capillary is t h e n sealed w i t h either d e n t a l w a x or e p o x y . E p o x y has the a d v a n t a g e of being

2.1 PreparingCrystalsfor Data Collection

27

FIG. 2.2 Preparing X-ray capillary for mounting. If at any time during mounting you want to temporarily seal the capillary, you can simply plug it with a softened piece of Plasticine. This gives you an opportunity to find something you forgot or to take a break. Sometimes it is necessary to allow viscous liquids time to bead up again before you can fully remove them.

cool; there is some danger with dental wax that heating will hurt the crystal. This can be minimized by laying a strip of wet tissue on the capillary over the crystal before the melted wax is applied. Another common mounting technique is to fill the capillary with liquid and float the crystal down into it. The crystal can be picked up in a minipipette and placed into the tube held vertically. Or the crystal can be sucked directly up into the capillary along with the mother liquor. Both methods require a large amount of liquid to be removed before the crystal is ready. However, some liquids are very difficult to dry completely as they tend to stick to the glass, and an excessive amount of time and effort may be required

28

DATACOLLECTIONTECHNIQUES

FIG.2.3 Crystalmounting.

to dry the capillary. These methods are easier and may also be gentler on the crystal. A disadvantage of the method is the necessity of making an artifiical mother liquor with which to fill the capillary. This artificial mother liquor sometimes damages the crystal when it is transferred. The gentlest technique of all is to grow the crystal in the capillary and then remove the growth so-

2.1 PreparingCrystalsfor Data Collection

29

FIG. 2.4 Dryingcrystals. (A) Remove large amounts of liquid by drawing up into a drawn-out pipette by capillary action or by sucking. (B) Final drying is done with a thin piece of filter paper. (C) The crystal should have a small amount of liquid to keep it wet and to help it adhere to the capillary walls.

lution for data collection. This has been necessary for some protein crystals with very high solvent contents.

Drying Crystals H o w dry does the crystal need to be? This depends upon the particular protein, and therefore requires experimentation. If some crystals are left too wet, they will dissolve slowly. If too dry, some crystals may crack. If a lowtemperature apparatus is to be used, then temperature gradients may cause the liquid to distill around the capillary, either cracking or dissolving the crystal. To prevent such damage, use a short capillary with as little free liquid as possible. A piece of filter paper may be used to wick solution around the capillary and reequilibrate it. Polyethylene glycol solutions are amazingly tenacious, and a layer that slowly beads up around the crystal will remain bound to the glass, destroying your careful drying work. One remedy for this is to place the unsealed capillary in a sandwich box with a reservoir of crystallization liquid to keep the crystal wet while you wait about half an hour for the liquid residue in the capillary to draw up around the crystal and rewet it. This time when you remove the liquid the crystal will stay dry and you can seal it. The sandwich

30

DATACOLLECTIONTECHNIQUES

box is also handy to have to give yourself a break if you think that the crystal is getting too dry. You can reequilibrate the crystal in the box before continuing.

Preventing Crystal Slippage Crystals are held in place by the surface tension of the thin film of liquid between the crystal and the capillary wall. In most cases this is adequate, but sometimes the crystals will slip slowly or suddenly. Some methods of data collection are gentler than others and the crystal is less likely to slip. Any slippage of a crystal during data collection is a problem. The crystal can leave the center of the X-ray beam or it can rotate, changing the pattern of the diffraction. There are several ways to avoid this situation. First, dry the crystal thoroughly; excess liquid around the crystal encourages slipping. (See the preceding remark on viscous liquids and drying crystals.) The slippage may be due to excess liquid that builds up around the crystal after data collection has begun. For instance, if you use a low-temperature device there may be a temperature gradient along the capillary that causes water to distill from one end of the capillary to the other. This changes the vapor equilibration point at your crystal and can cause it to get wetter. In the worst cases a bead of liquid may form above the crystal and slip down onto it and dissolve it. To avoid this, keep the capillary as short as possible and put a wicking material such as filter paper in the capillary to encourage reequilibration of the liquid. Second, mechanically holding the crystal in place with fibers may be used. This should be a last resort, as material used to hold the crystal in place will add to the background scattering. Pipe-cleaner fibers have been found to be useful for this purpose. A third method is to glue the crystal in place with a glue that dries in a thin film over the surface of the crystal and cements it into place. The glue and the method used is described by Rayment. 2 Finally, freezing the crystals as described in Chapter 6 will prevent crystal slippage. Also consider the shape of the capillary relative to the surface of the crystal you are mounting. If the crystal has a flat face, then mounting inside a large-diameter capillary will provide a better contact between the crystal and the glass (Fig. 2.5). Conversely, a small capillary may be better suited to a crystal with many facets that presents a more curved surface. In fact, by floating crystals down capillaries filled with liquid, it is possible in extreme cases of slippage to wedge the crystals into the capillary where the glass tapers. 2Rayment, I. (1985). In Methods in Enzymology, Vol. 114, pp. 136-140. Academic Press, San Diego.

2.2 Optical Alignment

31

FIG. 2.5 Choose the capillary size to fit the shape of the crystal.

.....

2.2 . . . . .

OPTICAL ALIGNMENT T h e n e x t steps will be m a d e e a s i e r if t h e c r y s t a l is first a l i g n e d o p t i c a l l y . T h i s is a c c o m p l i s h e d u s i n g a s p e c i a l g o n i o m e t e r s t a n d c a l l e d a n o p t i c a l a n a l y z e r a n d t h e d i s s e c t i n g s c o p e . T h e c r y s t a l in t h e c a p i l l a r y is p l a c e d o n a g o n i o m e t e r h e a d t h a t is, in t u r n , m o u n t e d o n t h e o p t i c a l a n a l y z e r . T h e first o p e r a t i o n is t o find t h e c e n t e r of r o t a t i o n of t h e a n a l y z e r w i t h r e s p e c t t o t h e m i c r o s c o p e r e t i c u l e s (Fig. 2.6). R o t a t e t h e a n a l y z e r to 0 ~ a n d n o t e t h e

FIG. 2.6 Steps in entering a crystal on a camera using the crosshairs as guidelines. The view through the microscope is shown in three steps A, B, and C. The rotation axis is horizontal and the direct beam passes through the crystal vertically. The microscope crosshairs in this example are not perfectly aligned with the rotation axis of the camera. (A) View at 0 ~ (B) View at 180 ~ The translation on the goniometer is moved so that the crystal is halfway between the two positions observed at 0 ~ and 180 ~ (C) The final position after correction: the crystal will now be in the identical position at both 0 ~ and 180 ~ Note that the crosshairs do not go through the center of the crystal. The center is not defined by the crosshairs but by the center of rotation. If the crosshairs and the center do not coincide, the crosshairs should be adjusted to facilitate future alignments. Never assume the crosshairs are centered unless it has been done recently, since high-power microscopes can become misaligned easily, especially if they are frequently moved.

32

DATACOLLECTIONTECHNIQUES

position of the crystal, then rotate it to 180 ~ and note the position. The center is the midpoint between these two positions. The crystal can be translated by using the slide on the goniometer head to move it to the midpoint. Another check of 0 ~ and 180 ~ is usually needed to fine-tune the centering. Repeat these steps for 90 ~ and 270 ~ Note that the center is defined by the range of motion as the axis is rotated and not by any particular point in the microscope. The most c o m m o n source of frustration in alignment is to assume that the crosshairs on a piece of equipment correspond to the center of rotation. Never make this assumption: the center is that position at which the crystal does not move when rotated. If the crystal has a definable axis, you may want to align it with the rotation axis. This is done by comparing the views at 0 ~ and 180 ~ and adjusting the arcs on the goniometer until the crystal axis in both views is in the same position. This is then repeated for 90 ~ and 270 ~. The third alignment that needs to be made is the position of the crystal faces relative to 0 ~ In general, crystal axes either are perpendicular to a face or they pass through an edge (Fig. 2.7). If the goniometer head provides a z-rotation,

1<: A

1<: B FIG. 2.7 Crystal axes relative to faces. (A) Faces are parallel to the crystal axes shown on the left. (B) Axes pass through the edges of the crystal and the faces are diagonals of the unit cell. Often crystals are combinations of both.

2.2 OpticalAlignment

33

you may want to rotate the crystal with optical analyzer held at 0 ~ until either a face or an edge is directly facing you. Draw a sketch of your crystal including its dimensions at both 0 ~ and 90 ~ on the optical analyzer. Include two lines for the walls of the capillary. This drawing will be very useful later when you want to correlate diffraction information with crystal morphology (Fig. 2.8). Other optical properties of the crystal may be noted in polarized light. Protein crystals are birefringent and will appear brightly colored in polarized light. The polarizers should be oriented so that one is below the sample and the other above. The top polarizer is then rotated until the field becomes dark. The crystal should then appear bright, since it further rotates the polarized light. As the crystal is rotated there will be four positions where the crystals will appear dark, and it will have maximum brightness in between. The exception is cubic crystals, which are so symmetric that they appear dark in all directions. If one is looking exactly down a symmetry axis of the crystal that is centrosymmetric in projection (a centric zone), then the crystal will not be birefrigent and will always appear dark. By noting these directions and comparing them with the external morphology and X-ray photographs, you may be able to identify the directions of the crystal axes. This can be very useful in mounting the next crystalmespecially if data need to be collected in a specific direction. The quality of the crystal may be judged to some degree by the bright-

FIG. 2.8 Exampleof a sketch of crystal on X-ray camera. This rough drawing of the crystal is as it appears 90~ apart on the camera, with the probable positions of the axes indicated. Later this sketch can be compared with X-ray photographs to determine the equivalence between morphology and crystal axes. (The abbreviation hi, is shorthand for photons, i.e., the X-rays.)

34

DATACOLLECTIONTECHNIQUES TABLE 2.1

Some X-Ray Sources X-ray source Sealed tube GX-20 GX-13 Storage ring

Wheel diameter(in.)

4 18

Brilliance

Cost ($)

1 4 12 102-104

15K 100K 150K

ness in the dark field. For intance, if soaking in an inhibitor or heavy atom destroys the crystalline order, the crystal will become dark. In other cases it may be possible to see a dividing line between twinned sections of the crystal, although one must be cautious because a change in thickness will have the same effect. In the best cases, twins may be cut apart by applying a sharp scalpel to the line that joins the crystals. A more complete description of optical properties of protein crystals may be found in Blundell and Johnson. 3

....

. 2.3

X-RAY

.....

SOURCES

X-rays for protein crystallography are produced by two methods. The first method is to accelerate electrons at high voltage against a metal target, and the second is to use synchrotron radiation emitted by electrons and positrons in high-energy storage rings. Laboratory sources are limited to the former, whereas the latter is available at several international facilities. A brief comparison of sources is in Table 2.1. Laboratory sources fall into two types: sealed tube sources and rotating anode (Fig. 2.9). Both produce X-rays by accelerating electrons to a high voltage of 4 0 - 5 0 kV at a metal target. The factor limiting the power at which the source can operate is the rate at which heat can be removed from the target. A typical sealed tube cannot operate at more than 20 mA of current or 0.8 kW at 40 kV. A rotating anode can reach 8 kW. A brighter source is better because radiation damage to protein crystals is not linear with dose but is a combination of dose and time. Once the crystal has been exposed, a series of chemical reactions starts that eventually damages the crystal. These are triggered by the ionization resulting from the radiation. Higher powers 3Blundell, T. L., and Johnson, L. N. (1976). Protein Crystallography, pp. 98-104. Academic Press, San Diego.

35

2.3 X-RaySources water

Be Win X-ray

t

~e-

a,,amen,

amen1

12 mA

Sealed Tube

0 kV

__~

40 W _ ~

200 mA

Rotating anode

FIG. 2.9 Sealed tube and rotating anode X-ray sources. Both sealed tubes and rotating anodes generate X-rays by accelerating electrons against a metal target. The electrons are boiled off a filament that is heated by a filament current. The filament is 40 kV above the target that produces the electron acceleration. A beryllium window lets the X-rays out. The power of the source is limited by the amount of heat the source can dissipate without melting the target. In the sealedtube case this is 12 mA, which produces 480 W. More heat can be dissipated by rotating the anode at high speed, which gives the rotating anode a current of 200 mA, which produces 8000 W.

do not linearly increase the rate at which this damage occurs, and, therefore, more useful data can be collected before the crystal is irreversibly damaged. Another factor in favor of the rotating anode is that many data collection installations are swamped with projects and the faster speed is needed to satisfy demand. The choice of the metal target determines the characteristic wavelength at which the X-rays are emitted. The most common choice is a copper target, which emits at 1.5418-A wavelength (CuK~). This wavelength is a good compromise between maximum achievable resolution and absorption. The other factor in favor of copper is its superior heat-conducting properties that allow it to be used at higher powers. In fact, most other metals used in targets are actually overlaid onto a copper base. Copper X-ray radiation is a soft X-ray: it will not penetrate very far through most materials, is absorbed quickly by air, and is scattered efficiently by air, water, and glass. The path length through air must be minimized; if the distance from the crystal to the detector is over 150 mm, a helium box should be used. At shorter distances,

36

DATACOLLECTIONTECHNIQUES

the windows of the helium path absorb about as much or more as is recovered by the helium path. Because copper X-rays are scattered by air, the free air path length of the direct beam should be minimized to prevent excessive background scatter. This means placing the collimator and the beam stop as close to the crystal as possible. The amount of glass and water surrounding the sample must be kept to a minimum, this will greatly increase the signal from the sample by reducing both background scatter and absorption of the diffracted rays. Phillips 4 has an excellent discussion of X-ray sources and their optimization for protein work. In addition to laboratory sources, synchrotron radiation sources are used. Since they may be extremely bright, they make ideal sources for characterization of small crystals. More information is given later on synchrotron sources, and a full discussion of synchrotron sources including a listing of available sources and the equipment at each can be found in Helliwell. s Optics at synchrotrons are also generally superior to those found in the laboratory. This is due partly to cost, but also the brilliance of the synchrotron beam means that more can be thrown away and still have a very bright beam. This should be remembered when one is characterizing a new crystal, as even very small crystals (around 50/zm) can produce diffraction patterns with the very bright beam available at a synchrotron source.

Nickel Foil Filtering The X-ray radiation as it comes from the tube cannot be used for data collection without some filtering. The spectrum of a copper source consists of two main peaks at CuK,~ and CuK~ and also has white radiation at both higher and lower energies than the characteristic radiation. The simplest filter to remove the CuK~ radiation is a piece of nickel foil (Fig. 2.10). This will remove most but not all of the K~ without attenuating the K,~ overly much. A piece of foil 0.0005 in. thick is used as a filter 6 and is mounted in a holder that makes it easily removable.

Filtering by M0nochr0mat0rs A single-crystal monochromator can be used to filter the X-rays. It produces a cleaner output than nickel foil, removing more high-energy radiation than the nickel foil lets pass through. Any radiation that is absorbed by the 4phillips, W. C. (1985). In Methods in Enzymology, Vol. 114, pp. 300-316. Academic Press, San Diego. Helliwell, J. R. (1992). Macromolecular Crystallography with Synchrotron Radiation, pp. 94-135. Cambridge University Press, Cambridge; for addresses of contacts, see http://

ppclI.scripps.edu. 6Stout, G. H., and Jensen, L. H. (1989). X-Ray Structure Determination: A Practical

Guide, 2nd ed., p. 12. Wiley, New York.

2.3 X-RaySources

37

2

g E

3

A t !

i t I

FIG. 2.10 Production of X-rays. Electrons are accelerated in a vacuum to strike a copper target. (1) The electron acceleration voltage is below the threshold for characteristic radiation, and the X-rays produced have wavelengths (A) corresponding to the energy of the electrons: A = h c / e V = 12,398/V. (2) When the electrons exceed a certain voltage, they will have enough energy to displace a K-shell electron. X-rays are then produced as electrons fall from the L-shell (K~) of M-shell (K~), producing sharp peaks of X-rays superimposed on the white radiation. (3) Nickel foil is used to filter out most of the K~ radiation. The nickel filter has the absorbance curve indicated by the dotted line. It also filters out most of the higher-energy white radiation.

sample a n d causes d a m a g e but does n o t c o n t r i b u t e to the diffraction p a t t e r n is wasteful. Experience has s h o w n that m o n o c h r o m a t o r s can e x t e n d the useful life of a p r o t e i n crystal's diffraction by r e m o v i n g h a r m f u l r a d i a t i o n t h a t is a b s o r b e d by the crystal but does n o t c o n t r i b u t e to the diffraction pattern. A g o o d setup w i t h a m o n o c h r o m a t o r , collimator, a n d b e a m stop can p r o d u c e excellent signal-to-noise ratios. D r a m a t i c r e d u c t i o n s in the a m o u n t of scattered r a d i a t i o n in the b a c k g r o u n d can be m a d e over the nickel foil filter by using a m o n o c h r o m a t o r . H o w e v e r , the optics are still divergent. For large unit-cell crystals w h e r e the diffracted spots are very close together, it m a y be better to use mirrors, as e x p l a i n e d in the following section.

Focusing with Mirrors X-rays can be focused by deflecting at low angle with curved m i r r o r s (Fig. 2.11). T w o m i r r o r s are used, one in the h o r i z o n t a l plane a n d one in the

38

DATACOLLECTIONTECHNIQUES

FIG. 2.11 Two types of X-ray optics. Top: With a collimator the beam is divergent. Bottom: With mirrors, the beam is convergent and is better suited for use with small crystals and large unit cells. Two mirrors are used: one in the horizontal plane and the other in the vertical.

vertical plane. 7 The biggest advantage of a mirror is that the X-rays can be focused, increasing brilliance and allowing resolution of very large unit cells with closely spaced diffraction spots. The mirrors are not 100% efficient, and a substantial portion of the direct beam is lost, although the overall brilliance is increased. For smaller cells the advantage is lessened; indeed, for larger crystals with small cells, mirrors can so focus the diffraction spot as to saturate the detector l o c a l l y - - w h i c h can be alleviated by defocusing. If you have a large unit cell (> 150 A), or very small crystals (< 0.3 m m in largest dimension), then mirrors are definitely indicated. Mirrors will not reflect the higher energy X-rays but do reflect the lower energy X-rays that, because they are efficiently absorbed by the sample, are thought to cause much of the radiation damage to protein crystals.

Increasing Brilliance with Mirrors Another increasingly c o m m o n use of mirrors is to focus a large beam area onto a small area at the crystal and this increase the brilliance of the X-ray beam while at the same time the mirrors m o n o c h r o m a t e the beam. These systems can increase the number of counts in measured diffraction 7phillips, W. C., and Rayment, I. (1985). In Methods In Enzymology, Vol. 114, pp. 316330. Academic Press, San Diego.

2.4 PreliminaryCharacterization

39

spots up to 20 times. The chief drawback is that the beam is convergent and thus the d-spacing that can be resolved is limited to about 2 0 0 - 2 5 0 A. Since most protein crystals have cells smaller than this, the mirrors are almost certainly useful in most labs. Examples of this type of mirror are the Max-Flux | confocal mirrors from Osmic (www.osmic.com), which use two graded multilayer mirrors joined at their edges to form a compact mirror system that is relatively simple to align and adjust. The d-spacing is graded across the length of the mirror to match the changing angle of the incident radiation, and thus a large angle of X-rays are captured and focused onto the sample.

.....

2.4 .....

PRELIMINARY CHARACTERIZATION

There is nothing more exciting than the first diffraction photos from a new protein crystal. There is nothing more disappointing than to discover that the crystal is really salt! The goals of a preliminary characterization are to discover if the crystal is protein, to what resolution it diffracts, the space group of the crystal, the quality of the crystal, and the relationship of the visible morphologies to the unit-cell axes. While not strictly necessary, a good precession photo contains much information on the quality of the crystal and can greatly aid space group identification. In many cases you can proceed directly to data collection and rely on the software to identify the space group. In difficult cases, a good set of precession photos may resolve difficulties in assigning symmetry and screw axes. The film orientation in the cassette should be standardized to facilitate alignment. We cut the upper right corner as the film is placed in the cassette and always clip the film holder to the top when it is processed. This allows the top, bottom, left, and right sides of the film to be distinguished. The film can be labeled with a #2 pencil to uniquely identify it. The label will be visible after the film has been developed. It is important to keep a careful record of each film. Note the camera and goniometer angles, the exposure time, the exposure type, and the amount of rotation. Make a rough sketch of the crystal that shows how the crystal morphology relates to the camera axes and the direct beam. After each still exposure, mark the direct beam position on the film by closing the shutter, rotating the beam stop out of the way, and opening the shutter for the briefest time possible. This defines the center and allows the alignment angles to be calculated accurately. The goniometer carrying the crystal is placed on the precession camera and aligned using the microscope mounted on the precession camera. Never assume the goniometer will mate exactly the same with the camera as it did

40

DATACOLLECTIONTECHNIQUES

with the optical analyzer. Rotate the spindle until it is at 0 and set the precession angle to 0 ~ The first shot is a "still" photo taken without the precession motion. If the crystal's smallest dimension is larger than 0.1 mm, usually a 30-min exposure on a rotating anode is sufficient. Small crystals require longer exposures. The exposure time is also dependent upon the unit-cell size. Smaller cells have fewer spots, with correspondingly more energy in each. For a first photo you will not know the unit-cell size, so start with a large collimator and take a 30-min exposure. Examine this film for diffraction spots. A small-molecule film is shown in Fig. 2.12 for comparison with a protein film. The protein will have more diffraction spots that fall into families of rings. There will be spots at lower resolutions (nearer the center), and since each spot is usually weaker, it will not be as streaked as in the salt case, which has a white radiation streak. If there are no spots on the film, check the crystal alignment and make sure that X-rays are coming out of the collimator and that the camera is aligned. If all these things check out, try a long exposure of several hours. If there is still nothing, you may have damaged the crystal during mounting and it may be worth trying again. In the meantime you may wish to verify that the camera is working properly, using a crystal that is well characterized. Many laboratories keep a fresh lysozyme crystal around for this purpose. The spots on the photo should suggest a family of concentric circles, commonly referred to as a "zone." The center of the circles may correspond to a principal axis of the crystal but may also be a diagonal of the lattice. The circles may not be well populated and may not be obvious at first glance. If there are no obvious zones then rotate q5 300-45 ~ and take another photo. If there still are no zones, try 20 ~. If there still are no zones, you are not looking closely enough. A piece of acetate with concentric circles scribed onto it with a compass can be overlaid onto the film to make the rings more obvious and to find the center. M a r k the center and measure the distance from the directbeam spot in the vertical and horizontal directions (Fig. 2.13). The correction needed is tan -~ (A/F) where A is the vertical or horizontal distance and F is the crystal-to-film distance (Fig. 2.14). The vertical correction is applied to spindle and the horizontal correction to the horizontal arc of the goniometer. Move the crystal so that an imaginary line from the crystal to the center of the zone aligns with the direct beam. After you adjust an arc, always remember to check the centering of the crystal and, if necessary, correct the goniometer translations.

PrecessionPhotography After the crystal has been aligned using zones, a small-angle screenless precession, of 2 ~ or 3 ~ is taken to find the center more accurately. This film

FIG. 2.12 Comparison of small-molecule X-ray photos of a small-molecule crystal, sucrose, shown as an example of a buffer or salt crystal. ( A ) X-ray still photo of sucrose crystal taken with a copper rotating anode source at a crystal-to-film distance of 100 mm without any filters for 5 min. The spots are bright and spaced far apart, and there are no pieces of lattice visible. (B) Precession photo (3")of the same crystal, again without a filter. There is still no lattice visible as there would be for most protein crystals. Note the streaky character of the spots. Similar streaks appear with protein crystals, but they are rarely bright enough to see.

42

DATACOLLECTIONTECHNIQUES

FIG. 2.13 Crystal alignment film. This diagram illustrates the method of aligning a crystal by means of a still photo. The spindle axis of the camera is horizontal and the crystal-to-film distance has been set to 100 mm. C, the center of the camera, has been marked by a brief exposure with the direct beam. The distance from the center of the camera to the center of concentric rings of diffraction spots, R, is measured in the horizontal and vertical directions. The vertical correction that is applied to the spindle is tan ~(S/100), and the horizontal correction to be applied to the horizontal arc on the goniometer head is tan-I(H/100), where S and H are measured in millimeters (mm).

should s h o w a circle in the center with a plane of the reciprocal lattice. The center of the c a m e r a is at the center of this pattern. If the p a t t e r n is perfectly centered, then the crystal is aligned. If not, the error is m e a s u r e d in the horizontal and p e r p e n d i c u l a r directions as defined in Fig. 2.15, and the p r o p e r correction is applied by looking up the error on a chart to find the corres p o n d i n g a n g u l a r c o r r e c t i o n or by using the following equation, which is accurate for small errors and small precession angles. Given A (the error in millimeters) the a n g u l a r missetting (in minutes) is given by E -~ A X 8.5' X 100/F. Be careful to m e a s u r e the errors f r o m where the principal axes meet and not the center of the b e a m stop. A check p h o t o g r a p h at the same precession angle is taken to confirm the corrections. D e p e n d i n g u p o n how practiced you are at m a k i n g the corrections, this p h o t o will show either no need for corrections or that small ones are necessary. Repeat the correction process as required. This p r o c e d u r e is illustrated in Fig. 2.16.

A2O 18 16 14

A 12 n g 10 I 8

~f J

J

.,J =

J

i

J

I

f

i

II w

lip

o

~ -"

O

6

-

j,-

4

J

o

"'

v

~

-"

6

i

0

0

1

2

3

4

5

6

7

0

mm

. . . .

8

9

75

mm

100

i

llil

i

mm

10 11 12 13 14 15 16 17 18 19 20

Error ( m m )

B 80 J

50

A

4O

n

g30 I e

A

2O

~

60

~ I

10

7'5

f

,

,

100

I

0

10

20

30

40

50

60

70

80

90

100

Error (ram)

FIG. 2.14 Film error versus angular correction for F = 100, 75, 60 mm at small (A) and large (B) angles. If the spindle is near one of the cardinal points, 0, 90, 180, or 270, then the corrections to the arcs are straightforward. The most horizontal arc is corrected and the other arc is left alone. If, however, the spindle is off by more than about 10 ~ corrections need to be made to both arcs. This condition is known as "crossed arcs." Subtract the spindle angle from the nearest cardinal value and take the sine and the cosine. The most horizontal arc is moved by the correction times the cosine, and the other arc is moved by the correction times the sine. For example, if the spindle is at 75 ~ (after correcting the vertical error) and an angular correction of 1 2 ~ on the horizontal plane is called for, then the arc closest to horizontal is corrected by cos(90 - 75) • 12, or 11.6, and the other arc by sin(90 - 75) • 12, or 3.1. The directions are chosen such that an imaginary line from the crystal to the zone on the film is moved to the center.

44

DATACOLLECTIONTECHNIQUES

FIG. 2.15

Measuring A on a precession photo.

The camera is then set up to take a screened precession photo of 12 ~ to 15 ~ A chart that comes with the precession camera can be used to find the correct screen and screen distance for the precession angle, or you may use the formula CS

= ~

1, ,

tan u

where C S is the crystal-to-screen distance, r is the radius of the screen, and u is the precession angle. A simple shortcut is to set the crystal-to-screen distance to 57 mm, at which distance r in millimeters is approximately equal to the precession angle in degrees. Thus, for an 8 ~ precession picture, the radius to use is 8 mm with a screen distance of 57 mm. These values can be confirmed with the equation. The camera is then set to these values and the exposure taken. The exposure time needed for a screened precession is very long because most of the diffraction is blocked by the screen and only the small area of the annulus is exposed at any given time. The exposure time can be estimated from the time used for the small-angle photo by comparing the areas to be exposed. To a first approximation, the area at small angles is proportional to the square of the angle. So if a 3 ~ precession required 1 h for a good exposure, then a 15 ~ will require 1 h x (15 x 15)/(3 x 3) or 25 h. The precession angle is equal to the theta of the highest angle reflection that will be recorded. A 15 ~ photo will correspond to a 2~9 of 30 ~ or a 3-~i resolution photo for CuK,~ radiation (Fig. 2.17). An example of a 15 ~ precession photo is shown in Fig. 2.18. ~

Rotation Photography Rotation photography, also called oscillation photography, is done by rotating the crystal about a single axis while the photo is taken. A 1~ rotation For more information on precession geometry, see Blundell and Johnson, pp. 2 6 0 - 2 6 9 .

2.4 PreliminaryCharacterizati0n

45

image is an exposure taken while the crystal is rotated through 1~ on the spindle axis. Rotation photography is more efficient than precession photography, since there is no screen and all the diffracted photons are recorded. Rotation photography has been used extensively for data collection and we will not dwell on the details here because there is much material already written on the subject. Figure 2.19 shows a rotation image taken at a synchrotron source. Note that the pattern is of concentric, nearly circular regions called lunes. The bottom edge of each lune corresponds to the start of the rotation, and the top edge of the lune is the end of the rotation. The spots in between are "swept out" as the crystal rotates (Fig. 2.20). Each lune arises from a single plane of reciprocal space. Compare this to a precession photo where the entire photo is of a single plane in reciprocal space. In the rotation image several planes are each partially developed. To collect an entire data set, more rotation images, each adjacent to the other in rotation angle, are taken until the crystal has been rotated through enough reciprocal space to collect all unique data. The spots at the edge of the lunes are only partially developed, and, when integrated, either are added to the corresponding partial on the adjacent film or are ignored. To maximize the amount of whole, integrable spots, a large rotation angle is desired. At some point, however, the spots will start to overlap on adjacent lunes. This maximum range is resolution dependent. The farther out from the center of the film, the shorter the angle that can be rotated before spots start to overlap. In Fig. 2.19 the lunes at the top and bottom of the film are just starting to overlap. The other limit to the rotation is the amount of background that can be tolerated. The background, given a constant rotational speed, is proportional to the amount of rotation. Thus, a 2 ~ film has twice the background of a 1~ film. Each individual data spot, on the other hand, is the same intensity, as long as it is a whole spot, regardless of the rotation angle. To maximize the ratio of signal (spot intensity) to noise (background), more films of shorter angle need to be taken. Balanced against this is the fact that short rotations require more films, more film handling (which is very time-consuming), and more partial spots. These conflicting needs must be reconciled for each individual data collection. The maximum rotation angle depends upon three factors: 1. Unit-ceU size. The larger the unit cell, the smaller the rotation angle. Only the axes prependicular to the rotation angle are limiting. The axis that the crystal is rotated about is not limiting. 2. Spot size or mosiac spread. The width of the spot in the rotational directionmtypically 0.1 ~ to 0.5 ~ depending upon crystal quality and X-ray optics. 3. Resolution. The higher the resolution desired, the shorter the permissible rotation angle. The maximum overlap will occur in the area of the

46

47

FIG. 2.16 Aligning a precession photograph. All of the photographs were taken at a crystal-to-film distance of 100 mm. (A) The first step is to

"

.

take a still photograph. This is a photograph for iron-binding protein that has a 110 X 110 X 200 A' unit cell. Note the concentric rings centered slightly up and to the right. This film is corrected as described in the text and in Fig. 2.13. ( B ) After the corrections have been applied, a smallangle screenless precession photo is taken to fine-tune the alignment. The direct beam was marked on this photo by moving the beam stop out of the way with the X-rays off, and then the shutter was opened as briefly as possible. This marks the center and makes determining thecorrection easier. Note that the beam stop is not centered. In general, the beam stop is never a reliable indicator of the center of the film. The error is measured as described in the text and in Fig. 2.15. This precession photo was taken at 1.5" precession angle because of the large cell spacing of 200 A in the direction perpendicular to the film. Smaller cells are usually taken at 2 5 - 3 " precession angle (C) Before a higher-angle photo is made, a check precession photo is taken. The alignment can be confirmed by noting the position of related spots o n opposite sides of the beam stop. This photo still shows a very small misalignment, but it is near the limit of accuracy that the camera and goniometer can be set to. (D)Finally, a high-angle screened photograph is taken. The precession angle was set to 8", the crystal-to-screen distance was 58 mm, and an 80mm A 2 screen was used. In the photo the major axes are tilted 45", and the pattern shows 4-fold symmetry. A precession photograph taken at 90" to this shows every second and third spot missing along the c axis, indicating that the space group is either P 4 , or its enantiamorph P 4 , . (Photos courtesy of Dr. Michele McTigue, Scripps Research Institute.)

10 9 i

8 ~

A 6

\

4" 3 2" 1 0 ~. 10

"~ I 12

15

2O

~~-~

25

30

40

50

60

2f/ FIG. 2.17

Resolution versus 2~9 for CuK~ (1.54-A) source.

FIG. 2.18 Precession photograph (15~ This precession photograph is of E. coli endonuclease III hOl zone. Note the m m symmetry with a mirror on each axis. Also note that every odd spot is missing along the two major axes. The other two zones both show the same symmetry and systematic absence, identifying the space group as P2,2~2~. Photo courtesy of Drs. Chefu Kuo and John Tainer, Scripps Research Institute.

2.4 PreliminaryCharacterization

49

FIG. 2.19 Rotation photograph. A 2 ~rotation photo of aconitase taken at Stanford Synchrotron Radiation Laboratory on beamline 7.1. The rotation axis is about the horizontal. The horizontal band across the center of the film is due to a piece of Mylar that supports the beam stop. (Photo courtesy of Dr. C. David Stout, Scripps Research Institute.)

film perpendicular to the rotation axis and at the m a x i m u m diffraction angle. The m a x i m u m rotation angle can be estimated with the formula

] ArotatiOnmax- [tan-~(ceHmin edge]3

- spot width.

For example, suppose that we want to collect a 2-A data set on an orthorhombic crystal with unit cell dimensions a = 100, b = 75, c = 50 A and, given the optics, the spot width is 0.3 ~. The crystal is m o u n t e d so that a, the longest axis, is along the rotation axis. W h e n we are rotating with c in the plane of the film, the m a x i m u m permissible rotation is limited by b, so

50

DATACOLLECTIONTECHNIQUES

rotation axis

FIG. 2.20 Rotation geometry projected down rotation axis. The volumes swept out by two successive rotation photographs are marked 1 and 2.

that t a n - 1(2/75) - 0.3 = 1.2 ~ and w h e n c is limiting, the m a x i m u m rotation angle is 2.0 ~ The m a x i m u m angle is thus 1.2 ~ To collect a full data set we need 90 + 20 .... . W i t h CuK~ radiation (1.54 A), 2~9 is 45 ~ at 2 fl~. So, to collect a full data set w i t h o u t overlaps, we need 135/1.2 or 113 films. Some saving can be m a d e by using 2 ~ for the part of the data set where c is limiting and switching to 1.2 w h e n b is limiting.

Blind Region N e a r the rotation axis there is a region of reciprocal space that cannot be collected because the Lorentz correction 9 is very large. The Lorentz correction accounts for the a m o u n t of time that a reflection spends in diffracting 9

Blundell and Johnson, pp. 319-320.

2.4 PreliminaryCharacterization

51

conditions. Near the rotation axis this time gets to be very large, and at the limit a reflection directly on the rotation axis is always diffracting. To fill in the data, it is necessary to rotate about another axis in order to sweep out the data near the rotation axis. If you have the crystal mounted on a goniostat, this may mean simply moving chi or phi, otherwise the crystal will have to be remounted. In the absence of symmetry, the crystal will have to be rotated by at least 2~)max, and with mirror symmetry the angle will be #max-Another strategy is to rotate with the crystal mounted such that the nearest axis is 20o-30 ~ off the rotation axis. If you are using autoindexing this is the ideal solution, as it maximizes the amount of unique data. Offsetting the axes of the crystals from the rotation axis is always desired for maximizing unique data, but in the past crystals had to be nearly perfectly aligned or the software being used could not index, and therefore could not integrate, the film. For instance, consider the film in Fig. 2.19. It is set so that the mirror symmetry in the vertical direction is along the vertical. This makes for a pretty film but it also means that a partial on the right half has a corresponding mate on the left. If the crystal was rotated so that the mirror was off the vertical by a few degrees, the corresponding mirror-related reflections would be such that when one was partial the other would likely be whole. It may be desirable to have mirror symmetry when collecting Bijvoet pairs in order to use an anomalous scattering signal. In this case, the mirror-related reflections may be Bijvoets, and thus collecting them both at the same time with the same geometry may maximize the accuracy of the small difference between pairs. An excellent set of three articles on rotation photography can be found in M e t h o d s in E n z y m o l o g y , including use with large cells, synchrotron radiation, and integration of intensities. 1~

White-Radiati0nLaue Photography In Laue photography, the crystal is held still during the exposure, and multiple wavelengths, or white radiation, are used rather than monochromatic radiation. This gives many more diffraction spots at once than with monochromatic radiation, and recently, with the availability of synchrotron radiation light sources that give off a continuous spectrum of useful X-ray wavelengths, several groups have successfully used this technique to study reaction intermediates. A single Laue photograph of 100 ms exposure can have nearly an entire data set for high-symmetry crystals. Laue photography is very sensitive to the mosaicity of the crystal. Crystals that are too mosaic 10 See Methods In Enzymology, Vol. 114: Harrison, et al., p. 211; Rossman, M. G., p. 237; Fourme, R., and Kahn, R., p. 281.

FIG. 2.21 White-radiation Laue photographs. (A) Laue photograph of photoactive yellow protein taken at beam line X26C, National Synchrotron Light Source, Brookhaven National Laboratories. Crystal size: 60 x 60 x 30/2m ~. Exposure: 30 ms. The image plate used was scanned on a Fuji BAS2000 scanner. The image plate was placed in a cassette with a rubber front that was then mounted on the X-ray camera by means of a mount similar to those used for film cassettes. (B) Close-up of I.aue pattern. The horizontal line across the middle of the image is displayed in cross section at the bottom of the screen. Because of the low background noise of image plates and their high sensitivity, weak diffaction spots are well imaged. In the center just below the horizontal line is a nodal where several circles cross. Laue patterns are indexed by noting the positions of several nodals. The software then searches through all possible orientations for one that matches the list of nodals. Nodals are low-index reflections and are usually energy multiples, hence especially bright. (Photos courtesy of Dr. Keith Moffatt and Zhong Ren, University of Chicago.)

2.4 PreliminaryCharacterization

53

FIG.2.21 (continued).

to use with Laue photography can still be used with monochromatic radiation. A Laue photograph is shown in Fig. 2.21. Note that the pattern is that of many rings. Where these rings intersect is called a nodal. The positions of these nodals are used by the Laue software to orient the crystal by searching through all possible positions and finding the one that best predicts that nodal pattern.

SpaceGroupDetermination Space groups are determined by examining the diffraction pattern and noting any symmetry and systematic absences. It is necessary to have a copy of the International Tables for Crystallography to look up the symmetry patterns to find the identity and number of the space group. A list of the seven crystal systems and their main features is given in Table 2.2. Since proteins

TABLE 2.2 Protein Space Groups Crystal system

Class

Space groups

Triclinic

1

P1

Monoclinic (2-fold parallel to b )

2

P2, P2, c 2

Orthorhombic

Tetragonal (4-fold parallel to c)

222

4 422

Trigonal (3-fold parallel to c)

3

-

u =p =y

=

90

P4, P4,, P4: P42,14 14,

a=b

a

= p=y

=

90

a=b

a =p = 9 0 , = ~

P422, P42,2, P4,22, P4,22, P4,2,2, P4,212,P4222, P422,2,1422,14,22 P3, P3, P3>,R3

Cubic

23 432

120 a=

6

622

Angle restrictions

P222, P222 1, P2 1212, P2 12121 C222, C222, F222,1222, 12,2121

32 Hexagonal (6-fold parallel to c)

Lattice restrictions

b=c(R)

a=b

u = p = y < 120 u =p

=

90, y

=

y = 90

=

120 ~ 6 2 2~, 6 ~ 2~26, ~ 2~26, ~ 2 2 , P6,22, P6,22 P23, F23,123, P2,3,12,3 P432, P4,32, P4,32, P4,32, F432, F4,32,1432,14,32

a=b=c

a

=

p

2.4 PreliminaryCharacterization

55

are asymmetric objects and occur only in the L-form, they cannot be involved in symmetry elements requiring inversion centers, mirrors, or glide planes. This limits the possible space groups to 65 out of the 230 mathematically possible space groups and leaves 2-, 3-, 4-, and 6-fold axes along with the corresponding screw axes, and centering, as the possible symmetries. We will consider each in turn. 2-Folds. A 2-fold causes the presence of a mirror plane perpendicular to the 2-fold axis in the reciprocal space pattern after the addition of the inversion center (see Fig. 2.22). A 21 can be distinguished by the absence of every other spot on the axis that lies in the plane of the mirror. The presence of even one exception to the screw axis absences means the axis is a 2-fold. The 2-fold constrains the angles between the 2-fold and the other two axes to be 90 ~. 3-Folds. A 3-fold causes 3-fold symmetry in the reciprocal axis except on the 0-layer, where the inversion center raises the symmetry to a 6-fold. Thus it is usually necessary to take an upper-level precession photo to differentiate between a 3-fold and a 6-fold. There are two 3-fold screw axes, the 31 and the 32. They give identical absences along the 3-fold axis: only every third spot is present. They cannot be distinguished from each other at this point and must be left to distinguish later. In addition, the 3-fold symmetry gives rise to a hexagonal lattice that constrains the a and b axes to be identical and the angles between axes to be 90 ~ between the 3-fold and the other two axes, and to be 120 ~ (60 ~ in reciprocal space) between the two non3-fold axes. 4-Folds. A 4-fold gives rise to 4-fold symmetry to the diffraction pattern in the plane perpendicular to the 4-fold axis. It also constrains two axes to be identical to each other and all axes to be at 90 ~ A mirror will be found at the plane passing through the origin and perpendicular to the 4-fold (hkO). There are two possible types of screw axis: 41 and 43, with only every fourth spot present; and 42, with every other spot present on the 4-fold axis. 6-Folds. A 6-fold gives rise to 6-fold symmetry both on the zero level and on upper levels. In addition, a mirror is found at the plane passing through the origin and perpendicular to the 6-fold axis (hkO). Both trigonal (nonrhombic) and hexagonal space groups have a hexagonal lattice; the symmetry of the intensities must be used to tell the two apart. The 6-fold axis itself can have three different screws: 61 and 6s, with every sixth spot only; 62 and 64, with every second spot only; and 63, with every third spot only on the 6-fold axis. Two pairs of the screw axes, 61 and 6s, and 62 and 64, can be told apart only at a later stage.

56

DATACOLLECTIONTECHNIQUES

§

Q-

2-fold

A

3-fold

4-fold

§

Q-

§

6-

2-fold + center

A

3-fold + center

4-fold + center

FIG. 2.22 The effect of adding an inversion center to a 2-, 3-, and 4-fold. On the left is the symmetry in real space with a comma used to indicate an asymmetric object. A " + " and a solid symbol indicate an object above the plane of the page and a " - " and an open symbol indicate an object below. Note in the case of the 2-fold that an inversion center forms a mirror perpendicular to the 2-fold. In projection there are two mirrors and, therefore, the 0 level will also show two mirrors, while an upper level will show only one mirror. An inversion center plus a 3-fold still has the same symmetry as before, although in projection it will appear to be a 6-fold. A 4-fold plus an inversion center adds a mirror in the plane of the paper. A 6-fold (not shown) is similar to a 4-fold.

Rhombic. Rhombic is a special case of trigonal and is characterized by having all three axes equal and all three angles equal. It is the hardest system to diagnose because of the difficulty in finding the zones and determining their relationships when the axes are not near 90 ~. In certain cases, C-centered monoclinic may really be R32, so this may be worth checking out.

2.4 PreliminaryCharacterization

57

Centering. Centering can be detected by systematic absences throughout the diffraction pattern. Centering can cause confusion about the direction of the principal axes. Always use the symmetry to determine the lattice. For instance, in a C-centered lattice, spots with h + k = odd are missing. At first inspection, the lattice appears to be running on the diagonals of the cell. Symmetry on the zero level will show the presence of two mirrors that give the correct direction of the two axes. The five types of centering possible are A, B, C, F, and I. A, B, and C centering are identical except for the naming of the axes. The convention is to name the axes such that the cell is C centered, so that h + k = 2n + 1 reflections are missing; that is, the ab face of the crystal is centered. (Thus, A centering has the bc face centered and B centering has the ac face centered.) F is face-centered so that all three faces ab, ac, and bc are missing the odd reflections. I is body-centered so that an extra lattice point is found at the body center of the lattice. This would be easy to miss by precession photography of 0 levels alone, although a picture of a diagonal zone might reveal it. Always assume that you may have higher symmetry than you do until proven otherwise. Never trust 90 ~ angles unless confirmed by symmetry. Never use low resolution photographs to decide the space group. Always keep an open mind about the space group until the structure has been solved and refined. To determine the correct space group it is necessary to take enough precession photos to determine the symmetry elements present, any systematic absences along the axes, and any centering. The photos are then compared with the diagrams in the International Tables. The tables are grouped according to the highest symmetry present (i.e., if you have a 6-fold, then the molecule is found in the hexagonal section). You then search for alternative space groups and ask if the precession photos you have are necessary and sufficient to eliminate all other possible space groups. This may mean taking an upper-level photo to determine the difference between some possibilities such as trigonal versus hexagonal (Fig. 2.23). Determine the size of the unit cell by Bragg's law and compare the volume of the cell in angstroms cubed with the size of the best estimate of the protein's molecular weight (MW) in daltons. Listed in the International Tables for each space group is the Z number, or the number of asymmetric units in the unit cell. Use this formula to calculate the angstroms per dalton of the asymmetric unit: volume/(Z x M W ) . 11 The expected value for this number for protein molecules ranges from 1.7 to 3.0, with the average being about 2.3. If you have a number substantially smaller than 1.7, it is likely that something is wrong or, perhaps, there is internal symmetry in the protein molecule that corresponds with a crystallographic axis. For instance, spot hemoglobin is a tetramer 11Matthews, B. W. (1968).J. Mol. Biol. 33,491.

58

DATACOLLECTIONTECHNIQUES

2.4 PreliminaryCharacterization

59

and has a dimer axis that coincides with a crystallographic axis so that the asymmetric unit contains one-half of a tetramer instead of a full molecule. If the number is substantially larger than 3.0, then there are two possibilities. One is that there is more than one molecule in the asymmetric unit, which is very common. The other possibility is that the space group has higher symmetry than you have determined. Try looking for higher-symmetry space groups that contain the symmetry you have already determined to see if there is a precession p h o t o g r a p h that you could take that would prove or disprove this possibility. For instance, R32 can be reindexed as monoclinic C2 and it can be difficult to spot the difference. If you are using X E N G E N or M O S F L M to reduce the data, it is also possible to try reindexing your data in different space groups and to look at the R-merge values. There are also programs that are commonly used by small-molecule crystallographers to search for additional symmetries in a three-dimensional data set. Finally, a c o m m o n error is to mistake pseudosymmetry for true symmetry. An example of a crystal with pseudosymmetry is given in Fig. 2.24. Pseudosymmetry appears correct at lower resolutions but breaks down at higher resolutions. Most protein crystals show some pseudosymmetry in the range of infinity to 6 A. It is unusual not to have low-order reflections on the axis that are virtually extinct, leading one to the conclusion that there is a screw axis. Always confirm screw axes to at least 3 ~i or better resolution. If you cannot confirm the screw axis to high resolution, then bear in mind that the axis is not a screw axis and try both possibilities. The presence of even a single reflection that breaks the symmetry rules out the presence of a screw axis. If that reflection is weak, however, it may be worthwhile considering that it is an artifact (in particular, K~ radiation can cause artifacts), and do not exclude the 2-fold screw until better evidence is found. One of the best confirmations of the space group is a good heavy-atom Patterson. For example, consider the case of determining whether a 2-fold or a 21 screw is present on the a axis. The Patterson map for both possibilities is calculated the same way. Even if you enter the incorrect possibility in the symmetry operators, both a 2-fold and a 2 ~ degenerate to a mirror in Patterson space. In the case of the 2-fold, the H a r k e r vectors will be at the plane x = 0.0, and for the 2~ the Harker vectors will be on x = 0.5. It is important to plot out

FIG. 2.23 Distinguishing 3- and 6-folds. (A) The 0-level photograph of photoactive yellowprotein taken down the c axis shows a hexagonal net with 6-fold symmetry. Both a 3-fold and a 6fold will show 6-fold symmetry in the 0-level. Thus an upper-level precession photograph hkl (B) was taken to distinguish the two possibilities. This photo also shows 6-fold symmetry,confirming the presence of a 6-fold axis. Another precession photograph (not shown) was taken of the 6-fold axis and showed every odd spot systematically missing. There are no mirrors, eliminating the possibility of the class 622. This is enough information to assign the space group as P63.

60

DATACOLLECTIONTECHNIQUES

FIG. 2.24 Pseudosymmetry. This precession photograph of iron-binding protein shows true 4-fold symmetry. There are also pseudomirrors along the main axes and the main diagonals. Close examination shows these mirrors to be inexact. (Photo courtesy of Andy Arvai, Scripps Research Institute.)

the entire Patterson m a p and look at all possible H a r k e r sections. (For more information on Patterson maps, see the following.)

Unit-Cell Determination A single still roughly the three concentric circles visible and these directions.

p h o t o g r a p h taken along one axis can be used to determine directions of the unit cell. You should have a pattern of around the beam center. Portions of the lattice will be can be used along with Bragg's law to determine two

2.4 PreliminaryCharacterization

61

A d

=

2 sin( tan-l(A/F))2 where A is the spacing between spots, F is the crystal to film distance, and A is the the wavelength of X-rays (1.5418 A for copper targets). It is best to measure a long row and divide the length by the number of spaces to get a more accurate determination. Also, the closer in to the center the row is, the more accurate the approximation of using Bragg's law will be, because the spacing gets stretched the farther out from the center you are. The third direction can be determined from the spacing of the concentric rings using the equation: n~ d= ]

-

cos(tan-l(r/F))'

where r is the radius of the nth circle. The circles need to be close to concentric about the beam stop (i.e., an axis aligned along the direct beam) for this equation to be accurate. The direction determined is correct if you have an orthogonal cell. Otherwise the lengths need to be corrected for the fact that the photographs show d* instead of d directly. To do this simply divide the distance by the sine of the appropiate angle. For example, the correct value of b is b/(sin~). More accurate distances can be determined from the undistorted lattices found on precession photographs. For this you will need photographs of at least two zones. Measure a number of spots in a row and divide by the number of spaces as above and use the same equation derived from Bragg's law. Again, for nonorthogonal cells these distances will need to be corrected.

Evaluation of Crystal Quality The number-one piece of information requested about protein crystals is the limit of observable diffraction. The desire to have this number be as high a resolution as possible (i.e., to have the smallest numerical value) has led to some rather creative definitions where the single highest-resolution spot is used to report this value. It is better to report the resolution where at least one-third of the possible reflections are still visible above background. Even this is pushing it, but resolution inflation is common and pervasive. Another factor to consider is the mosaicity of the crystal. Mosaicity is a measure of the order within a crystal. If a crystal has low mosaicity, the crystal is highly ordered and diffraction spots will be sharp. Highmosaicity crystals will have broader peaks because of lower crystalline order (Fig. 2.25). Since increased mosaicity means that a spot is in diffracting conditions for a larger range of angles, mosaicity may be recognized as broadened lunes on diffraction patterns. Or, if you are using a diffractometer or

62

DATA COLLECTION TECHNIQUES

FIG. 2.25 Mosaicity of crystals. Each block is composed of one to many unit cells. Top: This crystal has high order (right) and thus the diffraction of a single spot is sharper (left). Bottom: This crystal has lower order and broader diffraction spots.

area detector, the profile of a peak may be directly measured. While increased mosaicity in itself may not be a problem, it may indicate other problems. For instance, if when mounting the crystal you let it dry too much, the mosaicity will be increased. If the crystal is suffering from radiation damage (heating, drying, etc.), it is quite likely that the mosaicity will increase. If the crystals have high mosaicity, it may be worth trying to see if a crystal can be mounted to give a diffraction pattern with less mosaicity. Mosaicity may also indicate twinning, where the crystal is actually made up of several crystals joined together. Unless the twinning can be accurately accounted for, it is not possible to use the amplitudes of a twinned crystal to determine the X-ray structure. Finally, increased mosaicity is usually accompanied by lower-resolution diffraction overall and lower signal-to-noise ratio, since the counts are spread over a larger diffraction angle. In looking for twinning it is important to look for extra families of circles that cannot be accounted for by a single crystal in the beam (Fig. 2.26). A twinned crystal is one or more crystalline units joined together. Sometimes the joining is apparent in the morphology, but often the only way to tell is from the diffraction pattern. Still photographs or small-angle photos are best for this purpose. Be cautious in assigning twinning due solely to split spots (Fig. 2.27). If the crystal is slightly misaligned about the center of the camera or the camera is misaligned, this can cause split spots because the

FIG. 2.26 Image of twinned crystal. An image of a crystal with a twinning defect. Two separate, unrelated sets of concentric circles are evident. This crystal cannot be used for data collection.

FIG. 2.27 Split-spot profile. A split-spot profile such as this may indicate a cracked crystal or twinning.

64

DATACOLLECTIONTECHNIQUES

Ewald sphere is not centered on the camera center. It is possible then for a reflection to occur twice in near proximity and to produce split spots. It may be more fruitful to align the camera carefully rather than to throw the crystal away. On the area detector or diffractometer, twinning can be recognized from looking at the profile of several spots in different areas of reciprocal space. For area detectors, 1 0 - 2 0 frames of 0.05~ ~ in oscillation angle are taken on the profile of spots as a function of oscillation angle is plotted. This can be done using the frameview program from XtalView (see following). Diffractometers allow a continuous scan in one of several angles. The presence of split spots indicates a twinned or cracked crystal, which must not be used for data collection. Are the crystals big enough for data collection? This is a question with so many parameters that it is not possible to give a good answer. It is always possible to grow a larger crystal, although it may take considerable experimentation and hard work. The size needed is determined by the quality of the diffraction pattern, not its physical dimensions. Crystals with large unit cells have weaker diffraction patterns than do similar crystals with smaller cells, with the result that a larger crystal is needed. (Actually they diffract the same amount of p h o t o n s - - i n the larger cell these photons are spread out over more reflections.) In the end you have to decide which questions you wish to answer with your experiment. If you want an atomic-resolution structure of a mutant protein to look at small changes from the wild type, then clearly a 4-A diffraction pattern is not enough. It may be worthwhile collecting data on a small crystal for now so that you can start working on structure solution at low resolution while you wait for larger crystals to grow. Avoid collecting data just because you can do it. If you collect a 2-A data set but all the spots beyond 3 A are below the noise level, then it is really a 3-A data set and will not give you any information beyond this.

.....

2.5 . . . . .

HEAVY-ATOM DERIVATIVE SCANNING WITH FILM The traditional method of scanning for heavy-atom derivatives is to use screened precession photos with a precession angle of 5~ ~ or higher. The method is inefficient in that it takes a longer exposure to collect the same number of photons by precession photography than it does to by other methods because most of the diffracted X-rays are blocked by the screen. Shorter exposure times can be used if several-degree-rotation photos or low-angle screenless precession photos are used. In any case the object is to compare the intensities of the heavy-atom film with an equivalent "native" film and to look for intensity changes. The unit cell can be quickly checked by overlaying

2.5 Heavy-AtomDerivativeScanningwith Film

65

equivalent rows on the native film. If the unit cell of the putative derivative changed significantly (> 0 . 5 - 1 . 0 % ) , then the derivative may not be usable. Deciding if there are intensity changes can be difficult for the beginner because it is necessary to differentiate between different exposure times and differences in the rate of falloff for the entire pattern. The best way to convince yourself that the changes are real is to look for reversals where the intensity is greater in one photo and another pair where the intensity differences are reversed (Figs. 2.28 and 2.29). A good heavy-atom derivative has obvious differences. Most photos will not have large differences but may show one or two differences. Remember, "One difference does not a derivative make." The differences should occur at all resolutions. Differences will be found in the lowest-resolution reflections between infinity and 10 A from differences in solvent contrast because of the presence of the heavy atom in the solvent, even if there is no binding to the protein. The differences of an isomorphous derivative will fall off slightly with resolution and will increase with resolution if the derivative is not isomorphous. However, unless the pattern of differences is obvious, it is probably better to decide these questions by collecting some data on the derivative and determining the size of the differences with resolution statistically on a large number of reflections. If rotation photos are used, be careful that you do not compare spots that could be partial in one photograph but not in the other. Examine only reflections roughly perpendicular to the rotation axis and at least one row from the edge of the lunes. Reflections near the rotation axis are probably partial, and very small differences in crystal orientation can cause large intensity changes. Knowing this, it is possible to use rotation photos of about 5 ~ to scan for derivatives for most unit cells. Choose the largest angle you can without getting overlap below 3 A. Align the crystal carefully with still photos 90 ~ apart on the spindle. It is not necessary to align the crystal as precisely as for a precession photo, but be aware that a different pattern of partials can be confused with true intensity changes. In any case the worst that can happen is that you falsely identify a derivative and collect an extra data set. This is far better than missing a derivative altogether. The other common method of finding derivatives is to scan using the

!

2

FIG. 2.28 An intensity reversal between otherwise identical spots on two films. Note that in film 2 the upper spot is larger than the lower, whereas it is the opposite in film 1.

66

DATACOLLECTIONTECHNIQUES

FIG. 2.29 Derivative and native films compared. Left: native iron-binding protein; right: the iron-binding protein soaked in iridium hexachloride. There are clear intensity changes, and many examples of reversals can be found. Also note that the pseudomirror symmetry between top and bottom has been clearly broken in the derivative. area detector. As people are gaining familiarity with area detectors, this is becoming more common. In most laboratories with both detectors and cameras it is easier to get time on the camera, and you can usefully fill time by scanning for derivatives with film while waiting for the detector to become available. In using the area detector, collect enough frames to index and integrate a small a m o u n t of data. This is then merged with the native data, and the resulting statistics (see following) can be used to determine if the crystal is derivatized. If it is not, it can be removed and another crystal tried with a minimal waste of time. This is k n o w n as the "take-it-off" strategy. This method can be used to scan several crystals in a single day. It takes overnight to make a precession photo for the same purpose, and then one is usually comparing fewer total unique spots and the method is not quantitative.

67

2.6 Overall Data Collection Strategy

. . . . . 2.6 . . . . . OVERALL DATA COLLECTION STRATEGY Unique Data The essence of data collection strategy is to collect every unique reflection at least once. First you need to determine the unique volume of data for your space group. This is done by considering the symmetry of your space group and including an additional center of symmetry. Thus the space groups P222, P2122, P21212, P212121, C222, and C2221 all have m m m symmetry in reciprocal space because both a 2-fold and 21 screw degenerate to a mirror plane when a center of symmetry is added. To determine the unique data, you can look up your space group in the International Tables and determine the reciprocal space symmetry (also called the Patterson symmetry because the Patterson function also adds a center of symmetry). In Table 2.3 tl~e volume needed for each space group is listed, For instance, for orthorhombic

TABLE2.3 Unique Data for the Various Point Groups Crystal system Triclinic

Class 1

Data symmetry 1

Unique data

-h,h;

-k,k;

-h,h;

0, k;

0, h; Monoclinic (2-fold parallel to b) Orthorhombic Tetragonal (4-fold parallel to c)

2/m

Rhombohedral

Hexagonal (6-fold parallel to c) Cubic

-k,k;

l,-I

0, k; 0,1or

0, h;

0, k;

-l,l

mmm

0, h;

0, k;

0,1

4/m

0, h; 0, k; 0,1or 0, 1 and any 90 ~ about c

4/mmm

0, h; k - > h , k ; h - > k , h ; 0, k;

3

3

0, h; - k , 0; 0, lor 0, 1 and any 120 ~ about c

3 32 32

3

0, h; 0, h;

0, k;

O, h;

k >- - h/2, k <- h;

222 4 422

Trigonal (3-fold parallel to c)

-h,h;

0, lor -l,/or

6 23 432

3ml 31m

0,1or 0,1

l >- - h , 1 <-h

0,1

6/m

0, h; 0, k; 0,1or 0, 1 and any 60 ~ about c

m3 m3m

0, h; 0, h;

0, k - < h ; 0, k<-h;

0,1<-h 0,1<-k

O, l

68

DATACOLLECTIONTECHNIQUES

space groups we need to collect 0 - h m a x , 0 - - k m a x , and 0 -/max. If you are using a diffractometer setting, this is simple: just enter the bounds and the software takes care of the rest. Film and image plates cover a large enough area that you can set up to rotate around any convenient axis by 90 ~. However, the geometry of diffraction is such that the data occur on the surface of the Ewald sphere. If data collection is started so that the h axis is aligned and we are rotating about the k axis, when we have gone 90 ~ to the I axis we will be missing a piece of data because of the curvature of the Ewald sphere (Fig. 2.30). To collect all the data we need to rotate another 28 ~ where 28 is the highest-resolution data to be collected. Additionally, we will have missed a blind zone of data near the rotation axis that can be filled in by rotating around another axis; in this case if h is chosen then the crystal should be aligned so that k is in the plane of the film and I is along the direct beam. Rotation cameras provide no means of rotating the crystal to this new position, but a 90 ~ offset on the crystal permits replacement on the goniometer head with the capillary vertical instead of parallel to the rotation axis. This is hardly very convenient because if the camera spindle is rotated too far, the capillary will hit the collimator or the beam stop and break. In polar space groups, such as P6, any

FIG. 2.30 Diagram of reciprocal space volume swept out by 90 ~ of data collection. In this diagram the rotation axis is vertical to the page and revolves around the center, sweeping out the shaded area in 90 ~ The crystal is orthorhombic with m m m symmetry as indicated by the letters A and B. The asymmetric unit, the minimal volume that needs to be collected, is any one of the four quadrants. Note that the point A is not included in the sweep. To collect this region the sweep must be greater than 90 ~.

2.6 OverallData CollectionStrategy

69

60 ~ around the unique I axis will collect all the data except the blind zone. If you are using sophisticated software, the blind zone can be collected by setting up the rotation axis about 25 ~ off the crystallographic axis. Between the four quarters of the film, all the blind zone will be collected. However, some data collection software requires that the crystal be almost perfectly aligned. The area detector presents a greater challenge because the area of the detector is smaller than film or image plates. The detector can be swung out to collect higher-resolution data, but its height is not enough to cover all the data needed. It is necessary to rotate the crystal about another axis besides the rotation axis. On a four-circle machine this is X or q5 and on a three-circle detector this is 4~. The crystal is "ratcheted" around to collect the data completely. This method is discussed more completely by Xuong and coworkers. 12 In addition, a program, RSPACE 13 is available in which the orientation of the crystal can be entered and the data that can be collected at different angles of the camera can be viewed. A similiar program, ASTRO, is available for Siemens area detectors. Another option is to use XRSPACE (Fig. 2.31), which can be used to show completeness of data but does not display the goniometer parameters, or to use the strategy option in M O S F L M . It is not necessary to align the crystal accurately before data collection, although it is more efficient to start data collection with one axis roughly aligned with the direct beam at the start of data collection. In fact, it is better not to be perfectly aligned, as this will cause symmetry-related reflections to fall in equivalent positions so that both will be missing if they fall in a blind region. Also, redundancy increases the accuracy of the data. The sigma of n identical measurements of a reflection is 14 o" O-

n

-~

~

~

F/

Bijvoet Data When an anomolous scatterer is present in the crystal, it may be worthwhile collecting Bijvoet pairs. It is then necessary to k n o w which reflections are Bijvoets in your space group. The Friedel mate of a reflection, - h , - k , - l is always a Bijvoet. Other reflections may or may not be. A reflection is a Bijvoet pair of another reflection if it takes an odd number of sign changes to transform its indices and if it is related by a symmetry element. Thus, the 1 2 X u o n g , N. H., Nielson, C., Hamlin, R., and Andersen, D. (1985). J. Appl. Crystallogr. 18, 342-350. 13RSPACE was written by Mark Harris, Computer Graphics Laboratory, University of North Carolina, Chapel Hill. 1 4 T h e sigma of an averaged reflection is calculated as follows: O'avg 1/(1/o-1 + 1/o-2 + 1/o-3 + 9 9"), SOthat if the sigmas are equal this reduces to 1/(n/o-). =

70

DATACOLLECTIONTECHNIQUES

FIG. 2.31 Frames of data with prediction boxes superimposed. Two 0.25~ frames from a Siemens area detector are shown with the predicted positions of spots superimposed as boxes. These can be used as a check. reflection h, k, I and - h , k, I in an orthorhombic space group form a Bijvoet pair. The reflections h, k, I and - h , - k , I are not a Bijvoet pair. In a monoclinic space group (b unique) h, k, I and - h , k, I are not related by any space group symmetry, so they are not Bijvoets in this space group. A general rule of thumb is that reflections related by a mirror or inversion center are Bijvoets. Reflections related by a 2-, 3-, 4-, or 6-fold, they are not Bijvoets. To collect complete Bijvoet pairs, approximately twice as many data need to be collected. Centric reflections never have a Bijvoet mate, so they need to be collected only once.

Indexing of Data Indexing of data is normally done in two stages. In the first stage a primitive lattice is used to give each reflection an index that is used to identify each individual point in reciprocal space. In the second stage reflections that are related by the point group symmetry of the space group are reduced and collected together. M o d e r n programs can autoindex the data (Fig. 2.32). These fall into

FIG. 2.32 Using XRSPACE to check for data completeness.

72

DATACOLLECTIONTECHNIQUES

two categories. In the first type, the unit cell is entered into the program and all possible orientations of this cell are searched until a match is found. In the second type, vectors are looked at in reciprocal space, three noncoplanar groups are searched for, and the shortest reciprocal space distance (the largest in real space) is taken to be the cell edges. This must be checked against the known cell to find the correct lattice. In either case the cell should be known alreadymusually from precession photographs. The correct cell is the reduced cell that is consistent with the highest symmetry and the longest axes (in real space). In most space groups there are ambiguities when it comes to deciding upon a unique volume. In some cases, such as orthorhombic, all possible choices are equivalent and lead to the same answer. In others it makes a difference which volume is chosen. For instance, in hexagonal space groups there is no way to tell the a axis from the b axis on the basis of the lattice itself. The decision must be made at a later stage when some intensities are known. The first time you choose it makes no difference, but a second run of data must be indexed consistent with the first. One sign of inconsistent indexing is if the R-merge of two data sets is very high, around 3 0 - 5 0 % , while the individual R-symms are in the normal range, say below 12%. Since there is no way to decide on the correct orientation before the data have been integrated, the correction must be made at the reduction step. X E N G E N 15provides a handy means of doing this with the - i option of reduce. A 3 x 4 matrix is entered that is used to alter the indices of the primitive lattice before the data are reduced to the unique indices that will be used to group the data for scaling and merging purposes. For example, to reindex data in point group 32 on a hexagonal lattice (e.g., P3~21) such that a and b are incorrectly chosen, use the matrix

(1 0i)(0) 0 0

-1 0

+

0 . 0

This will change h to - k and k to - k and leave l as is. The fourth column is added to the indices and in this case is all 0s. When the data are reduced, the effect of this will be to switch h and k (the same as reassigning a and b) while preserving the handedness of the Friedel pairs. This matrix will also switch a and b but makes the data left-handed and thus switches the Friedel pairs so that they are incorrect: ~SXENGEN was written by A. Howard and is available from Bruker Instruments. See Howard, et al. (1987). J. Appl. Crystallogr. 20, 383-387.

2.7 Overviewof Older FilmTechniques

('0 !)(0) 0 0

1 0

+

-

73

0 . 0

As a final example, here is a matrix for switching a hexagonal space group in point group 6 (e.g., P63). These space groups are polar, and the essential problem is that it is not possible to tell whether the c axis is pointing one way or the other. To switch I it also necessary to reassign h and k thus:

(!' 0)(i) 0 0

0 -1

+

.

To find the correct matrix the easiest thing to do is to compare equivalent sections of data from both data sets. It is best not to use the 0 level, as it usually has extra symmetry. With XtalView (discussed in Chapter 4) you can use XRSPACE to view equivalent sections by starting two copies of XRSPACE on both data sets. The problem is usually obvious upon comparison of the patterns of intensities. When making these matrices, be careful to preserve the correct handedness. If you do one operation that changes the hand, then you must do another that changes the hand back. There must be an even number of hand changes. You may not care about the Friedel pairs for native data without an anomalous scatterer, but remember that you will want to keep the Friedel pairs correct for heavy-atom derivative data later.

.....

2.7.....

OVERVIEWOF OLDERFILMTECHNIQUES X-ray film is still one of the best area detectors around. It is compact and easily stored, has very high pixel resolution, can be bent to form a curved surface, and is inexpensive. Its chief disadvantages are its high background, low efficiency, and low dynamic range. Films must be scanned by an optical scanner to be used as data, and in this step many of the advantages are lost. To increase its dynamic range, three films are usually used, which means that three files must be used to scan the data and three films must be handled. To take advantage of the high resolution, a fine pixel must be used on the scanner, increasing the scan time and producing three very large files. If the cost of the manual labor involved in a large film data set is factored in, X-ray detection can be a very expensive method of data collection. However, for preliminary X-ray work it is hard to beat film for determining space groups and assessing crystal quality.

74

DATACOLLECTIONTECHNIQUES

The rotation method is usually used with film data collection. Three films are placed in a pack, one behind the other. Each film partially absorbs the X-rays so that the exposure of each film in the pack is lowered by about a factor of three. The cassettes have small holes that are used to make fiducial marks on the films. These marks are needed to align the film precisely because the film can move around inside the cassette a small amount. The films are exposed while the crystal is rotated through a degree or so (see Rotation Photography in Sect. 2.4). When the films are developed, it is important to use consistent times for each film. The film must not touch the sides of the developing tank or each other, and when they are wet care must be taken not to scratch the film. Typical development times are 5 min in the developer, 30 s in the stop bath, and 5 min in the fixer. When the developer becomes brown, it should be changed and the fixer changed at the same time. Wash the films for at least 5 min before exposing them to air to look at them; otherwise they will turn brown later. The total wash time should be a minimum of 30 min in clean, flowing water. Dry the films thoroughly before handling them. When the films are scanned, be sure to do it in a consistent manner so that the scanner axes are known relative to the camera axes for each film. If you forget to mark the fiducials on a film, often you can carefully align the film with a rotation before or after and the fiducials marked with a pen.

.....

2.8 . . . . .

FOUR-CIRCLE DIFFRACTOMETER DATA COLLECTION The term diffractometer is used in this book to mean a goniostat equipped with a single-counter detector (Fig. 2.33). This used to be a complicated topic, but with the advent of modern software, collecting data by diffractometer has become easier and faster. Many diffractometers are capable of autoindexing the crystal. Check reflections are used to check the alignment as data collection proceeds, and more efficient peak scanning methods combined with faster motor slew rates have resulted in faster data collection. In fact, for collecting a low-resolution data set for the purpose of heavy-atom scanning, the modern diffractometer can actually beat an area detector in speed and accuracy. Diffractometer data properly scaled and corrected can be more accurate than data acquired by any other collection method. Diffractometers can be adjusted to cut down on the background reaching the detector through the use of slits and a long helium path. (Background falls off with square of the distance, while diffraction spots do not fall off at all if the path has no air.) The counter used in diffractometers has very high efficiency and virtually no background, so that fewer counts are needed to achieve the same signal-to-noise ratio. Finally, an accurate absorp-

2.8 Four-CircleDiffractometerData Collection

75

/ X-ray source

X-rays r Crystal Beam stop

Detector

FIG. 2.33 A 4-circle diffractometer, consisting of a four-circle goniostat and a scintillation detector attached to an X-ray source. The four circles are co, X, &, and 20. By moving the angles, it is possible to bring every reflection into diffracting conditions such that the reflection can be counted with the detector. To integrate the intensity, the crystal is rotated on the o) axis and the profile of the counts used to determine the intensity. With an area detector the scintillation counter is replaced and a means of varying the distance from the crystal to detector is provided. The co axis is then used to rotate the crystal to collect the data while holding 4~and g fixed. A three-circle goniostat has a fixed X angle (usually 45 ~ and a two-circle would have X = 0 and 4,=0. tion c o r r e c t i o n can be m a d e on the area d e t e c t o r experimentally, while on area detectors a n d film this is d o n e only a p p r o x i m a t e l y by scaling equivalent reflections. It is o b v i o u s that the d i f f r a c t o m e t e r is a s u p e r i o r m a c h i n e m s o w h y isn't everyone using one? T h e r e are a n u m b e r of d r a w b a c k s to d i f f r a c t o m e t e r s that limit their usefulness in all cases. D e p e n d i n g u p o n the length of the d e t e c t o r a r m o n the d i f f r a c t o m e t e r a n d the optics of the source, the individual reflections m a y n o t be resolved if the unit cell is t o o large ( > 1 0 0 - 1 5 0 A). R a d i a t i o n d a m a g e m u s t be relatively slow for a c o m p l e t e data set to be collected, or else m a n y crystals m u s t be used, with c o n c o m i t a n t scaling p r o b l e m s . Finally, given the scarcity of X - r a y p o r t s available in m o s t labs, the d i f f r a c t o m e t e r just takes t o o long to allow it to c o m p e t e successfully. Far m o r e data can be collected on an X - r a y g e n e r a t o r e q u i p p e d w i t h an area d e t e c t o r t h a n on a diffractometer. This fact m a k e s the area d e t e c t o r the m e t h o d of choice for m a n y

76

DATACOLLECTIONTECHNIQUES

groups where the speed with which data can be collected is the single limiting factor and space and money for more generators are limited. (The current detector fad has left many perfectly useful diffractometers languishing. You might be able to get your hands on one for heavy-atom screening, or you may have a crystal that can use the diffractometer to collect a highly accurate data set.) An excellent discussion of the geometry of diffractometers and the factors affecting data collection parameters can be found in a discussion by Wyckoff. 16 For details of data collection, refer to the manual that comes with your diffractometer. In general, the data collection strategy is to collect the data in shells of resolution starting with the highest resolution where radiation damage first shows up and proceeding to shells of progressively lower resolution until all the data have been collected. Standards are measured periodically about every 100 reflections to monitor alignment and decay. Some of the standards should be in the resolution range you are measuring. A d~ scan of a bright reflection on an axis is used to calculate a correction for absorption as a function of 4>. If Bijvoet pairs are desired, they are collected in blocks of 25 reflections to minimize errors between pairs. First, a block of 25 reflections is collected and then the same 25 reflections are remeasured at -h, - k , - I . Some reflections that are symmetry mates can be collected to measure R-symm (see Chapter 3.1, Sect. 3.3). If Bijvoet pairs are collected, then these can be the Bijvoet pairs of centric reflections ~7which cannot have an anomalous scattering component. R-symms, or the agreement between equivalent reflections, is the best indicator of overall data quality and a check on the accuracy of the data set.

.....

2.9 . . . . .

AREA D E T E C T O R DATA COLLECTION Area detectors are fast becoming the most common data collection technique. Since an area detector collects a larger volume of reciprocal space than a diffractometer, it is more efficient. The critical value for determining the speed of data collection is the angle subtended by the detector (Fig. 2.34). This angle will depend upon the width of the active surface of the detector and the distance the detector is set back from the crystal. The width of the

16Wyckoff, H. C. (1985). In Methods in Enzymology, Vol. 114, pp. 330-385. Academic Press, San Diego. 17Technically speaking, the difference in the distribution of intensities for centric and acentric reflections means that the centric reflections are not a perfect indicator of the symmetry errors in acentric reflections. Centric reflections tend to be either bright or weak, whereas acentrics have a more even distribution of intensities.

2.9 AreaDetectorData Collection

77

FIG. 2.34 Calculatingresolutions for area detectors. The highest resolution (20) available for a given swing of an area detector is the swingplus half the angular width in degrees. The minimum resolution is the swing minus the half-width. The angular width of the detector is given by tan- 1 (width of detector/crystal-to-filmdistance). detector is fixed. The distance is dependent on the largest dimension of the unit cell and the size of a pixel on the area detector. Two adjacent diffraction spots must be separated by at least one pixel. For a Bruker area detector, Andy Howard's rule of thumb is d(cm)->

longest cell edge (~k) 8.0

The factor of 8 can be changed to 12 if mirrors are used. In practice, spots have some width, so that the center-to-center distance that two reflections can have and still be resolved is also dependent upon the optics and the particular crystal. In practice, it is better to err on the safe side. Trying to squeeze too much data onto the detector and overlapping adjacent spots will lower data quality. The easiest and best way to determine the d is to first do a rough backof-the-envelope calculation and then put the crystal on the machine. Find an orientation with the closest spacing on the detector, collect a short data set with the longest axis on the face of the detector, and examine some of the frames (Figs. 2.35 and 2.36). The closest spots should be clearly separated. On the Hamlin detector this can be a single pixel, while on the Bruker (with XENGEN) detector there should be a separation of at least 3 pixels. If the spacing is too close, the detector must be moved back. Do not collect data with overlapping spots! An area detector can be equipped with three types of goniostat, twocircle, three-circle, or four-circle. The two-circle is most limited, consisting of a rotation axis and a swing movement for the detector. Some improvement can be made by mounting crystals so that the rotation axis is a diagonal of the unit cell. With such a setup it is very hard to collect a single data set at high resolution from a single mount.

78

DATACOLLECTIONTECHNIQUES

FIG. 2.35 Frame from San Diego Multiwire Systems (Hamlin) area detector. Frame from a multiwavelength data collection experiment at beam line I-5 at the Stanford Synchrotron Radiation Laboratories. Spots are well separated. Data collection geometry has been set up to allow

Bijvoet pairs to be collected simultaneously on the left and right halves of the detector (rotation axis is horizontal at this beam line) by taking advantage of the mirror symmetry of the samples space group (P2~2~2~). (Frame courtesy of Brian Crane, Scripps Research Institute.)

A three-circle goniometer has a rotation axis and a ~ rotation mounted with X fixed at 45 ~ A swing angle is also provided for the detector. Data are usually collected by rotating around the rotation axis for as far as possible. The crystal can be rotated around the 4~ axis to collect new data. Usually rotating 4~ 90 ~ will give the most new unique data. A four-circle goniostat will allow the most control over data collection. It has all the movements of the three-circle, and X can be adjusted a full 360 ~ Because of the sizes of the detector and the X circle, however, these collide after co has moved over a limited r a n g e m u s u a l l y about 60 ~ To overcome this

2.9 AreaDetector Data Collection

79

FIG. 2.36 Area detector frame of data collected on a Bruker area detector. The frame is 0.25 ~ rotation of Fe-binding protein. The detector was located at 17 cm. Note that the spots (see lower right) are just being separated.

disadvantage, the crystal is ratcheted by advancing ~b 60 ~ and a n o t h e r a) sweep collected. A useful recipe for data collection using a four-circle goniostat with a Bruker area detector and an o r t h o r h o m b i c crystal is given in Table 2.4. It collects the unique data in a m i n i m u m a m o u n t of time at 2-A resolution at a crystal-to-detector distance of 12 cm. The crystal is m o u n t e d so that one axis is a p p r o x i m a t e l y along the capillary (i.e., at X 0~ o) and this axis will be coincident). This needs to be only accurate to a b o u t +_5~ Optically center the crystal on the goniostat. M o v e X to 0, a) to 50, and set the swing to 22.5 ~

80

DATACOLLECTIONTECHNIQUES TABLE2.4 Example of Bruker Area Detector Data Collection for Orthorhombic Crystals

Swing

X

4)

Oscillation (a))

Number of frames

-10

22.5

15

0

0.25

240

50

-10

22.5

15

60

0.25

240

50

20

22.5

75

0

0.25

120

Run

Start

1

50

2 3

End

N o w rotate 4) and take still frames until a 0 layer (the 0 layer is the one that passes t h r o u g h the b e a m stop) is centered on the detector (Fig. 2.37) so that the outer edge of the circle is at the edge of the detector. Define this 4) angle as 0 ~ As data are collected, the 0 layer circle will move from the center of the detector t o w a r d the b e a m stop. The X is then offset to m a x i m i z e the a m o u n t of unique data. O t h e r w i s e the data on the top and b o t t o m of the detector will be related by m i r r o r symmetry. Since not all the data can be collected in one run (we need 90 ~ + 22.5~ a n o t h e r run is done with 4) rotated. To fill in the data that were missed by the limit of the detector height, a fill-in run of half the length at X 90 - 15 ~ or 75 ~ is used. If Bijvoet pairs are desired, then

+m

~.

..

+t0

....iiiiiiiiiiiiiiiiiiiiiiiiiiiiiii i. . . . !!"9 i!

:"':'" .:!

. .......... "":.. .."......... :. "i.

--~,...... ,i .... ~:~...... -! ....

............................. 'i:; I uea op "i shadow

i---~~~

i ii;;! !!i!;ii i.i.i

.:'::" .i:"

,,:

................................ .... .............

.,,:

::"i ~r

..,...................... ~..~'~.. .... ~~

i!i!i!:!!i!i;i.ii -..;-.-.,; 9

A

D,

o....o

B

FIG. 2.37 Starting position for ()rthorhombic data collection. (A) The crystal has been aligned so that one axis is vertical, coincident with the rotation axis, and the crystal is rotated until the 0 layer stretches from the beam stop to the edge of the detector. The 0 layer always intersects the beam stop. (B)X has been rotated 15~to maximize the unique data and to minimize the effect of the blind region around the rotation axis.

2.9 AreaDetectorData Collection

81

TABLE2.5 Example of Bruker Area Detector Data Collection for Orthorhombic Crystals

Run

Start

1

50

2 3 4 5 6

End

Oscillation (co)

Number of frames

0

0.25

240

180

0.25

240

60

0.25

240

-15

240

0.25

240

75

0

0.25

120

-75

180

0.25

120

Swing

X

-10

22.5

15

50

-10

22.5

50

-10

22.5

50

-10

22.5

50

20

22.5

50

20

22.5

-15 15

4'

interleave runs at X = - X , ~b + 180 ~ leaving the other angles the same, for a total of six runs (Table 2.5).

Increasing Signal-to-Noise Ratio Other than modifications to the optics and the beam stop, there are several easy ways to lower the background for marginal crystals. The first is to decrease the width of each frame so that each reflection takes about three frames to diffract completely. The background is a continuous value as the oscillation angle changes, whereas the spot is not. Taken to the extreme, it easy to see how this helps. If each frame is 1.0 ~ wide and the spot diffracts for 0.25 ~ of this oscillation, then in the pixels containing the reflection, background will have accumulated for four times longer than the reflection counts. This will greatly decrease the signal-to-noise ratio. In tests on our Bruker area detector we have found that collecting 0.1 ~ frames instead of 0.25 ~ frames increased the I/or(I) ratio for weak high-resolution reflections beyond 2.0 A by two times. A second method of reducing background is to pull the detector back. The background falls off as a square of the distance from the crystal (ignoring air absorption for now). To a first approximation, the diffracted rays are parallel and do not decrease in intensity with distance. So doubling the distance from the crystal will decrease the background four times. Of course, it will also decrease the amount of data that can be collected in a single frame. If distances greater than about 15 cm are used, a helium path is necessary or the gains in background will be lost to air absorption. If you have many small crystals, or if your crystals are not radiation sensitive, an increase in signal can be had by pulling the detector back and collecting more crystal positions to make up for the lower reflections per frame.

82

DATACOLLECTIONTECHNIQUES

In tests at Scripps using a Bruker area detector, we found that I/~r increases about 1% per centimeter for helium versus air for distances greater than 10 cm. So at a d of 20 cm, an increase of 10% in I A r is expected. Hamlin-style detectors require greater distances, so helium paths are a must.

.....

2.10 . . . . .

IMAGE PLATE DATA COLLECTION Image plates are relatively new but have the potential of becoming the data collection method of choice. They have a high spatial resolution of 1 0 0 - 1 5 0 / ~ m , similiar to film, and subtend a large angle so that more data are collected at once. ~8 Image plates can be used either as an alternative to film or as a replacement for the detector in an area detector. In the film mode they can be used in the same cassettes that X-ray film is used in and scanned off-line up to several hours later. In the area detector mode they are automatically scanned after each exposure by apparatus built directly into the machine. The dynamic range of an image plate is much higher than that of filmmit can reach 12 bits for image plates, whereas film is limited to 8 bits in practice. 19 Image plates are exposed with X-rays, as with any other detector, and the X-ray photon causes a chemical change in the plate coating that releases a fluorescence that is detected by a photomultiplier when scanned with light of the proper wavelength. Image plates are read out by a laser beam on a scanner. The quality of this scanner largely determines the limits of the image plate. The construction of a high-quality scanner is a technically difficult feat because of the mechanical precision needed and the high quality of the electronics needed to take full advantage of the image plate's capability. The photomultiplier must have low noise and use a high-quality analog-to-digital converter, and the laser used for scanning must be stable and must hit precisely when scanned. Image plates have a wider range of sensitivity with respect to X-ray wavelengths, which gives them higher counting efficiency at higher energies. This makes them the detector of choice for white-radiation Laue experiments that use very bright synchrotron light sources. Because only a few exposures are needed for Laue data sets, manual handling of the plates is not a great disadvantage. For collecting data sets with monochromatic radiation where hundreds of exposures are needed, an automated method of scanning the plates, such as the MAR Research scanner (Fig. 2.38), is a necessity. Miyahara, J., et al. (1986). Nucl. Instrum. Methods A246, 572-578. 19This is based upon practical experience and is not a theoretical limit in either case. 18

2.10 ImagePlate Data Collection

83

FIG. 2.38 Frame from a MAR research image plate. The crystal was rotated 1~about the horizontal axis for a 5-min exposure using a rotating anode source. The edge of the image is about 2.1-A resolution.

Image plates are erased by exposing to white light. This means they can be handled in the room light before they are exposed to X-rays. After exposure they must be protected from light. Cosmic background radiation will slowly expose the plate, so they need to be freshly erased before being used. Exposure to very bright X-rays such as the direct beam will cause a spot that will take a long time to erase and can even show up for m a n y e x p o s u r e erasure cycles (months). With the use of an image plate as a film replacement on a monochromatic X-ray setup, the data are collected using the rotation m e t h o d as previously described. Software used for the analysis of film can be adapted easily

84

DATACOLLECTIONTECHNIQUES

by removing the corrections for film sensitivity, because image plates are linear. Since image plates are becoming common for use in other applications to replace film, such as radiography of gels, an image plate scanner may be available at your institution. These scanners are perfectly adequate for replacing film in preliminary characterizations of crystals.

. . . . . 2.11 . . . . . SYNCHROTRON RADIATION LIGHT SOURCES The term synchrotron light is misleading because the sources of synchrotron light are usually electron storage rings. Synchrotrons are very different machines that are never used directly as X-ray sources. The first observation of synchrotron radiation was made using synchrotrons, and therefore the name. It is inaccurate to say "We are collecting data using a synchrotron," but the term has become so common that "synchrotron" is now synonymous with synchrotron radiation light source in protein crystallographic jargon. An excellent book on synchrotron sources and crystallography is Helliwell's Macromolecular Crystallography with Synchrotron Radiation. There is room here only to touch on the subject and to point out areas of special interest.

Differences from Standard Sources Synchrotron radiation as available at a storage ring has a continuous spectrum in the area of interest to protein crystallographers and is very bright (Fig. 2.39). Even after tight monochromatization where only a small fraction of the total energy is used, the sources are still up to two orders of magnitude brighter than the best rotating anodes. The tight monochromatization can mean lowered backgrounds and decreased radiation damage for the same exposure. Furthermore, the optics at storage rings is usually far superior to anything used in laboratory sources providing tightly collimated highintensity beams. One reason for the better optics is that the source is located meters away instead of within less than a meter, giving effectively a parallel source. The combination of brighter, tighter optics makes synchrotron sources the best for very large unit cells such as are found in viruses with cells from 300 to 1000 A. In our experience with many different crystals we have always found an increase in signal-to-noise ratio at storage rings. The ability to tune the wavelength allows the use of more optimal energies. Wavelengths near 1.0 A ~ show very little absorption by the capillary and the solvent around the crys-

2.11 SynchrotronRadiationLightSources

Synchrotron

radiation___ ----.,,-I e ......

85

J

I

e+

C

!

magnet Wiggler

FIG. 2.39 Synchrotron radiation source. A storage ring has high-energy electrons held in orbit by bending magnets (A). As the electrons accelerate around the curve they emit synchrotron radiation (B). Because the beam is so intense, all experiments are done in shielded hutches that are interlocked so that personnel cannot be inside while the shutters are open. A wiggler (C) is a method of increasing the brilliance of the X-rays by combining several beams from local excursions of the electron path.

tal, allowing wetter mounts while obtaining better signal-to-noise ratios. Corrections due to absorption are minimized. We have not found that harder radiation decreases lifetimes; in fact lifetimes are longer, since the absorption that causes free radical damage is more efficient at lower energies.

Special SynchrotronTechniques The simultaneous availability of all wavelengths led to the development of white-radiation Laue photography. Exposure times for Laue photographs can be very shortml0-ms exposures for a typical lysozyme crystal at the best s o u r c e s u a n d yet contain almost all the diffraction information in one or two photos. Furthermore, since most of the factors that need to be corrected for in reducing the data are a function of wavelength, especially absorption,

86

DATACOLLECTIONTECHNIQUES

a 9

0

FIG. 2.40 How overlaps arise with white radiation. In white-radiation Laue photography a range of wavelengths is used simultaneously, and thus there are many Ewald spheres in diffracting conditions simultaneously. Two are shown here that differ in wavelength by a factor of 2. The resulting diffraction exits the crystal in the same direction and is recorded on the detector in the same spot, leading to an energy overlap.

the presence of the same reflection measured at different wavelengths in the same data set (Fig. 2.40) allows these parameters to be accurately accounted for by least-squares scaling. Moffat and co-workers have collected lysozyme data sets that compare favorably with data collected a diffractometer. 2~ The brightness of the source and high-quality optics make the storage ring an ideal place to collect data on very large unit cells like those found in viruses.

Time-Resolved Data Collection The short exposure times needed at the storage rings has allowed the collection of time-resolved protein crystallographic data. Reactions are initiated by laser flashing or in flow cells (for very slow reactions) and then data are collected by white-radiation Laue photography at appropriate time points. When an undulator is used to intensify the beam and white-radiation Laue photography on storage phosphors, the exposure time can be as short as microseconds. One of the chief difficulties can be the relatively high concentration of a protein crystal. This makes it difficult to deliver enough substrate, and the optical density can be very high. If light is to be used to start a photoreaction, 2~ B., and Moffat, K. (1987). In "Computational Aspects of Protein Crystal Analysis. Proceeedings of the Daresbury Study Weekend, DL/SCI/R25" (Helliwell, J. R., Machin, P. A., and Papiz, M. Z., eds.), pp. 84-89. See also, Helliwell, J. R., et al. (1989). J. Appl. Crystallogr. 22, 483-487.

2.12 DataReduction

87

high absorption necessitates intense light sources and causes gradients across the crystal. It is better to illuminate off the absorbance peak at a position where the crystal is still transparent to the light so that light can get to the entire crystal volume. These experiments are, therefore, technically demanding and must be done carefully to ensure that most of the crystal is synchronized; otherwise the time resolution will be lost. Reactions can also be started by diffusing in substrates using a flow-cell apparatus. In this case, the reaction must be very s l o w - - o n the order of h o u r s - - o r else the diffusion time will be greater than the reaction time and the reaction will not be synchronized across the crystal.

. . . . . 2.12 . . . . . DATA REDUCTION Integration of Intensity Integration of the intensity in a spot is a matter of separating the background counts from the reflection. Two methods are in general use: 1. Mask and count. In this method the region that is to be considered the spot is masked and the pixels within this region are summed (Fig. 2.41). The background is determined from the pixels adjacent to the spot and this value is subtracted to give the final intensity. The method works well when spots are well above background. The pixels can be counts in the case of counters, as in diffractometers and area detectors, or optical densities in the case of film. nP

nB

Ihkl = ~ countsP - ~ countsB 1

1

2. Profile fitting. In profile fitting a curve is fit to the data and the area under the curve is taken to be the intensity (Fig. 2.42). The curve, or profile, can either be a geometric shape such as a Gaussian or it can be derived by averaging over the brighter spots. The advantage of the latter method is that bright reflections can be used to determine the profile, which is then applied to weak reflections. Different profiles are usually used depending upon the position of the spot on the detector. For example, the detector might be separated into a 4 x 4 array and a different profile used in each of the 16 areas. Then, to find the area of the spot, this curve is best-fit to the counts found in the area where the spot is predicted to be, and the area under the curve is then used to find the integrated intensity rather than the counts themselves.

88

DATA COLLECTIONTECHNIQUES

1D

i

FIG. 2.41 Integration by masking in one and two dimensions.

Error Estimation An a c c u r a t e e s t i m a t i o n of the e r r o r is i m p o r t a n t . T h e e r r o r of a single reflection is t e r m e d its or. C o n t r i b u t i o n s to or: C o u n t i n g statistics-or .......t ~ - V ' N p e a k + N b a c k g r o u n d (note t h e i n clusion of b a c k g r o u n d counts) 9 Instability of d e t e c t o r ; usually a c o n s t a n t 9

t I

I

FIG. 2.42 Profile fitting. Profile fitting can more accurately find the intensity of a peak--especially in the example on the right, where the background is sloped.

2.12 DataReduction

89

9 Profile fitting: deviation from observed and ideal shape 9 Local variation in background Other sources of errors are saturated pixels (photographic film is especially vulnerable to this), overlapped profiles, and errors in background models. Merged multiple measurements of several reflections should be weighted by o'. Different data reduction packages will determine different values of o-, and the data are probably better averaged without ~r weighting. In my experience the o- of some packages can differ by at least a factor of 2. Reflections are often rejected by the ratio of intensity to ~r, I/o(I).

Polarization Correction The polarization correction arises from the dependence of scattering efficiency as a function of scattering angle. For polarized sources, the scattering efficiency is also a function of the change of polarization direction with the angle of the scattering plane. Sources can be polarized by a monochromator, so this correction is dependent upon the optics of the source used. For unpolarized radiation, p = 1/211 + cos2(2{3)].

Lorentz Correction The Lorentz correction accounts for the rate with which a reflection passes through the Ewald sphere. Reflections near the rotation axis remain in diffracting conditions for a longer time. At some point this correction becomes so large that the reflections very close to the rotation axis are rejected.

Decay or Radiation Damage Prolonged irradiation of a sample induces radiation damage. Decay usually affects higher-resolution reflections faster. If there is a choice, the higher-resolution data should be collected first. Decay should be monitored by collecting a set of standard reflections. If the decay exceeds about 20%, data collection should be halted. Although a decay correction can partially account for decay, different reflections can decay at different rates so that a single decay parameter cannot restore the accuracy of the data set. Radiation damage can be reduced by lowering the temperature (see Chapter 6). This slows down the free radical chain reactions that are thought to induce radiation damage. Decay is a function of time and dose. However, it is not linear with dose, and brighter sources can collect more counts before the same

90

DATACOLLECTIONTECHNIQUES

amount of decay sets in. This is a great advantage of synchrotron sources. Also, once a sample is irradiated the free radical chain reactions are initiated and will continue even after the beam has been off for some time. Irradiation affects samples at different rates, and some samples are very sensitive. The presence of a metal that absorbs X-rays more efficiently, such as iron, platinum, or mercury, can speed up decay.

Absorption Absorption is probably the largest source of uncorrectable error in data sets. The path length of the diffracted X-rays through glass, crystal, solvent, and air determines the amount of absorption. This path length is different for each reflection. Unfortunately, there is no entirely accurate way to model this absorption. Two approaches are generally used. In the first, experimental measurements are made of the absorption in different directions through the sample, and each reflection is corrected by these factors. In the second method, a least-squares fit is made to the differences between symmetryrelated reflections as a function of some parameter believed to be a function of absorption. The experimental correction is easily calculated in the case of a diffractometer. In the case of two-dimensional detectors, the second method is normally used. The overall error in a data set can be estimated by comparing symmetry-related reflections, which in the ideal case would be indentical. The reflections are calculated as

~,,(I,, - L) R ~y111111 ----

...... 3 ...... COMPUTATIONAL TECHNIQUES

There are several different crystallographic software packages available and it would be impossible to cover them all. The XtalView package is used for specific examples in this book. XtalView is a window-based visually oriented package that is especially easy for novices to learn. Options and commands are shown as buttons, sliders, and menus. All options are visible, making it easy to spot them and ideal for publishing in book form. You may not want to use XtalViewmperhaps you already have a favorite package. In any case, most programs have similar options and features. For consistency, a single package is necessary for this book so that we can get right to explaining the methods and spend less time explaining the particular implementation. XtalView was written at the Research Institute of Scripps Clinic by the author. It runs under X-windows, which is available on most workstations (Figs. 3.1 and 3.2). At present it has been ported to Sun workstations, (including the SparcStation series), Silicon Graphics, and DECstations running ULTRIX. DENZO, MOSFLM, and XPLOR are used as the primary examples for data collection and protein refinement, which XtalView does not include.

91

92

COMPUTATIONAL TECHNIQUES

A

r~

XtalView

Xtalmgr

Project: examplesA

~-~

I ~ examples

~

Crystal: cvccp

Ne....

Directory:. /as d/prog/XtalV iew/exam pies Utilities: ~

Applications: ~

resflt limit1 [limit2] < >

xHeavy Command: xheavy ccp.aul.sol

(LiSt Files)

('Auto Name Output)

Input Argument 1:

ccp.aul .sol Filter: *.sol

I ccp.aul.sol

ccpaul ano,sol

r---i

I i~

(Add Args)

( Run Command]

( History,,.~

Input Argument 2:

Output Argument:

Fi Iter:

Filter: *,phs

r, 1

ccp,calc.phs ccp.phs ccp50.phs hp50.phs

o x

FIG. 3.1 XtalView xtalmgr. (A) The XtalView xtalmgr program is used to organize data and to start the individual applications. It has a graphical user interface using buttons, pulldown menus, and scrolling lists. Data are organized into Projects, which can be entered and edited using the field at the top of the window. The Crystal field is a keyword used to access the parameters for a specific crystal type such as the unit-cell parameters and the the space group symmetry operators. Other applications are selected from a pulldown menu (not shown) accessed from the Applications glyph. Selecting an application causes all files with the correct extension to be listed in one of the three file lists at the bottom. Files can be selected from these lists by clicking on them with the mouse. The command line is then built up using Add Args and then the application is started with Run Command. (B) The crystal editor is used to enter the unit-cell parameters, space group information, and any other relevant information. The space group symmetry operators for all space groups are kept in a table and can be accessed either by space-group number or by symbol as found in the International Tables for Crystallography, Vol. 1. The information entered into the editor is then available to all XtalView programs by simply entering the crystal keyword ( c v c c p in this example).

3.1 Terminology

93

B

r-'~

Crystal Ed itor Crystal: cvccpA Title: Unit Cell: 49.2 5G.7 98.8 90.0 90.0 90,0 S pace Gro u p: P2(1)2(1)2(1) Space Group#: 1 ~ ( Find Space Group by number) Symmetries: ~ 1/2-x,-y, 1/2+z; 1/2+x,1/2-y,-z; -x,1/2+y,1/2-z, Other Fie Ids: Keyword: Data:

(ReplaceField)

(Create

Field) (DeleteField)

chromatium vinosum ccp orhtorhombic form ncrsymml 1.0 0.0 0.0 0.0 1.0 0.0 0,0 0.0 1.0 0.0 0.0 0.0 ncrsymm2 -0,99881 -0.03213 0.03673 0.02744 -0.99217 -0.12187

( Up date Th is Crystal)

FIG. 3.1 (continued).

.....

3.1 . . . . .

TERMINOLOGY Reflection

A reflection is a single X-ray-diffraction vector that is the combined scattering resulting from the individual scattering of all the electrons in the unit cell along a particular direction. It has a magnitude, tFI, that is referred to as F, a phase, a, and the Miller indices h, k, I. The diffraction vector for protein is called Fp and the same diffraction vector with a heavy atom soaked in is FpH. The diffraction vector for the heavy atom alone is fh (the lowercase letters remind us that fh can never be directly measured but is always calculated). Two separate observations of the same reflection will be called F1 and F2.

Standard OPEN LOOK window frame Quit button

Current Crystal Database Key

--

Input file and directory

W i p n a l SP.E.

unkcd. ~ 0 5 2 0 i 6 7 0 0 1 ~ ~ m 1 ~ ) m w m i m

.. . . ......... .. .. ....... ... .. ,. . .. .. .' ... . . . .. ...

krn2lrtrhg"~liRrluul

flk b h l o f l n

Several file types supported Radio button controls

tn

C.*r

I

.my

Hnl

I df I ut

law

Viewer W a I.1.

.

1 nllnbov

I

* *

J.

.m.

8.

.*

....... . ... r

E

svmm:

e -10 +10

.. *.

2z 1:

I'

-

1.

Y'

.v. .-v. - -v. .

x.

V'

-v.

-1.

V'

1,

V'

V'

I'

.

-

I

POllf<m4: X' I' -1 P?s
I .-2

Potltlm 6: I'

L &ad"

brw .

.

V'

. ? .

.J

..

....

. a

.

Y.

Program State Message

\

Graphical Output FIG. 3.2 XtalView application dissected.

.

' i .

Y.

l.u*

I,

...

.D

.....

.

.

* 8.

..'m. . .. '. :?. . . . . . . : ..... ... . S...I . . . . . . . .. ... ... . . . . . . . . . * .

. i ' ~ l w ~ m uan*rrDn :

*

?

. . . . .. *. D'

ifn of wqu(vr1em p

*

D

8.

Apply button

....

.

. . ..... . . ..... . . . .... .... . . .

3.1 Terminology

95

Resolution "Resolution" is a loosely used term in protein crystallography. It usually refers to the minimum d-spacing (defined shortly) in the diffraction data set in question. It is usually assumed that the data are fairly complete to this resolution. A data set with one reflection at 1.7 A and all the rest with dspacings greater than 2.0 A is really a 2.0-A data set, not a 1.7-A data set. In this book, high resolution is taken to mean a smaller number, usually in the range of 2.5 A or less; low resolution is taken to be larger numbers, usually in the range of 1 0 0 - 4 A; and medium resolution is usually around 3 A. Not all crystallographers use these terms in a consistent way; it is important to keep the context in mind. Subjectively, a smaller minimum d-spacing implies more data and higher resolution and is, therefore, thought of as "better." Thus the phrase "this structure has been to determined to a better resolution" means a higher resolution or a lower minimum d-spacing. "Resolution" is also commonly used to refer to the minimum distance that can be resolved. In a map with good phasing (low noise), the minimum distance that can be resolved is about 0.92 of the highest resolution. Thus, in a 1.6-A map two carbon atoms 1.5 A apart are just resolvable. Resolutions where individual atoms can be resolved are referred to as "atomic" resolution. Resolutions where not even the secondary structure can be seen, somewhere lower than 10 A, are affectionately called "blobology," since that is the appearance that a typical protein has at very low resolutions. Another important effect of resolutionmnamely, that as the resolution increases so does the number of d a t a m i s especially important in refinement because it determines the ratio of observed data to refinable parameters. Around 2.0 A, depending upon the exact protein, it becomes possible to refine x, y, z, and B values independently as the ratio of observations to refined parameters becomes greater than 1. Some properties of radial shells for a typical protein are listed in Table 3.1. The table shows the number of observed reflections over the number of parameters to refine the protein atom positions and the ratio of observed to parameters for a typical heavy atom refinement. Resolution is often expressed in angstrom units, as mentioned above. Resolution is related to the scattering angle 0 by the rearranged Bragg equation: d -

0.5 A = (sin 0)/A 2 sin 0"

(3.1)

The equation is rearranged to show the relationship between resolution and (sin 0)/A, which is often used; (sin 0)/A is a wavelength-independent quantity, and its numerical value gets larger as the resolution increases. The wavelength, A, is included to make the value wavelength independent. This makes

96

COMPUTATIONAL TECHNIQUES

TABLE 3.1

Radial Shells of Resolution Reflections, s Shell no.

d

2~9

Total no.

No. centric

Percent centric

N/22,500 a

N/20 b

0.5 1 2 3 4 5 6 7 8 9 10 12 15 20 25 30

7.92 6.28 4.99 4.36 3.96 3.67 3.46 3.28 3.14 3.02 2.92 2.75 2.55 2.32 2.15 2.02

11.17 14.10 17.78 20.38 22.46 24.21 25.76 27.14 28.40 29.57 30.65 32.62 37.72 38.90 42.04 44.81

500 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000 12,000 15,000 20,000 25,000 30,000

229 365 577 756 917 1068 1201 1337 1459 1577 1687 1902 2212 2673 3112 3526

46 37 29 25 23 21 20 19 18 18 17 16 15 13 12 12

0.02 0.04 0.09 0.13 0.18 0.22 0.27 0.31 0.36 0.40 0.44 0.53 0.67 0.89 1.11 1.33

25 50 100 150 200 250 300 350 400 450 500 600 750 1000 1250 1500

One thousand reflections per shell for a protein with an orthorhombic unit cell of 69.75 x 77.42 x 87.79 A ~. "Ratio of observed to parameters for protein refinement assuming 7500 atoms and three parameters: x, y, z. i, Ratio of observed parameters for a typical heavy-atom refinement assuming four sites with five refinable parameters: x, y, z, occupancy, B-value. Note.

statements like "A low-resolution 5-A map is harder to interpret than a highresolution 2-A map" read as "A low-resolution (sin 0)/a = 0.1 map is harder to interpret than a high-resolution (sin0)/a = 0.25 A map." The angstrom unit is a common measurement in chemistry and means something to crystallographers as well as noncrystallographers: 1 A = 10 ~~ or 0.1 nm. Being nonmetric, however, the angstrom is not an approved international unit of measure. But since protein bonds are on the order 1 A, it does make sense to use angstroms in crystallography. This book will not break from tradition and will continue to use the angstrom. Be aware, though, that some journals have banned its use.

CoordinateSystems Two main coordinate systems are used in crystallography: fractional coordinates and Cartesian coordinates. Fractional coordinates are expressed as fractions of the unit cell in each of the three directions a, b, c separated by

3.1 Terminology

97

the angles c~,/3, y. Fractional coordinates are parallel to the crystallographic axes and thus are not necessarily at right angles to each other. Thus the origin of the unit cell is (0, 0, 0), and the point one cell edge away in b is (0, 1.0, 0). The point (0.5, 0.5, 0.5) is in the center of the cell regardless of the cell dimensions and shape. In any space group, 1.0 or - 1 . 0 can be added to any fractional coordinate without changing the meaning of the coordinate because of the infinitely repeating nature of crystal lattices. Thus, the point ( - 2 . 3 , 1.4, 0.3) is the same as (0.7, 0.4, 0.3). Certain operations such as comparing symmetry relations are more easily done in fractional space. It is obvious in fractional space that (0.5, 0, 0) and (1.5, 0, 0) are the same point. Usually Cartesian space is used for atomic models, especially since this makes it easier to compare bond angles and lengths. Cartesian coordinates are expressed as three mutually perpendicular axes, x, y, and z, with angstroms as the units. Coordinates in the Brookhaven Protein Data Bank (PDB) are reported in Cartesian coordinates, and the files contain a 3 x 3 matrix for converting the Cartesian coordinates to fractional coordinates. The origins of both coordinate systems are the same point. Crystallographic calculations, such as Fourier transforms, use fractional coordinates, so it is necessary to convert from Cartesian coordinates, which are the best for display and geometry purposes. If the space group of the crystal is orthogonal, tetragonal, or cubic, then there is a unique way to superimpose the two coordinate systems: a along x, b along y, and c along z. When the space group has a non-90 ~ angle, there can be several ways to superimpose axes. It is necessary then to use the matrix given in the PDB file to perform the Cartesian to fractional transformation. For instance, in monoclinic space groups, where y is non-90 ~ there is the choice of aligning a with x or c with z, and b will always be along y. These two possibilities are referred to as a b c * and a * b c , respectively (Fig. 3.3). Once the matrix for one of the transforms, Cartesian to fractional or fractional to Cartesian, is known, the matrix can be inverted to obtain the matrix for the opposite transformation. There is no standard method to relate fractional and Cartesian coordinate systems, and this is a common cause of confusion. For instance, if a rotation-translation solution is written out in a * b c , an attempt to refine it in a program that uses a b c * will give a random R-factor, leading the crystallographer to believe that the solution is wrong even though everything is fine! Patterson space has its own coordinate system analogous to fractional coordinates except that the terms u, v, w are used instead of a, b, c. Reciprocal space has the coordinate system h, k, l, also referred to as the Miller indices. This coordinate system is related to the fractional system in an inverse manner. 1 In addition, there are many other coordinate systems: polar 1Stout and Jensen, in X - R a y Structure Determination (pp. 26-33), an informative discussion on reciprocal space and its relationship to real space. Especially nice are the figures showing the relationship of the real axes to their reciprocal counterparts.

98

COMPUTATIONALTECHNIQUES

cqc

Z

A

Z

~ ~Y B

B

x

Y

x

FIG. 3.3 Two ways to superimpose axes a b c * and a *bc. Left: The Cartesian coordinates x, y, z have superimposed upon the crystallographic axes a b c such that a and b superimpose with x and y. Since fl is not 90~ c cannot be superimposed but, instead, c * is superimposed on z, and thus this superposition is labeled a b c *. Right: The other possibility is shown with a *, b, and c superimposed on a * bc.

coordinates, Eulerian angles, camera coordinates, and so forth. In this book we will avoid these additional systems.

R-Factor R-factor is a formula for estimating errors in a data set. It is usually the sum of the absolute difference between observed (F,,) and calculated (Fc) over the sum of the observed: /crystallographic

=

~lF(, i

9

(3.2)

If two r a n d o m data sets are scaled together, then the R-factor for acentric data is 0.59 and for centric data it is 0.83. The R-factor, often called just the R, is ubiquitous in protein crystallography and is probably given more weight than it deserves because it turns a rich wealth of detail in three dimensions sampled at thousands of points into a single number. It is dangerous to use a low R-factor as a guide to the correctness of a structure, although a high R-factor is a good guide of the incorrectness of a structure. One problem with using the R-factor as a guide is that it has led to a tendency to overrefine models and to "drive in" phase bias by using the power of least-squares algorithms. A good least-squares minimizer can always lower the R-factor by modeling the errors as well as the data. Because of the nature of the Fourier transform, every point on one side of the transform contributes to every point on the other side so that it is possible to model the error on one side by subtly adjusting the points on the other side (see Kevin Cowtan's web site, listed in Appendix B, for a graphical illustration of this). It is also possible to

3.1 Terminology

99

lower the R-factor by deleting observed data and by raising the number of parameters. For evaluation of an R-factor, it is important to ask how many parameters were used and what percentage of the data were deleted and by what criteria. Knowing these, direct comparisons of R-factors can be a useful way to compare the relative accuracy of a model or data set. The relationship between R-factors and errors is discussed further in Section 3.10 (see Evaluating Errors; also see Cross Validation with R-Free, Section 3.11).

SpaceGroupsand Symmetries Crystals are by definition symmetric arrays of molecules in three dimensions and must fall into one of the 230 known space groups that describe all the possible ways identical objects can pack in three dimensions. Of these 230 space groups, only 65 are possible for chiral objects, such as proteins, which are made up of L-amino acids. Crystals are built up from unit cells that are repeated by translation in each of three directions. The edges of a unit cell are denoted by a, b, c, and the angles between them are c~,/9, y.2 Note that the direction of translation does not have to be on a Cartesian coordinate system but can have any angular value. The unit cell contains from one to several asymmetric units, arranged by symmetry elements. The symmetry elements are expressed mathematically as symmetry operators. By applying the symmetry operators to one asymmetric unit to form a unit cell and then repeating the unit cell in the three directions specified by c~,/3, y, the crystal can be reconstructed. If a computer program is space group general, then it can read a list of the symmetry operators and perform the correct calculation. If a computer program is space group specific, then the symmetry operators of one or a few space groups are hardwired into the program, which can be used only for calculations in these space groups. XtalView is completely space group general, and it has a library file that contains all 230 space groups that can be accessed from the Crystal-Edit function of xtalmgr. These can be overridden if desired to put in nonstandard settings of known space groups, or even to enter impossible space groups, by entering the symmetry operators in algebraic form as found in the International Tables for Crystallography, Volume 1. Most space groups have several different choices of origin. That is, several positions in the unit cell can be chosen as the point 0, 0, 0 to yield identical crystalline patterns that are shifted by a constant amount. In orthorhombic space groups, for instance, the origin can be placed at 0, 0, 0 or 1/2, 0, 0 or 0, 1/2, 0 or 0, 0, 1/2 or 1/2,1/2, 0 or 1/2, 0, 1/2 or 0, 1/2, 1/2, or 2For a complete description of space groups and symmetry see The International Tables for Crystallography, Vol. 1 or Vol. A, or any of a n u m b e r of textbooks on crystallography.

COMPUTATIONALTECHNIQUES

100

1/2, 1/2, 1/2. This arrangement can give rise to confusion because, for example, two sets of coordinates can be on different origins and appear to be different solutions when in fact they are identical.

Matrices for Rotations and Translation Symmetry operators can be expressed more generally in the form of matrices, as can any rotation and translation combination in the form x' = Rx

+ t

(3.3)

where R is a 3 • 3 matrix and x and t are length-3 vectors. The vector x is transformed to form a new vector x'. In this way the arbitrary symmetry operator x + y, 2/3 + x, 1/4-z may be expressed as follows: (x') y' z'

-

(i)(-1

1 0

1 0 0

0) (0.0 / 0 + 2/3 . - 1 \1/4/

(3.3a)

An example of a program segment applying the foregoing matrix operation can be found in Appendix A, program 2. A common task in crystallography is to rotate from a starting position to a new position about one of the three laboratory axes, qSx, 4~x, ~z. The matrices for these transformations are: q~x =

(i cos0~x sin ~x

0

- s i n 4~x , cos ~ x /

(3.4)

cos ~by 0 sin ~by) 0 1 0 , - s i n 4>y 0 cos ~,,

(3.5)

[cos ~z - s i n ~z 0 (I3+- "|sin~b~ cOS~z 0 .) \o 0 1

(3.6)

(I)y -

If the vector with the coordinates is X, then the new coordinate X' is found by X' - ~x(I)y~+X.

(3.7)

B-Value The B-value is a measurement of the displacement of an atom from thermal motion, conformational disorder, and static lattice disorder. This vi-

3.1 Terminology

101

bration will smear out the electron density and will also decrease the scattering power of the atom as a function of resolution. 3 The displacement for an isotropic B-value is related to the displacement u by the equation B = 87"/'2(U2).

(3.8)

Anisotr0pic B-Values The B-value as expressed in Eq. (3.8) assumes equal movement in all directions and is thus an isotropic B-value. The vibration of an atom need not be the same in all directions, and in this case the motion is described by anisotropic displacement parameters (ADP). In this formulation the motion is described by an ellipsoid that can be rotated in any direction. The ellipsoid is described in a symmetric 3 • 3 matrix:

U =

811 G1

U12 G2

U13 G3 9

U~

U3~ U~

Since the matrix is symmetric, the lower three elements below the diagonal are equal to the upper-diagonal elements and can be removed. This leaves six elements. The elements on the diagonal, U~I, U22, U33~ specify the magnitude of the movement in the three axes, and the off-diagonal elements, U~2, U~3, U23, specify the rotation of the ellipsoid off the principal axes. If the matrix cannot be solved in terms of an ellipsoid it is termed nonpositive

definite. The anisotropy of an atom is defined as the ratio Umin/Umaxof the diagonal elements Ull, U22, U33, such that a spherical, isotropic atom would have an anisotropy of 1.0 (Fig. 3.4). In a survey of anisotropic thermal parameters in proteins, Ethan Merritt found that the average anisotropy of proteins to be about 0.45 with remarkable agreement between structures. 4 One interesting consequence of this is that it shows that the assumption of an isotropic thermal parameter is a poor fit for the majority of proteins structures. On the other hand, there is no way to properly fit the extra parameters at lower resolutions. An overall anisotropic correction for the protein is a good compromise, since it requires only six more parameters for the entire data set. Anisotropic displacement parameters can be converted to the isotropic equivalent by the formula

Bi~o----- 8"/'/'2(811 -'~ G 2 + G3) 9 3The equation for this falloff is exp(- Bs2), where s is (sin0)/Aor 0.5/d. 4Merritt, E. A. (1998). http://www.bmsc.washington.edu/parvati/parvati_survey.html.

102

COMPUTATIONALTECHNIQUES

!

A

B

FIG. 3.4 (A) Isotropic spheroid with anisotropy of 1.0 and (B) anisotropic ellipsoid with an anisotropy of 0.5.

. . . . . 3.2 . . . . . BASIC COMPUTER TECHNIQUES

File Systems As your projects grow you will gain many files. It is important to keep these files organized in a sensible way or they will soon get out of hand. I have seen some users keep all their files in their home directory until the list became longer than the directory command could process. A better approach is to put all the files for a given protein in a subdirectory. Appropriate subdirectories can also be made for groups of files with a common purpose such as phasing, refinement, or fitting. The use of subdirectories in a tree structure will allow all the files to be easily backed up. Even if you start off with several gigabytes of space, it will eventually all be filled if files are not stored off-line and deleted occasionally. To keep track of files, a naming convention should be followed. Most files are given an extension to indicate the file type. The root of the file name should be a description of the contents. My convention is to use a short threeor four-letter descriptor of the protein; a one- or two-letter descriptor of the crystal, whether it is native or derivatized (n for native or the element name of the heavy atom); and a number designating the particular crystal. For instance, photoactive yellow protein was called ~ and a file containing the native data from crystal number 5 in .fin format would be called y p n 5 . f in. If this information is merged with mercury derivative crystal number 1, yphgl, fin, to form a .df file, the new file would be ypn5hgl, df.

3.2 BasicComputerTechniques

103

Portability Considerations Avoid Space Group Specificity While it may seem easier to write space-group-specific code, it usually turns out to be more work in the end. Quick-and-dirty programs have a way of growing beyond your original expectations and becoming programs you rely on daily. If a problem is worth solving, it is worth doing it right. Take the extra time to make the program space group independent. It is much harder to modify a program later; when you look back at the code it can be difficult to decipher its meaning. Often space-group-specific code is faster. However, modern computers are so fast that the difference between no time and no time is still no time, whereas it takes humans relatively a much longer time to alter the code. For example, if you multiply a symmetry operator matrix, most of the elements will be zero. A small amount of time may be saved by writing the code assuming zeros, but later the code will have to be changed to put these terms back in. The time it takes to rewrite the code later will probably add up to more time than the computer uses doing these extra multiplications in all the runs of the program put together. (Actually, a smart math coprocessor will recognize the zero and return zero without actually doing the multiplication).

Avoid Binary Files At first glance, binary files seem to be a good idea. Instead of saving data as an ASCII equivalent, files are stored in their computer representation. This saves some space and speeds up I/O. However, binary files cannot be read by humans, so there is no way to tell what is in them. Computers can read binary files only if the order of bytes in the file is precisely known, as well as what they represent (a float, an integer, characters, a double, etc.). Unless the format is precisely specified, binary should not be used. Another problem with binary is that it is not portable between machines. Even worse, there is no universal standard for FORTRAN binary; different implementations use different headers and trailers on the records. Since the only constant in the world of computers is change, you may find in 2 years that your hot new machine cannot read binary files from your last machine. Scratch files can be binary because they are not meant to stay around and binary I/O is faster than formatted. Also, for very large files, such as electron density maps, binary may be the most practical alternative in terms of size and speed. Electron density maps can always be recalculated from primary phases, so saving them is usually unnecessary. With the speed of modern computers, it is usually faster to fast-Fourier-transform a map than to read it in from a disk file, even if it is binary!

104

COMPUTATIONALTECHNIQUES

Language Extensions Computer language extensions often seem to be temptations designed to make users dependent on a particular brand of computer. Avoid them. Since change is the only constant, your extensions may become a headache when you have to update some old code that refuses to run on your new system.

Setting Up YourEnvironment UNIX users need to set up two files in their home directory: . c s h r c and . l o g i n . The . c s h r c file is executed every time a new shell is started and t h e . 1 o g i n file is executed once when you first log into the system. VMS users have a similar file called l o g i n . com. To use XtalView it is necessary to put a few lines into your . c s h r c file or source the XtalView.env file that comes with the installation. Especially important is setting up your path variable, which determines where the operating system will look for commands. You can share executables with other members of your group by including the proper directories in your path. Also useful is the aliasing of common commands or groups of commands to a short word. For instance, XENGEN users will find the following alias useful: a l i a s ypset source -dem/ yp~$1 ~process^S1. cmd. This is used ypset crystalname to source the appropriate command file to set the environment variables for a particular crystal. Also, alias rm to rm - i and copy to cp - i to prevent overwriting files accidentally. Alias RM t o / b i n / r m and CP t o / b i n / c p to override the prompting when you are sure you want to delete the file(s). A related command, s e t n o c l o b b e r , prevents overwriting of files with redirection commands.

File Formats and mmCIF CCP4 M T Z Format CCP4 uses a file format called M T Z to store crystallographic data. The format is binary and consists of FORTRAN records of header records followed by reflection data records. The header records contain crystallographic information including the cell and space group and the column labels. The data records consists of columns, with each record containing a minimum of three columns containing h, k, I indices and additional columns containing crystallographic data such as amplitudes, intensities, phases, and weights. The column labels are used to identify data and can contain any name up to 30 characters long, but typically standard names are used for items:

3.2 BasicComputerTechniques Label

Data item

H, K, L IC M/ISYM BATCH I vp

Miller indices Centricity flag: 0 centric, 1 acentric Partial flag M, and symmetry number Batch number Intensity o-(l) Native, measured F

SIGFP

er(FP)

vC

Calculated F F for derivative n Anomalous difference for native data o-(FP) Anomalous difference for derivative n o-(DPHn) Calculated phase in degrees Most probable phase Phase Figure of merit Weight Hendrickson-Lattman coefficients A, B, C, D R-free flag

SIGI

FPHn

DP SIGDP DPHn

SIGDPHn

PHIC PHIM

PHIB FOM WT

HLA, HLB, Hr,C, HLn FREE

Isymstored

105

as 2 5 6 *M+ I sym

To use the data in a CCP4 program, the user specifies which column labels in the M T Z file correspond to the program labels on a LABIN card in the command file. These don't necessarily have a one-to-one correspondence as programs, and users are free to label columns as they wish.

XtalView Organization

XtaIView Crystal File XtalView is organized around the concept of a crystal. A crystal is all data that have a common unit cell and space group. The information for such a crystal is kept together in a central file. This information can then be accessed by all XtalView programs by entering the name of this file into the Crystal field or by setting the environment variable CRYSTAL. The file, which is in ASCII, consists of lines with a keyword followed by data, and it can be edited. Xtalmgr provides a facility for editing new crystal files, or an existing one can be copied. Xtalmgr knows about all 230 space groups.

XtalView File Formats Almost all XtalView files are in ASCII with the notable exception of map files, which are in FORTRAN binary form for backward compatibility.

COMPUTATIONALTECHNIQUES

106

Records are separated by end-of-lines, and data fields are separated by white space (a comma is not considered as white space). Columns are not necessary except in PDB files, the standard to which XtalView faithfully adheres, for better or worse. The files are easily readable by F O R T R A N or C programs. Using ASCII allows easy importing and exporting of data to and from XtalView and permits browsing of the data. It also forms the major bottleneck in running programs, since every line of an ASCII file must be scanned and converted as it is read in or written out. Given the trade-offs, it seems a price worth paying. As few file formats as possible are used. Standard extensions are used. Because strict typing of these extensions would make XtalView less compatible with other software, the practice is not enforced, although such consistency would make the system much more robust. The following list of the most common file types includes most files used with a brief description. .fin. The basic crystallographic data file contains: h k 1 F1 o- (F1) F2 cr ( F 2 ) . F1 and F2 can be Bijvoet pairs F (+ and F(-), or isomorphous pairs Fr and F,,,,, or any other pairable data with an associated sigma. If one of the two observations is missing, then F should equal 0.0 and or(F) should equal 9999.0 to indicate that the data are missing or not applicable (i.e., centric reflections do not have an anomalous signal, and this is indicated by setting the Friedel mate to 0.0). When a .fin file is read, if one of the observations is missing, it is correctly handled, depending upon how the data are to be used. .df. A "double fin" file contains enough data to merge two data sets while preserving Bijvoet pairs: h k 1 F1 or (F1) F2 o (F2) F3 o (F3) F4 o r ( F 4 ) . Usually, the first two are native data Fp(+) and F p ( - ) with F3 and F4 Fph(+) and F p h ( - ) . Missing or unobserved data are handled as for a .fin file. For example, a centric reflection would look like this: 0 0 5 220.05

5.76

0.0

9999.0

157.3

9.67

0.0

9999.0

.phs. A "phase" file contains either h k 1 Fo Fc p h i , or h k 1 Fo f . o . m , p h i , where phi is in degrees and f.o.m, is the figure of merit. The H e n d r i c k s o n - L a t t m a n coefficients A, B, C, D are sometimes saved at the end of each record, in which case the file contains: h k 1 Fo f . o . m . phi

A

B C D.

.sol. This is a heavy-atom "solution" file that contains a keyword format. Xheavy provides an editor for creating and editing these files. They are easily read after editing and contain all information necessary to keep track of several derivatives. This allows the use of individual heavy-atom files and avoids merging them into a "superfile" with the concomitant headaches of altering or removing data.

3.2 BasicComputerTechniques

107

.map. This is an electron density map file in the FSFOUR format originally from Pittsburgh. It is in FORTRAN binary and contains an entire unit cell. Given the very large size of electron density maps, it is not practical to use an ASCII format. In addition, many other formats are supported here and there by various programs, such as .mu and .rfl files from XENGEN, XPLOR .fobs files, SHELX .fcf files, and TNT .hkl files. Xprepfin can be used to import or export .fin files to and from a variety of formats. Much of your formatting tasks can be solved by learning to use the UNIX command awk. For instance, to read a file with h k 1 Fp o- (Fp) (Fp ( -- ) --Fp ( + ) ) into .fin format with awk: awk ' {print $i, $2, $3, p o r t _ f i l e > i m p o r t e d , fin.

$4,

$5,

$4 +$6,

$5} ' im-

This .fin file should still be read through xprepfin to set things such as centric reflections and reduce and merge indices. If you save the awk procedure in a script file, you can even add this awk script to xprepfin as an Other file format by editing the file $XTALVIEWHOME/data/OtherFormats and following the instructions there to add your script. Plotting is done using Postscript. Postscript can be viewed with pageview on workstations, psview on SGI machines, and ghostscript under LINUX; other systems have similar commands. Commercial services are available for preparing slides from Postscript files.

History Files XtalView is intended to be more than just another pretty face; it is an entire system for organizing and tracking crystallographic data. The history file is key to the tracking and record keeping of data. All data files created by XtalView also have an associated history file that contains the name(s) of the input file(s), the date created, and all the values of variable parameters. The history file can be used to track all data backward and f o r w a r d - - w h e r e the data have been and how they were created. If an error was made at a step in the past, it can be found by searching through the history files. Intermediate data files can be deleted and later recreated by using the information in the history files. This can save a tremendous amount of disk space. A history file has the same name as its parent file with the extension .hist added.

mmCIF mmCIF is the macromolecular version of the crystallographic interchange file (CIF) format, which was developed to describe small molecule

108

COMPUTATIONALTECHNIQUES

organic structures and the crystallographic experiment by the International Union of Crystallography. The result of this effort was a core computerreadable dictionary of data items sufficient for archiving a small-molecule crystallographic experiment and its results. The CIF dictionary and the data files based upon that dictionary conform to a subset of the self-defining text archive and retrieval (STAR) representation developed by Hall. STAR is a set of syntax rules from which STAR-compliant dictionaries have been developed by several disciplines. From the dictionaries it is possible to define data files that use data items referenced in the dictionaries. The dictionary allows precise descriptions of crystallographic terms and data items in a manner that can be automated and parsed by computers and is also easily read by humans. One of the most powerful features of STAR is that it is extensible by adding entries to the dictionary. mmCIF is a dictionary that extends CIF to describe macromolecular crystallography data and experimental details. Not surprisingly, given the complexity of biological macromolecules, the dictionary is extensive and contains over 600 data items unique to macromolecules.

mmCIF Syntax A CIF file consists of one or more data blocks each of which starts with the keyword d a t a . Each data block consists of data items, identified by a leading underscore (tag) followed by a value for that data item (value). Repetitive data values may be grouped into a loop_structure. Lines starting with a # are comments. # This example is of # l o a d e d f r o m the p d b # data_s fexample

a

# the # for loop_ _s y m m e 'x, y, 'z, x, 'y, z,

defines

following R3

loop

structure

t r y _ e qu i v__po s _ a s _ x y z z' y' x'

# T h e u n i t cell _cell length_a _ c e l i_i e n g t h _ b _cell length_c _cell_angle_alpha

40. 4300 40.4300 40. 4300 80. 250

the

factor

entry

symmetry

down-

operators

109

3.2 BasicComputerTechniques _cell_angle_beta _cell_angle_gamma

80. 250 80. 250

# This is the b e g i n n i n g of the r e f l e c t i o n # c o n t i n u e s to the end of the file. loop_ refln.index h refln.index k refln.index 1 _refln. F _ s q u a r e d _ m e a s _refln. F _ s q u a r e d _ s i g m a refln. F calc _refln.phase_calc -i -2 -i 0 1 -2

1 -i -i -I -i 0

1 46047.80 2 113852.10 2 2140.88 2 1536.05 2 126277.95 2 63480.48

1063.92 3460.71 119.66 117.88 3545.42 1348.86

o

An example of a coordinate file: data_coordexamp ie loop_ _atom_site.label_seq_id _atom_site.auth_asym_id _atom_site.group_PDB _atom_site.type_symbol a t o m s i t e . l a b e l a t o m id _atom_site.label_comp_id _atom_site.label_asym_id _atom_site.auth_seq_id a t o m s i t e . l a b e l alt id _atom_site.cartn_x _atom_site.cartn_y _atom_site.cartn_z _atom_site.occupancy _atom_site.B_iso_or_equiv a t o m s i t e . f o o t n o t e id

421.55 310.21 90.91 180.45 442.72 281.96

197.1 77.3 85.1 151.7 42.8 142.6

list

which

110

COMPUTATIONALTECHNIQUES

_ a t o m _ s i t e. i a b e l _ e n t i t y _ i d atom site.id _atom_site. anlso_U

[1 ] [i]

_atom_site.

anlso_U

_atom_site.

anlso_U[2

_atom_site.

anlso_U

_atom_site.anlso_U[l] _atom_site.

anlso_U

1 . ATOM

N N

GLU

1 . ATOM 1

N N 2

GLU

1

1 . ATOM 1

1

C CA 3

* 1 A

0.9336

* 1 B 0.8406

[i] [2 ] [3]

] [2 ]

[2 ] [3 ]

[3 ] [3 ] 4. 127

0.0004

26.179

0.2737

-7.903

0.7394

0.49

57.53

0.2771

.

0.4591

3.535 25.448 -12.889 0.51 54.52 . -0.0887 0.3093 0.5015 0.0161 0.6783

GLU * 1 A 5.490 26.607 -8.207 0.49 52.50 . 0.9283 -0.0256 0.2331 0.5563 0.1241 0.4611

9 Data names cannot be longer than 32 characters. 9 No file record can be greater than 80 characters. 9 Loops cannot be nested. 9 C o m m e n t s begin with a hash # and end at the end of a line. 9 Data items begin with an underscore" _. 9 Single data values that contain white space can be contained either in single quotes or within semicolons (;) as the first character of a line. 9 To maintain the integrity of a loop structure, missing values are filled in by a question mark (?). A period is used if the value does not have any meaning within the loop. 9 Loop_ ends when the next tag is encountered. 9 The data block is declared by a data_ and terminates with an end-offile or another data block.

mmCIF Dictionary The m m C I F dictionary defines the allowed data items. An example of the e n t r y _ r e f l n . F _ m e a s used in the foregoing reflection loop is:

111

3.2 BasicComputerTechniques save_refln.

F_meas

_i t e m _ d e s c r i p t i The

measured

on. d e s c r i p t i o n

value

of

the

item.name

_item.mandatory_code item item

aliases.alias

item

name

aliases.version

related.related

no

' refln

F meas'

cif_core.dic

name

'_refln. F_meas_sigma' associated

' refln. F meas code

_i t e m _ t y p e _ c o n d i item

s ave

units, code

electrons.

2.0.1

_item.related.function_code

_item_type.

in

refln

aliases.dictionary

loop_

factor

' refln. F meas'

_item.category_id item

structure

esd

au'

convers i on_arbi trary

tions, code

float

esd

electrons

CCP4 CrystallographicPrograms The CCP4 crystallographic system is a suite of continually evolving programs for protein crystallography, written by many crystallographers and maintained by CCP4 staff at Daresbury, UK. It is highly recommended and has been used to solve innumerable protein structures. Combined with XtalView and a data processing package, CCP4 gives you a complete suite of software for solving protein structures by all the major techniques. There is some overlap between CCP4 programs, and some choice over which program to use to perform a specific task, giving the user a lot of flexibility. This is enhanced by documentation, example scripts, and a discussion group, all of which is accessible from the CCP4 Web pages (see Appendix B). All the programs within CCP4 use the same reflection file format, an .mtz file. This binary file has a header containing the cell dimensions, space group, a title statement so you can annotate the history of the .mtz file, the resolution of the data, columns within the file, and then h k I and any choice of columns up to a limit of 200. Thus, all processed and scaled data for your native and derivatives can be kept in one file, along with the sigmas, anomalous differences, an R-free flag, and so on. The order of the columns in the reflection file is not important because each column has a column label,

112

COMPUTATIONALTECHNIQUES

designated by the user, to help make it clear what is going on. Many of the CCP4 programs also have a column type associated with each column, to prevent the user from, for example, using figures of merit as the phases in a map calculation. Since an .mtz file is a binary file, the program mtzdump is used to read the header of the .mtz file and can be used to look at individual reflections.

Handling Reflection Files CCP4 has programs for the scaling of diffraction data, both internally and against other data sets (SCALA is probably the most frequently used program). The autoindexing and integration of data are done by other programs outside of the CCP4 suite (MOSFLM, DENZO, and others). MOSFLM writes out an .mtz file, while DENZO or its scaling program SCALEPACK write out ASCII files, so you can choose to scale using CCP4 or not by converting the reflection file before or after SCALEPACK. Conversion of I's to F's is done using the program TRUNCATE, which additionally produces useful output regarding the properties of your data, as well as performing a more elegant conversion than simply square-rooting intensities in which negative intensities are lost. To make sure that your reflections are in a standard asymmetric unit of reciprocal space, run CAD to move everything to the same portion of reciprocal space and the program SORT to sort the reflections by indices. The combination of reflection files is done using mtzutils or CAD, and the scaling of multiple datasets uses SCALA, SCALEIT, or FHSCAL. The program FREER can be used along with the program UNIQUE, which generates all unique reflections for a given cell, space group and symmetry, to create a master free R reflection set, which simplifies the introduction of different data sets to refinement (e.g., ligand-bound forms, higher-resolution data). In addition, if you change refinement programs later, no doubt the format of the reflection file will have to be changed, and having a master free R reflection set can simplify things.

Density Modification: DM and Solomon There are two widely used programs for density modification within C C P 4 m D M and Solomon. DM, written by Kevin Cowtan, is perhaps more widely used, and includes noncrystallographic symmetry averaging without the requirement of a mask filemjust the matrices that correspond to the rotation (and translation if applicable) between the identical subunits. DM is a fast-evolving program, reflected in the large number of examples given in the documentation. An example using DM is given later (Sect. 5.5).

3.2 BasicComputerTechniques

113

PDB File Manipulation PDBSET can be used for renumbering, relabeling, resetting, and moving PDB files. Note that CCP4 programs generally require that if you have a metal in your PDB file, the first mention of the metal on the line in the PDB file must be displaced by one column to the left relative to other atoms.

Molecular Replacement with AmoRe One of the quickest and most robust programs for molecular replacement is AmoRe. The program has a number of keywords to perform a number of functions, such as generating structure factors from the model, converting file formats, rotation search, translation search, and then rigid-body refinement. Each function is run at the terminal, allowing the user to check at each stage before proceeding. With the change in cell dimensions often observed when freezing crystals, AmoRe provides a quicker alternative, by means of molecular replacement, to the cycles of rigid and positional refinement often needed to move the model into the new cell.

Phasing: MLPHARE M L P H A R E is one of the most popular phasing programs used. Prior to use, the native and derivative data sets must be scaled (using SCALA, SCALEIT, FHSCAL, or alternatives) and then using CAD to merge into one .mtz file. Successive runs of M L P H A R E allow the refinement of position, occupancy (real and anomalous), and B-factors. It is possible to phase MAD data sets by treating them as a special case of MIR.

Refinement: REFMAC Prior to running REFMAC it is necessary to run PROTIN to create connectivity and geometry files. REFMAC allows refinement against several minimization functions, and the user can dictate the weight given to the X-ray terms, geometric terms, the smoothing of B-factors, and so forth.

Map Calculation: SFALL, SIGMAA, and F F T In general a erA-weighted map is the most useful, since the phase bias is minimized in such a map. Thus to calculate a map from a model, you would first use SFALL to convert a model to structure factors. By including the observed data at this stage, you can scale the Fcalc tO the Fobs. Then run SIGMAA

114

COMPUTATIONALTECHNIQUES

to calculate coefficients, and use FFT to make a map from these coefficients. [Of course, if you are going to use xfit, none of this is necessarymsimply read in your .fin file with your observed data as a map, select the era map type, and instead of calculating the map, click the SFCalc button instead. Then select your PDB which is to be used as F's, select the .fin file as the map, and calculate.] If you need a figure of merit for a density modification program such as DM and you have molecular replacement phases rather than M I R / M A D phases, SIGMAA will happily provide some for you.

Model Validation If you want to know what's wrong with your model, use PROCHECK to check on its geometry, and use SFCHECK to see where there is poor agreement between your model and your data. In addition, the non-CCP4 program WHATIF provides a lot of output, some of it quite useful. SFTOOLS When converting to and from .mtz files, it is worth taking a few seconds to run SFTOOLS. This utility will check that your reflections haven't become victims of an incorrect format statement.

Some CCP4 Tips The first hurdle the new user has to overcome is the fear of matching program labels and column labels. Each program has a label for the columns that it is going to recognize and use, and there is some variation between programs. For example, your .mtz file may contain a native, denoted FNAT, and a derivative, denoted FHG. A scaling program may have the program labels FP and FPH1. Thus to let the program know which is which, you need to assign the program labels column labels something like this: F P = F N A T FPHI =FHG.

There is no specific program for data indexing and integration within the CCP4 suite, although MOSFLM has the look and feel of a CCP4 program and outputs a reflectdion file in the CCP4 reflection format, an .mtz file. Once you master the .mtz file, the CCP4 suite becomes simple to use. Specific examples of CCP4 scripts are presented throughout the book. CCP4 is run by editing command files and then running them from the command line. The new versions of XtalView can read and write CCP4 files in many cases and do so by writing and running command files for you in the background. Thus, for best use, CCP4 must be installed and in your path on the same computer that runs XtalView.

3.3 DataReductionand StatisticalAnalysis .....

3.3

115

~

DATA REDUCTION AND STATISTICALANALYSIS Evaluation of Data R-symm R-symm is an internal measure of the errors within a data set. It compares the differences between symmetry-related reflections that should be identical in intensity. In the statistical world, these should be two independent measures of the same number, and the variance of symmetry-related reflections should reflect the variance of the entire data set. In the real world, symmetry-related reflections are usually not completely independent--they usually occur when the crystal is in similar geometry. However, R-symm does give a minimum variancemit is unlikely that the data are more accurate than the R-symm. In practical terms, data with an R-symm less than 0.05 are good data; when R-symm is less than 0.10 the data are probably usable; and above 0.20 the data are of questionable value. Two data sets that have no relationship to each other will have a theoretical R-symm of 0.57 if they are simply scaled together. Random data scaled in bins with multiple scale factors can often reach an R-symm of about 0.35. By rejecting outliers, the R-symm can be lowered even further to the point where one might think the two data sets are related when they are from different crystals or the data were misindexed.

Ratio of Intensity to Sigma of Intensity The sigma of a reflection intensity is an estimate of the accuracy of an individual measurement as opposed to the accuracy of the data set as a whole. It is a combination of counting statistics, background height, background variance, and the number of times the reflection was measured. In my experience, no two data reduction packages will produce the same sigma, so direct comparison of data sets on the basis of sigma should be avoided unless some yardstick is available. The ratio of the amplitude of the reflection to the sigma of the amplitude (F/o-(F)) is often used as a rejection criterion. If this ratio is less than approximately 2, a reflection is often rejected as being unreliable.

Anomalous Scattering Signal Many protein crystals contain a prosthetic group with anomalous scattering properties and, in addition, many heavy atoms used for derivatization have anomalous scattering. One test for the presence of an anomalous signal is to compare the R-symm of the centric data, in which the anomalous

116

COMPUTATIONALTECHNIQUES

signal cancels, and the R-symm of the acentric data. The anomalous signal manifests itself as a breakdown of Friedel's law, F + is equal to F - , so that a difference is found, s Even though the effect is often small, the Bijvoet pairs (Friedel pairs where the equality no longer holds) can be measured from the same crystal, and this allows measuring a smaller signal than is possible when the measurements are from different crystals, as in the isomorphous replacement case. Anomalous scattering arises when the energy of the incident radiation is close to the resonant frequency of the tightly bound inner-shell electrons of an atom. A simple experiment that demonstrates this effect is easily performed. Tape a lightweight Styrofoam ball onto the end of a 2.5-foot piece of yarn. With your arm outstretched, dangle the ball in front of you. Move your hand slowly back and forth and then gradually increase the speed. At slow speeds, where the driving energy (your hand) is well below the resonant frequency, the ball will follow your hand in phase. As the speed increases it will reach a speed where the motion of the ball is mostly in the vertical direction (the imaginary direction) and the sideways motion (the real direction)is minimal. As the speed increases further, the ball will become out of phase with your hand, being on the opposite side of each swing. The physical meaning of the term ~tfa' (the change in the real part of the scattering at a given wavelength, A) and Afa" (the change in the imaginary part of the scattering) can be explained in terms of this mechanical model. The term Afa" is the vertical movement of the ball at the point of resonance and Afa' is the dampening of the range of oscillation in the horizontal movement. The sign of Afa' is negative since the term represents an absorption of energy. The total scattering is then f~ =

f0

+

Af~

+

Aft.

(3.9)

Percentage Completeness The number of unique reflections measured over the number of unique reflections possible is the "percentage completeness." The amount of completeness needed is dependent upon many factors, but above 80% complete, data are sufficient for almost all purposes. Data missing from a single region are worse than data missing in random positions. If there are some data in all directions, a lower completeness can be tolerated.

Filtering of Data It is often important to filter out unreliable data and outliers that can skew results. The F/cr(F) criterion has been mentioned. If there are two measurements being compared, the ratio (F1 - F2)/[(F1 + F2)/2.0] (the differ5Stout and Jensen, pp. 218-222.

3.3 DataReductionand StatisticalAnalysis

117

ence over the average) is very useful. If this ratio is greater than 1, one of the measurements may be an outlier.

Merging and ScalingData Merging data refers to finding reflections with common indices and placing them together. Merging of data sets is desired for isomorphous phasing, for comparing two data sets, and for combining mutant and native data for difference map purposes. Scaling data is the operation of setting the sum of one data set to the other. In the ideal case, a single overall scaling factor is sufficient to scale two data sets. Scaling can also be much more complicated. Systematic errors in the data due to absorption and decay mean that different parts of the data require different scale factors. Scaling is often done in bins based on resolution. Anisotropic scaling uses six scaling parameters derived by least-squares fitting. Local scaling scales together data from the same local region of reciprocal space and scales each reflection individually.

Resolution Bin Scaling The size of bins in scaling requires careful consideration. In the best circumstances only one scale factor is needed. To overcome systematic errors, the data are usually divided into small bins that are scaled individually. If the bins are too small, the scale factor becomes less accurate. At the extreme, if a separate bin is used for each reflection, the two scaled data sets become equal and any information is lost. It is usually necessary to pick bins that have at least 100-200 reflections each. In single-parameter scaling, a single parameter is used to scale the data in each bin such that

(3.10)

IF I = 2 IF I x scale.

A problem with bin scaling is that there is an abrupt transition at the zone between bins. To overcome this, a continuously varying scale factor can be used by fitting a line to the scale factor found in the bins or by interpolating between bins.

Anisotropic Scaling Anisotropic scaling fits six parameters to the data to find a continuously varying scale factor in three dimensions along the reciprocal space lattice. The six directions are along the principal axes, h, k, and l, and along h x k, h x l, and k x I by fitting an equation of the form s = h* h* all + k* k,a22 + h,

k,a12

+ l , l * a33 + h,

l , a13 + k ,

l , a23.

(3.11)

COMPUTATIONALTECHNIQUES

118

to the differences between the two data sets to be scaled. If a11, a22, and a33 are equal to 1.0 and a12, a13, and a23 are equal to 0.0, this is equivalent to a single uniform scaling parameter of 1.0. Finally, the two techniques, bin scaling and anisotropic scaling, can be combined. The data are broken up into large bins within which an anisotropic scaling is used. This is the method used by the XtalView program, xmerge (Fig. 3.5). The data are broken up into bins of resolution, and within each bin of resolution a six-parameter anisotropic scale factor is applied. This method is especially useful for heavyatom data, although it is common for low-resolution, heavy-atom data to have larger differences because the dissolved heavy atom is present in the solvent region. Thus, it is desirable to scale these data separately from the higher-resolution data so that they are not adversely affected by the very large differences in the lower-resolution data. The smallest number of reflections included in each bin should be about 1 0 0 - 2 0 0 reflections, with 500 being a good average number, so that the six anisotropic scaling parameters are well overdetermined. Too few reflections will result in overfitting the data. To divide data into equal bins of resolution,

r"[EJ

Merge and Scale

"

Crystal:

" . . . . . . . cvccpA

''

'

1.0

"

~.1.8

U n i t Cell: 49.20 56.70 98.80 90.00 90.00 90.00 D irecto ry: /tin p_ m n t / h o m e / d e m/han d book/lab de mo Fin File 1: natav.fin Fin File 2: ccpniaul.fin O u t p u t File: natav,ccpniaul.df

Output T y p e :

I Dou,,e

Sigma Cutoff: 0 Number of bins: 10

(.d+

I Si,ngle'(.fln)

"" . . . . . . .

'

I

,, 1

,

0

, 20

Scaling Type: {Single I Anisotropic I

o.o Shell Shell Shell Shell Shell Shell Shell Shell Shell Shell Shell .

1 2 3 4 Resolution 2.764 - 2.912 Scale=1.375 R=0.565 D e l t a l = 7 . 3 0 R e s l t n 8.502 - 34.782 Rstart=0.591 Rscaled=0.555 Deltal=11.75 R e s l t n 6.104 - 8.502 R s t a r t : 0 . 6 2 1 Rscaled=0.591 Deltal=9.70 R e s l t n 5.009 - 6.104 R s t a r t = 0 . 5 7 4 Rscaled=O.544 Deltal=7.13 R e s l t n 4.349 - 5.009 Rstart=0.571 Rscaled=0.537 Deltal=7.96 R e s l t n 3.896 - 4.349 R s t a r t - 0 . 5 7 9 Rscaled-0.548 Deltal-8.55 R e s l t n 3.561 - 3.896 Rstart-O.58G Rscaled-0.5G2 Deltal-8.88 7: R e s l t n 3.299 - 3.5G1 Rstart=0.5G3 Rscaled=0.532 Del t a l =8. 04 8: R e s l t n 3.088 - 3.299 Rstart--0.559 Rscaled=0.523 !De]ta!=7.G8 9: R e s l t n 2.912 - 3.088 Rstart=0.57G Rscaled=0.511 IDeltal=G.75 10: R e s l t n 2.784 - 2.912 Rstart=O.5G5 Rscaled=0.493 IDeltal=G.38

10: 1: 2: 3: 4: 5: G:

.

.

!Delta

.

.

.

.

.

5 G n=300 n=160 n=330 n=442 n=539 n-608 n-673 n--749 n=79G n=800 n=300

7 rfls rfls rfls rfls rfls rfls rfls rfls rfls rfls rfls

8

9

I ,, io.o

10

.

FIG. 3.5 Xmerge user interface. Xmerge is used to combine and scale two data sets together. The results are graphed on the screen and are listed. The user can change the settings and rescale until the best results are obtained.

J

I~1

l| I| I| l| I| I| l|

| | | I | | |

II I I1~11 II--il /r--1

3.3 DataReductionand StatisticalAnalysis

119

the bins should be divided into equal segments of ((sin ~9)/A)3 between the minimum and maximum values. This will put approximately the same number of reflections in each bin. Another method is to sort the data on resolution and then divide the data into equal portions. Past experience with data scaling has shown that scale parameters change faster with lower-resolution data than at higher resolutions. Thus, the bins should include fewer reflections at low resolution than at high. In xmerge this is handled by binning based on ((sin O)/A) 2 instead of ((sin O)/A)3, which seems to be a good compromise.

Local Scaling Another useful technique is local scaling. Some data sets vary too quickly to be handled by anisotropic scaling. In local scaling, a scale factor is computed for each individual reflection by considering only the data in a local block centered about the reflection. For example, the scale factor could be set by setting the sums of the data in 5 x 5 • 5 blocks based upon h, k, l to be equal, and then applying that scale factor only to the reflection at the center of the block. The block is then moved over one row at a time. This method is used quite often to scale Bijvoet data together. Again it is important not to use too small a block size. Local scaling can smooth out the largest differences of any of the techniques discussed. Beware, however, for in isomorphous derivatives, where large differences are often real, local scaling can scale out a significant portion of the signal.

Multiple Data Set Scaling So far, the scaling of a single pair of data sets together has been discussed. A more complicated situation is the scaling of several partially overlapping data sets simultaneously: for example, scaling together several different data collection runs on the same or different crystals. This is usually handled in an iterative manner. Since the scaling of one data set affects the scaling of all the other data sets, it is not possible to derive the best scaling parameter for each data set independently. Therefore, the scaling parameters are set to initial values and each bin of data is rescaled until the process converges. Scaling that diverges rather than converges indicates serious errors in one or more of the data sets. Often the bad data can be detected by examining the individual scale factors. The bad data may be due to excessive radiation damage. Alternatively, the crystal may have suffered some catastrophe such as drying out, or a single run among several correct runs may have been misindexed. Throw these data away and continue the scaling with the rest.

120

COMPUTATIONALTECHNIQUES

Heavy-At0m Statistics Several statistics are useful in determining the probability that a crystal has been derivatized and that the differences, if any, are isomorphous. 1. The absolute size of the differences, IFv - Fp~I should fall off with resolution (Fig. 3.6). If they increase with resolution, then the derivative is nonisomorphous. This is because the heavy-atom scattering falls off with resolution owing to its scattering and thermal factors. If the derivative is isomorphous, the total scattering is the vector sum of the derivative amplitudes and the original protein amplitudes. In a nonisomorphous derivative protein, amplitudes are changed as result of changes in the unit cell and heavy-atominduced movements of the protein. Differences due to these effects increase with resolution. 2. The root-mean-square differences should be larger for centric reflections than for acentric reflections. This is because the centric zones have phase restrictions so that the full heavy-atom magnitude is either added or subtracted. In the acentric case, the vectors can have any angular relationship so that at one extreme the heavy-atom vector produces no change in intensity, and at the other extreme the vectors align to produce the maximal change. The average intensity change is 1/V~ of the heavy-atom magnitude. This is illustrated in Fig. 3.7. 3. The size of the differences should correlate with the size of F, which is partly a restatement of point 1, since the mean size of F also falls off with resolution. The overall size of the differences is important for the derivative to have any phasing power. In our lab we use some rough rules of thumb, which are fairly liberal to prevent throwing out a potentially useful derivative. If the root-mean-square differences on F (not I) are above 15% for the 5-A data, there is definitely a signal present. If the differences are 9 - 1 5 % and the data have been accurately collected, the derivative may be useful. Below this we have never found a useful derivative. We have solved the atom positions for weak derivatives with differences of about 7 - 8 % , but when they are refined they have little to no phasing power. The size of the isomorphous signal for a given heavy atom can be estimated from a simple formula derived here. The size of the resultant vector from adding a series of N random-walk vectors of length fis fv/-N. The maximum isomorphous difference is f,,~-N--~J~lfpV~,,, where f~, is the scattering

FIG. 3.6 Graphs of heavy-atom statistics for a derivative with good phasing power ((A), (B), and (C)) and from an unusable derivative ((D), (E), and (F)). See text for an explanation of the expected curves.

3.3 Data Reduction and Statistical Analysis

121

R-factor (IF1-F21/F1) vs. resolution for enpt7iso.fin

A 0.276

i 0.221 9

0.166

i

,

"

"

i

i

3.32

3.02

IF1-F21/F1 O.ll

0.0552

42.3 | H

4.79

3.8 Resolution (Angstroms)

2.8

I A117411 reflections Centric 1042 reflections Acentric 6369 reflections

Delta (IF1-F21) vs. resolution for enpt7iso.fin

B

162 q

130"

. . . . . . . .

r

'

"

I

[ 97.3 IFI-F21 _

64.9

~

I

.......

~

..........

',

,

_

i

_

~~

,~=---Z

.

32.4

0 42.3

4.79

+ - - - + All 7411 reflections ~. ~ Centric 1042 reflections Acentric 6369 reflections

i

|

I ' I '! i ,

I I

3.8 Resolution (Angstroms)

I I

3.32

3.02

2.8

C

Delta (IF1-F21) vs. amplitude for enpt7iso.fin 143

.

.

.

.

.

.

.

.

,

,

m

114

f E !

: 85.7 IF1-F21 57.2

28"60 !

9

,,

11.4

,

,

167

"k---+ All 7411 reflections H Centric 1042 reflections H Acentric 6369 reflections

323 Amplitude

,

479

635

791

R-factor (IF1-F21/F1) vs. resolution for ccpniulab.fin

D 0.177

0.142

0.106 IF1-F21/F1 0.071 I I i

,, 0.0355

. . . .

i i I i i ~

1 i | i i i i

C

i

28.3

H

,

,,,

I

,,,

5.1

All 4808 reflections = Centric 754 reflections Acentric 4054 reflections

4.0':., . . . . Resolution (Angstroms)

FIG. 3.6 (continued). 122

3.54

3.21

2.98

Delta (IF1-F21) vs. resolution for ccpniulab.fin 2.42

1.94

1.45

IF1-F21 0.969

0.484

0

,

28.3 |

,

I,

'

,

5.1

,i

,,,

,

4.05 Resolution (Angstroms)

3.54

3.21

2.98

~ All 4808 reflections Centric 754 reflections Acentric 4054 reflections

Delta (IF1-F21) vs. amplitude for ccpniulab.fin

F 2.95 ,

,i

.

.

.

.

.

.

.

': l !

2.36

.

.

.

.

.

.

.

.

.

.

.

.

;

i:

|

.

.

.

.

'

.

/

r

!

1.77 -

m

! ! !

IF1-F21 1.18

0.591

1.26

6.22

11.2 Amplitude

| All 4808 reflections O----O Centric 754 reflections H Acentric 4054 reflections

FIG. 3.6 (continued). 123

16.1

21.1

26.1

124

COMPUTATIONALTECHNIQUES

centric:

FP

= FH.~ v

,r

FPH

acentric:

~~FpH

FH

FIG. 3.7 Centric and acentric vector constructions. Where F~,and FH are the same magnitude, in the acentric case IFp.l is considerably less than IFrl + IFHI.In general, centric reflections will have a difference equal to the amplitude of F. (except in rare crossover cases), but an acentric difference is usually less than the amplitude of FH (except, of course, when F. and F1,have the same or opposite phases). power of the NH heavy atoms and fe is the average scattering power of the Np protein atoms. Given the average ratios of carbon, nitrogen, oxygen, and sulfur atoms in a typical protein, the average scattering power of a protein atom is 6.7 e - and the average molecular weight of a protein atom is 13.4. Substituting these numbers into the equation, we get f H V ~ H / 6 . 7 V ' M W / 1 3 . 4 , or AF/Fma x =

0.55

f.x/-~. ~ . x/MW

(3.12)

This gives the signal expected for centric reflections where the vectors are colinear. For the acentric case the vectors are not correlated, and so the difference will be lowered on average by 1/V~: AF/F ..... tric = 0.39

N/MW"

(3.13)

If we consider occupancy, this has the effect of lowering f~, by a proportionate amount, that is, f[~ - fH " Occupancy. This formula gives an upper estimate, since disorder will tend to lower this estimate. For example, if we have a 32,000-Da protein with a one-site mercury (f = 80) derivative that is half-occupied, the expected AF/ ...... is 12 %. The expected AI/I will be twice this, or 2 4 % . A second site of the same occupancy would raise the isomorphous signal by the square root of 2, or 1.4, to 17%.

.....

3.4 . . . . .

T H E PATTERSON SYNTHESIS N a m e d for the man who invented it many years ago, the Patterson synthesis is still very useful. Patterson's maps are obtained from a Fourier synthesis by assuming the phase is 0.0, squaring the amplitudes, and adding a

3.4 The PattersonSynthesis

125

center of symmetry. A Patterson map shows the vectors between atoms in the unit cell instead of the absolute positions of these atoms (Fig. 3.8). It is still the standard method of locating heavy atoms in the unit cell. A Patterson map of the difference between a data set with a heavy atom added and the data without (Fp. - Fp) gives the vectors between the heavy atoms.

Patterson Symmetry The symmetry of a Patterson map is the symmetry of the space group in real space with a center of symmetry added and all translational elements set to 0.0. To add the center of symmetry, each operator, in turn, is copied and multiplied by [ - 1, - 1, - 1] so that the number of operators is doubled. The asymmetric unit is one-half the size of the real space asymmetric unit. Some symmetry elements are changed by this procedure into new ones. For example, a 2x axis turns into a mirror perpendicular to the original axis. The symmetry of Patterson space is similar to that of reciprocal space except for systematic absences and centering.

Calculating Pattersons Patterson maps are calculated as for a normal Fourier except that the Patterson symmetry operators are used, the amplitudes are squared, and the phases are all set to 0.0. If you are using XtalView, the proper symmetry is generated automatically and the amplitudes are squared when a Patterson synthesis is selected in the program xfft (Fig. 3.9). A difference Patterson is calculated by squaring the differences between two amplitudes and adding a phase of 0.0. Xfft will make difference Pattersons from . f i n and . d f files. Because the differences are squared, a few large differences can dominate the

F2

FIG. 3.8 Patterson vectors: a simple three-atom molecule (left) and its Patterson synthesis (right). The Patterson is derived by taking each atom in turn and placing it at the origin (0, 0, 0). Note that the Patterson pattern has an added center of symmetry.

126

COMPUTATIONALTECHNIQUES

Crystal: cvccp Unit Cell: 49.20 56.70 98.80 90.00 30,00 90.00 Directory: /tmp_mnt/asd/prog/XtalView/examples/

Direction of Planes:

I•

(Defaults v )

(~ sections)

r.,a,, Ty,,e: I Fo' o Input

c~ sections) ! z (xy sections) I

I F( I

I

I Fo'Fo (Patterson) I

Phase File: ccp.aul.phs

(Phase File Type v )

Fourier map from h,k,l, fo, fc, phi (degrees)

Reso I utio n Filter: 37.1605 to: 5.0008A A ngstro ms Outlier Filter: differences > 1000 r~[~ % of average (Read Phase File) Grid

Output

Number in X: 30 ~ Y: 35 [X'[~ Z: 60 [XT~ OR precentage of m i m i m u m resolution: 33 OR approximate spacing in Angstroms: 0.00 Map File: ccp.aul.map

( Calcu late) Opened Fourier coefficients are (2*Fo -Fc) * exp (i *phi c) Opened for reading f i l e /tmp_mnt/asd/prog/XtalViev/examples/ccp.aul.phs 1244 reflections read, resolution l i m i t s : 37.160431 - 5.000852 H l i m i t s : 0 - 9 K l i m i t s 0 - 11 L l i m i t s 0 - 19

FIG. 3.9 Xfftprogram interface. Xfft is used to calculate electron density maps by a fast Fourier transform method. The user interface illustrates the options available. The user selects the desired options by selecting buttons and entering numbers in the appropriate fields and then pushes the Calculate button. The options can be saved and later loaded using the Defaults menu button. map. If these differences are caused by outliers, they can completely swamp the signal. Therefore, a filter has been added to xfft to detect obvious outliers and delete them. They are detected by rejecting any reflection where the IF1 - F2] > [p * (F1 + F2)/2], or where the absolute value of the difference is greater than p times the average of the two reflections, and is usually set to 1 0 0 % for isomorphous differences and about 3 0 % for anomalous difference Pattersons. If p is set greater than 2 0 0 % , then no differences will be rejected. An example of the usefulness of this filter is shown in Fig. 3.10.

3.4 The PattersonSynthesis

127

B ~176176176176176176 j(..zj,' ' Y'

,3

'

'~..]

~.5~

0.000,0.000

,~.~,,,

,

,

,

0.500,

cJ

0.444 x = 0.500

x = 0.500

FIG. 3.10 Effect of filtering outliers. (A) Gold-derived Patterson map without filtering of outliers. (B) The same data filtered so that if IFp - Fpnl is greater than 100% of Fp + FpH, the reflection is rejected. This filter rejects a handful of very large differences that were dominating the Patterson and making it uninterpretable. After removal of the outliers, the Patterson was interpretable, and this derivative turned out to have excellent phasing power.

There are Nz peaks in a Patterson map, where N is the number of atoms in the unit cell. Of these, N are vectors between the same atom and fall on the origin. This leaves N ( N - 1), and the unique peaks are N ( N - 1 ) / Z , where Z is the number of asymmetric units in the Patterson space group. For example, if we have 3 atoms in the asymmetric unit of an orthorhombic space group, then there are 12 atoms in the unit cell. There are 144 vectors, of which 132 are not at the origin. Since the Patterson has a Z of 8, there are 132/8 unique vectors or 16.5. The fraction occurs because some peaks are on mirror planes and are shared by two adjacent asymmetric units.

Harker Sections The peaks on a Patterson map result from all possible combinations of vectors between atoms in the unit cell including symmetry-related atoms. The symmetry-related peaks fall onto special positions, are called Harker peaks,

128

COMPUTATIONALTECHNIQUES

and fall onto Harker sections. For example, in the space group P2, for every atom at x, y, z there is a symmetry mate at - x, y, - z for which a vector will occur at (x, y, z) - ( - x , y, - z ) = 2x, 0, 2z (Fig. 3.11). Thus, on the Harker section y = 0 peaks can be found at 2x, 2z for each atom in the asymmetric unit. However, while all Harker peaks fall on Harker sections, not all peaks on a Harker section are Harker peaks. If two atoms, not related by symmetry, happen to have the same y coordinate, they will produce a cross-vector on the section y = 0. The positions and relationships of Harker peaks (also called self-vectors) can be found by subtracting the possible combinations of symmetry operators for a given space group pairwise as shown in Table 3.2. The peaks for the example in Table 3.2, space group P2221, fall onto three Harker sections: x = 0, y = 0, and z = 1/2. The Harker sections can be found by noting places in the table where one of the coordinates is a constant. An examination of the space group symmetry will also reveal the Harker sections, although not the full algebraic relationship. A 2-fold axis will have a corresponding Harker section at 0 in the plane perpendicular to the axis, a 2-fold screw at 1/2, a 3-fold screw at 1/3 and 2/3, and so forth.

Solving Heavy-AtomDifferencePatters0ns Before you can use a heavy-atom derivative for phasing, the positions of the heavy atoms in the cell must be found. If no phasing information exists, the only map that can be made at this point is a difference Patterson map. The Patterson map must be solved for the x, y, z positions that produce this pattern of vectors. Note that the solution to the Patterson function is not unique; many different sets of positions can explain the same Patterson. The different solutions fall into two categories, origin shifts and opposite-hand

- ~ 2x,1/2 ~I -x,y+l/2

r

F2 v

0,0

0 x,y --~

-2x,-1/2

FIG. 3.11 Originof Harker planes. Left: Two atoms related by a 2-fold screw axis along y that displaces the second copy of the atom by 1/2 along y relative to the first. Right: Patterson synthesis is constructed, the self-vectors fall along the planes y = _+1/2.

3.4 The PattersonSynthesis

129

TABLE 3.2

Harker Vectors for Space Group P2221 P2221

x, y, z

x, y, z 1/2 + -x, y, 1/2 - z -x,

-y,

x, -y, -z

z

-x,

- y , 1/2 + z

- x , y, lA - z

x, - y ,

0, 0, 0

2 x , 2y, 1/2

2 x , 2y,

1/2 2x, 0, 1/2 - 2z

0, 0, 0 0, 2y, 1/2 - 2z

0, 1/2- 2z 0, 2y, 1/2 - 2 z 0, 0, 0

2x,

0, 2y, 2z

2x, 0, 1/2 - 2z

2x, 2y, 1/2

0, 0, 0

2x,

-z

0, 2y, 2 z 0, 1/2 - 2z 2x, 2y, 1/2

choices. Fortunately, any self-consistent solution will give phases that will produce the same protein map. The choice of origin is arbitrary, and it will not matter which is chosen. The choice of hand is important because only a right-handed solution is correct. However, for isomorphous derivatives an incorrect choice of hand will produce a left-handed protein map that otherwise is identical to the right-handed map (this is n o t true for anomalous data). This can easily be remedied by inverting the signs of all of the heavyatom positions and recalculating the phases. If two derivatives are solved from their Patterson maps, there is no way to k n o w whether they have the same origin and hand. To solve this problem, difference Fouriers are used (see the following) to put both derivatives in the same framework. If this is not done, then the derivatives cannot be combined. There are two ways to solve Patterson maps. One involves visual inspection and the calculation of heavy-atom positions by hand. This is usually not as difficult as it might seem at first because symmetry produces many special relationships between Patterson peaks that make it easy to find the solution. The other method involves using the computer, which, when it works, is obviously the easier method. But there is no guarantee that the computer can find the correct solution. Before trying any method, evaluate the quality of the Patterson map to see if it is possible to solve it at all. Frequently, the first derivative has a Patterson that is not easily solved. It is sometimes more fruitful to keep looking for a derivative that produces an easily solved Patterson map that can be used to bootstrap the rest. In my experience, this simple strategy has almost always worked. Of course, if you have four protein molecules in the asymmetric unit, your chances of finding a single-site derivative are pretty unlikely. To improve your chances of solving a Patterson, either by hand or by computer, it helps enormously to make the best Patterson map you can. Because the differences are squared, Patterson maps are easily overwhelmed by a few large differences. Outlier differences should be filtered out as discussed earlier. A value of 100% is usually about right for this filter, since larger

130

COMPUTATIONALTECHNIQUES

differences are often incorrect. Deleting differences smaller than 100% can lower the signal-to-noise ratio. If the derivative has a high merging R-value (above approximately 0.20), then a larger percentage may be needed. The other filter to be set is resolution. Data below about 25 A are often clipped by the beam stop on most data collection systems. If the beam stop is not perfectly round, the amount of clipping could be different between data sets. Also, very-low-resolution reflections will have strong differences that are due to the change in contrast from native to derivative mother liquor, either because the soaking mother liquor is higher in precipitant or because the dissolved heavy atoms in the soaking mother liquor interfere. Often, leaving out such data will improve the Patterson map. As resolution increases, data become noisier because they are weaker, and often become less than perfectly isomorphous. This can give higher-resolution Patterson maps lower contrast. As a first approximation, 4 - 5 A is a good upper-resolution limit. The quality of the Patterson can be judged by looking at the peak-to-background ratio of the Patterson map. The background is taken to the root-mean-square, or sigma (or), of the entire map. Peak heights are then expressed as ratios of the peak height over the sigma of the map. If you use xcontur to contour your Patterson map, it automatically sets the first contour level to lo-, the second to 2o-, and so on. Look first at the Harker sections for large peaks. If the Patterson is complex, then the Harker sections may be crowded and looking at general sections may be more fruitful. Try varying the resolution and outlier filter to produce the best contrast ratio. I cannot overemphasize the need to filter outliers and very-low-resolution data; otherwise interpretable Patterson maps may be overwhelmed by a few bad differences (see Fig. 3.10 for an example). The effect of changing the upper-resolution cutoff should be more subtle. At some point the contrast will be maximal, but the basic features of the Patterson map should not change with resolution. If they do, this is a bad sign, indicating that the differences are not due to the isomorphous addition of heavy atoms. Classical Methods In addition to this explanation of how to solve Patterson maps by hand, examples are given in Chapter 5. In most cases the strategy is first to find a Harker peak or set of Harker peaks that gives a first site and then to use cross-peaks to find additional sites. Make a table of symmetry operators and find the positions of the Harker peaks as is illustrated in Table 3.2. Sometimes special relationships can be found between Harker peaks that make it easy to find Harker peaks that arise from the same site (see the Patterson examples in Chapter 5, Section 5.1). This helps in identifying non-Harker

3.4 The PattersonSynthesis

131

peaks that happen to fall on a Harker section. Related sets of cross-peaks can also sometimes be found, and relationships between Harker peaks and crosspeaks can often be identified, making it easy to find related sets of peaks. These relationships can be found with a little simple algebra using the symmetry operators of the space group. For example, say you have a Patterson in space group P222. The three sections x = 0, y = 0, z = 0 are Harker sections and the peaks on these are from vectors 2z, 2y, 2z. You can look for single-site solutions that match between different Harker sections. In some cases a single site will explain all the Patterson peaks, and there are no significant non-Harker peaks. Congratulations, you are done. If not, look for a cross-peak. You can find a second site by taking the position of site I and adding the cross-peak. However, remember that you do not know from which symmetry-related atom in the unit cell this peak arises. Also you do not know in which direction the vector goes: from atom A to B or from B to A. Therefore, you must try all of these possibilities. In this space group there are four symmetry-related atoms: A1, A2, A3, a4. Given a cross-peak X we need to try A~ - X, A2 X, A3 - X, A4 - X and also A1 + X, A2 + X, A3 + X, A4 + X to find atom B. We can confirm atom B by finding the Harker peaks predicted by atom B and also the other cross-peaks generated by symmetry-related atoms at sites B and A. These vector calculations are most easily done by a computer. Xpatpred in XtalView can generate all the heavy-atom vectors given a list of sites. These can then be loaded into xcontur and compared against the Patterson map. Each of the possibilities can be tried and the results quickly scanned by looking at the agreement with the peaks in the Patterson map. You can maximize your chances of finding a match by starting with the highest Harker peaks and the highest cross-peak. This may fail, though, because the highest peaks may be due to two or more peaks that happen to fall in the same place, and the extra height is fortuitous. Many examples of this can be seen in the Pattersons for the heavy atoms of C. vinosum cytochrome c', illustrated in Chapter 5. A noncrystallographic 2-fold causes many sites to be related, and the vectors fall in clumps. Also, it is possible that the cross-peak X is not between A and another atom, but between two other atoms. If you cannot solve a Patterson, put it aside and look for a derivative with one that is solvable. Later you can easily solve the difficult Patterson by cross-Fourier (see the following) with the solved derivative phases, so all is not lost. Computer Methods Several programs have been written that try to solve Patterson maps automatically. I know of three, HASSP, written by Tom Terwilliger, SHELXS, a commonly used small-molecule crystallography program, and XtalView/

132

COMPUTATIONALTECHNIQUES

xhercules, a correlation search method I wrote. HASSP and SHELXS first look for single-site solutions that explain the Harker vectors and then look at cross-peaks and try to find pairs of positions with the best match to the density in the Patterson map. This is very similar to the classical method. One problem I have seen is that, since the programs do not always account adequately for the fact that a peak has already been used, overlapping solutions can be found. Both programs work well on clean Patterson maps. The XtalView program xhercules uses a different approach to automatic Patterson solution that works well but is very computer-intensive. A single atom is moved around the entire asymmetric unit on a grid, and at each position a correlation is calculated between the observed differences and the calculated heavy-atom amplitudes. (The use of a correlation function, rather than an R-factor, is important because the scale factor cannot be computed correctly and the correlation is independent of scale.) This atom is then placed at the position with the highest correlation (Fig. 3.12). A second atom is then moved about the asymmetric unit and the correlation calculated with the first atom held fixed. This atom is then fixed at its highest correlation. The relative occupancies are then refined by another correlation search. A

.O(X) ,

~1

a

.,

J

&

~r

I

"

0.500 Z = O.(X~)

FIG. 3.12 Xherculescorrelation map section of a single-site correlation search for a platinum derivative of photoactive yellow protein. The space group is P6 ~--hexagonal with 6~ screw axis along the z direction. The single site is found at 0.25, 0.08, 0.0 (z is arbitrary because there is no orthogonal symmetryelement). The second peak is related to the first by Patterson symmetry.

3.4 The PattersonSynthesis

133

third atom can then be searched for in the same manner, and so forth. Each correlation search takes a large amount of computer time--several hours on a typical workstationmwhich gets longer as more atoms are added. An intelligent choice of the asymmetric unit helps reduce time. Remember that the asymmetric unit is one-half the size in reciprocal space. The search grid needs to be at least one-fourth of the minimum resolution and preferably one-sixth. The resolution cutoff should be as low as 6 A for large unit cells and as high as 4 A for small unit cells. There must be a large ratio of differences to atom parameters, (x, y, z), because the differences are only an approximation of the heavy-atom vectors. For most proteins at the resolutions suggested, this can be kept at about 50 to 1. Although the method can be used automatically, it is far better to check the results against the Patterson map. This can be done by writing out the solution, reading it into xpatpred, and displaying the output of predicted vectors on the Patterson map with xcontur. For best results, an idea of the relative occupancy of unfound sites is needed, although tests have shown that it is not critical. The relative occupancy can easily be estimated by inspection of the remaining density in the Patterson map. How successful is the method? It seems very robust. A six-site solution to a complex Patterson with many overlapping vectors was correctly found for a PtC14 derivative of C. vinosum cytochrome c' in P212121 (three Harker planes at x -- 0.5, y - 0.5, z - 0.5). The solution was found before any other derivative and was later confirmed independently by cross-phasing from another derivative whose Patterson was solvable by inspection. The platinum derivative evaded a manual solution because the largest peaks on the Harker sections turned out to be due to a mixture of cross-peaks and Harker peaks. Another successful approach to heavy-atom position determination entails direct methods. The programs used by small-molecule crystallographers, MOLTAN and SHELXS, have served this purpose, using the derivative differences as input.

Anomalous Difference Pattersons If there are atoms in the structure that have an anomalous scattering component (i.e., have an absorption edge near the wavelength being used to collect data), then the differences between the Bijvoet pairs may be large enough for phasing. For proteins, the most likely naturally occurring anomalous scatterer is iron, and many of the heavy-atom compounds used, such as mercury, platinum, uranium, and the lanthanides, have significant anomalous scattering signals. Centric reflections do not have an anomalous component because the signal exactly cancels out in a centric projection. If you

134

COMPUTATIONAL TECHNIQUES

have collected symmetry mates of centric reflections, they can be used to estimate the anomalous signal by comparing them to the acentric reflections. The difference between the centrics is the "noise," and the "signal" can be found using the formula (noise) 2 + (signal) 2 = (differences) 2. In practical terms this formula tells us that the signal will be larger than simply the difference between the centrics and the acentrics. The anomalous signal, in fact, can be detected even in noisy data because the pairs are usually collected from the same crystal near in time, which eliminates many of the scaling problems. Thus, random noise is a less serious problem than systematic errors, such as those due to absorption and X-ray damage. The expected anomalous difference (F + - F - ) / ( F ) can be estimated using the same equation as for the isomorphous case (see preceding) except that fH is replaced by 2f". Some expected anomalous differences are listed in Table 3.3 for atoms that give usable signals for CuK,, radiation. If you can tune the wavelength as at a synchrotron, the signal can be optimized, and edges for more elements become available. Before an anomalous difference Patterson can be calculated, the data should be nearly complete to the resolution you wish to use. This means

TABLE3.3 Anomalous Scattering Signals for CuK,, (1.54A) Radiation Percentage F ~ - F

Percentage A F/F 1 0 - kDa

32 - kDa

1 0 0 - kDa

Electrons

( 3 0 - kDa

protein)

Af"

protein

protein

protein

S

16

5.1

0.6

0.46

0.26

0.14

8.3

Element

Fe

26

3.2

2.46

1.38

0.77

Pd

46

15

3.9

3

1.68

0.94

Ag

47

15

4.3

3.31

1.85

1.03

I

53

17

6.8

5.24

2.92

1.63

Sm

62

20

12.3

9.47

5.29

2.95

Gd

64

20

11.9

9.16

5.12

2.86

Pt

78

25

6.9

5.31

2.97

1.66

Au

79

25

7.3

5.62

3.14

1.75

Hg

80

25

7.7

5.93

3.31

1.85

Pb

82

26

8.5

6.54

3.65

2.04

U

92

29

13.4

10.32

5.76

3.22

3.4 The Patterson Synthesis

135

complete in terms of Bijvoet pairs, which means collecting twice as many data in the correct positions. An anomalous difference Patterson (also known as a Bijvoet Patterson) is made using the differences between the acentric reflections as the Patterson coefficients (Fig. 3.13). Care must be taken not to use the centrics. Since in XtalView the difference between centrics will be 0, even if the centrics are left in, they will make no contribution to the differences. Interpret an anomalous difference Patterson exactly as you would an isomorphous difference Patterson. Because the centrics are left out, however, even a perfect set of anomalous differences will give series termination errors that can lead to small peaks not due to scatterers. These can be detected by making a calculated Patterson using the same reflections list as the observed Patterson and coefficients calculated from the heavy-atom positions (with XtalView you can use STFACT for this purpose). If a peak appears that is not in the atom list used for the calculation, it must be a series termination error. In fact, the lower the resolution, the higher the percentage of centric reflections, so very-low-resolution anomalous difference Pattersons have more noise due to series termination. One use of anomalous differences in heavy-atom work is to compare them to the isomorphous difference Patterson. An anomalous scatterer represents independent measures of the heavy-atom positions and, as such, comparison of the two Pattersons gives extra confidence in determining the heavy-atom positions. A peak on both Pattersons is more likely to be correct. It also possible to make a Patterson by combining the information from both sets of differences. Before doing this, it is worthwhile first to check that the anomalous Patterson actually has some signal. Adding noise to the isomorphous Patterson will not make it more interpretable. There are two methods used to combine the signals. The simplest is to make a Patterson with the coefficients AF{so + AFa2no . The other uses FHLE coefficients, 6 which gives slightly higher peaks than the first. In either case, the improvement of the Patterson is better judged by noting the number of contours a peak of interest is above the root-mean-square density (or sigma) of the Patterson map. If the main peaks do not increase, it is possible that the anomalous signal is too small to be of use. If the anomalous data are too incomplete to give a Patterson map alone, it is still possible to use combined coefficients to augment the isomorphous data. Again, use the criterion of peak height to judge the effectiveness. Another criterion is the flatness of areas without peaks. They should become cleaner if the Patterson has been improved. 6For a discussion and derivatization of pp. 338-340.

FHLE

coefficients, see Blundell and Johnson,

COMPUTATIONALTECHNIQUES

136 A 0.000 ~.ooq

,

,

,

Y.

.

.

p.5oc

B 0.0o09.009

.

,

,

X,

Q.50C

Z

0.500

0.50( X=0.5~

Y=0.5~

G 0.000

3.oo9 , ~ x ,

. 9.~

0.50(3 z = 0.500 FIG. 3.13 Sulfite reductase Bijw~et difference Patterson. The anomalous scattering of sulfite reductase is due to the presence of an Fe4S4 cluster and a heine iron. At low resolution the individual scatterers are not resolvable and form a single large site. The space group is P2 ~2121 and the coefficients are (F~I - F , )2. (A)m(C) Harker sections x = 0.5, y = 0.5, and z = 0.5. Note that the peak on the section x = 0.5 overlaps onto the other Harker sections. At higher resolution the peaks are resolvable from the edges. (D) A three-dimensional stereo view of the Patterson, showing that there are just three peaks, not including the origin peak at O, 0, O.

3.5 FourierTechniques D o.oooP.OqO , , . x. ,

:).OqO

137

.

,

, x,

.

O,S(X:

A

0.50( Y=O.O-0.5

FIG. 3.13 (continued).

. . . . . 3.5 . . . . . FOURIER TECHNIQUES It is a happy day in any structure determination when Fourier maps, which are very straightforward to interpret, can be used instead of Pattersons. On the other hand, a Fourier requires phases, and the quality of the phases very much determines the quality of the resulting map. For Pattersons, the quality is dependent only upon the accuracy of the amplitudes used. In fact, it has been shown that while a Fourier map made with random amplitudes but correct phases is easily interpreted (Fig. 3.14), the opposite case, correct amplitudes and random phases, is not (which is the root of all this phasing trouble). This does not mean that the amplitudes can be ignored when there are phases. In crystallography we work in the gray area between the two extremes of correct and random phases. In this case, correct coefficients (e.g., 2Fo - Fc) do make an important difference in the quality of the resulting map. Also, do not take this to mean that the amplitudes do not need to be accurate. Without accurate amplitudes there is no way to derive accurate phases.

Types of Fourier

Fo Map The classic Fourier synthesis comprises the observed amplitudes with the most current phases Fo, Ceca~c,where Fo is the observed diffraction amplitude. It is not sufficient for all crystallographic needs and there are many

138

COMPUTATIONALTECHNIQUES

other types. In particular, when the phases are calculated, this type of map is subject to model bias.

Fc Map This is the least useful map for crystallographic purposes: the calculated amplitudes are phased with the calculated phases, and you get back exactly what you put in. Still, it can be used for checking what a map should look like, especially at lower resolutions, and to check for series termination problems. For instance, even an F~ map in the resolution range 5 - 3 A can be choppy and hard to interpret because of the missing low-resolution terms. An Fc map can also be used to check if programs are working correctly. If the Fc map does not look like what you put in or lacks the correct symmetry, then there is an error somewhere.

Fo - Fc, or Difference, Map The difference Fourier, F,, - Fc, O~calc, where ~calc is the calculated phase, is very useful in terms of information content but may be hard to interpret. In this map there are peaks where density is not accounted for in the model used to calculate Fc and holes where there is too much density in the model. This map is especially useful for finding corrections to the current model: for example, looking for missing waters, finding movements in mutants, misfittings. Other types of differences map are often used. An isomorphous difference m a p with F p H -- F p , apr,,tci,1, gives the positions of heavy atoms. Another difference map is F ....t~,lt -- Fwila-typc, which can be used for looking at mutant protein structures (if the mutation crystallizes with the same unit cell). Note that a difference Fourier, F .....ta,, I Fwild_typc ' tSgwild_typc' is not the same as the difference between two Fouriers, (Fmutant , c[ ..... tant) -- (Fwild-typc, 15[wild-type), which is equivalent to the difference between two electron density maps. There are three basic patterns of density found in a difference map. A peak indicates electron density in the F,, terms that is not accounted for in the F~ terms. A negative peak indicates a position having less density in the Fo terms than in the F~ terms. These differences can arise from movement of

FIG. 3.14 (A) Fouriers with random phases or amplitudes and (B) the equivalent section of a Fourier synthesis with random phases. The thick lines represent the model that was used to calculate the correct amplitude and phases. Note that the map with random amplitudes is still interpretable, but the random phase map is uninterpretable. This apparently means that phases are more important than amplitudes. However, since phases are not directly measurable and must be determined from the measured amplitudes, accurate amplitudes are necessary to determine the phases accurately.

•

i

,

f

I

140

COMPUTATIONAL TECHNIQUES

atoms, changes in B-values, or a change in occupancy. A third pattern is positive density paired with negative density. This indicates a shift in position from the negative to the positive density. The final position of the atom or group may be difficult to determine from the difference density. Its actual position is somewhat short of the positive peak because the negative hole next to the positive peak distorts its shape. 2Fo - Fc Map The 2Fo - Fc map is the sum of an Fo map plus an Fo - Fc map (Fig. 3.15). It contains information from both the classic Fourier synthesis and a difference map and is easy to interpret because it looks like protein density. The quality of the 2Fo - Fc map depends upon the quality of the phases--a fact that often seems forgotten in the literature, where such maps

FIG. 3.15 Fourier maps: the same section of map is shown using (A) F,, coefficients, (B) F,, - Fc, and (C) 2F,, - Fc. The F,, map does not contain any information that was not already in the model. The difference map shows some unexplained density in the solvent region that is taken to represent disordered solvent molecules. The 2F,, - Fc map is equivalent to adding the two maps together and shows the unexplained solvent density as well as the model density. This property makes it the most popular map type.

f

v"

V

FIG. 3.15

(continued).

142

COMPUTATIONALTECHNIQUES

are often used as "proof" of the correctness of a structure. At R-values above 0.25, they are of dubious value, since they will look remarkably like whatever model was used to calculate the phases. In these cases a better map to use is the omit map, where the model in question is left out of the phase calculation (see below). Another problem with 2Fo - Fc maps is that they do not show the exact final position of the model in positions where there are errors, although they can come quite close. The 2F,, - Fc map shows where the model is and where the model should be. Usually the final position is a farther away than the map shows, but it is usually close enough to bring the model to within the radius of convergence for a refinement program. Variations on the 2Fo - Fc map, such as 3Fo - 2F~ (or 0.5F,, + (Fo - F~)) and 5F,, - 3Fc, are sometimes used and are claimed to alleviate the problem of phase bias. Sigma-A-Weighted Map

2mFo -

DFc

A 2Fo - Fc map weighted by two terms derived from a sigma-A analysis of the data m and D gives the coefficients 2 m F , , - D F c , ce~a~, where m and D are between 0 and 1. In essence the idea is to weight the terms by taking into account the difference between F,, and Ft. The analysis is done in thin shells of resolution, and the shells with a lower agreement, or higher R-factor, will be downweighted. Since, in general, the R-factor rises with resolution, this convention has the effect of downweighting the higher terms with higher errors and removing noise from the maps. It also tends to remove phase bias because phase bias is worse for high R-factor data as noted in the previous section. As the R-factor of the data set reduces to 0.0, the maps become 2Fo - Fc maps at the limit. 2 m F o - DF~ maps are probably the best for general fitting. To a large extent they obviate the need to find the best resolution at which to make maps. If the high-resolution data have an R-factor, it is automatically downweighted. Omit Map In the omit map the portion of the model to be examined is left out of the phase calculation altogether, and the rest of the model is used to phase this portion of the map (Fig. 3.16). This powerful feature of a Fourier is pos-

FIG. 3.16 Omit map. (A) The highlighted and labeled model in the center of the helix was omitted from the phase calculation and then an F,,, ce....., map was made. Note that the density for the omitted atoms comes back because of their presence in the amplitude information. (B) The model with 25% of the residues omitted. The density for the omitted residues, although noisy, still comes back. The maps are at 1.8-A resolution and contoured at I or.

A

144

COMPUTATIONALTECHNIQUES

sible because all parts of the model contribute to every reflection. The chief difficulty in an omit map is using the correct scale factor to scale Fo to Ft. The sum of Fc will be smaller than that of Fo if a portion of the model is omitted. The correct procedure is to use the proper scale before anything has been omitted and keep this scale factor after the model has been omitted. Only about 10% of the model should be left out at any given time, so it is necessary to make many omit maps to examine the entire structure. Even omit maps can have some residual phase bias, as will be explained in the section "Phase Bias."

Figure-of-Merit Weighted Fo Map A figure-of-merit weighted F,, map is used for MIR and to compensate for the error in the MIR phases. The coefficients are F,, times the figure of merit. This map can be thought of as having each reflection weighted by the confidence in its correctness.

Fast Fourier Transform The fast Fourier transform (FFT) is what its name implies: a quick Fourier synthesis algorithm. The FFT works by dividing the unit cell along the three principal directions by integer multiples, n x , ny, nz. Depending upon the particular FFT used, these must be multiples of small prime numbers. The FFT used in XtalView can use multiples of 2, 3, 5, and 7. This gives a wide range of possible integer values. In addition, the grid cannot be too coarse. The coarser the grid, the faster the FFT can be calculated and, also, the faster the resulting map can be displayed. The grid must be at least twice the maximum h, k, I values in the input structure factors. This can be calculated from the formula n x = 2(a/d,,,~,,), where a is the cell edge and d,nin is the resolution limit of the input structure factors, both in angstroms. If the grid is less than this, the FFT will return incorrect values because the cell will be undersampled. At a sampling of two times, a map will be coarse, and a better sampling is 3(a/d,,,m), which gives a smoother map that is easier to interpret. In cases where one cell edge is substantially longer than the other two, this edge can be sampled at two times and the others at three times. Such a map is hard to distinguish from one that is sampled finely in all three directions. Sampling on grids finer than three times will be slower and is usually not necessary. If no interactive inspection of the map is contemplated, sometimes oversampling is used to make smoother contour lines for displaying a static picture.

3.6 IsomorphousReplacementPhasing

145

Solving Heavy Atoms with Fouriers Once even a single derivative has been solved, the single isomorphous replacement (SIR) phases are usually of sufficient quality to solve the rest of the derivatives by difference Fourier using the coefficients FpH - Fp, a s~R. With XtalView, a derivative difference Fourier, also called a cross-Fourier, is made by the following steps. First, xmerge is use to merge the derivative data with the native data as previously described for difference Pattersons, to produce a file with the derivative scaled to the native. This file is then phased using xmergephs to merge the .fin file coefficients with the best available protein phases. The option to switch Fp and FpH should be checked so that when the Fourier is made the peaks are positive. This new phase file with h, k, l, FpH, Fp, ~protein. is then run through xfft with the Fo - Fc option, which in this case will make a FpH -- Fp map. Find the biggest peaks on the map using xcontur. The resulting coordinates are the correct coordinates on the same origin as the derivative(s) used to calculate the phases. The peaks obtained this way should be checked against the Patterson map of the derivative to be sure that the heavy-atoms positions are valid. The most common spurious peaks are ghost peaks, those present at the position of the heavy-atom model for the derivative used to produce the phases. Less obvious is the possibility of ghost peaks at the opposite hand position. This can happen if the solution is on a special position, if it is centrosymmetric, or if it is pseudocentrosymmetric. Particularly confusing is a centrosymmetric heavy-atom solution that gives rise to centric phases in an otherwise acentric space group. 7 Maps made with these phases will contain both left-handed and right-handed solutions to the new derivative superimposed. The map may still be useful if this can be resolved. An easy way to check this problem is to look at the distribution of phases for acentric reflections. If they fall on the cardinal points, 0, 90, 180, and 270, then the phases are centric; if they fall near but not exactly on them, the phases are pseudocentric.

. . . . . 3.6 . . . . . ISOMORPHOUS REPLACEMENT PHASING Heavy-At0m Refinement When you have solved one or more derivative data sets for heavy-atom positions, your next step is to refine the positions and search for minor sites and/or missing sites. The goal behind refinement is twofold: to improve the heavy-atom parameters and to get statistics that give information about the 7A single site in a polar space group is centrosymmetric, as is a multisite solution where all the coordinates in the polar direction are the same.

146

COMPUTATIONALTECHNIQUES

quality of the derivative and the highest resolution at which it can be used. Some guidelines about statistics will be given, but it must be remembered that they are only guidelines and that two heavy-atom derivatives with identical statistics can produce maps that differ in quality. To calculate the most accurate phases, heavy-atom parameters are refined to improve the parameters and to get the best estimates of the errors. An accurate estimate of the errors is essential for good multiple isomorphous replacement phasing, since the errors control the figures of merit and the relative weighting of each reflection. In XtalView, the heavy-atom refinement is done using xheavy (Fig. 3.17). Xheavy departs from the more traditional refinement programs by using a correlation search instead of least-squares refinement. The advantage is that it avoids local minima; the disadvantage is that it takes more computer time. With the faster computers available today, this is not a significant problem. The phases are calculated in two steps. First, each derivative is treated separately, and estimates of the errors and phases are made without protein phasing information. Second, these phases are combined and a better estimate is made using these phases to produce a more accurate set of phases. The correlation search in xheavy is done by moving the atom in a coarse box over a large range and then in progressively finer boxes and over a smaller range until no improvement is found. At each point the correlation is made between the observed differences and the calculated difference. The correlation used is EA~

(3.14)

X/'EA 2o X E A 2' (

where A is the observed heavy-atom difference and A is the calculated difference. This function has the advantage of being immune to the scale between the calculated and observed differences, a Each atom is moved and then the occupancy and B-value are refined by a second correlation search.

Is0m0rphous Phasing 9 Because of errors in the data and errors in the solutions and because true isomorphism is probably rare, isomorphous phasing must be done in 8The scale factor cancels out. Say we have two quantities FI and F2 related by the scale factor s such that EFI = sS~F2, then the correlation is

s

* sF2

sY, FIF2

~v/EFI 2 E(sF2)2 sX/EFI 2EF22 and the scale factor s cancels out. 9Be sure to read Watenpaugh, K. D. (1985). "Overview of Isomorphous Replacement Phasing," in Methods in Enzymology, Vol. 115, pp. 3-15. Academic Press, San Diego. Also read Belle, J. D., and Rossman, M. G. (1998). A general phasing algorithm for multiple MAD and MIR data. Acta Crystalogr. D, 54, 159-174.

3.6 Isom0rphousReplacementPhasing

147

terms of probabilities. What we know are the amplitudes ]Fp] and ]FpH], and the vector FH, which is calculated from the heavy-atom model. What we want to find out is the best phase of the vector Fp, given the errors in the data. The chief difficulty in this is estimating the errors in the data correctly. Blow and Crick assumed that all of the error lies in the magnitude of Fpu, from which follows the equation 10 P ( c e ) - exp

2E 2

),

(3.15)

where E is the estimate of the error and E(~) is the lack of closure error at a given value of ~, the phase angle given by the expression if(CO)--]FpH[

--

]Fp +

F.],

(3.16)

which is the difference at a given phase angle between the the measured ]FpH[ and the amplitude of the sum of the vectors FH and Fp. The error estimate E is given by the equation (E 2) = ((]FpH ----- F p ] -

]FH])),

(3.17)

which is the difference between the calculated heavy-atom vector and the observed heavy-atom vector. If the reflection is centric, then it is possible to estimate the error by simply assuming that the observed difference FpH -- Fp is the observed heavy-atom amplitude. For the acentric case, the difference FpH -- Fp will be, on average, smaller than FpH -- Fp by 1/V~ (see above). Thus, the E can be estimated by E ~

E (IFPH centrics

FpI-

IF-l) ~ §

E (LFH- F IV - IFl)2-

acentrics

(3.18) In the general acentric case, there will be two maxima in Eq. 3.15 that are equally probable. The best phase is the weighted average of the two. In the centric case, the equations give one peak. To resolve the twofold ambiguity of acentric reflections and to overcome the errors, several derivatives are used. For multiple isomorphous replacement, we repeat this process for each derivative in turn and multiply the phase probabilities. To simplify this process, we can use the Hendrickson-Lattman coefficients A, B, C, D to store each phase probability: 1~

D. M., and Crick, F. H. C. (1959). Acta Crystallogr. 12, 794-802.

148

COMPUTATIONALTECHNIQUES P(c~) = exp(A c o s ( a ) + B sin(a) + C c o s ( 2 a ) + D sin(2c~))

(3.19)

This equation allows us to reconstruct the probability from the four coefficients and is a more compact form to store. To multiply two probabilities together, the coefficients are simply added i.e., A M~R = A DER~ + A DER2 + A DER3+ " " ")- Equivalent equations for the Blow-Crick equations above have been derived that give nearly identical results. ~ The equations for isomorphous replacement phasing are - 2 ( F 2 + F~ E2

F2H)FraH

A is,, =

B is,, =

-2(F~ + F 2 E2

F(,H)FpbH

- F2(a(~ C is,, =

(3.20)

b{~)

E 2

-2F~a.b. W i~,, =

E 2

where aH = COS(all) and bH = sin(a~) with a~ as the calculated heavy-atom phase. The best phase can then be found from the following procedure. The phase probability is calculated from the combined A M~R, B M~R, CM~R, D MIR for every 15 ~ using Eq. 3.19 to generate the phase probability distribution. The phase is then calculated by integrating the probability distribution: ~,~ X ~

P(a)cos(a) 9

~

Y

E02~ P ( c e ) s i n ( c e ) Z,,

ah~t = t a n - ' ( y / x ) ,

(3.21)

and the figure of merit is given by m = V'x 2 + y2.

(3.22)

The figure of merit, m , is the probability of Ct'bcst being correct. It ranges from a value of 0.0, where all phases are equally probable, to 1.0, where the phase is correct. The average figure of merit over all reflections gives us an estimate of the accuracy of the protein phase set. Xheavy makes two passes through the phasing process in order to get a better estimate of E for each derivative. The initial set of protein phases 11Hendrickson, W. A., and Lattman, E. E. (1970). Acta Crystallogr. B26, 136.

3.6 Is0morphousReplacementPhasing

149

uses centric data, if available, to calculate the initial estimate of E, the error, which is calculated as a function of the size of Fp by fitting a curve to the data. The initial Hendrickson-Lattman coefficients A, B, C, D are calculated with this estimate. If there is more than one derivative, the coefficients of each derivative are summed and the protein phase and figure of merit are calculated. In the next pass, all the data are used to calculate E, using the initial estimate of the protein. The new coefficients with the updated E-values are again summed together and a final protein phase and figure of merit are calculated. When calculating a map with isomorphous replacement data, it is important to calculate a map weighted by the figure-of-merit for each reflection. This is done by choosing the Fo* fom option in xfft. The effect of using figureof-merit weighting is shown in Fig. 3.18. The average figure of merit for the entire map gives an idea of the quality of the map. Figures of merit from different programs are not directly comparable unless they use comparable methods of calculating E. The error estimate is used in the denominator of the phasing equation and directly affects the figures of merit calculated. Also, phase sets run through solvent-flattening procedures or density modification will have higher figures of merit, regardless of whether they are really improved. Nonetheless, there are some rules of thumb for figure-of-merit values and the corresponding map quality. If the figure of merit is less than 0.5- to 3.0-A resolution, the map will be noisy and very difficult to interpret. Around 0.6 A the map starts to be interpretable, and above 0.75 A, the map is almost certainly interpretable. A map that is above 0.8 A, without any modifications after the isomorphous replacement phase calculation, will be of excellent quality and a joy to interpret. In my experience, the increase in figure of merit that occurs with solvent flattening is of little use in judging the final map quality. It may be used as a relative number to judge the effect of using different solvent-flattening parameters. In the end, the best way to judge the quality of the phase set is actually to examine the map (Fig. 3.19). To search for more heavy-atom sites, try to use a difference map or a residual map. In the difference map the coefficients used are FpH -- Fp, O/protein, which produces a map that shows the positions of all of the heavy atoms (see the previous discussion on difference Fouriers). Any peaks on this map that are not already in the heavy-atom solution can be added and refined. Since ghost peaks from other derivatives included in the phasing may show up, especially if the figure of merit is low, be careful not to add peaks that are in other derivatives used to calculate the phases. Whether the site is truly in common or just a ghost peak can be verified by examining the Patterson map. Add peaks in the order of size until the relative occupancy falls below 0.10.25 times the highest site. Adding too many minor sites may only model

A XHeavy

(iquit)

Crystal: cvccp Unit Cell: 49.20 56,70 98.80 90.00 90.00 90.00

Directorv: /tin p_ m nt/as d/prog/XtalV iew/exam p les Derivative File: ccp,aul,sol

Load Derivative v )

~Save Derivative v )

Output Phases: ccp.aul,phsA Derivatives"

aul

(Delete) Method: I ~

Calculate Protein Phases (Apply)

(Abort)

- - - Second Pass to get final phases - - >>>RePhase DERIVATIVE aul : Scaling coefficients 0.000000 0.000282 0.999464 Scale on FPH = 1.001572 from 1244 reflections E coefficients- 0.078110 1.292850 7.843384 from 1244 reflections in 50 bins Mean figure of merit = 0.650 for 319 centric reflections Hean figure of merit - 0.365 for 925 acentric reflections Hean figure of merit - 0.438 for 1244 reflections >>>Protein Phases" Mean figure of merit 0.650 for 319 centric protein phases Mean figure of merit 0.365 for 925 acentric protein phases Hean figure of merit 0.438 for 1244 protein phases determined from 1244 derivative phases

FIG. 3.17 Xheavy program. (A) Xheavy is used to refine heavy-atom derivatives and to calculate isomorphous replacement phases. A derivative, or solution, file contains the information for one or more derivatives. This information can be edited using the Derivative Edit window (B). Xheavy refines heavy-atom positions by maximizing the correlation of F,, and IF,. - FpHI by moving each atom in turn until no further improvement can be found. The relative occupancies are then refined using the same correlation. To calculate protein phases, the single isomorphous replacement phases are first calculated for each derivative in turn. These are then combined to give an initial estimate of the protein phase. This protein phase is then used to get a better estimate of the SIR phases, and finally the new SIR phases are combined to give the final protein phases. The two-pass protein phasing method was adapted from PHASIT, a program written by William Furey of the University of Pittsburgh.

151

3.6 IsomorphousReplacementPhasing B

r~9

xHeavy Derivative Edit Derivative: aul, DataFile: ccpniaul.fin File Type: ~ Phase Type: ~

fin Isomorphous

Weight: 1O0

1

[3 100

Resolution: 1000,00

to" 5,00

SigmaCut: 3

percent

Delta/Average Filter: 100 Sites:

] AU1 AU2 AU3 AU4 AU5

-0,6184 -0,3630 -0.8749 -0.1140 -0.6392

-0.1040 -0,380t -0.3513 -0,0984 -0,4134

X: -O.G1842

Label: AU1 Atom: AU+3

-0,1471 -0.1281 -0.0912 -0,0954 -0.0088

I~

H

AU+3 0,0115 AU+3 0.0181 AU+3 0.0020 AU+3 0.0024 AU+3 0.0019

V: -0,14710

Occupancy: 0.01150

(Insert) (Apply)

C Replace)

17,0 I 17,0 17.1 17.1 17,1

Z: -0.10400 B: 17,01

(Delete)

(Reset)

FIG. 3.17 (continued).

errors in the data and make the phases worse, not better. A residual map is similar except that the coefficients are (FpH - Fp) - (Fpnca~c -- Fp) ~protein, which results in the peaks accounted for in the solution to be removed from the map so that only new positions are shown. These can be treated in the same manner as for the difference map. As more derivatives are found and added, the resulting protein phases should improve (Fig. 3.20). With the improved protein phases, it is worthwhile repeating the difference Fouriers for all the derivatives to look for minor sites. Always make sure there is something in the Patterson map to justify the addition of the new site (especially check the cross-peaks). A site may not

A

f,

f,

3.6 IsomorphousReplacementPhasing Endo[][

Map Nal

* Thll

x-

o.oooo

1

hand2

5.0A

Sep

153 26,1990

-o. l ooo

I

FIG. 3.19 Low-resolution MIR map: 10-A slab of a typical low-resolution MIR map at 5 A. The clean solvent boundaries (dashed lines) indicate that this solution is on the right track.

give all the peaks on the Patterson, but if there is only one or n o n e , the site is not justified. If y o u r h e a v y - a t o m refinement p r o g r a m uses the lack-of-closure error refinement, be w a r y of m i n o r sites in one derivative that have the same position as a strong site in a n o t h e r derivative. This site is likely to refine to a g o o d o c c u p a n c y w h e t h e r it is real or not. Thus, lack-of-closure refinement is rarely d o n e a n y m o r e .

Heavy-Atom Phasing Statistics If y o u look at only one n u m b e r in the p h a s i n g statistics, l o o k at the phasing p o w e r or (FH)/(E), the size of the h e a v y - a t o m a m p l i t u d e s over the

FIG. 3.18 Effectof figure-of-merit weighting: MIR phased maps with no figure-of-merit weighting (A) and with figure-of-merit weighting (B). The weighted map is somewhat cleaner and easier to interpret, making it the map of choice for MIR phasing.

154

COMPUTATIONALTECHNIQUES

error. This statistic takes into account the two factors that determine heavyatom phasing power: the size of the differences and the size of the errors. This number should be above 1.0 for the derivative to contribute any to the phasing. Phasing powers above 2.0 are good, and the very best derivatives can sometimes reach 4.0. Since the sizes of the differences are fixed for a given data set to increase the phasing power, you must lower the errors by improving the solution or collecting better data. Otherwise, phasing power may be increased by longer soaking times and/or higher concentrations for a given derivative. More careful data collection and attention to scaling and merging may also lead to lowered systematic errors. As resolution increases, the phasing power falls off, and this can be used as a guide of the m a x i m u m resolution at which the derivative can be used. FH decreases with resolution because the scattering factors for the heavy atoms fall off with resolution. The errors increase with poor measurements, scaling errors, and nonisomorphism. Often a derivative is nonisomorphous above a certain resolution but can be used for phasing below this. The program SHARP is particularly good at handling nonisomorphism. Another useful number is the centric R-value, for which the rule of thumb goes: above 0.70, the solution is wrong; for 0 . 6 0 - 0 . 6 9 , the derivative may be useful, but look for improvements to the solution; for 0 . 5 0 - 0 . 5 9 , the solution is definitely useful; below 0.5, is an excellent derivative. You should find phasing power and the centric R correlated. There are lots of other possible phasing statistics, but the two that are mentioned have been the most used over the years.

Including Anomalous Scattering There are two ways anomalous scattering can be included in the phasing. One is as an adjunct to the isomorphous phasing by measuring FJH and Ffi~. Many heavy-atom compounds have a useful anomalous signal (Table 3.3), and provisions for including the anomalous signal are included in most isomorphous phasing programs. The equations for this have been worked out in terms of A ......, B ......, C ....., D ......, and these terms are simply

FIG. 3.20 Addition of heavy-atom derivatives. The same section of MIR map is shown, using one, two, three, and four derivatives to determine the phasing. (A) Notice that the single-derivative map is largely uninterpretable, with breaks in the main chain, and an effective resolution that is quite low. (B) As a second derivative is added, things improve dramatically. (C) With the third derivative, the isoleucine side chain on the left is becoming visible. (D) The fourth derivative has little effect, and further addition of derivatives of this quality will probably not improve the map much. Compare these maps with improved versions shown later (Fig. 3.29).

I ..~

~" / B

0

C

O \

1

)

P

P

./ FIG. 3.20 (continued).

.S

I

3.6 Is0morphousReplacementPhasing

157

summed, as for an isomorphous derivative. Also, inclusion of the anomalous scattering can improve Patterson maps. Since the anomalous portion of the heavy-atom structure factor is at right angles to the real part, the Bijvoet difference F~H -- Fb-n will be largest when Fp - Fp. is smallest, so that the two are complementary in their phasing power. One major difference holds when you are using the anomalous scattering. The hand of the heavy-atom positions must be correct. Since there is no way to determine this a priori, both hands must be tried. In one direction the figure of merit should be slightly higher than in the other. The map can also be checked for clues from c~-helices and B-turns (see the following), both of which are handed. Another strategy for determining the correct hand is to cross-phase a second derivative with SIRAS (single isomorphous replacement with anomalous scattering) phases in both possibilities. In one map the peak height of the derivative should be higher than in the other map. All lines of evidence should point to the same answer. If there is a conflict (e.g., the map has left-handed helices even though the figure of merit is higher), be sure that F~H and Fp-H have not become switched somewhere. Switching can occur when you use a lefthanded description of the machine you collected data on (i.e., assigning the rotation angle to the opposite direction) or transform the indices that switched the handedness without also switching F~-Hand F~-H, such as h = - h. The other method is to use a native anomalous scatterer, that is, a scatterer in the native crystal such as an iron cofactor in a heme or iron sulfide, cluster. There are some extra difficulties in this case. There is no anomalous signal in the centric data, so these reflections must be left out of the syntheses. As can be seen in Table 3.1, centric reflections can represent a sizable fraction of the data in the low-resolution ranges. The centric isomorphous differences do not suffer from the approximations of the acentric ones, and their presence in Patterson maps, heavy-atom refinement, and error estimation in phasingmake a large contribution to the robustness of these methods. To get around this, use native anomalous Pattersons and refinements with the top 30%, or so, of the data (excluding outliers). The theory behind this is that the larger differences are more likely to appear when FA is colinear with the protein phase and, thus, these differences are better approximations of FA. In practice, the author has found in at least two cases that the Patterson maps were little different whether all or 30% of the reflections were included. Since the terms are squared in a Patterson, the smaller differences do not add much to the map and their absence makes little difference. If a derivative exists, a Bijvoet difference Fourier can be used to find the native anomalous scatterer (Fig. 3.21). This is the same as in the derivative case except that 90 ~ is subtracted from the phase. If you are using xmergephs in XtalView, there is an option to do this built into the program. Refinement can be done using

158

COMPUTATIONALTECHNIQUES B

A ,

0.00C 3.500.

X .

.

.

.

/~

|

/

~.

;

0

,s~

i

l

I

X.

,

,

,

I 1.00~

c3

i

o,~176

P

~x

.'5~--~'~

\.2

t'

0.000.0.500.

I,,"

/'

/*"'N

1.00(

/.;p

,,

0

.

",, ;.~176

\

~,~176

j ;9

C?

p

r

"~,

;-,,.

0.50(

0.50( Y = 0.000

Y = 0.000

FIG. 3.21 Bijvoet difference Fourier. Sulfite reductase has a large, anomalous signal at 1.54-A wavelength because of the presence of an Fe4S4 cluster and a siroheme iron. A Bijvoet difference Fourier using MIR phases from two derivatives is shown using data in left- and righthandedness. Solid contours are positive density, and dashed contours are negative. (A) The correct right-handed case: a large peak is found at the site of the Fe4S4 cluster. (B) The incorrect left-handed case, showing a large hole.

conventional heavy-atom refinement programs, since F,! is proportional to FA when the reflections with the 2 5 - 3 0 % largest differences are used. This approximation will be valid enough for refining the x, y, and z. A full refinement can be done with the computer program ANOLSQ written by Wayne Hendrickson (Columbia University), SHARP or xheavy. If the protein contains a cluster of known geometry such as a n Fe4S4 cluster, then a rigid-body refinement of a standard cluster'2 will give improved accuracy by reducing the number of refinable parameters. Finally, the native anomalous scatterers can be included in the phasing process. While native anomalous scattering has been sufficient to solve a protein without using multiple-wavelength methods in only a few cases (see the following), such scatterers can contribute substantially to an isomorphous solution. Hendrickson-Lattman coefficients can be derived for the native anomalous case: 12An example of this can be found in McRee, D. E., Richardson, D. C., Richardson, J. S., and Siegel, L. M. (1986).J. Biol. Chem. 261, 10,277-10,281.

3.6 IsomorphousReplacementPhasing

159

2(F~- - Fp )a A A ano

E2 2(F~ - Fe-)bA E2

B ano

(3.23)

a2 _ b2 C ano

--

D ano

--

E 2

2abA E 2

where aA = FA COS(teA) and bA = FA sin(teA) where Fa, tea is the calculated anomalous scatterer structure factor. Again, the model must be in the correct hand for the phases to be correct. This may be judged by examining maps in both hands and by looking at the figure of merit in both possibilities.

Choosing the Absolute Heavy-Atom Configuration The heavy-atom configuration that you solve, by any method, is ambiguous. Two different configurations will both equally account for the Patterson maps and refine equally well. These configurations are often called "hands." Since there is no way to a priori know the correct configuration, the one that is actually in the crystal, both possibilities must be tried. One configuration will produce a correct map and the other an incorrect map. You can switch between the two configurations by inverting the heavy-atom solution through the origin. To do this, take each coordinate (x, y, z) and replace it with its negative ( - x , - y , -z). This will work in all space groups; there are more possibilities for most groups, but we will not cover them all. Rather, we discuss four of the possibilities for choosing absolute heavy-atom configuration.

MIR without Anomalous In the case of MIR without anomalous scattering, the two possible configurations are indistinguishable except that one will produce a right-handed map and the other will be the left-handed mirror image. If there are helices in the map, it is easy to tell which is the correct map by looking at the handedness of the helical screw axis. Put the helix vertical on the screen. Curl your fingers in the direction that the helix rises, if the thumb points up this is a right-handed, correct, helix. If the thumb points down, this is a left-handed, or incorrect, helix. If you have only/3 sheet, you can tell by carefully examining the configuration of the amino acids. If the map is left-handed, all the amino acids will be D- instead of the L-isomer. You can use the C O R N

160

COMPUTATIONALTECHNIQUES

method to make this determination. Pick a clear amino acid and turn the density such that the side chain (R) is at the top, the C - - O is at the left, and the N is at the right (and thus spells CORN). If the hydrogen on the C~ atom points toward you, the amino acid is the L-isomer; if it points away, it is the D-isomer.

MIR with Anomalous Including Multiwavelength Anomalous Dispersion (MAD) Unlike isomorphous data, where the wrong configuration is identical except for hand, with anomalous data one configuration produces correct phases, and the other is incorrect. If anomalous data are included, the two maps can be told apart by inspection. The correct map should have proteinlike features and clear solvent channels. The incorrect maps will have noisy density and few if any clear solvent regions. It is also possible to tell by looking at the electron density histograms. The incorrect map will have a slightly broader histogram around 0 and consequently less area in the large positive rho region. (Look ahead to Fig. 3.38 in Section 3.12, to see the effect of adding error to the phases.) If both maps look good, it probably means that the anomalous data are weak and the isomorphous phases are dominating. In this case you can use criteria for the previous case, MIR without anomalous, and just look for right-handed helices to pick the correct map. If the one of the maps is clearly correct and the other is nonsense, but the correct map has left-handed helices, then the Bijvoet pairs got flipped somewhere along the way [i.e., F ~ is really F and vice versa]. You can either correct all the data by reversing the Bijvoet pairs, and recalculate the phases with inverted heavy-atom positions, or flip the phases by negating them all (i.e., flip them about the real axis).

SIR with Anomalous in a Polar Space Group with a Single Site There is a special case for polar space groups only, such as P2~ or P61 if you have only a single heavy-atom site. In this case, the ambiguity of the phases can be broken by including the anomalous data. These phases can then be used to cross-phase your other derivatives and the resulting MIR phases will be in the correct configuration. If the map is left-handed, see the explanation for the previous case; the Bijvoets must be flipped.

Enantiomorphic SpaceGroups If you are in a space group that has an enantiomorph, such as P41, which has as its enantiomorph P43, then you also have an ambiguity in the

3.6 IsomorphousReplacementPhasing

161

correct choice of enantiomorph. In protein crystallography, the enantiomorphs occur when you have a 41, 61, or 62 screw axis, in which case the enantiomorph will be the space group with a 43, 65, o r 64 screw axis, respectively. You must try both space groups and look at the maps. One of the space groups will be correct and the other incorrect. If you also have anomalous data, you must try both absolute heavy-atom configurations as well as both space groups, which gives four possibilities that all must be tried. If none of these produce usable maps and you think you have enough phasing power, you may have one of the following problems: 9 Incorrect space group (look for other related space groups) 9 Incorrect heavy-atom solution(s) (check against the Pattersons and cross-phase) 9 Derivatives on different origins (recheck the cross-Fouriers) 9 Poor or ambiguous derivatives (remove them) 9 Twinning

Fine-Tuning of Derivatives Ideally, the computer should be able to refine all the derivatives to the best possible values and produce the best possible map. In reality, "fiddling" with the derivative parameters often leads to a better map. If you have a multiple-derivative solution, the first step is to see whether removing a poor derivative improves the map. To tell whether the map is better or worse, it helps if you can find a recognizable feature. A helix is usually the best for this, since the geometry of helices is tightly constrained. Another feature to look at is the solvent. As the phases improve, the contrast between the protein and solvent should improve and the solvent should become flatter. Once you have decided on a section of map to look at, you can adjust the parameters, recalculate the map, and try to decide if it is better. If you are using XtalView, you can set up all the windows you need and just keep reexecuting them to view the results. If removing a derivative seems to improve the map, then try lowering its resolution limit to see whether it is still usable at a lower resolution. Check for heavy-atom ghost peaks and holes (large peaks or holes at the position of one of the heavy atoms). These anomalies can be removed by adjusting the occupancy of the site up or down. If you used lack-of-closure error, heavy-atom refinement, and refined occupancy often, ghosts or holes are almost sure to be present. Keep an accurate record of the changes you make so that you can return to a good solution later. Be objective. Increasing the resolution of your map beyond the phasing power of the derivatives will not make it easier to fit.

162

COMPUTATIONALTECHNIQUES

Solvent flattening of very noisy data will only yield solvent-flattened noise. Beware of snake oil and wooden nickels.

.....

3.7.....

MOLECULAR REPLACEMENT Many structures can be phased by using a homologous structure and molecular replacement. ~-3In this method, the homologous probe structure is fit into the unit cell of the unknown structure and the phases are used as an initial guess of the unknown structure phases. A six-dimensional search is required to find the best match of the probes transform to the observed transf o r m - t h r e e angles and three translations. Fortunately, it is possible to split this search into two three-dimensional searches: a rotation search followed by a translation search. As an example of the amount of time this saves, suppose that a search of one dimension takes 10 s of computer time. If we split up the search, the total time is 103 + 103 = 2000 s versus 106 = 1,000,000 s, or more than 11 days. How identical does the probe need to be? This depends upon the structural identity of the two proteins as opposed to the sequence identity. Since we do not know the structural identity, we are forced to use the sequence identity as a guide. A (very) rough rule of thumb is that above 50% sequence identity a molecular replacement solution should be straightforward, since chances are that these two proteins are structurally very similar. Other factors are the s-helical content (more is better), and an anisotropic shape for the probe molecule is desirable. The largest problem with searches using low-homology probes is that once the solution is obtained, the phases will be poor estimates of the true phase, and there will be a high bias toward the probe structure, making it difficult to refine the correct structure. This bias is the main drawback of the molecular replacement method. In many cases, a probe that represents only a portion of the structure is available: for example, in an antibody-protein complex. The method is robust enough that the probe can be accurately positioned in many of these cases. Molecular replacement is often thought of as an easier alternative to multiple isomorphous replacement. However, in practice I have seen rotation-translation solutions take as long as, or longer than, heavy atom 13An excellent discussion of molecular replacement along with several examples can be found in an article by E. Lattman (1985). "Use of the Rotation and Translation Functions," in Methods in Enzymology, Vol. 115, pp. 55-77. Academic Press, San Diego. The classic work on molecular replacement is a collection of articles edited by M. G. Rossman (1972), The molecular replacement method, Int. Sci. Ser., 13.

3.7 MolecularReplacement

163

searches. The combination of both methods is more powerful than either alone. In the heavy-atom case, the phases are noisy but unbiased. In molecular replacement, the phases are heavily biased. A particularly successful strategy is to let a single derivative and a molecular replacement solution cross-check each other and combine the phases to produce a better map than either method can produce alone. The phases of the molecular replacement solution can be used to cross-Fourier the derivative differences in order to find the heavy-atom solution. These single isomorphous replacement phases can then be combined with the molecular replacement phases to filter them and to remove the phase ambiguity, thus producing a map superior to what either can produce alone.

Rotation Methods Rotation searches are actually done in Patterson space. Consider the Patterson of a protein molecule packed loosely in a lattice. In general, the short vectors will be intramolecular vectors, and the longer ones will be intermolecular. In a rotation function we want to consider only intramolecular vectors. Since all vectors in a Patterson start at the origin, the vectors closest to the origin will, in general, be intramolecular. Of course, closely spaced lattice contacts will also produce short intermolecular vectors, but they should be in the minority. By judiciously choosing a maximum Patterson radius, we can improve our chances of finding a strong rotation hit. The second choice to be made is the resolution range to use in calculating the Patterson. Higher-resolution reflections (above about 3.5 A) will differ markedly even between homologous structures as they reflect the precise conformation of residues. Lower-resolution reflections reflect the grosser features of the structure, such as the relation of secondary structural elements. Very-lowresolution reflections, below about 10 A, are heavily influenced by the crystal packing and the arrangement of solvent and protein, which is, of course, more dependent on the particular packing arrangement than on the structure of an individual protein molecule. Thus, the resolution range used for rotation searches is usually within 10-3.5 A, with 8 - 4 A being common. In practice, several ranges can be tried. The first step is to calculate structure factors for the probe structure in an artificial P1 cell. The cell should be about 30 A larger than the probe in each direction so that there are no intermolecular vectors in the Patterson radius used for the rotation search. The probe is usually centered at the origin of this cell to simplify later steps. These calculated structure factors are then used in the rotation search. While the search is not actually done computationally in this manner, it can be conceptualized as follows. A Eulerian angle system is used with three nested angles (Fig. 3.22). The range of the angles is

164

COMPUTATIONALTECHNIQUES Z

u r

FIG. 3.22 Definition of Eulerian self-rotation angles.

chosen to cover the unique volume of the space group of the unknown structure. One angle is incremented 5 ~ and the other angles are moved through all their values every 5 ~ and at each point the match of the probe and observed Patterson functions is calculated and stored. The first angle is incremented 5 ~ more and the process repeated. After all angles have been calculated, the list is sorted, grouped into peaks, and printed (Fig. 3.23). The peaks are usually reported in terms of their size relative to the root-mean-square peak size or sigma. If the probe structure is a good match for the unknown structure and the proper resolution ranges have been chosen, then there will be a single large hit at several sigma. Often, a decreasing series of peaks is found at slightly differing sigma. In such cases, the correct peak may not be the first in the list. The order of the list may be altered by changing the resolution ranges slightly. Since the correct peak is unknown, there is no a priori way to decide on the correct ranges. In these cases a new probe should be looked at if one is available. If not, there is the option of continuing the process, going down the list of rotation hits to see wheter a decision can be made based on the behavior of subsequent steps. The asymmetric unit of the rotation function depends on the symmetry of both the probe's Patterson and the Patterson of the unknown. This problem has been examined and reported on in detail, 14 and Table 3.4 lists the more common cases when the probe is space group P1 for each of the 10 possible Patterson space groups for proteins. It makes a difference which Patterson is rotated and which is held still. This can be decided by a careful reading of the rotation program's documentation, because both conventions of rotating the probe or the unknown are used. When comparing hits, keep in mind that in Eulerian space, the operator 7r + ~)~,- ~)2,7r + ~ is an identity operator. Two rotation hits that look different may in fact be the same if the identity operator is applied. 14

Rao, S. N., Jih, J., and Hartsuck, J. A. (1980). Acta Crystallogr. A36, 878-884.

165

3.7 Molecular Replacement

4~-90

~=0

FIG. 3.23 Self-rotation function example. Peaks on this section (K = 180) show positions of 2-fold symmetry in the diffraction pattern. The large peak in the center is due to the crystallographic 4-fold (which can be thought of as two 2-folds). The peaks around the outer edge are positions of noncrystallographic 2-folds.

Improving the Probe It may be possible to improve the hit by systematically leaving out pieces of the probe and doing the rotation search again. For example, every three residues can be deleted and the size of the rotation hit examined. The absolute sizes of the hits are compared, not the peak/or ratio. If parts of the probe structure that contribute to the match and are thus likely to be homologous are removed, a lower hit will be found. If a portion is interfering, then removing it will result in a larger hit (Fig: 3.24). A pmhe cnn then he built, leaving out several of the residues that raise the hit and, thus, the overall rotation hit can be improved. A similar procedure can be done, leaving off side chains beyond the C e atom (i.e., polyalanine) on the hypothesis that the main chains follow the same path but the side chains differ considerably.

COMPUTATIONALTECHNIQUES

166

TABLE 3.4 Rotation Function 1 Probe rotated Unknown Patterson group

1 Probe held still

01, 02, 03

0+, 02, O_

01, 02, 03

0+, 02, O-

0 - < 0 <2"rr 0 < 02 <,rr 0 -< 03 < 2r

0 -<0+ < 47r 0 < - 0 _ <-2r

0-<0~ < 2r 0<02_
0-<0+ < 47r 0<-0_-<27r

2 / m b unique

0<0~ <27r 0 < 02 < 7r/2 0-<0~ < 2 r

0<0~ <47r 0<0 <27r

0<0,<27r 0 < 02 --< 7r/2 0 < 03 < 2"rr

0<0+<47r 0 <0_ <27r

2 / m c unique

0-<0 <2rr O<02
0<0+ <47r 0<0
0<0~
0<0+ <4r O
lnmm

0 - < 0 <2n" 0 < 02 -< rr/2 O
0-<0~ <47r 0<~0_ <--77"

0<-0, <~r 0<02_<7"r/2 0<03<27r

0-<0+ < 4 7 r 0<_0 _<77-

4/m

0 - < 0 , <217" O
0<0~. <47r 0<0 -
0 < O~ < r 0<02
0<0+ <47r 0<0 <~/2

4/mmm

0-<0, <2n" 0 < 02 < 7r/2 0 < O~ < rr/2

0<-0~<4r 0<-0
O-
0 <0+<47r O
0<0~ <2rr 0<02<710 < O~ < 2rr/3

0<0, <4-rr 0 < 0 < 27r/3

0<0,<2r 0 < 02 < r 0 < 0.~ < 2r

O
0<-0~ < 2 r r

O-
0<0,<2r 0 < 02 < r 0<0,<27r

0<0+<4~ 0 < O_ < 27r/3

0<0, 0<0

0 < O, < 7r/3 0<02
0 < O+ < 4~0<_0 <7r/3

D

3/m

0 < 02 <- rrl2

0 <- O~ < 2rr/3 6/m

0-<0, <2rr O
6/mmm

0-<0~ < 2 r r 0 < 02 < rr/2 0 < O~ < rr/3

<47r
0_<0~<2~

0<-0~ <4r

0-<0~
O-
0<_0

0<02-<7r/2 0<0~<27r

0-<0__<7r/3

-
Note. Asymmetric units in Eulerian angles 0., 02, O~ and pseudo-orthogonal Eulerian angles

fl+, 02, fl .... The range of 02 is the same in both systems. The asymmetric unit given is one of several possible choices in each case.

Refining the Rotation The success of the translation rotation

solution. As a minimum,

search will depend

the rotation

limited range with a fine grid. In particular,

on the quality of the

search should be repeated in a if y o u u s e d t h e C r o w t h e r

fast-

167

3.7 Molecular Replacement

1.5 x 1013

Dimer contact

Dimer contact 0

13lg

3 2

.=o= ~

~

~

5

6

5

125

8 ,j

Vlll ~D

A Helix

B helix ii iii iii

C helix

D helix l

i

1.0x 1013 Sequence Number FIG. 3.24 Systematic deletion of residues and the rotation function. In this plot of the absolute value of the rotation peak, R. molischianum cytochrome c' serves as the probe against C. vinosum cytochrome c' data, with every three residues deleted throughout the entire probe molecule. The number indicates the first of the three residues omitted. The horizontal line across the entire graph indicates the value of the hit found using the entire model. Points above this line represent models that give larger hits when residues are removed; thus these residues probably are not in the same conformation as those in the C. vinosum structure. Points below the line represent those that make the hit smaller when removed and are, thus, probably in similar positions in the two proteins. After the C. vinosum structure was solved (see Chapter 5), the models were compared by superposition of the coordinates. It was found that the best superposition is of the heme-binding pocket, which includes residue 12 and the residues indicated by the arrow and the label "Heme-Binding." The biggest difference was at the dimer contact in helix B, where the last turn of the B helix is missing in C. vinosum. Thus, the rotation function used in this manner can give useful information about the unknown structure before a complete solution is available.

rotation method, you should do a finer search with the conventional rotation function. Another alternative is to use a least-squares refinement method based on the Patterson correlations as implemented by X P L O R / C N S . is In lSBr/inger, A. T. (1990). Acta Crystallogr. A46, 46-57, and the XPLOR 3.0 manual, p. 278.

168

COMPUTATIONALTECHNIQUES

this method, the probe is treated as a rigid body and its orientation refined to maximize the correlation between IE ohsI2 and IE model]2 given by the function

(]E,,bs[2]Em(~,~i,t~)] 2 ~v/([Eobsl

4

--

(]EobsI2)(]Em(~,~).i,

(IEobsl2)2(IEm(~,~-~i,

ti)] 4 -

ti)]2)) (IEm(a,~;,ti)]2))2) '

(3.24)

where the overall orientation ~ is refined. It is also possible to break up the probe into two or more groups and to refine each orientation, ~i, and translation, ti, individually. The data are normalized from F's to E's by binning the data in shells of resolution with equal numbers of reflections per bin and setting the average of each bin to the same value. In the normal case, where the amplitudes fall off with resolution, this will have the effect of emphasizing the shorter vectors. Another program that performs the same function of refinement of the rotation angles of unpositioned models is INTREF. 16 This program incorporates a radial weighting function to downweight longer vectors, which tend to be the undesired intergroup vectors. INTREF also permits refining an individual group orientation and the relative translations between groups.

Translation Methods Translation Function Once a rotation solution has been found, the translation function can be attempted. This step is not as robust as the rotation method. It is possible to get a correct rotation solution and still not find a translation solution for poor probe structures. Of course, it can be difficult to distinguish this from the case where the rotation is incorrect. One reason for the low signal in the translation function is that correctly and incorrectly positioned molecules differ only in the intermolecular vectors. The intramolecular vectors are identical, being independent of the translation position, and form a large background. The simplest translation function is similar to a correlation function

T(x,y,z) = ~ F~(h,k,l,x,y,z)F~(h,k,l),

(3.25)

h,k,I

where T(x,y,z) should be at a maximum when the probe molecule is in best agreement with the observed data. Notice that F, is calculated at every x,y,z and for every h,k,l, making this a large calculation. Because they are heavily influenced by the solvent, the lowest-order data below 1 0 - 2 5 A are left out ~6yeates,T. O., and Rini, J. M. (1990). Acta Crystallogr. A46, 352-359.

3.7 MolecularReplacement

169

of the calculation. On the other hand, the logic of this seems questionable because the intermolecular vectors, which represent the longest spacings in the crystal, are also of lower order. The high-resolution cutoff is generally about 4 - 6 A because the probe is only an approximation, and the higherorder terms will not agree even at the correct position because of the differences in the precise conformations of the two structures. To remove the background intramolecular vectors, the formula can be recast to remove approximately the self-vectors: T(x2 - x ] ) -

~

[Io - k(F~l + Fh2)] • FM]F;~2exp[-2zrih(x2 - x])],

h

(3.26) which, although it looks complicated, is now a Fourier summation, where FM1and FM2 are the transforms of molecule 1 and molecule 2, and the function gives the vector between molecule I and molecule 2. The function is used to calculate Harker sections as in the heavy-atom case, and the problem is solved in a similar manner by considering the peaks on the Harker sections as self-vectors. Thus, for the space group P222 the three Harker sections are x - 0, y - 0, and z - 0. The self-vectors are located at 0, 2y, 2z; 2x, O, 2z; and 2x, 2y, 0. There should be one unique peak on each section, and solving any two should give a solution that can be confirmed by the third.

Translation Search An alternative to the translation function is an R-factor search where the molecule is moved on a grid and at each point the R-factor between the calculated probe amplitudes and the observed unknown structure amplitudes is calculated (Fig. 3.25). A more robust method is to calculate the correlation function, since it is immune to scaling errors and an accurate scale cannot be calculated at this point in the structure determination. The translation search takes more computer time than the translation function, but it is less prone to error and, as computers become faster, it will become the method of choice. Several proteins that could not be solved using the translation function were solved using the correlation search. The volume that needs to be searched is dependent upon the space group symmetry. M a n y space groups (those that belong to classes 1, 2, 3, 4, see Table 2.2) are polar, and one direction is arbitrary. This direction need not be searched, and if it is, the exact same hit will be found on all sections in this direction. For example, in P2~ there is no need to search in the y direction.

170

COMPUTATIONALTECHNIQUES 1.000

0.000 0.000

!

r

:=~

i

["

1.000

L_

z = 0.5O0

FIG. 3.25 Translation search example: one section, z - 0.5, from a translation search, holding one-half of the C. vinosum cytochrome c' dimer fixed at a previously found position and searching with the second half. The second half was translated throughout the entire cell, and a correlation coefficient was calculated at each point. The map shows the correlation squared times 100. The largest peak in the map gives the correct translation of the second half of the dimer. The streakiness of the map is common in translation searches. Often the correct solution is found where two or more streaks cross.

Molecular Packing There will be only a limited number of positions in the cell where the molecule can pack without overlapping one of its neighbors (Fig. 3.26). Since it is physically impossible for the molecules to overlap in three-dimensional

3.8 NoncrystallographicSymmetry

171

FIG. 3.26 Molecular packing illustration. A hypothetical unit cell with a 2-fold in the center is shown. A molecule is moved across the cell and the symmetry mate is generated at each position. Note that in two positions the molecules do not overlap and represent allowable packings; however, in the middle position the molecules overlap. This packing is not allowed and need not be considered in searches of other types.

space, these positions can be safely eliminated. The complication comes when one realizes that the probe structure may have a loop that is not in the u n k n o w n structure and, thus, a false overlap may be found. Conversely, the probe may not overlap when the u n k n o w n will. Thus, in practice, one needs to be fairly generous in deciding on the a m o u n t of overlap allowed. Also, since proteins are fairly symmetrical, one could imagine a perfectly good packing with one or more directions flipped. Although a unique solution may not be found, a large number of possibilities can be eliminated. H a r a d a and co-workers have incorporated an interpenetration penalty into their translation function to incorporate molecular packing information. 17 Another approach is to run a packing search, generate a list of good packing positions, and calculate the correlation function for each of these positions to find the best translation.

.....

3.8 . . . . .

NONCRYSTALLOGRAPHIC

SYMMETRY

Noncrystallographic symmetry, or local symmetry, is symmetry that exists locally within the asymmetric unit of the crystal (Fig. 3.27). For example, the protein may crystallize as a dimer in the asymmetric unit so that there is a dyad axis relating the halves of the dimer that is not coincident with a crystallographic 2-fold. This information can be used to produce averaged maps m which no~sc will tend to ~,,,~,~ ,,,~ ,~,,uA ,.,,,,~" ~'~,~uscd as a v , , ~ restriction to improve phasing. In the extreme case of viruses with 20- to 30-fold redundancy in the asymmetric unit, the phasing restrictions become so high they can be used to phase reflections de novo. 9

"

~ l ~

,~

c,,~

17y. Harada, Lifchitz, A., Berthou, J., and Jolles, p. (1981). Acta Crystallogr. A37, 273.

172

COMPUTATIONALTECHNIQUES

A

FIG. 3.27 Noncrystallographic symmetry averaging. (A) Stereo view of the contents of one asymmetric unit of bovine superoxide dismutase MPD crystal form. There are two dimers arranged head to head. Neither of the two dimer axes intersects a crystallographic axis and, thus, they are noncrystallographic. The densities of all four molecules were averaged in the original solution of the protein and aided in the tracing of the chain. (B) The medium-resolution density map of C. vinosum cytochrome c' shows an example of a noncrystallographic 2-fold. The axis of the 2-fold is indicated by the "X." The noncrystallographic dyad is nearly parallel to Z.

A symmetry operator for noncrystallographic (NCR) symmetry takes the same form of a 3 • 3 rotation matrix and a translation vector. Crystallographic symmetry operators can be expressed in the same form but can take only certain values. Noncrystallographic symmetry operators are not so constrained. It is usually customary to write crystallographic operators in algebraic form to show their generality, while N C R operators are usually written explicitly. Crystallographic symmetry operators can be looked up in tables, while N C R symmetry operators must be calculated on a case-by-case basis. It is necessary to have a list of equivalent points, either from a preliminary model or from equivalent points found in a map, to which a leastsquares symmetry operator is calculated. The XtalView program pdbfit can calculate the matrix and vector that best fit two matched PDB models. Since pdbfit does not check the validity of the model, any arbitrary points could be input as fake atoms. It is necessary only that the equivalent pairs have the same atom type. Choose one molecule to be the standard reference position and calculate all the pairs between this molecule as the target and the other(s) as the source model. If you have solved the structure with heavy atoms, these

3.8 NoncrystallographicSymmetry

173

B 0 127 0 . 1 1 1

!

,

!

1.22:

r y

g-

e

Q H

~

~ o

.

1

0

9

. 1

Z3

Z = 0.528;0.537;0.546;0.556;0.565;0.574;0.583;0.593;0.602;0.611;0.620;0.630; FIG.3.27 (continued). pairs could be heavy-atom pairs. If you are able to fit only a portion of the unaveraged map, then these pairs can be partial models. Thus, if you have a dimer of A and B, there will be two operators A onto A, and B onto A. For four molecules A, B, C, and D, the four operators will be A onto A, B onto A, C onto A, and D onto A. One big difference of noncrystallographic symmetry is its local nature. A crystallographic symmetry operator can be applied to any point in the unit cell to find an equivalent point elsewhere. A local symmetry operation only relates a specific volume to another specific volume. There will be points in

174

COMPUTATIONALTECHNIQUES

the unit cell where the local operation does not hold. Therefore, a knowledge of this volume is required to use noncrystallographic symmetry. Usually this volume is one of the polypeptide chains, and the local symmetry operation describes how that chain is related to another. This description is useful in refinement programs. Before a model exists, this description is less useful. It is often desirable to use local symmetry to average out noise in electron density maps. If there are three or more copies, this method becomes very powerful. An approximation may be useful where a box (or sphere) serves as the bounds and the operator relates this box to another, equivalent box. The box describes one of the equivalent molecules as can best be determined from the present map, and then there are one or more operators that relate how a point in other volumes (i.e., molecules) relates to this point. It is possible that the edges and corners of the box are not involved in local symmetry at all. This will cause a blurring. In most cases this blurring will be outside the bounds of the molecule and will not matter. Since every one of the equivalent molecules is packed differently, the crystal contact points will tend to be blurred. If you are using the averaging to help with fitting, this will not be a problem. If you are using it for phasing purposes, this will be more of a problem. Another important difference of noncrystallographic symmetry is that it is not necessarily exact. Some parts of the molecule may be more similar than others. Usually the core and main chain of the molecule agree with the noncrystallographic operator to a close tolerance, but the surface side chains, which are more flexible, find themselves in different environments in each of the molecules and adopt different conformations (Fig. 3.28). When these multiple conformations are superimposed on one another with noncrystallographic symmetry operators, the density for them becomes blurred and confused. Outside loops of the protein may also adopt somewhat different or entirely different conformations. Still, noncrystallographic symmetry can be invaluable in tracing the main chain of most of the protein. With better phases you can then fit all the molecules independently to improved maps to find the differences. When you are using noncrystallographic symmetry during refinement, make some provision to allow for the different tolerances of fit. Since at the beginning of the refinement it is not known whether divergence in the model is real or is an error you want to refine away, the main chain is usually tightly constrained but the side chains are not. A periodic check should be made to see whether constraints should be loosened for a particular region. The constraints should permit the nonequivalent regions to drift apart. One strategy is to start the refinement with everything tightly constrained until convergence is reached. The constraints can then be removed and the model refined further. Regions of the model that have drifted apart can then be loosely

3.8 NoncrystallographicSymmetry

175

J

FIG. 3.28 Superposition of bovine superoxide dismutase (SOD) monomers: a narrow slab through the molecule. Note that in the center the four models superimpose closely,while at the solvent edges the side chains are in widely differing positions. The models of all four monomers in the asymmetric unit of refined bovine SOD are superimposed for comparison.

constrained, or not at all, and the portions that have stayed close can be tightly constrained.

Self-Rotation Function In order to discover noncrystallographic symmetry, a self-rotation function is helpful. This is similar to the rotation m e t h o d used previously, except that the probe amplitudes and the u n k n o w n amplitudes are the same. By looking on the sections of the outermost nested angle (usually K), the position of these noncrystallographic symmetry elements can be found. For example, the section K = 180 contains 2-folds. That is, when the second copy of the data is rotated 180 ~ with respect to the first, near the direction of the 2-fold it will superimpose and give a peak in the function (Fig. 3.23). The position of the peak on the section gives the two angles that define the direction of the 2-fold. Similarly, peaks on the section K = 120 are due to 3-folds; K - 90, 4-folds; K = 72, 5-folds; and K = 60, 6-folds. All translation information is lost, so that if the symmetry element is a combination of a rotation and translation, the self-rotation function will indicate the rotational portion only. The other difficulty with the self-rotation function occurs when the

176

COMPUTATIONALTECHNIQUES

noncrystallographic symmetry is parallel to, but not coincident with, a crystallographic symmetry axis. In this case, the crystallographic symmetry will hide the noncrystallographic peak. For example, in the case of the Chromatium vinosum cytochrome c' structure discussed in Chapter 5, the direction of the 2-fold relating the halves of the dimer was very close to the c direction and could not be distinguished from the peak due to the crystallographic 2-fold screw along c. Also note that 4-folds are equivalent to two overlapping 2-folds 90 ~ apart. Therefore, if you have a peak on the K = 90 section, there will also be a peak on the K = 180 section in the same direction. A similar relationship holds for 3-folds and 6-folds.

. . . . . 3.9 . . . . . DENSITY MODIFICATION There are as many ways to handle density modification as there are programs written for it. They all seek to improve the phases by imposing restrictions on the density in real space and then using the phases of the modified map to filter or replace the experimental phases. ~ The weakness of this method is that there is no a priori way to judge the correctness of a density and, even if one were certain that a density is incorrect, determining the correct value is difficult. The success of the method may be highly problem and operator dependent. One assumption often made is nonnegativity, which is valid only at very high resolutions (above about 1.5 a), even with the F000 term included. Because lower-resolution maps have series termination effects, they will never be completely positive. A more rational approach is to assume that the most negative density is probably incorrect and that the largest peaks are also incorrect.

Solvent Flattening Solvent flattening is a more straightforward technique and usually more successful. This method assumes that any density in the solvent region of the protein arises from noise fluctuations and that the solvent density should be flat everywhere throughout. (Actually, examinations of high-resolution maps with accurate phases still show some fluctuation in the solvent region even though there is no solvent model included. Still, the solvent is much closer to flat than the rest of the map.) The trick in this method is to find the boundary between the solvent and the protein. B. C. Wang has invented an averaging 'STulinsky, A. (1985). "Phase Refinement/Extension by Density Modification," in Methods in Enzymology, Vol. 115, pp. 77-90. Academic Press, San Diego.

3.9 DensityModification

177

technique that has been successful at coming close to the correct boundary even in very noisy maps and has led to the development of automated solvent-flattening programs. 19 The algorithm is equivalent to a low-pass filtering of the data in reciprocal space. The lowest points in this smoothed map are then taken to be the solvent and the remainder protein. Because the technique is not completely accurate, it may be wise to tell the program that the percent solvent of your crystal is about 5 - 1 0 % lower than it really is. Another method is to consider this automated mask a first approximation and to edit the mask by hand to increase its accuracy. If a completely automated mask is used, it should be periodically recomputed as the phases improve. This mask is used to flatten the solvent regions of the map to produce a modified map. Other filtering may also be done at this point, such as removing excessively negative or positive density. This filtered map is inverted through a reverse Fourier transform to produce new structure factors. The new structure factors are then combined with the original MIR phase data to produced filtered structure factors. The filtered structure factors are used to produce a new map, and the process is repeated until the process converges. Convergence is reached when the phase change for a cycle is only a few degrees from the start of the cycle. These procedures have been implemented in a package of programs: PHASES. 2~ In favorable cases the maps can show considerable improvement (Fig. 3.29).

Calculating Percent Solvent There are two formulas for calculating the percentage of solvent of your crystal given one of two pieces of information: molecular weight M W in daltons, or the number of residues nR. The other information needed is Z, or the number of equivalent positions in your space group (look up in the International Tables if unsure), V the volume of the unit cell in cubic angstroms, and n, the number of molecules in the asymmetric unit. The formula for using M W is % protein =

M W x Z x n x 1.2035 x 100

and % solvent = 1 0 0 -

V % protein.

'

The factor 1.2035 comes from multiplying the partial specific volume of a typical protein molecule by Avogadro's number, and the 100 converts the 19Wang, B. C. (1985). "Resolution of Phase Ambiguity in Macromolecular Crystallography," in Methods in Enzymology, Vol. 115, pp. 90-112. Academic Press, San Diego. 2~ PHASES package is available from its author, Dr. William Furey, University of Pittsburgh. You can also use the CCP4 package and xfit V4-0.

178

COMPUTATIONALTECHNIQUES

results to percent. To use the number of residues, we make the approximation that the average protein residue has a volume of 140 A3: % protein =

n R x 140 x Z x n x 100

% s o l v e n t = 100 -

, and

V % protein.

Finally, the general formula for V is: V = abcV/1

-

cos 2 ce - cos 2 fl

-

cos

2 ")/ -{-

2(cos ce cos fl cos y).

HistogramModification A commonly used method for two-dimensional image processing is to modify the histogram of the image to match the expected histogram. The histogram in this case is the number of times a particular density value occurs. The densities are grouped into bins, and the number of densities that fall into that bin is recorded. This histogram is then compared to the expected histogram. In the case of a photograph, the expected histogram is computed from a similar image that has the characteristics desired. The histogram of the first image is then modified by multiplying each value in the image by a factor such that the final histogram is the same as the one expected. In the case of a protein, the expected histogram is that of the electron density map of a well-refined protein. It is important that the electron density maps whose histograms are being compared have the same resolution range and similar solvent contents. Tests have shown that it is not important that the proteins have the same fold. The histogram of mostly helical human hemoglobin is identical to that of mostly sheet bovine copper-zinc superoxide dismutase (Fig. 3.30). The presence of a metal is not a problem as long as it represents only a small portion of the total electron density. The method for modifying the histogram is the same as for a two-dimensional image. The electron density values are binned by size, and the densities of each bin are scaled so that the final distribution is the same as that of the well-refined

FIG. 3.29 Phase improvement by solvent flattening. A section of helix and a section of turn in a four-derivative MIR map before and after solvent flattening and with the final refined phases at the same resolution. (A) Helix with starting MIR phases. (B) After solvent flattening. (C) Calculated phases from refined model for comparison. (D) The turn region using MIR phases. (E) After solvent flattening. (F) With refined phases. Note that the contrast of both maps is improved and some details are clearer after solvent flattening. In the turn region a false connection has become overemphasized and could lead to a false assignment. The original MIR map can be used to resolve the ambiguity. Compare the helix section with Fig. 3.20.

+

m

m 8

FIG. 3.29 (continued).

3.9 DensityModification

I

FIG. 3.29 (continued).

181

182

COMPUTATIONALTECHNIQUES

standard protein. This map will be easier to interpret, mostly because the contrast will be increased. Histogram modification can also be used as a density modification procedure to refine the protein phases. In this method the histogram-modified map is inverted to produce calculated phases that are combined with the original phases, and this produces a new phase set. The process is then recycled through the process until the phases converge, which happens slowly. In our tests we have used 2 0 - 4 0 cycles of cycling to improve MIR phases. The difference between the expected histogram and the one observed can be used as a guide to the progress of the method.

. . . . . 3.10 . . . . . MULTIWAVELENGTH DATA WITH ANOMALOUS SCATTERING Recently a number of structures have been solved by the multiplewavelength method, which takes advantage of the tunability of synchrotron radiation X-ray sources and the presence of anomalous scatterers that have absorption edges in the wavelength range of 0.6-1.8 A. For the method to work, there must be such anomalous scatterers in your molecule. Proteins often have these naturally in the form of metal cofactors. The method may also be used to solve a protein with a single derivative that contains an anomalous scatterermmost of the metals that are used for protein derivatives also have usable anomalous signals. In this case, isomorphism is not a consideration, since the structure that is solved is the structure with the scatterer included. If the protein can be produced in E. coli, there is the possibility of substituting methionine with selenium-methionine, which gives a

FIG. 3.30 Protein histogram examples from two proteins of radically different structure at 2.0 A resolution. (A) Superoxide dismutase is an eight-stranded B-barrel. (B) Human hemoglobin (HHB) is an all-helical protein. Both proteins were crystallized in cells with nearly identical solvent contents but with different space groups. Nevertheless, the histograms are nearly identical. The total histogram (solid lines) has been divided into the contributions from the protein (dotted lines) and the solvent (dashed lines). The protein portion contains a long tail of larger density values. These are the values we contour in order to interpret the structure. The negative values are smaller than positive ones but cluster more at an intermediate value. The solvent clusters tightly around 0 but is never completely 0. This is because there is always some weak density in the solvent region due to weak scattering of the solvent molecules, even after accounting for bound water molecules. The electron density maps used for producing the histograms were calculated without including the F000term. Including F000would shift the histograms to the positive side by the amplitude of F000 but would not change the shape. (C) and (D) show the effect of resolution on the histograms of copper-zinc superoxide dismutase and hemoglobin, respectively.

3.10

183

Multiwavelength Data with Anomalous Scattering

A 14634

/ /

N(rho)

1

"~ o..~ !

!

-250

-200

150

- 1O0

-50

0

50

1O0

150

200

150

200

! 250

Size of rho

B 14211

N(rho)

0

250

200

150

100

50

0 50 Size of rho

100

250

34211

N(rho)

' .~ o*~

I

..

I

9

. . . . . /..;'.Z"..-./:L.ir'" . . . . . . ... ... ~_. ;_: .'

-25o

-2oo

-15o

-loo

-5o

_ ~. ~

... . . . .

o 50 Size o f rho

i/)

,.

1oo

is 9

1

S O D at 2.0 A

5

...........

S O D at 4.0 A

2

.............. S O D at 2.5 A

6

.......

S O D at 5.0 A

3

........

S O D at 3.0 A

7

..........

S O D at 7.0 A

S O D at 3.5 A

8

4

200

250

S O D at 10.0 A

D 32800

N(rho)

I

k.-:-.... ;-_.-_--~.;--_-...,._-,,r -25o

-2oo

-15o

-loo

~.':',,

Ii

....~ . . . . .,.-.-.- ~ ~ ~ - - - ~ : -5o

o

50

loo

15o

200

Size o f rho 1 2

..............

3

........

4

H H B at 2.0 A

5

...........

H H B at 4.0 A

H H B at 2.5 A

6

.......

H H B at 5.0 A

H H B at 3.0 A

7

..........

H H B at 3.5 A

8

FIG. 3.30

(continued).

H H B at 7.0 A H H B at 10.0 A

250

3.10 MultiwavelengthDatawithAnomalousScattering

185

usable anomalous signal. The most complete discussion of the method can be found in Helliwell. 2~ Small blocks of data are collected, with each block being measured at three to four wavelengths. The wavelengths are picked to maximize the differences in the real and imaginary components of the anomalous scattering. This is done by picking wavelengths near the absorption edge of the anomalous scatterer. If possible, it is desirable to collect Bijvoet pairs close together in time. Often this can be arranged by aligning the crystal so that a mirror plane is present on the face of the detector. Otherwise, alternate blocks of data can be collected such that the Bijvoets are collected in alternate blocks. Scaling is complicated by the necessity of keeping these pairs of data together. The phases are then computed by least-squares fitting of the terms to the multiwavelength data. For this to work, the data must be of exceptional quality and accuracy, as the dispersive terms involved are usually small effects, on the order of 3 - 5 % . Unlike the case of isomorphous phasing, the phasing power actually increases at higher resolutions. The scattering factor for the protein atoms falls off rapidly with resolution, while the scattering factor for anomalous dipersion stays almost constant with resolution, so that the percentage difference increases. All data can be measured from a single crystal, alleviating scaling problems, and, consequently, there is no isomorphism problem. The size of the expected differences can be estimated from formulas similar to those given for isomorphous differences (Eq. 3.13). For the Bijvoet differences the equation is 0.77Af"x/-N~A X/MW '

(3.27)

and for the dispersive difference the equation is 0.39(a~f' - a~f')x/N~A X/MW '

(3.28)

where MW is the molecular weight of the protein and N A is the number of anomalous scatterers.

Choice of Wavelengths At least three wavelengths are recorded for each reflection. It is necessary to find the absorption edge of the crystal using an X-ray fluorescence 21Helliwell, J. R., Macromolecular Crystallography with Synchrotron Radiation, pp. 338-382. Computer programs for this method were written by Wayne Hendrickson, Columbia University, New York.

186

COMPUTATIONALTECHNIQUES

scan (XAS). The wavelength is scanned through the range expected for the edge, and the X-ray fluorescence is measured with a scintillation counter placed near the crystal. It is not sufficient to look up this value in a table, since the chemical environment of the anomalous scatterer shifts the edge (Fig. 3.31). The spectrum is then separated into the real and imaginary components to find Af' and Af" as a function of wavelength. The wavelengths chosen are at the f" maximum, the f' minimum; these are found at the absorption peak and edge, respectively (Fig. 3.32). A third, remote, point is

r' solution vs. crystal. .

.

.

.

.

-15 -16 -17 -18 -19 -20

PtCI4

\~~_~

+ PtCI4

-21 .

.

.

11540 f"

.

.

.

.

i

11550

,

,

,

i

11560

energy (eV)

!

11570

11580

solution vs. crystal protein crystal

12 10

11540

in ;tlc~t~~

,

,

I

11550

I

11560

~",,,,~

,

,I

11570

11580

energy (eV)

FIG. 3.31 XAS scan of platinum edge in bound and free forms: plots of f' and f" of PtCl4 free in solution and bound to the protein. The binding to the protein significantly alters the platinum edge, illustrating the necessity for a fluorescence scan to determine the exact edge for each sample used in MAD phasing.

3.10 MultiwavelengthDatawith AnomalousScattering

187

5~ 2.5 0.0-

_~

-2.5 -5.0

-7.5

-10.0 -

|

i

-12.5 -

. ~ [ ~ L . _ . . _ . . ~ . _ l ~ J ~ J ~ . J . ~

6800

7000

7200

7400

7600

7800

8000

5t 4-

0 [

I -8

I -7

I -6

I -5

I -4

Energy (eV)

I -3

I -2

FIG. 3.32 Choice of wavelength for MAD phasing of an Fe4S4 protein. (A) Graph of energy versus absorption for a sulfite reductase crystal. The upper line represents the f" component and the lower line is the f' component. The thick lines are the observed values and the thin lines are fitted. The change in absorption is caused by the anomalous scattering of the Fe4S4 cluster in the protein, which causes a step in f" and a dip in the f ' scattering components. The positions of the three wavelengths chosen for data collection are indicated by the vertical lines. If a fourth wavelength was chosen, it would be before the edge to the left. (Scan done at the Stanford Synchrotron Radiation Laboratories by Brian Crane and Henry Bellamy.) (B) Another representation of the same data, except that f" is plotted against f'. The three wavelengths should be chosen to maximize the area of the triangle, which is proportional to the phasing power.

chosen to maximize the difference in f', alf,~ -- a2f;~. In some cases a fourth point before the edge is chosen where f" is at a minimum. This adds little phasing information but may help by increasing redundancy.

188

COMPUTATIONALTECHNIQUES

Collection of Data To minimize effects of crystal decay and machine drift, the data for each wavelength should be collected as close together in time as possible. This requires a computer-driven monochromator that can be accurately positioned many times. The monochromator should be calibrated frequently to check for drift by scanning the absorption edge of a piece of metal foil with an edge close to the wavelengths being collected. Ideally, if there is mirror symmetry in the data, Bijvoet pairs should be arranged to occur on the same frame of data by aligning crystal such that the mirror plane bisects the detector. Alternatively, two short runs can be done, one at a given X and ~b, and the next at X = - X and q5 + 180, to collect the Bijvoets close to each other in time. The counts in each peak must be high in order to get precise and well-determined data. The data should be carefully processed. Scaling can be done with local scaling because the expected differences are usually small. Bijvoet pairs are usually kept together in the same run and not averaged across multiple runs. Each pair can be treated separately in subsequent steps. Even for the dispersive difference measurements between two wavelengths, the Bijvoet pairs are needed because the dispersive difference is the difference between pairs at different wavelengths, a,F ~ - a,F 2

a~F ~ - a~F , 2

(3.29)

and all four terms are needed to measure the difference, so data completeness is important.

Location of Anomalous Scatterers The scatterers can be located with AF Pattersons as is standard for heavy-atom work. A better method is to use all the wavelengths at once in a least-squares procedure to find the best value of FA, the anomalous scatterer(s) structure factor. 22 In this way the method is different from most because quite accurate values of FA can be found directly from the data rather than by approximations as in the isomorphous acentric case. Pattersons made with these values should be much cleaner and thus easier to interpret. If the Patterson is too complicated to solve, the multiwavelength-fit FA values could also provide the basis set for the direct methods commonly used for small molecules. The positions of the anomalous scatterers are refined 22The least-squares procedure is described in detail in Hendrickson, W. A., et al., (1988). 4, 77-88.

Proteins: Struct. Funct. Genet.

3.10 MultiwavelengthDatawithAnomalousScattering

189

against the FA values, as is done for isomorphous heavy atoms. If the Patterson is clean, this refinement should be straightforward.

Phasing of Data The phase equations for MAD phasing are ]aF(+h)]2--lOFT[2 + a(A)I~

2

0~g )

+ b(a)I~176176

0~g),

+ s(B)c(a)I~176176

(3.30)

with coefficients a, b, c defined as follows: a(A) =

b(A)

-

(f'(a) ~ + f"(a) 2) fo2

2 f'(A) f0 '

c(a) = 2/''(a) f0 9

(3.31)

The subscript T designates the total structure, and the subscript A designates terms from the anomalous scatterers alone. The superscript 0 means the wavelength-independent value. The factor s(h) is a sign and is positive for F + and negative for F-. Once the locations of the anomalous scatterers are known, phases can be calculated by calculating the phase 4~Afrom these positions. The protein phase is then given by qSx = A4~ + (/)A where A& = (4~T-qSa) is from the multiwavelength fitting procedure. Equations have been derived by Hendrickson and co-workers 23 for calculating HendricksonLattman coefficients AMAD, BMAD, CMAD, DMADSO that the phases can be combined with other phase sources such as an isomorphous derivative, if so desired. Of course, the correct hand must be used as for other anomalous dispersion techniques. This is done by trying both possibilities and examining the two maps. One should have clear solvent channels and the other should be less clear. Alternatively, the correct map could be selected by comparing the electron density histograms (see above). The other terms needed for the equations are the af,, and af' values. The values can be extracted from the X-ray fluorescence data. The values are often anisotropic along different crystal directions, which introduces a small error if a single value is assumed. Another method is to refine these values 23Pahler, A., et al. (1990). Acta

Crystallogr.

A46, 537-540.

190

COMPUTATIONALTECHNIQUES

and treat them as variables. Weis et al. 24 have reported a method for doing this in the solution of the calcium-dependent lectin domain from a rat mannose-binding protein by MAD phasing using the Lti~ edge of holmium. The use of holmium gave large diffraction differences between Bijvoets and different wavelengths ranging from 10% to 27%incomparable to multiple isomorphous replacement--and the maps produced were of excellent quality. MAD as a Special Case of MIR An alternative formulation of the MAD phasing problem treats MAD as a special case of MIR. In this formalism, the anomalous pair data at each wavelength are considered equivalent to native anomalous scattering where differences are due to the imaginary, f" component. This is identical to the treatment of the Friedel pairs in the SAS (single anomalous scatterer) case of MIR, where the difference in F + and F- for the acentric reflections is used for phasing. The change in f' between wavelengths, Af' = f'~ - f'2, can be considered equivalent to an isomorphous difference. Although this change in f', typically 3-10 e-, is small compared to 80 e for a single-site mercury derivative, there are two major advantages. First isomorphism is guaranteed and second, all the measurements can be made from a single crystal in an identical manner, minimizing errors in the measurements. Furthermore, since the signal from f" is orthogonal to the signal from Af', even though the data are from the same scattering center, the two signals provide completely independent phasing information, and when they are combined the phase ambiguity is completely eliminated (assuming perfect data). For centric reflections there is no f" signal; however, as in the isomorphous case, the A f ' signal unambiguously defines the phase as either plus or minus. When considering pairs of wavelengths it is important to treat the wavelength with the minimum in f' as the "native" and the other wavelength with the larger f' as the "derivative." This will keep things consistent with the derivation of the isomorphous phasing equations. Otherwise the calculated phase will be opposite in sign to the native protein. To simplify the discussion, we will introduce some notation to identify the wavelengths used in a standard three-wavelength MAD experiment (Fig. 3.32B). The point of inflection from the XAS scan will be called PI and is the minimum in the f' signal. The maximum in the XAS scan, also called the white line, is called PK for peak. The wavelength above the edge 2-swill be 24Weis, W., etal. (1991).Science 254, 1 6 0 8 - 1 6 1 5 . 25"Above" in this case means higher energy. Since wavelength is inversely proportional to energy, higher energies have shorter wavelengths. The relationship is a = 12.398/E, where a is the wavelength in A and E is the energy in kilo electon-volts.

3.10 MultiwavelengthDatawithAnomalousScattering

191

called RE for remote. If more wavelengths are collected above or below the edge they can be labeled RE2, RE3, and so forth. Collecting wavelengths below the edge is the least useful because they have no anomalous signal, as f" is 0 below the edge. Thus if a fourth wavelength is collected it is probably best collected above the edge. It can be advantageous to collect a remote wavelength far above the edge at 1.0 A for edges that are at long wavelengths (>1.3 A), since absorption is minimized and this can provide a good reference wavelength for scaling out the errors due to X-ray absorption, as well as a good "native" data set for refinement. The data sets (i.e..fin files) for the heavy-atom phasing program are thus: PI native anomalous PK native anomalous RE native anomalous PI-PK isomorphous (PI merged with PK) H-RE isomorphous (PI merged with RE) PK-RE isomorphous (PK merged with RE) In this notation, PK will have the maximal size of f" and provides the largest anomalous signal. PI-RE will have the largest dispersive difference or Af'. To check the usefulness of each of the six possible data sets, the Bijvoet difference Patterson map can be checked for each wavelength and the isomorphous difference Patterson map for each wavelength pair. If the Patterson map is noisy and the peaks due to the heavy atom are small (<30-), the data set will contribute little to the phasing and probably should be left out. To prevent overestimating the figure of merit, each data set should be weighted 1/3, since PI, PK, and RE all have the same phasing information and PI-PK, PI-RE, PK-RE all have the same phasing information. To locate the heavy-atom sites the FA coefficients as described in the preceding section can be used in a Patterson map or as coefficients in a heavy-atom position finding program such as SHELXS, Shake-and Bake, or xhercules. However, usually the Patterson map of the PK or the PI-RE pair will be sufficient for solving the heavy-atom positions, especially when an endogenous metal atom is being used for the phasing. Another good map to use is the isomorphous + anomalous Patterson map, where the map coefficients are v/Ai2o + A2no for the PI-RE pair that has the largest dispersive difference with the next largest anomalous signal. For difficult cases, such as multiple selenium-methionine sites, the FA coefficients can be used and an ab initio phasing program such as Shake-and-Bake can be used to solve the heavy-atom positions. As in the MIR case, it is necessary to have the heavyatom positions to calculate the protein phases. However, as in the MIR case, a partial solution with, say, half the sites, may be good enough to calculate

192

COMPUTATIONALTECHNIQUES

protein phases with enough accuracy to make difference Fouriers: either Bijvoet difference Fouriers of PK or isomorphous difference Fouriers of PI-RE, which can be used to find the additional sites. You can also make (FA, ~MIR) Fourier maps to find the sites. Finally, you may have some other phasing information, such as an SIR data set from a heavy-atom derivative, which can be used to make difference of FA Fouriers to find the MAD sites. One real advantage of the MAD as MIR method is the ease of treating the case of multiple types of anomalous scatterer in a single crystal. In this case you have two metals that have different edges, and the metal that has the edge at higher energy will have some the f" signal from the other metal mixed in. The amount of f r mixed in will be much smaller, since this signal dies off more rapidly as the energy is increased (see Fig. 3.32A). Since each data set is treated separately and independently refined in the MIR as MAD method, this causes no difficulties in the phasing as it does in the Hendrickson MAD method, where a single solution is used for the phasing calculation. In the MIR as MAD method, each solution can be different for each wavelength. In practice, for Fe MAD experiments multiple anomalous scatterers are always present because sulfur has a small but measurable signal at the Fe edge. This shows up in the anomalous data sets between Bijvoet pairs, but not in the dispersive differences between wavelengths. To find the sulfurs, first phase using just the Fe positions. Use these phases to make Bijvoet difference Fouriers of the PK and/or RE data sets at around 4 A resolution. The sulfurs should show up as peaks about 1/6 as large as the Fe peak(s). These can then be treated as minor sites and refined as usual in the MIR method. In fact, it is worth searching for minor sites in all the data setsmit is not necessary to know their source, just as in the MIR case. Any large peaks (say above 5 ~r) in the difference Fouriers can be added to the solution for the data set and refined as minor sites. Besides sulfur atoms in methionines and cysteines, it is not uncommon for protein crystals to have bound Zn 2' or other cations, which can show up in the MAD data. All the sites can be treated as any convenient atom, such as Pt, and the heavy refinement program will refine the occupancy and B appropriately. Of course, after you have solved the structure you should go back and find the source of these peaks. The sulfur Bijvoet difference signal is especially useful for verifying the chain trace as these are markers for cysteine and methionine. Since the Patterson maps are ambiguous as to the absolute configuration (or hand) of the heavy-atom positions, it is necessary to try both configurations. The other configuration can be calculated by inverting each positon through the center by changing the sign of each coordinate (x, y, z ~ - x , - y , - z ) and recalculating the phases. One map will show protein and the other nonsense (see Evaluating Map Quality, in Sec. 3.12). The MAD as MIR method can be adapted to all the heavy-atom phasing programs in common use such as PHASES, XtalView/xheavy, and

3.11 Refinementof Coordinates

193

CCP4/MLPHARE and SHARE A variation used Terwilliger's SOLVE program is to merge first the Bijvoet pairs and the the dispersive differences into two data sets representing f" and f ' , respectively, and then phase the problem as SIRAS. 26 SOLVE also completely automates the process in favorable cases starting with just the raw data. Multiple MAD (MMAD) Phasing Using the MAD as MIR method it is easy to merge information from more than one MAD experiement for two or more MAD experiments on the same protein. For instance you may have a native anomalous scatterer such as Fe and a heavy-atom derivative such as Hg. In this case you can collect MAD at both Fe and Hg edges and combine this phasing information by including all 12 data sets (six from each MAD experiment at three wavelengths PIFe, PKFe, REFe, PIFe -- PKFe, PIFe -- REFe, PKFe -- REFe, PIHg, PKHg, REHg, PlHg -- PKHg, PIHg -- REHg, PKHg - REHg) from the MAD experiment, as well as the Fp - FpH isomorphous data set (REF~ - REHg) in one large run of the heavy-atom phasing program. Using M M A D you maximize the amount of phasing informaiton you can obtain over the conventional MIR experiment. Of course, you have to have access to suitable beam line at a synchrotron radiation source for M M A D , whereas MIR can be done in your lab using a conventional X-ray source.

. . . . . 3.11 . . . . . REFINEMENT OF COORDINATES Available Software The three most commonly quoted refinement programs are PROLSQ, XPLOR, and TNT. All three have been used with success, as have several others. PROLSQ was written originally by Hendrickson and Konnert and is a reciprocal-space, least-squares refinement program. T N T is a similar program from Ten Eyck and Tronraud that uses a fast Fourier transform and is faster than PROLSQ (although FFT versions of PROLSQ are available). XPLOR, written by Axel Briinger, a very versatile program already mentioned in the molecular replacement section, can do conventional leastsquares refinement as well as simulated annealing refinement where the 26Terwilliger, T. C. (1997). "Multiwavelength Anomalous Diffraction Phasing of Macromolecular Structures: Analysis of MAD Data as Single Isomorphous Replacement with Anomalous Scattering Data Using the MADMRG Program," in Methods in Enzymology, Vol. 276, pp. 530-537. Academic Press, San Diego

194

COMPUTATIONALTECHNIQUES

protein is simultaneously subjected to molecular dynamics and crystallographic refinement to increase the radius of convergence.

Rigid-B0dy Refinement Rigid-body refinement is the process of refining the positions of rigid groups of atoms against the observed amplitudes. The protein model is divided into one or more groups. For each group it is possible to refine up to six parameters, three rotations and three translations. The shifts in positions for the groups are calculated by combining the shifts for all the atoms in the group. Rigid-body refinements should start at the lowest resolution range with enough reflections if large shifts are desired. The maximum size of a shift is limited to less than the smallest d-spacing included in the refinement. If too high a resolution is used, it is likely that the groups will be caught in a local minimum. On the other hand, the resolution must be high enough to ensure that the refinement is overdetermined by a large margin. Resolution is calculated by considering each group to be six parameters and then dividing the number of reflections in the resolution range being considered by the number of parameters. In practice, this number should be greater than approximately 10. The main trouble with rigid-body refinement by least squares occurs when a local minimum is found before the correct minimum. Otherwise, rigid- body refinement is a robust technique because it is possible to overdetermine it greatly. Rigid-body refinement should be the first step for molecular replacement models.

R-Factor or Correlation Search for Rigid Groups A method related to rigid-body refinement by least-squares is the Rfactor or correlation search. The protein is divided into one or more rigid groups that are moved as a whole. These groups are then moved over a large range, such as the entire asymmetric unit on a grid of about one-third the minimum resolution, and at each point either an R-factor or a correlation is calculated. The correlation has the advantage that it is independent of the scale factor so it is more accurate if the model is incomplete. The obvious advantage of this method is that it can find the lowest minimum for the model because it checks all positions and does not get trapped in local minima. It is usually possible to do a three-parameter search, either the three translation axes or the three rotation axes; but for a full six-parameter search, where all rotations are tried at every translation, the amount of computer time needed becomes impractical. If packing is taken into consideration, then some of the positions will be clearly impossible because of overlaps in the model, and these positions may be skipped. It may then become fea-

195

3.11 Refinementof Coordinates

sible to try all the remaining positions. Since a rotation solution usually is known, instead of trying all six parameters, all the translations are tried, and the rotations are done over a small range at each translation to adjust for small angular errors in the rotation solution. The other limiting factor, and a far more serious one, is inaccuracies in the search model. If you already know the correct model, you are unlikely to be doing the search. The amount of inaccuracy in your model can make the correct minimum only slightly lower than other minima or perhaps not even the lowest one. To get around this, it is often desirable to examine the lowest 10 or so minima, especially if there is not a minimum that is clearly lower. At each minimum further refinement may be tried to see whether there is a clear winner. If the model is known to be close to the correct one, then even if a huge amount of computer time is needed, it is clearly worth doing a large search to find the correct position.

Protein Refinement Unless one is working at very high resolutions, it is necessary to use stereochemically restrained refinement of proteins. This is because the ratio of observed (Fo) to refinable parameters (x, y, z, B) is below 1.0 and the problem is undetermined (see Table 3.1). Even at high resolution, the errors in the data due to the relatively weak scattering of protein crystals, especially at higher resolutions, make restrained refinement a good idea. Different programs have different schemes for incorporating the stereochemical information, but for ease of conception we will use the energy model to explain the process. The different terms can be thought of as energy terms in an equation combining all the information: Etotal--

WA~Ecrystallographic -]- W B ~ E b o n d +

WD~Etorsi ....

+ WFEEpl. . . .

gles +

groups

dist . . . . .

-~- W c ~ E b o n d

angles

WE~Enonbonded contacts

+ WG~Echiral vol. . . .

,

(3.32)

where the w terms are used for the relative weighting of the different terms. The energy for the crystallographic terms comes from the difference in Fo and Fc, and the energy for the stereochemical terms is evaluated by considering the difference between the actual value and the ideal value in such a way that, as the atoms deviate more from ideal geometry, the energy increases. The ideal values have been tabulated for proteins by examining highly accurate small-molecule crystallographic structures of amino acids, peptides, and other similar compounds. The refinement program is set up to minimize the overall energy by calculating the shifts in coordinates that will give the lowest energy. The equations for the derivatives necessary for this minimization are

196

COMPUTATIONALTECHNIQUES

nonlinear. This means that the equations cannot be exactly solved but are, instead, approximated, and a nonlinear least-squares procedure is used to minimize the total energy. For this, the equations are evaluated and approximate shifts are calculated that should bring the total energy lower. The equations are then reevaluated and more shifts applied. If the energy is lower in each cycle, then the process will eventually converge. However, because of the approximations that make the equations nonlinear, the point of convergence may be a local minimum. Some programs, such as XPLOR, also add an electrostatic term for the attraction of charged groups. Since the starting models for refinement are generally inaccurate, the use of the electrostatic term is dubious and probably it should be turned off or at least downweighted. It is also common to reduce the nonbonded weights at the beginning of refinements to allow the groups to move large distances while "slipping" past each other. Proper weighting is crucial to balance the stereochemical constraints with the derivative shifts. Otherwise, a low R-factor could be reached at the expense of distorted geometry, trapping the refinement in an incorrect minimum. Weighting the stereochemistry too high will slow the refinement and may even freeze the refinement by not allowing positional shifts. Isotropic temperature factors, or B-values, are also restrained during refinements (see discussion of B-values in Section 3.1). Atoms that are bonded to each other influence each other's motion so that if one atom is undergoing large displacements, atoms next to it must also be undergoing large displacements. B-values are restrained in such a manner that the average difference in B-values of bonded atoms is kept to a target value. That is, the B-values should vary smoothly along the protein chain and within a side chain. The usual target restraint for adjacent bonded main chain atoms is A - (B~ - B2) - 1.0, where B~ and B2 are the B-values for two bonded atoms, and for side chains the target A value is raised to 1.5 to account for the higher gradient of side chains due to their having one end free. In a similar manner, the B-values can be constrained for the one to three members of a bond angle. For main chain angles, the target A is usually 1.5 and for side chain angles A is 2.0.

Strategies Unfortunately, a well-refined structure cannot be achieved by simply running a refinement with the highest resolution data until it converges. The refinement is almost certain to hang up in a local minimum. If the starting model may be fairly far from the final model, start the refinement at a medium resolution and gradually add data by increasing the resolution. Refine at each stage until the R-value converges. Start with a single B-factor for the entire model. Keep adding data if the R-value is reasonable. If the R-value is

3.11 Refinementof Coordinates

197

high, then a round of fitting is needed to fix errors so that refinement can continue. Use difference maps to find these errors and carefully examine them with omit maps, omitting the residues in question and the near neighbors. If you have MIR phases, continue to use them throughout the refinement process in evaluating possible fittings. After refitting, the R-value will be higher, but it should drop to a value lower than before. At about 2.5- to 2.1-A resolution there are enough data to allow individual B-factors to be refined. Examine the B-values carefully to detect portions of the model with errors. If the B-values are either very low (<2) or very high (> 35), there may be an error in this region of the model. The program should restrain B-values among neighboring atoms. In a well-refined structure, these will vary smoothly from atom to atom. Deviations from this pattern may give clues to possible errors. At 2.0 A with the R-value below 0.25 it is worthwhile to start adding water molecules. One common method is to use a difference map and put waters in all the peaks in the difference map that do not collide with existing atoms. This practice is dangerous because it models the error with water atoms and can lead to serious phase bias problems. While the R-value will drop as a result of the presence of more parameters, this will not help in the long run if the waters are in noise peaks. Instead, examine the neighborhood of the water-proposed density peak and see if it has reasonable geometry for water. Is it in a crevice on the surface of the protein? Are there hydrogenbonding partners at reasonable distances? If there really is a water molecule in the peak under consideration, then adding it to the model will be beneficial, so it is worth being certain that the peak represents a water and not spurious noise. More waters will become apparent as the refinement continues. Weak density will often become stronger, and a water can always be added later when it is justified by the density. At some point the R-value will converge and difference maps will not reveal any features that indicate either refitting or solvent. If the model is well-refined then a plot of R-value versus amplitude should reveal good agreement for the strong well-measured data and the poorest agreement between weak, poorly measured data. Check a phi-psi plot for agreement with allowed values. There should be no, or only a few, outliers on this plot. Carefully check any outliers for errors.

Evaluating Errors Stereochemistry The examination of high-resolution, accurate crystallographic structures of peptides has provided the proper stereochemistry of protein groups. These ideal bond distances, bond angles, dihedral angles, and van der Waals contact distances can then be compared to a protein model to calculate its

198

COMPUTATIONALTECHNIQUES

deviation from ideal stereochemistry. The average protein model does not approach ideal stereochemistry nearly as well as the average small-molecule structure. First, the resolution of the maps is not as great. Even so, it should be possible to refine the protein structure to produce the correct stereochemistry. The interdependency of all the linked groups in the protein then becomes apparent. If you push on one bond, it pulls on several others--all the parameters are interdependent. A second, more subtle, reason may be that proteins are not static structures even when constrained by crystal contacts. The crystallographic map is a time-average of all structures in the crystal, and if the protein is in motion, the average may not be a true structure. Many cases are known of multiple positions of side chains and sometimes of small loops of protein where multiple positions can be seen in the electron density map. Many cases of multiple positions cannot be distinguished in the electron density because they are too subtle. They may show up as small errors in the stereochemistry. In any case, the amount of error expected in a protein model is estimable from past experiences with refinement. The errors in bond lengths should be approximately 0.05-0.03 A, and the error in bond angles about 3 ~ on average. Phi-psi (also known as Ramachandran) plots are especially good indicators of the accuracy of the protein model. The values of the main chain dihedral angles in a protein can only take on a limited range of values. Glycines, not having a side chain, are the exception. The phi-psi values of the other residues should all fall within the allowed values (Fig. 3.33). If there is a considerable number of outliers, then the structure is probably in need of improvement. R-Factors

The R-factor, or the agreement between the observed and calculated amplitudes, is given by Eq. 3.2. Examination of the equation shows that as the agreement increases, this number becomes smaller until it reaches 0.0. This number can be taken as a guide of the progress of your model; it gives an estimate of the accuracy of the model. The R-factor must be taken with a grain of salt and not used as the sole criterion for accuracy. It is not possible to tell exactly how good your model is from this one number. The number is also dependent upon the completeness of the data, the resolution limits used, amd the accuracy of the observed data. This said, I will attempt to give some tentative guidelines for what the R-factor should do as the refinement progresses. Assume you have just fit an MIR or map or solved a molecular replacement problem and that you are refining, starting at 10- to 3-A resolution with an overall B-factor. At the start of the refinement, the R-factor is usually around 0.40-0.45. This should fall rapidly as refinement progresses

3.11 Refinementof Coordinates

199

and then slow down until it stops falling at 0.25-0.35, depending on the quality of the starting model. Usually, the model must then be manually adjusted by examining difference maps to indicate regions that need adjusting. The R-factor will then slowly improve after the refitting. After refitting, the model is refined again. As the R-factor improves, the resolution is increased. When the resolution is around 2.1 and the R-factor below 0.25, it is time to refine individual B's for each atom. This will immediately improve the R-factor. The map is then refit and refined until the R-factor converges again. Waters are then added to the map and refinement continues. If the resolution is below 2.0 and the R-factor below 0.20, the structure is probably essentially correct. Well-refined structures can have R-factors less than 0.17. An R-factor below 0.12 is exceptional. And just to make you feel inadequate as a crystallographer, it should be noted that small molecules routinely refine to R-factors between 0.03 and 0.05. Why proteins cannot be refined to such low values is a topic of considerable debate. It is also helpful to look at the R-factor as a function of resolution and amplitude. The R-factor with resolution should start out high at very low resolution, around 10 A, where the solvent dominates the structure factors, then be lowest between 5 and 2.5 A, and start steadily rising as the resolution increases. The last shell of resolution often has an R-value in the high twenties. A plot of R-factor versus amplitude should decrease steadily as the amplitudes are increased. For the weakest data, the R-factor will be very high, say around 0.40, and for the strongest data it is usually below 0.10. If there is no correlation of R-factor with amplitude or the brightest data do not have the lowest R-factor, there are probably still some errors in the model. After B-value refinement has been started, analyzing these values can provide an important clue to incorrect parts of the structure. Places with very low B-values ( ( 2 ) are probably in need of attention, as are regions with very high B-values ( ) 35). You might wonder how a B-value can go below 2, since such low values are probably meaningless. An extremely low B likely arises from an atom that finds itself in density that is greater than can be accounted for by its scattering power. This can happen if the atom is in the density of a larger atom or if phase errors cause noise that makes the density at that atom higher than it should be. This also points out the problem with relying too heavily on the B-value to evaluate errors, since inaccurate phases due to errors elsewhere in the structure may be the actual cause of the bad B-value and not necessarily a displacement of the atom in question. Cross-Validation with R-Free A statistical parameter, R-free, can be used to validate the accuracy of refinements and to justify adding various parameters. The idea is to leave a

200

COMPUTATIONAL TECHNIQUES

A 180

G'i\

++

I I

i ',,,

I .+++~-+ ,,+

+

II

,

I I

120 .,~I

i .......

#-~+

I I

+

I

i

I II

slS~,s

I

I I I

sis SS

I

is I'

I

, - + . . . . ~,. . . . . i

I ! I L

I

4+ ,i

!

,,

~

ok

i

~

,,

%%%%%

i

Psi .....

.....

4.

-60 t . . . . . . . . . . . . . . . . . . . . . . . . . . . .

! "I . . . . .

, I|

+

___el..... .

.

.

r

J

! i

I

-120

i

i

IO

-180

1 180

Ie - 120

"

~-. -60

0

60

120

180

Phi

FIG. 3.33

Phi-psi (Ramachandran) plots for mostly helical protein (A) and mostly sheet protein (B). Nonglycines are shown as plus signs. Glycine residues are shown as solid circles and can be found, having a hydrogen side chain, outside the allowed regions shown by the dashed lines. (C) Plot is for a " p o o r " structure fit to an MIR map but not completely refined. While most residues are in allowed regions, there are many that are in forbidden regions that need to be tweaked before the final model. (D) Plot for a completely incorrect structure: the points fall everywhere on the graph. (E) Plot resulting when the structure in (D) was corrected.

small portion of the data ( 5 - 1 0 % ) in a separate pool that is never refined against but is instead used to calculate R-free; otherwise, calculations are done in the same manner used for the R-factor. It is very important that the unrefined data be split off at the very beginning of the structure solution into the working set (95%) and the free set (5%). The split should be done right after data collection, before any refinement has occurred. Usually, this is done by flagging the data in the file in some manner, but it can also be done

3.11 Refinementof Coordinates

201

B 180 l

"r'J

~+l-@ ~ la

120

i

ii

, ,

,.t•

~

-~ -@ -~

-I'-I -'1", ---'-.x./

i ....... I

, II

_~

.........

9 "~

0

"..

.........

-.

,

"-.

,I

..I

,,

sS

...." ,,

4-,

s IS

1

,, ,,

t

~ ,,

J[~.- - - -

i i

.,',

I

i

el

\

--_jL-IJ T."

% ~

144- +_i_o ",, $ §

--

s"

..........

:

sS

I

.r . . . . . . . .

__

i i i

-I-

+

i i

.,,

-.: ........... ~.

.',,

-I-;,

i -I- .-I-jL ,

-n'-

I I ! I

.lit

.I

~.,,-.-""

I

" .....

% l

II

I

, I i

..........

+__~

,

Psi

tI

,,i p

9 60

T

\

q .-T" I~1

+I

i i

,

I

I

,I

I, I

, i i

-60 l-

_I.

. . . . . . . .

-i

-120

-180

9

r----e-e----i J I 180

-120

4-

1

o

"I-. -60

0 Phi

60

+ 120

180

FIG.3.33 (continued).

by splitting the files. It is important to have an even distribution of reflections in the free set with regard to resolution, intensity, and centricity. All the data, working and free, should be used in the calculation of maps to avoid ripples due to the omission of terms. As the refinement proceeds, the R-free should drop; it will lag behind the R-factor and always be higher (usually by 5-15%). However, the gap between the R-factor and the R-free is less important than whether it drops for a given operation. For example, if you refit a loop in the structure and R-free goes up after the new model has been refined, the new loop conformation is likely to be wrong. Typically, the presence of noncrystallographic symmetry also lowers R-free because of relationships between reflection intensities between the working and free sets due to the extra symmetry. As the next section demonstrates, R-free can be used to validate the addition of

202

COMPUTATIONALTECHNIQUES

C

~80,, +

+, ......

+

:+++

120 ~ . . . . . . . .

:_L ' + - r .

++

:~+

-F

i

I I

L +

"

+

!

.

-~ ..........

:

Psi

-60

--~----I ",,,,

+

+

I I I i

!

:

, :

1 :

I

+

.

. lo

Z t..

" I', II

.

!

+

,,l!

9...,,

+

ii

I

I

II

++

,,

i

I

I

1

:, oI

+ .....

I

'

I

, i

I i

,

~ . . . . . . . . . . . . .

+ __4-_ . . . . .

9

i

9

I I

+

1!

9

i i

........

-120

. . . . .

.

! ! i !

++

+

. I

:___+_ . . . . ; - - 1 . . . . . . .

:

-1.

+

.

+

+ ! "'-..

)i

i

.

\ \

0 ......... -I-

.

I,

+,'

0o

,

I I

I ',

:;

!

+

1! . . . . . . . .

i

~" . . . . . . . . . . . . . .

,tI "q-!: ~,1 ~' !0 r I:

~-""

los ~

+

!

, .................

~,~ i

II

r-..

+_..... ~

I

',--

+

I

60

i i

i ++

.~_-~_- .............

:

_~ ........ +

'+,+ t|

i~

i "1[-_ . . . .

I " .....

t

! +

I 4--180

. - I - ---•

++

1! +I

-180

t -o .....

"+:

f

-120

!

%i

-60

FIG. 3.33

'0 Phi

60

120

(continued).

more parameters to the model. For instance, if you turn on anisotropic scale factors or add a solvent model for the intensity data, the R-free should drop. If new data are collected, the R-free flags should be transferred to the new data for the reflections with the same indices. Not doing so will artificially drop your R-free and invalidate the principle of independence between the two data sets. This is most easily done using CCP4 by merging the R-free flag column into the new data file.

Very-High-Resolution Refinement with SHELX-97 At very high resolution (> 1.5 A), the refinement strategies can be improved to take advantage of the increased number of data available. By going to higher resolution, we can add many more parameters to the model, including thermal anisotropy, split side chains, and riding hydrogens. The validity of these has been well known in small-molecule crystallog-

3.11

203

Refinement of Coordinates

D ,

180

,+ +

'+ +

+!

+

+ +

120

! I

,

|

I

!

l

"-

:

-"a I

I

"I"

:

i

I

I

,

'

+I

I

§ +

9

.....

+

+

+

+

i 4-- W

Ill Ii If I! I!

. . . . . . . . . .

+

+

+ +

...... +

+

+

1-t-! !

+

!

+

i

1 +

i

i++

r.......... H ........... --k. -180

+,,____+_ ........

"'"-'i

I

iI

Ii Ii

I

,

-180

+

9

i

. . . . -I,i + ',

+

9. . . . I I I !

-~,...,| T+

II

l

! ,

§ ....

',J

............. + + + ---_1_

I

"',, '~:

+

,-ii x i!

i+:

+i+

I

-120

~, t ~ ' "

+ +

i 1 !

I

+ ++

++

+

+

---+-.... N.--~,...:' I ~"~.~

-60

+11",

i -,

'

:+

:

-t-__.I,' . . . . . ;'. . . . . . . . . . . . . . . . .

i

""- ,

'~

i

60 I . . . . . . . . . . . . -.-.- ' -

Psi

:

+~

i .... "-+--+ ~.§

I+

'

0

i ii

I! \ \

I

............

-120

+

.....

I . . . . . .

-60

FIG. 3.33

o Phi

l,

60

. . . . . . .

i ...... 4-

120

.

180

(continued).

raphy for many years, where R-factors of 1% are obtainable. The program to be used here is SHELX-97, the latest version of SHELX written by George Sheldrick. 27 SHELX has not been widely used to proteins in the past, partly because converting the large number of macromolecular coordinates and residue structures to small-molecule conventions was troublesome. However, the new version has features that make these conversions unnecessary or automatic. It also builds the geometry restraints needed for refining structures at less than atomic resolution. SHELX-97 has several advantages for ultra-high-resolution refinements: 1. The structure factor calculation is more accurate than in XPLOR or TNT. At resolutions above about 1.8 A, XPLOR and TNT show significant 27Sheldrick, G. M. (1998). " H i g h Resolution Structure Refinement" in Crystallographic (K. W a t e n p a u g h and P. E. Bourne, eds.). O x f o r d University Press, Oxford.

Computing 7

204

COMPUTATIONAL TECHNIQUES

180 i\

i

'

i"

++.Z!

+i+ ++ ,

+

,

\

120 i

i

60

.........

"-. + ""

Psi .

.

.

.

.,".. ,I I

-60

.

.

.i.! . . . . . . . . . . . . . .

I

I ,

-

i. . . . . . . . . . . . .

s'!

'...I.,illi.l

, -~-

i.~

I . . . . . . . . . . .

I'I| ,

L .

.

.

- v ~ -!:~', ""-.. i

+ ' ~ ; ' l i ~" l. i i i ~ " L

i--

Ii ii ii

"

i .

9

L ........

II +

.

-. " ....

II

, _._._++,,.,,

I ', " 9

",.4-

," I I I

I .....

'

++ -k

~ |

.

I

,,

s~S ~SSp a SS i (" I

.

l I

. . . . . . . . . . . . . . . . . .

.

,

I

',

i

,

..I

-120

-180 -180

120

-60

FIG. 3.33

0 Phi

60

120

180

(continued).

errors that are due to the use of an FFT approximation, and by 1.0 .Ji these errors have become a significant portion of the R-factor. 2s 2. Anisotropic B's can be used if there are enough data. Proteins typically exhibit considerable anisotropy, and including this model increases the accuracy at the expense of more parameters. 3. SHELX includes generation of fixed or riding hydrogens. These hydrogens move with the atom they are bonded to and effectively make the structure factor calculation more accurate. 4. Partial structure is easily generated and refined. SHELX has free variables that allow correct refinement of the occupancies of the split parts. 2Sin an FFT from model to structure factors, the model coordinates themselves are not transformed, but an approximation of the equivalent electron density built on a grid. In theory, an accurate enough representation can be computed, but in current practice it usually introduces a small error.

3.11 Refinementof Coordinates

205

One of the chief drawbacks is that the program is considerably slower than the above-mentioned refinement software. However, with the increased speed available on today's computers this is no longer a serious impediment. The quality of the density of very-high-resolution structures is greatly increased (Plate 7). Features of SHELX

Use of Intensity Data Considerable experimentation by Sheldrick and others has shown that it is best to refine directly against the original intensity data (I) rather than ]FI, and to include all the data, even the negative observations. The argument for this is that the intensity data, consisting of real observations, have errors that give an expected distribution which for the weakest data has a negative component, that is, because of the experimental error, some of the weak data will be negative in intensity as they are measured to be less than the surrounding background. For example, consider a reflection that is exactly 0 in intensity. Half the time it is measured it will be greater than 0 and half the time less than 0. If we threw out the negative measurements and then averaged, we would find that the reflection now has a positive intensity. Leaving out negative intensities raises the mean of the weak highest-resolution data and thus affects the weighting scheme. This change to the weighting leads to slower convergence. Since it is impossible to take the square root of a negative number, when IFI's are used negative observations are removed from the data set, leading directly to this problem. As discussed later, the inclusion of negative intensities is also needed for correct uncertainty calculations.

Riding Hydrogens Riding hydrogens are hydrogens added to the model in a fixed geometry to the heavy atoms. Thus, they are not free to refine but move with the heavier atom to which they are attached, their chief effect is to make the structure factor calculation slightly more accurate by accounting for the small, but measurable, contribution to the scattering in the crystal. Indeed, if they are left out the heavy atom will move slightly in the direction of the left-out hydrogen to account for the missing density. Given accurate data, riding hydrogens become significant somewhere around 1.5-A resolution. An R-free cross-validation test can be used to verify that riding hydrogens are valid. Typically a 1% drop in R-free is found when hydrogens are added.

Anisotropic Thermal Parameters At lower resolutions a single isotropic thermal factor, B, is used to represent the thermal vibrational motion of the atom, as well as static disorder

206

COMPUTATIONALTECHNIQUES

within the crystal. A more complete model of this motion uses six parameters: three to describe the axes of motion of the vibration, and three to describe the magnitude in each direction. In reciprocal space, these motions are described by a symmetrical matrix with the terms, Uij, which are the actual parameters refined. We have added a feature to the latest version of xfit to view these termal parameters in a manner similar to the familiar program ORTEP, but in real time. Examples of this are shown in Fig. 3.34. The main obstacle to using anisotropic thermal parameters at lower resolution is the lack of sufficient data to justify adding five more parameters per atom. As can be seen in Table 3.5, increasing the resolution from 1.9 A to 1.35 A results in three times as much data being available. If anisotrpic thermal parameters are added at 1.9 A, the ratio of data to observed parameters becomes about 1 : 1 (10,741 : 9211 ), considerably less than the 2 : 1 minimum required for a well-conditioned least-squares minimization. The exact point at which this crossover occurs for a given protein crystal depends on the solvent content. If the solvent content is higher than 50%, then there are fewer protein atoms for the same volume unit cell and the crossover is at a lower resolution. For low-solvent content crystals, the 2 : 1 crossover happens at a higher resolution.

Solvent Model SHELX includes a bulk solvent model based on work by Moews and Kretsinger 29 and being used increasingly in all refinements. Most refinement protocols in the past have excluded from the calculations the so-called solvent region (roughly, reflections between infinity and 7-5 A). Using the bulk solvent correction, which includes just two adjustable parameters, K, a scale factor, and B, a thermal parameter, it is possible to include all the data from infinity on up with a substantial drop in R-value for the low resolution data. This has been done in the refinement example in Table 3.5. A further advantage is that maps that were calculated without the solvent region can be subject to ripple as a result of series termination errors (see Sec. 3.12, Resolution Cutoffs), and the inclusion of the low-resolution terms removes that problem.

Split Side Chains In high-resolution structures, especially in frozen high-resolution structures, discrete disorder of the protein can often be detected. For instance a side chain on the surface may have two equally populated conformers. At 2 9 Moews, P. C., and Kretsinger, R. H. (1975). Refinement of the structure of carp muscle calcium-binding parvalbumin by model building and difference Fourier analysis. J. Mol. Biol. 91,201-228.

3.11 Refinementof Coordinates

207

FIG. 3.34 Stereo figures of the Fe3S4cluster and the S~ ligands at 1.35 A illustrating the density and the corresponding thermal ellipsoids. (A) 2Fo - Fc erA-weighted electron density map contoured at 5or. (B) Thermal ellipsoids showing the 25% probability surface (i.e., an atom can be found within this surface 25% of the time) for the cluster and 50% probability lines for the major axes. Note how all the atoms move in similar directions. The long axis is along the crystallographic c axis and, since all atoms in the crystal show this same elongation, it is probably

a n y given m o m e n t half the side chains in the crystal will be in o n e o r i e n t a t i o n a n d half in the other. B o t h p a r t s can be refined s i m u l t a n e o u s l y in S H E L X , a n d the relative o c c u p a n c i e s of the t w o parts can be tied t o g e t h e r a n d also

TABLE 3.5 Examole Refinement of 7-Fe Ferredoxin (FD) Using the Methods Outlinedd

Start (5FD1)

1.9-z

868

0

30

3547

10741

44.6

48.1

43.9

47.0

1 Refine model as is

1.9-1

868

0

30

3547

10741

27.7

33.1

27.5

32.6

2 Refit

1.35-1

989

0

30

3547

29336

27.2

30.5

25.6

28.9

3 Added waters

1.35-1

958

0

188

4096

29336

19.8

23.9

18.5

22.7

4 Pruned waters

1.35-1

1013

0

157

4087

29336

20.0

24.3

18.7

23.0

5 Anisotropic B's, refit

1.35-7-

1013

0

157

9177

29336

16.1

22.3

14.8

21.0

6 Added hydrogens

1.35-x

1013

776

157

9177

29336

15.4

21.3

14.1

20.1

7 Diagonal block

1.35-x

1013

776

157

9177

29336

15.4

21.5

14.0

20.3

8 Added waters, refit

1.35-x

1011.5

776

162

921 1

29336

15.3

21.3

9 Final refinement (6FDI)

1.35-x

1011.5

776

162

921 1

30880

15.0

-

13.9 13.8

20.1 -

. z N , 3occupancy , sum of asymmetric unit; N,, number of hydrogen atoms; N,,number of waters; N,,,, number of refinement parameters; Noh,,number of observations; R = 1 [ ( F , , - F L ) / EF<>]100; R,,,, calculated for 5% of data not used in refinement; R (>40) including only data for which F / a ( F ) > 4.0.

3.11 Refinementof Coordinates

209

refined. We have added a feature to xfit that simplifies splitting the side chains and fitting the two halves. Depending on whether one side of the split is clicked on or the root is clicked, xfit will fit half of the residue or the entire residue. Besides inspection of the electron density maps, split atoms can be found by an examination of the anisotropic thermal parameters. SHELX detects possibly split atoms and prints a list for further inspection.

Cross-Validation To check the validity of each step and the overall refinement, the statistical parameter R-free is used. Five percent of the data is held in a separate pool and kept out of in the least-squares refinement. Thus, if R-free decreases it must be because the model has become better, while for the other 95% there is the danger that the R-value decreased because of overfitting of the free parameters of the model, leading to a false minimum. When parameters are added to the model, the validity can be checked by an expected drop in R-free. Another use is to check for the best value of refinement parameters. For example, the sigma applied to bond lengths can be varied in a series of refinements. As this is done, it is found that the R-value slowly decreases as the restraint is removed. This is expected because removing the restraint allows the minimizer to achieve a closer fit. However, R-free shows a shallow minimum where tightening the restraint causes an increase in R-free, and relaxing shows no improvement in R-free and eventually actually increases R-free. Thus, R-free can be used to find the correct target sigmas for bond lengths, bond angles, and thermal parameters. In the sample refinement of Table 3.5, note how the R-free drops 2 % when the thermal model is changed to anisotropic, and 1% when riding hydrogens are added. As a test of R-free we tried refining a model with anisotropic thermal values at a lower resolution where the ratio of data to parameters was about 1:1. Although the R-value dropped 3 %, the R-free actually increased 1% and justified our confidence that by following R-free we can avoid adding too many parameters too soon.

Checking the Refinement Besides following the R-value, R-free, and visually inspection maps, the resulting structure can be checked by looking at the final polypeptide geometry. For this we use PROCHECK. 3~SHELX does not refine any torsions and so these may be used in a manner analogous to R-free in that the torsions 3~ R. A., MacArthur, M. W., Moss, D. S., and Thornton, J. M. (1993). PROCHECK: A program to check the stereochemical quality of protein structures. J. Appl. Crystollogr., 26, 283-291.

210

COMPUTATIONALTECHNIQUES

were not restrained to target values. In the 7-Fe ferredoxin example of Table 3.5, we found that the torsions of the main chain and side chains fell well within the P R O C H E C K plots. In face we were delighted to find that the bond lengths and angle distributions were below our restraints in SHELX and even tighter than the Engh and Huber geometry 31 in PROCHECK, clear evidence that the restraints did not overly determine the geometry but were instead determined by the extra resolution of the data.

Positional Uncertainty Analysis Positional uncertainty analysis involves calculating the standard deviations of the positional parameters or, as these are known in crystallographic jargon, standard uncertainties. Traditional small-molecule methods of estimating positional uncertainties 32 involve normal matrix inversion. (The standard uncertainty was k n o w n as the estimated standard deviation until the International Union of Crystallography recommended changing the terminology.) Calculating standard uncertainties is quite distinct from refinement, but, because it is usually done by the same software, the processes are often confused. Traditional uncertainty analysis is proteins has consisted of little more than a Luzatti plot of R-value versus resolution (which as Cruikshank 33 points out is a misuse of Luzatti's method), or a erA calculation. 34 In either case these methods lump the entire structure into one average uncertainty. Some parts of the structure will be much worse and some much better. For something as plastic as a protein, these make very poor methods of uncertainty analysis. A better method proposed by Cruikshank allows for estimating uncertainties based on the atom type and its B-value. Uncertainties have been shown to correlate well to Cruikshank's equation by comparing uncertainties calculated by full-matrix inversion with those derived from the equation: 3~Engh, R. A, and Huber, R. (1991). Accurate bond and angle parameters for X-ray protein structure refinement. Acta Crystall(}g. A47, 392-400. 32Schwarzenbach, D., Abrahams, S. C., Flack, H. D., Gonschorek, W., Hahn, T., Huml, K., Marsh, R. E., Prince, E., Robertson, B. E., Rollett, J. S., and Wilson, A. J. C. (1989). Statistical descriptors in crystallography. Report of the International Union of Crystallography Subcommittee on Statistical Descriptors. Acta Crystallogr. A45, 63-75; Schwarzenbach, D., Abrahams, S. C., Flack, H. D., and Wilson, A. J. C. (1995). Statistical descriptors in crystallography. II. Report of a Working Group on Expression of Uncertainty in Measurement. Acta Crystallogr.

A51,565-569. 33Cruikshank, D. W. J., Protein precision re-examined: Luzatti plots do not estimate final erros, In "Macromolecular Refinement, Proceedings of the CCP4 Study Weekend, January 1996" (Dodson et al., eds.), 1996. 34Read, R. J. (1986). Improved Fourier coefficients for maps using phases from partial structures with errors. Aca Crystallog., 42, 140-149.

3.11 Refinementof Coordinates

,~(xi) k

(

~]nat /=1 .... Z 2 / Z

2

nob . . . . . tions -- np . . . . .

ters

)

1 + 0.04Bi + 0.003B } 1 + 0.04B w + 0.003 B 2

211

C-1/3

dmin R,

where k is about 1.0, Z is the atomic number, B w is the Wilson B for the structure, B i is the B-value of the atom in question, C is fractional completeness for the data to dmin, and R is the crystallographic R-factor. Note that as the number of observations increases, the uncertainties will decrease, and if the number of observations falls below the number of parameters, the equation becomes invalid. In this formula, the uncertainty for a given atom in a given structure depends mostly on the B-value for a given atom, Bi, and the scattering factor of the atom, Z i , which together determine the atom's contribution to the total scattering. Atoms with low B-values have lower uncertainties in their positons, and atoms with higher atomic number and thus more electrons, will also have lower uncertainty. For a typical protein it becomes possible to estimate individual atom positional uncertainties somewhere above 2.0-A resolution, with the uncertainty strongly dependent on resolution. 3s Cruikshank gives examples where a 2.0-A resolution protein structure has an average uncertainty on coordinates of 0.32 ~i (Cruikshank, 1996: see note 33). At 1.6 A resolution the average uncertainty drops to about 0.13 A, and at 1.0 A resolution the typical uncertainty is about

0.03 A. Calculating Standard Uncertainties If the normal matrix of the least-squares minimization operation is inverted, the resulting matrix, after scaling, contains at each element i j, cijcricri, where cij is the correlation between the two parameters i and j and cri and cri are their standard deviations. On the diagonal of the matrix where i - j, since the correlation of a variable with itself is 1, this becomes cry, and thus inverting the matrix gives us the standard deviation on any parameter. In crystallography these standard deviations are termed the standard uncertainties. Each atom coordinate and thermal parameter will thus have a standard uncertainty associated with it. The radial uncertainty of an atom at x, y, z

3SThe uncertainties are actually dependent on the ratio of observations to refined parameters, but since the number of observed data increases with increasing resolution, for a given protein this is an accurate statement. The resolution at which this ratio crosses 1.0 depends on each crystal's characteristics. Proteins with high solvent content will have fewer parameters to refine and proteins that have more than one molecule per asymmetric unit that can be averaged will have fewer free parameters to adjust.

212

COMPUTATIONALTECHNIQUES (a) Atom positional esds including solvent (Angstroms) 0.25

./...i

0.20

9 9

0.15-

9

o 9

:.:) 9

9

o"

9

~:~0.10, 9149

~..

11 9

0.05

o.oo-~ 0

5

10

1'5

20

25

30

3'5

40

Equivalent B-value

FIG. 3.35 Plot of positional uncertainties versus thermal parameter for carbon, nitrogen, and oxygen atoms from a 1.35-A resolution structure. The upper, black line is for carbon, the middle line is for nitrogen, and the lower line is for oxygen. Note that as the atomic number for an atom increases (carbon = 6, nitrogen = 7, oxygen = 8), and thus its contribution to the total scattering, the positiona uncertainty decreases. As the B increases, so does the positional uncertainty.

w i t h u n c e r t a i n t i e s of o.x, o-y, o.z will have a radial u n c e r t a i n t y of 9 2 (o.2 + o .2 + o.zZ). Similarly, the s t a n d a r d u n c e r t a i n t y on a b o n d can be calc u l a t e d by n o t i n g t h a t for t w o u n c o r r e l a t e d q u a n t i t i e s that are s u b t r a c t e d , the s t a n d a r d deviation of the r e s u l t a n t is o.22 j = o.)z + o.2z. T h u s if we calculate the length of a b o n d we can calculate the s t a n d a r d u n c e r t a i n t y on the b o n d , o. b,,n~, with length, l, f r o m the p o s i t i o n a l u n c e r t a i n t i e s by the e q u a t i o n o.2

bond

-

(o.x2 + o.2) x, - x2 I x2 I

- y2 + (o.2Y l + o.}') y~ I -

+ (Oz~ + O z~) z, - z~ 1

,)

/

3.11 Refinementof Coordinates

213

which is the sum of the positional uncertainties projected onto the bond to account for the direction dependence of the bond. Similar considerations can be used to find the uncertainty on any quantities derived from the atomic coordinates. The foregoing treatment assumes that the atoms are uncorrelated. If the off-diagonal elements are nonzero, this indicates a correlation between parameters. The general equation is more complex and takes into account the off-diagonal correlations and the uncertainties in the unit-cell measurements. If two independent measurements of the same quantity are averaged, such as two bond lengths in a dimeric molecule, the standard uncertainty of the average is ~r~ - X/(~rx21 + ~rx2 ~ )/2 9 The general equation for n measurements is o-~ - X/s The standard uncertainties for proteins can be calculated using the program SHELX. First the structure must be refined to convergence, and then a cycle is done using the full matrix with zero Marquardt damping and all restraints switched off (including the restraints will give artificially low values). All reflection data are used; there are no deletions due to weakness and no exclusions by resolution, even of low-resolution data often excluded as the "solvent region." SHELX then calculates the standard uncertainties using the full covariance matrix and the estimated uncertainties in the unit cell. The standard uncertainties in the derived parameters bond lengths and angles are then calculated from the positional uncertainties. The relevant SHELX commands are L.S.

DAMP BOND

1

0

0

Remove the commands DFIX, DANG, FLAT, and other restraints, since these will artifically lower the standard uncertainties.

Block Diagonal Calculations For several reasons, only a few proteins thus far have been analyzed by matrix inversion for the uncertainty in atomic positions. The main reason has been that until recently, most X-ray structures had not been determined to high enough resolution to have a big enough ratio of data to parameters to allow meaningful calculation of standard uncertainties. Second, even when the resolution of the data was high enough, the calculation for inverting the full least-squares matrix was too large to fit in the memories of the available computers. For a given protein the amount of memory required to hold the full least-squares matrix needed for the calculation of uncertainty is given by the equation [n(n + 1)/2] • 4 bytes, where n is the number of

214

COMPUTATIONALTECHNIQUES

parameters. 36 For a typical small protein with 1000 atoms and 200 solvent molecules, the number of parameters including x, y, and z, and six thermal parameters per atom is then 1200 atoms x 9 parameters = 10,800 parameters atom which would require 222 megabytes of memory plus an overhead of about 10 megabytes. For a medium-sized protein of 3000 atoms, the amount of memory for the matrix increases to 1.4 gigabytes! Because of the manner in which the memory is accessed during the analysis, it is not possible to set up the problem to use virtual memory efficiently. -~7This means that unless the computer has enough main memory to hold the entire matrix, continuous swapping of memory pages causes the program to slow to about 1% of full speed, which also renders the computer useless to others on multiuser systems. Recently, computers with very large memories have become available. The smaller proteins can now easily be done on a lab workstation with one gigabyte of memory. A useful approximation to the full-matrix calculation is a blockdiagonal calculation, where portions of the full matrix are extracted into smaller matrices along the diagonal. A good approximation is to use a block matrix retaining the three (x, y, z) positional parameters for all atoms without the thermal parameters, which will require 32/92 or 1/9 as much memory. This gives a considerably smaller matrix of 154 megabytes for the 3000-atom example. Since the thermal parameters contain very little information about the positions of the atoms, this is a good approximation. We have done tests showing that there is only a 1% difference between calculations done with and without thermal parameters. The SHELX command for doing the positional terms by block-diagonal standard uncertainty analysis only is: L.S.

1

DAMP

0 0

BLOC BOND

1

The memory requirements can be further reduced by breaking up the positional parameters into overlapping blocks. For example, the protein 36Since the matrix is symmetric about the diagonal, the calculation requires the diagonal plus the upper half of the matrix, which is (n(n + 1))/2 matrix elements, and each element requires 4 bytes of memory. 37To evaluate a single parameter, the column and the row containing the parameter need to be accessed. Thus if the matrix is arranged to put columns sequentially, rows will be far apart, and vice versa. If the program starts to swap, very large number of pages are continuously moved in and out of memory and the CPU grinds to a near halt.

3.11 Refinementof Coordinates

215

could be cut into several pieces with an overlap of two residues. If we cut the 3000-atom example into three pieces with an overlap of 20 atoms, into 1020-atom pieces (3060 parameters/block), the memory requirement for each block drops to 18 megabytes. Similarly, the time to calculate the standard uncertainties is reduced. The drawback is that the standard uncertainties will be underestimated because the sum parts of the matrix have been omitted. The amount by which the standard uncertainty is underestimated can be estimated however, and compensated for. Figure 3.36 shows the percentage by which the error is underestimated when the positional parameters are divided into blocks. As shown in the figure, when the 12 overlapping blocks are used, the standard uncertainty is underestimated by only 16%. For large proteins, this is an excellent approximation.

SHELX Refinement Strategy As a typical example of how to use SHELX in protein refinement we'll assume that a data set has been collected on a crystal using synchrotron radiation and cryocrystallography and now a very-high-resolution data set

0

-

-4-

-8-

E3

o~

-12 -

-16

-

-20 0

I

i

I

J

i

I

2

4

6

8

10

12

14

no. of blocks

FIG.3.36 Effectof block size on the underestimation of positional uncertainty parameters.

216

COMPUTATIONALTECHNIQUES

exists for a protein that was previously solved a room temperature with 2.0resolution data. This is a typical application of SHELX. The following example is for SHELX-97 or later versions. See Appendix B, Useful Web Sites, for the SHELX site that offers information about obtaining SHELX. Start with a PDB format file with the coordinates to be refined and run SHELXPRO with the I option, .ins from PDB file, which creates an .ins file from the PDB coordinates. The program will prompt you for the unit cell and space group. N a m e this file something like prot. i. ins. Also prepare your data and convert SHELX .hkl format. You can do this with xprepfin. The data must be divided into a working set and a free set for R-free crossvalidation. This is done with SHELXPRO and the V command, which marks the R-free reflections with a - 1. For the example, this file will be p r o t . h k l . To turn on the R-free calculations, edit the .ins file and add the - 1 flag to the CGLS command: CGLS

20 -i

When SHELX runs it will look for the reflection data in a file that has the same name as the .ins file but with the extension changed to .hkl. Rather than copying or moving the data file, we can create a soft link to the file. Thus we need to enter: in -s p r o t . h k l

prot.l.hkl

N o w we can run SHELX with the command: shelxl

prot.l

>& p r o t . l . l o g

&

This starts SHELX in the background and puts the output into a log file. We can follow the log file to monitor progress with the command tail

-f p r o t . l o g

which continuously monitors the file for new lines being added as SHELX runs. As the program runs it will create three other output files, a CIF format reflection file with an .fcf extension, a longer log file with t h e . 1 st extension, a new instruction file with the extension .res (for restart file), and a new PDB format file with the extension .pdg. When SHELX stops you can fit the output with xfit by building the command xfit p r o t . l . p d g

prot.l.fcf

After fitting the model, write the output PDB file from xfit into p r o t . 1 . f i t . p d b . With SHELPRO you can update the .res file with the PDB file using the U command. Write the output from this into p r o t . 2. i n s . Make another soft link to p r o t . h k l and run the new .ins file. Repeat this process until further fitting seems to be unnecessary, adding or subtracting waters as needed (see Chapter 4, Sec. 4.4, Editing Waters).

3.12 Fittingof Maps

217

If you are above 1.5-A resolution, you can make the model anisotropic by adding the command"

ANISO_* $C SN $0 $S to change the thermal model for isotropic to anisotropic. Run this refinement and look for a drop in R-free of about 2 %. After another manual fitting step, again, if at very high resolution, add riding hydrogens by removing the REM comments from the HFIX commands in the .ins file. This should give a drop in R-free of about 1%. Continue refinement if necessary to convergence. For the last cycle, remove the -1 flag from the CGLS command to refine the final model against all the data. The R-factor should be in the teens and R-free 5 - 1 0 % greater than the R-factor.

. . . . . 3.12 . . . . . FITTING OF MAPS Calculating Electron Density Maps Resolution Cutoffs Normally, you can specify a minimum and a maximum resolution cutoff when calculating an electron density map (Fig. 3.37). Remember that the map is being calculated using a Fourier synthesis and that this causes series termination errors for the higher-resolution terms left out. These series termination errors show up as ripples in the electron density with periods close to the highest resolution used in the map. The effect of leaving out low-resolution terms is to add low-period ripples that make the map look "choppy." Low-resolution terms are often left out of refinements and, thus, they often end up left out of map calculations. How much effect this has on the interpretability of the map depends upon how high the other limit is. For instance, a 4.0- to 3.0-A map is hard to interpret even with perfect phases, while a 4.0- to 2.0-A map is relatively straightforward. As a rule of thumb, a 3-A map should include data from 10- to 3-A.

Nonorthogonal Coordinates The main problem with nonorthogonal coordinates is making certain that the map and the model are in the same coordinate frame, as discussed previously Coordinate Systems (see Sec. 3.1). It is also common to discover after a model has been fit that the cell transformation is incorrectly specified for the refinement program and the only indication this has happened may

218

COMPUTATIONALTECHNIQUES

be a high R-factor. The XtalView system allows the user to specify a 3 x 3 matrix for the Cartesian-to-fractional coordinate transformation (for fractional-to-Cartesian the inverse matrix is computed), so that if another program is being used, the matrix from this other program can be entered. In XtalView the matrix used is the same as F R O D O and X P L O R in the cases tested so far.

Map Boundaries M a n y programs, such as F R O D O or O, can display only as much of the unit cell as is stored in the precalculated map file. This means that the user must decide beforehand on the map boundaries that will cover an entire molecule. A mini-map can be used to determine boundaries that will cover an entire molecule. In xfit there is no need for this because the program is smart enough to k n o w that the density at 1.1 is the same as at 0.1, and the maps always contain a full unit cell.

Combined Phase Coefficients Phase bias is a serious problem in the early stages of fitting and refinement. One way to avoid phase bias is to use only the MIR phases to calculate the maps; this guarantees that the maps are unbiased with respect to the model being fit. However, MIR phases are noisy and usually limited in resolution. Several methods have been developed for combining calculated model phases with experimental phases to allow information from both to be used. This approach can be used to increase the resolution, to allow the use of partial models, and to help reduce model bias. All methods rely on the difference between F,, and F~ to weight the a m o u n t of calculated phase contribution. Combined phase coefficients allow the inclusion of the low-resolution MIR phases, which are usually accurate, with calculated phases from higher resolutions. The MIR phases alone should be used in the resolution range infinity to 5.0 A, where model phases are inaccurate. At other resolutions both phases are used in a weighted manner. The phase combination program

FIG. 3.37 Effect of different resolution cutoffs. The maps use the same data but differ in the resolution limits used. Thick lines are the model used to calculate the phases; thin lines represent the electron density contoured at l~r. (A) 37-5.0, (B) 37-4.5, (C) 37-4.0, (D) 37-3.7, (E) 37-3.3, (F) 37-3.0, (G) 37-2.0. The turns of the helix become apparent at 3.7, and the carbonyl bulges are apparent at 3.0. However, these maps are made with refined phases, and starting maps will appear to have lower resolution because of phase errors. Map (H), 5.0-3.7, shows the deleterious effect of truncating the low-resolution data too severely. Compare the densities in (H) and (D).

A m

41'

m

C

D

FIG. 3.37 (continued). 220

FIG. 3.37 (continued). 221

G

FIG. 3.37 (continued).

3.12 Fittingof Maps

223

puts out a combined figure of merit that is used to weight the map, as in the MIR case. The combined maps are smoother, especially at resolutions between 3 and 2.5 A, than maps made from an incomplete model for phasing. The most common phase combination procedure is Bricogne's adaptation of Sim's weighting scheme. 38 Two-phase probability distributions are multiplied together by the following procedure. The phase probability for the MIR phases has been previously stored as the four Hendrickson-Lattman coefficients, AMIR, BMIR, CMIR, DMIR (see note 11). The phase probabilities for the calculated phases are calculated from the equation:

exp[2lFob~]lFcl] Pc(~b)

(F 2 - F2)cos(& - &c)'

(3.35)

where (F 2 - F 2) is the root-mean-square difference in intensities and is calculated in bins of resolution. It can be seen that when the R-factor is large, that is Fo and Fc do not agree, then the contribution from the calculated phase is lowered because the denominator will be large. As the R-factor decreases, the calculated phase contribution will increase. This probability is expressed in terms of A and B, where A is the cosine part of phase, B is sine part of the phase at the maximum value of the probability distribution and x / A 2 + B 2 = m F o , where m is the figure of merit. These are then added to A M~Rand B MIR with the relative weights set by WMI R and Wcalc; A

=

WMI R X

A MIR +

Wcalc X

A calc,

(3.36)

with a similar equation for B. The new phase is found by evaluating: Pj(~b)- exp(A(cos(b)& B(sin~b)& C(cos2~b) + D(sin2~b))

(3.37)

which gives a new most-probable phase and figure of merit using Eqs. 3.20 and 3.21. The success of this process can partially be judged by an increase in the figure of merit. The weights should be adjusted so that the new phases are, on average, between the MIR and the calculated phase. Low-resolution phases will be closer to the MIR phase, and the higher-resolution phase will be closer to the calculated phases. The swing point, where more of the calculated phase is included than the MIR phase, should be located between 4.0 and 3.5 A, based upon past experience. This seems to give maps that contain new information from the filtering effect of the calculated phases but is not overwhelmed by them so that no information from the MIR is left.

EvaluatingMap Quality An experienced crystallographer can usually quickly assess the quality of a map by inspection. It is especially important for new crystallographers 38Bricogne, G. (1976). Acta

Crystallogr.

A32, 832.

224

COMPUTATIONALTECHNIQUES

to try to learn this technique so that a lot of time is not wasted trying to fit poor maps.

Judging Electron Density First look at the map on a large scale, where both protein and solvent should be visible. There should be a large contrast difference between the solvent and the protein. If the map is contoured at 1or, there should be few long connected regions in the solvent region (look at several sections). The protein region of the map should have connected densities that are cleanly separated. The heights of these ridges should be consistent over the protein region. Excessive "peakiness" is a bad sign. It might help at this point to consider that a protein is a long polypeptide composed mostly of carbon, nitrogen, and oxygen, which all have about the same electron density. The exception is a small number of sulfur atoms. Any error in the phases will cause some volumes to have too little density, and another volume of the map will be correspondingly too high. With a practiced eye, you can quickly discern this. Another feature to look for in judging the quality of the map is the contrast between protein and solvent regions. For this purpose, a slab of electron density over a large area is needed (Fig. 3.38). There should be a clear difference in level between the protein and solvent. The solvent should comprise a few large areas of low-level peaks rarely rising above 1-2 times of the root-mean-square value (or) of the map.

Electron Density Histograms The previous statements can be restated more precisely. In a correctly phased map, the distribution of the densities will be that for a protein molecule, which is independent of its fold or space group, and will have a characteristic histogram. In fact, the histogram can be used to compute the probable amount of phase error by comparing the histogram of an unknown with that of correctly phased maps of known structures (Fig. 3.39). The histogram is dependent upon the resolution range used and also on the percentage solvent in the crystal. XtalView comes with a program, xedh (Fig. 3.40), that computes the histogram of a map and can be used to compare it with known histograms. This is easily enough done; except that, of course, there is a large gray area where a map may be interpretable in spite of the phase errors. However, the histogram comparison method gives an objective method of estimating phase error that, with experience, will give a guide to the "interpretability" of a set of phases. The histogram method requires no model, and it does not matter how the phases are derived.

3.12 Fittingof Maps

225

-0.456 i 0-094 ,

C)

<3

0.000

z -- -0.139; -0.125; -0.111; -0.097; -0.083; -0.069; -0.056; -0.042; -0.028; -0.014; FIG. 3.38 Effect of increasing phase error on the electron density map. (A) A section of bovine superoxide dismutase density at 2.0 A resolution, using the final refined phases. (B) An average of 22 ~ of random error has been added to each phase. The solvent is becoming noisy but the map is still easily interpreted. (C) An average of 45 ~ phase error has been added to the model. The density is noisy but still interpretable; it is still possible to tell the solvent from protein. (D) An average of 67 ~ of phase error has been added, making the map virtually uninterpretable; some regions of solvent have as much density as the protein.

Fitting and Stereochemistry Fitting is the process of making the model match the map while preserving the proper stereochemistry. This is done using an interactive computer graphics program, such as the XtalView program xfit (Fig. 3.41). If all structures could be fit to 2.0-A maps, there would be little difficulty in doing this.

226

COMPUTATIONALTECHNIQUES

-0.456:0.094

.

.

.

.

Y

-'Ld- j o

@

X

o

O

o.~ ~ Z = -0.139; -0.125; -0.111 ; -0.09?; -0.083; -0.069; -0.056; -0.042; -0.028; -0.014;

FIG. 3.38 (continued).

Most structures are first fit in the 3.0- to 2.5-,~ resolution range. Stereochemistry is not as obvious at this resolution because the individual atoms cannot be resolved. For instance, carbonyls either are small bumps or are not definable at 3.0 )i,, and side chains often merge with the main chain for the smaller amino acids.

Amino Acid Stereochemistry Amino acids have a fixed stereochemistry, and this must be kept in mind while fitting. The allowed bond distances and bond angles are fixed, and all the degrees of freedom are in rotations about bonds. Furthermore,

227

3.12 Fitting of Maps

C -0.456

@ z -- -0.139; -0.125; -0.111; -0.097; -0.083; -0.069; -0.056; -0.042; -0.028; -0.014;

FIG. 3.38 (continued).

the peptide bond is constrained to be planar. 39 Also, the dihedral angles of the main chain, phi and psi, have a limited range of allowed conformations (see above), and, if the positions of the residues before and after a given residue are taken into consideration, there is only a very small range left. In fitting, given the path of the polypeptide, and knowing the directions of the side chains, you do not have many choices left to make. Side chains are less restricted than the main chain. Side chains have preferred rotamer positions that should be kept in mind. Methylene s p 3 carbons, such as in the side 39A thorough discussion of peptide bond structure can be found in Chapter 2 of Schulz, G. E., and Schirmer, R. H. (1979). Principles of Protein Structure. Springer-Verlag, New York.

228

COMPUTATIONALTECHNIQUES D

z = -0. i 39; -0.125; -0.111; -0.097; -0.083; -0.069; -0.056; -0.042; -0.028; -0.014;

FIG.3.38 (continued).

chain of lysine, prefer a staggered conformation with dihedral angles near 120 ~ and - 1 2 0 ~ The guanidinium group at the end of arginine is planar. Until the maps are ()f hiqh quality, the two flipped conformations of the guanidinium head group are so similiar in shape that it is difficult to distinguish between them. Rings, except pr()lines, are always tightly constrained to be planar. This planarity extends to contain the attachment-point atom; that is, in phenylalinine the C~ atom is coplanar with phenyl ring. The carboxylate end groups of aspartate and glutamate also form a four-membered

3.12 Fittingof Maps

229

A 14634

N(rho)

"\, l~

-~

'-2o0

-~50

4oo

-. ",,,

-50

0 50 Size of rho

100

150

2oo

i

I

i""

'~0

B 14211

/1- r

N(rho) /s

.

\~

,,, ..9 \,,

:..

~&

,/,'-;

s.,, .-"

-250

~-200

i

-150

i

-100

I

-50

i

i

0 50 Size of rho

100

150

200

I

250

FIG. 3.39 Effect of phase errors on protein histograms. Note that although the two molecules have completely different structures and space groups, the histograms respond nearly identically to added phase errors. (A) Histograms of the maps in Fig. 3.38. The histogram without any error added is the solid curve. The histogram takes on a more Gaussian shape as the error is added and the height decreases (dashed lines). The tail of points at higher rho values disappears, and the peak of 0 values found in the solvent decreases as noise enters the solvent region. (B) The same figure for phase error was added to human hemoglobin.

i~i i~-~l

,.,.,, !. c- ! ii~.l

i,~-i

r'~|

,

~

"-

.o

--

iui

a-

"E

ol

O~

"6

il ~

~

o

230

0

i

~m,-LI

--9 r,,.

.,- .,_

E~ ~.,.~.~

El

i

~

~.i .

~. ~

#

.

I~

"-:

,ii "N

.

.

-=

c~

|---I

i~

E~ :

4.a

9

(1.)

0

E 0 4-a

4-a

0

r oo

0

3 =

9~

oo

9 m

~,~ ~

28

~~ - ~E

FLQ

Xfk - Aalve Model

W

Filename: ccp.ccal2.pdb

Xfit Canvas X

Atoms I n Selecled Resldue:

C 46.6 42.7 54.1 15.0 1.0

II

-

(sU,t)o(-)(m)(-;

p K ) ( - ) ( ~ ) ~ (Modei...) (-)

(ShoW...) -)[ crystal: cvccr, tell: 49.20 Front Clip:

n

, ,r,,

I a c k Clip:-q

I

223 0 M E T C

/I

56.70 9B.80 90.0090.0090.00

3.0 m -

S l a b -7

'

(Labelr)

I

r,

I

I

I

I

I

I

I

,

, , --3.9 I

I

6.9

R

50

4 3 7 1 34 09 6 2 0 7 0 I

)metI-(

(-)(-)

(Dirtance) (Angle) (rarlon) Fit Model 1: 1

(m)(-] FFl

Flt Residues

Croup

Molecule

(RBlet)(Cancel)m

(Contour.) (GiG) (5wap) selectsaveset: B 0: (symrn Atoms v)

'58.023998 OC C 1.000000 B 15.000000 Contour ran e 33 43 28 38 56 66 2026 edges ?eiel 50 total 2026 nap 1 Contour ran e 31 45 26 40 54 68 4533 edges Setel 59 total 4533 map 1 Picked MET 223 C a t 43.705002 34.088001 62.074001 occ 1.000000 0 15.000000 P i c b d THR 211 n a t 26.115999 30.343000 52.605999 occ ?.OOOOOO 0 15.000000

Radius: 2_ 1J P -Level 1

:

40

L -182 , - : -

326

Level 2 : K -182 - : ! ,

326

Level 3 : 1 5 0 -182

326

Level 4: 200 -1 82

326

Level 5

250 -182 y 326

~

~

~

~

___

use Leuel:

set TO c u r n n t color

Level:

12j314151

(Apply)

FIG. 3.41 Xfit, a program used to fit protein models to electron density maps. The density can be represented as both contours and ridgelines. The program can also be used to compare models and to edit them. Using a macro language, complicated figures can be built (such as the ones in this book) and printed to any Postscript device. Most of the options available can be accessed through menus, buttons, or graphical sliders.

232

COMPUTATIONALTECHNIQUES

planar group including, in the case of aspartate, Cv, Cr, O a,, and O e2. The same holds true for end groups of asparigine and glutamine.

Chain Tracing 4~

Mini-maps A mini-map is "mini" relative to the large maps that formerly were used in a Richards box. 41 The easiest way to make a mini-map is to plot out sections and copy them onto transparent sheets. The transparencies are then stacked up, separated by pieces of Plexiglas(Fig. 3.42). The mini-maps can still outperform the best graphics systems in terms of number of lines drawn, but they are awkward to manipulate. One great virtue of the mini-map is the ability to mark directly on the map with colored markers. Since the map is real and not a virtual image on the computer screen, it is easier to keep track of one's position. Symmetry is more easily recognized in a mini-map, and symmetry axes can be marked directly on the map. A lot of people like to keep a mini-map at hand as they fit on the graphics system. The mini-maps serve as a road map that can be viewed in detail on the graphics. Two final virtues of the mini-map: it is portable, and you do not have to sign up to use it.

Ridgelines The traditional basket-weave representation of density requires too many lines to represent large portions of density, and it also has a "can't see the forest for the trees" problem. An alternative method of viewing maps is to use ridgelines that are drawn along the three-dimensional ridges in the electron density map 42 (Fig. 3.43). This is a method of simplifying the map, or "skeletonizing" it. It requires far fewer lines to represent the density, so a much larger piece of map can be viewed on the graphics. Ridgelines also more closely resemble the stick figures commonly used to represent chemical models. The program G R I N C H (graphical ridge lines from Chapel Hill) is 4~ excellent discussion of practical issues inw)lved in fitting to real maps is provided by Richardson, J. S., and Richardson, D. C. (1985). "Interpretation of Electron DensityMaps," in Methods in Enzyrnology, Voi. 115, pp. 189-206. Academic Press, San l)iego. 41Richards, F. M. (1985). "Optical Matching of Physical Models and Electron Density Maps: Early Developments," in Methods in Enzymology, Vol. 1 15, pp. 145-154. Academic Press, San Diego.

3.12 Fittingof Maps

233

FIG. 3.42 Electrondensity mini-map.

meant to be used to perform the initial chain trace of a map and to give an initial model. In this program, the first step in the interpretation is to color the density as represented by ridgelines, according to your interpretation. The map starts out all white to represent " u n k n o w n . " As the map is interpreted, it is colored green for main chain, purple for side chains, and red for carbonyl oxygens. Density features considered to be false connections are colored brown, and two colors, yellow and cyan, are left for special purposes such as cofactors. A special feature of the program is built-in heuristics that can do much of the coloring and fitting for you. For instance, to color a large stretch of chain green, you need only color one end green and then go to the other end and choose a ridgeline; the program finds the shortest path back to the first marked segment and colors the entire path green. Similarly, side 42The original author of GRINCH is Thomas V. Williams. His 1982 doctoral dissertation (University of North Carolina, Chapel Hill) contains a concise and thorough discussion of ridgelines. Another good discussion of an alternate method is: Greer, J. (1985) "Computer Skeletonization and Automatic Electron Density Map Analysis," in Methods in Enzymology, Vol. 115, pp. 206-224. Academic Press, San Diego.

234

COMPUTATIONALTECHNIQUES

I FIG. 3.43 Ridgeline representations of maps. (A) Conventional contour representation of electron density (thin lines) and superimposed stick-figure model (thick lines). (B) The same electron density was "skeletonized" to produce ridgelines and is compared with the model. Note that ridgelines look more like a stick-figure model, which can cause some confusion. However, they depict the density with fewer lines, which means that more map can be drawn on a graphics station before performance is slowed, and it is easier to see through foreground objects to the background. (C) Helix density with contours and ridgelines. The combination of the two is superior to either alone. (D) Helix density with ridgelines and model. (E) Looking down helix density represented by ridgelines. Note that a hole can be seen down the middle with side chains coming out radially, making helices particularly easy to identify with ridgelines. (F) Ridgelines are especially useful for looking at large views of the map. It easy to make out the large solvent holes between protein molecules. If this much map were contoured, the figure would have been black and the graphics system would have ground to a virtual halt.

chains are c o l o r e d by picking one s e g m e n t in the side chain: all the o t h e r s e g m e n t s c o n n e c t e d to t h a t one are c o l o r e d until the m a i n c h a i n or a n o t h e r color is f o u n d . Building a m o d e l is just as easy. A residue to be built is chosen, a n d t w o s e g m e n t s , o n e in the side c h a i n a n d one in the m a i n chain, are picked, a n d the p r o g r a m builds a residue to best fit the ridgelines. T h e chief d r a w b a c k to the p r o g r a m is t h a t a localized r e s i d u e - b y - r e s i d u e fitting does n o t build g o o d s e c o n d a r y structure. Fitting in larger pieces of three or four is m o r e successful. As a quick w a y to trace a chain a n d to build a starting

J

FIG. 3.43

(continued).

<

~; (

, .I~

,

~

/

.

\I

./'

#

v ~

n.

~

.

~

x

S s

/

F FIG. 3.43

(continued).

3.12 Fittingof Maps

.- p

111

I.

9

.- p ,

9

237

.- p ,

,,

"#

-~,,r,~; ~

.

-

I

_~,

F

'~z~.,; ~ ~"

.

II

9

~

'~.r

.

I

~ .~"

9

,

,~

~" ~,

~

,,

,

-

,

. ~- .i,

~,

~ 4,.

-

~s

,. ~. ~,

~,,r

-

.

I.

FIG. 3.43 (continued).

model, GRINCH is very fast. I have built a 120-residue protein in a single sitting using the system. Recognizing

Secondary

Structure

Looking for large pieces of secondary structure can greatly speed map fitting. If the secondary structure is known, many of the breaks in the density can be confidently identified as false because the constraints of the secondary structure dictate that the chain must bridge the gap. Helices are the easiest to recognize, as long tubes of density disconnected from the rest of the structure. The side chains on a helix "point" to the N-terminal end (Fig. 3.44) so that it possible to discover the chain direction from a helix even if carbonyls are not visible. Helices are obvious in 5.5-A maps. It is worthwhile searching a 5.5-A map for long tubes of density. The trick to recognizing #-sheet structure is to find the correct viewpoint (Fig. 3.45). Sheets have a twist, and the

Q

D

,,/

FIG. 3.44 Identifying chain direction in helices on a 2.7-A map on which the final refined model is superimposed. The direction of helices can usually be determined from the direction in which the side chains "point." Most of the side chains point toward the N-terminal end of the helix, although some bend back and appear to point the other direction, especially in lower-resolution maps (see, e.g., the Phe side chain in the lower left).

FIG. 3.45 Identifying sheet secondary structure in medium-resolution maps. Features of electron density of sheet in an MIR map are shown as contours with the derived ridgelines (thick linesmnot model!): (A) sheet front, (B) sheet top, (C) sheet side, and (D) thin slab of sheet from the side showing a single strand. The strands of a sheet run parallel, but the entire sheet is twisted. Note in the top view (B) that the top and bottom strands are seen more edge-on, while the middle strand is seen end-on because of sheet twist. The side chains of the sheet alternate back and front. Adjacent strands are in register so that all the side chains stick forward or back in rows across the sheet. In MIR maps, breaks in the chain are common, and false crossovers often form at hydrogen bond positions. Another problem is that the the rows of side chains often "melt" together, indicating a false connection and making side chain assignment more difficult. 238

A

239

4~

D

FIG. 3.45 (continued).

3.12 Fittingof Maps

241

easiest way to recognize a sheet is to sight down the length and to look for this twist. Then turning 90 ~ to this orientation, you can pick out the individual strands. Sheets often have regions with weaker density and gaps as well as false connections across the strands, usually at the position of the cross-strand hydrogen bonds. Sheets tend to break up at low resolution and are difficult to recognize at any resolution less than 3.0 A. It is difficult to decide on the chain direction in a sheet. If the sheet has reverse turns (Fig. 3.46), the direction can be found by examining the reverse turn. If the turn is traced in the wrong direction, the side chains will stick out on the wrong side. The correct direction can be determined by looking at which side the side chains come off the reverse turn (Fig. 3.47). By locating as much secondary structure as possible, it only remains to connect these pieces. When all the obvious connections have been made, a process of elimination may lead to correct decisions.

67

X

FIG. 3.46 Identifying tight-turn secondary structure in medium-resolution maps. It is common for the side chains of turns to be detached from the main chain density.

242

COMPUTATIONALTECHNIQUES

FIG. 3.47 Identifyingdirection by examining of ~3-turns. The direction of the chain through a tight turn can be determined from the direction in which the side chains point. Rotate the density for the turn, as shown, so that the turn is upright and the side chains point to the right. The chain now runs from front to back.

Sequence Identification Another milestone in structure determination is finding the first match of sequence to the map. Since it is not possible to tell all the side chains apart from their density, especially at lower resolutions, the matching of sequence depends on finding a pattern of large and small side chains that is unique. Examples of side chains in medium-resolution electron density maps, where most initial chain tracing is done, are shown for an MIR map (Fig. 3.48) and for an Fo, cec~c map (Fig. 3.49). To illustrate how the density for side chains becomes clearer as the structural solution of a protein progresses, all tryptophans from aconitase are shown in Fig. 3.50 at three stages of its solution. Tryptophan can often be recognized because it is so much larger than all the other amino acids. Reliably telling a tyrosine from a phenylalanine and an aspartate from a leucine is difficult, if not impossible, in the maps most of us have to work with. Keep in mind that hydrophilic side chains are often disordered, and their density prematurely ends at low resolution. It is rare, however, for a hydrophobic side chain to be so disordered that its density is shortened. Thus, for example, looking for the pattern Lys-Arg-Glu is less likely to be successful than looking for Phe-Tyr-Leu. There are computer programs

3.12 Fittingof Maps

243

designed to look for matches, and I have tried them on occasion. In the end it seems that you can find a match more reliably and more quickly by scanning a sequence than by entering the data into the computer. The true test of a sequence match is to continue the match in both directions to see if it holds up. After you have found several more amino acids in both directions, you can be confident that it is not an accident. Once one match has been found, it limits the sequence that must be searched for other matches. After a while, the entire sequence can be found by building in both directions from matches and by a process of elimination.

Fitting by "Pieces" Rather than fitting a model residue by residue, it is quicker and more accurate to add large pieces of secondary structure and thenJtrim and tweak them to match the map. It is difficult to build an u-helix accurately one residue at a time. Instead, a polyalanine helix made from another structure by truncating the side chains at the C~ atom can be grafted into the map. This can be done by loading the new molecule in and then "flying" it into the correct position. With xfit an alternative method is to load the new piece in and then least-squares-fit it to some marker residues. The marker residues are just fake residues: they have a Ca and C~ that are placed where you think the helix residues should go. Then the helix can be least-squares-fit to the marker residues. It is not necessary for the markers to be as long as the entire h e l i x - - a turn or two can be built, and then the new helix can be longer than this on either or both ends. The helix position can be further refined by realspace refinement. A similar method can be followed for/3-sheets.

General Fitting Like much of protein crystallography, fitting is an art that is learned by practice. While modern fitting programs and good maps can go a long way toward simplifying the task of fitting, or making the model match the map, there is no substitute for a practiced eye and good knowledge of protein structure. The crystallographer must take into account several factors while fitting an electron density map: resolution, noise level, protein structure, and phase bias.

Resolution Resolution has a large effect on the shape of the density and how closely the density matches the stick figures used to represent protein atoms and

F

FIG. 3.48 Examples of all 20 residues in MIR maps. These four-derivative MIR maps have an overall figure of merit of 0.7 at 3.0-A resolution.

3.12 Fittingof Maps

245

FIG.3.48 (continued).

bonds. The higher the resolution, the more the map represents the individual atoms. At medium resolutions the atom densities overlap and the map increasing looks like a tube (the main chain) with blobs of various size sticking out (the side chains). Below about 3.5-A resolution, the densities of adjacent main chains start to overlap and fitting of individual amino acids is no longer possible.

COMPUTATIONALTECHNIQUES

246

!

FIG. 3.49 Examples of all 20 residues in F,, maps. These 3.0-A resolution maps use Fo, structure factors from a refined model.

~5~calc

Noise Level The ease of fitting is highly dependent on the noise level. In a noisy map the densities tend to break up in some places and blend together in others. This noise needs to be taken into account. In an MIR map it is typically necessary every once in a while to put the main chain across a break with no

3.12 Fittingof Maps

FIG. 3.49 (continued).

247

248

COMPUTATIONALTECHNIQUES

140

196

215

347

o

3.0A MIR
o

3.0A Combine C~

2.1A 2Fo-Fc

aS

"d

-

9

k

FIG. 3.50 Tryptophan's progress. The density of all the tryptophans in aconitase at three stages of the structure determination: MIR, MIR combined with a mostly complete model, and the final refined density. (Figure courtesy of Drs. C. D. Stout and A. H. Robbins.)

density. If the map is noisy this jump may be justified; this should never be necessary for a noise-free map. Similarly, there will be spots in a noisy map where two features run together (e.g., a false bridge between two/3-sheets, two side chain densities joining together). It can be very helpful to study the MIR map of a solved structure and compare it to the final model. Protein Structure

Proteins have a surprisingly limited number of conformations that they can adopt, considering the large number of folds they can adopt and the incredible variety of functions they can perform. For the most part, protein residues adopt three conformations: helix, sheet, and turns with successive residues located 3.8 fli apart (Fig. 3.51). A random coil consists of short (as short as one residue) lengths of repeating elements of these three structures,

3.12 Fittingof Maps

429

548

577

249

631

739

f

FIG. 3.50 (continued).

whereas regular secondary structure consists of a stretch of residues in the same conformation. Sheet adopts a zigzag main chain structure with each successive side chains sticking out on opposite sides to the points of the zigzag. Thus, you can predict the position of the next residue in a sheet by going forward 3.8 A and placing the next side chain on the other side of the main chain at the next zag. The carbonyl atoms stick out perpendicularly to the side chains and also alternate sides at each successive peptide bond. Helices have an interesting structure and comprise the most regular of protein structures. Each residue in an ce-helix makes a near-90 ~ corner; when viewed down the axis, the helix looks nearly square. Each successive residue rises along the helix, with the peptide plane parallel to the helix axis. The carbonyls point toward the C-terminal end and the N ~ H bond points back to the N-terminus. The O atom of each residue hydrogen bonds to the N atom of i + 4 residue further up the helix (i.e., if the residue with the O atom is 5, it bonds to the N atom of residue 9). The C~ atom slopes back toward the N-terminus. Turn residues are similar to helical residues in that they turn a

250

COMPUTATIONALTECHNIQUES

square corner, but the peptide planes on each side of the residue are at 90 ~ to each other. In a tight turn, formed by two turn residues in a row, the peptide plane goes from horizontal to vertical and back to horizontal as viewed from the side (Fig. 3.51F). Both side chains in a tight turn are on the same side of the turn, sticking out at about 45 degrees from the plane of the turn. Protein chain cannot bend sharper than 90 ~ between successive Ca atoms. W h e n looking down the C~-C~ vector, the side chains come out 45 ~ in the sharp turns of helices to 180 ~ when the chain direction goes straight in sheet structure. Helices are very regular; sheets tend to twist.

Phase Bias It is necessary to have some idea of the level of phase bias in the map. N o matter how cleverly a map is calculated, there is always some phase bias if a model has been used to calculate the phases. Experimental maps are invaluable because they are free of phase bias, even if they are noisy. A c o m m o n mistake is to a b a n d o n the noisy experimental maps in favor of 2F,, - Fc maps too early in the refinement process when the R-factor is still high ( 0 . 2 5 - 0 . 4 0 ) because the 2F,, - Fc maps look very convincing. Phase bias can be easily detected. If your R-factor is high and the map with calculated phases looks just like the model, you are looking at phase bias. We will discuss methods for lessening phase bias later. Unfortunately, no technique exists for eliminating phase bias completely.

Fitting Main Chain Fitting main chain is the trickiest part of fitting. The conformation is dependent on the previous and the next residue as well as the side chain (Fig. 3.52). Errors made in fitting one residue may propagate down the chain. Additionally, main chain atoms are highly constrained. Knowing the position of three successive ce-carbons highly constrains the phi-psi angles of the middle residue. In fact, proteins have only a few conformations available to

FIG. 3.51 Protein backbone conformation. (A) Helix viewed with the helix axis vertical. The peptide planes are shown by the gray rectangles and the direction of the chain is indicated by the smaller arrows. The large arrows represent the direction of the helix axis running from N- to C-terminal. (B) Looking down the helix axis. The successive residues from an angle slightly less than 90~ (C) Hydrogen bonds are formed in helices between the carboxyl of the i and the N of the i + 4 residue. (D) Sheet with the strand direction vertical on the plane. Note how the carbonyls stick out on opposite sides of the strand. (E) Sheet viewed from the side, showing how the strand zigzags with successive side chains sticking out on opposite sides. (F) Tight-turn geometry. (G) Viewed from the top, tight turns form an angle slightly greater than 90~

252

COMPUTATIONALTECHNIQUES

FIG. 3.52 Building model from scratch. These figures illustrate building a short section of helix to its density using best-fit pentamer peptides from well-refined structures. The model building was done using the xfit program XtalView. (A) The electron density (light gray) is contoured at l~r. (B) Ridgelines (dark gray) have been added. (C) The electron density has been turned offfor clarity in these figures, but on a color display it is not necessary. A marker residue consisting of a single C<, atom has been placed at the position of the first residue in the helix (black cross). (D) More C,, positions (black lines) have been added to build the protein backbone. (E) A search of the pentamer database is made for the best fit to the first five C,, positions and the pentamer thus found is shown in thin black lines. (F) The middle three residues of the pentamer have been inserted into the model to replace the marker residues. The ridgelines have been turned off for clarity. (G) The pentamer-fitting process is continued. The second pentamer overlaps by one residue because the first and last residues of the pentamer are not used. (H) A model built for the entire helix except for the first and last residues. The side chains now need to be replaced with the correct sequence and fit to the electron density. The final model is compared to the ridgelines (I) and to the contours (J). The model includes polarizable hydrogens.

t h e m a i n c h a i n . If t h e m a i n c h a i n is in a n e x t e n d e d c o n f o r m a t i o n , t h e n t h e side c h a i n s a l t e r n a t e d i r e c t i o n d o w n t h e c h a i n . In an e x t e n d e d c h a i n it is n o t p o s s i b l e for t w o side c h a i n s to p o i n t in t h e s a m e d i r e c t i o n . In helices, the side c h a i n p o s i t i o n s are so h i g h l y c o n s t r a i n e d t h a t y o u c a n a c c u r a t e l y p r e d i c t t h e m a i n c h a i n a n d Cv a t o m p o s i t i o n s w i t h a r e f i n e d c~-helix f r o m a n o t h e r p r o tein. T i g h t t u r n s are a l s o h i g h l y c o n s t r a i n e d . M e m o r i z a t i o n of the g e o m e t r y

C

\ .

FIG. 3.52 (continued).

254

COMPUTATIONALTECHNIQUES

D

-N

q

\

/

f

-d_ FIG. 3.52 (continued).

3.12 Fittingof Maps

G

FIG. 3.52 (continued).

255

256

COMPUTATIONALTECHNIQUES

\

q

90~

9CA / CA VAL 6 CA

6 CA

ASN 5 CA

FIG. 3.52 (continued).

ASN 5 CA

3.12 Fittingof Maps

257

FIG.3.52 (continued). of helices (ce and 310), extended chains (antiparallel and parallel), and tight turns will allow fitting 90 % of the structure. The so-called random-coil portion of the protein is made up of short interlocked sections of helices, extended chain, and turns. Glycines and prolines are the complicating factors. Glycine is very flexible owing to the absence of a side chain, and proline is often found in "kinks." Because of the constraints on the main chain, if the path of the main chain is marked out, a reasonably accurate model can be built by finding a matching piece of chain from previously solved structures that follows that same path. In this manner, the entire main chain model can be built by finding overlapping best-match polypetide pentamers from well-refined, highresolution structures (Fig. 3.53). The match is performed by making a list of the difference vectors between the Ca atoms and comparing these to a previously computed vector list for a database of solved structures. Since the vectors between adjacent residues are always approximately 3.8 A, there are only six unique vectors to consider. After the best match has been found, the pentamer is loaded and least-squares-fit to the marker residues. Because the first and last residues are not well determined, only the middle three are kept. This process can be repeated until the entire structure has been built.

258

COMPUTATIONALTECHNIQUES

A

B

Tu

FIG. 3.53 Fitting a side chain by torsioning in xfit. (A) A tyrosine residue that does not fit into its density. The bond to be torsioned is selected by picking the C,, atom followed by the Ce atom and then choosing the Torsion menu item. This allows moving all the atoms connected to the C~ atom to be rotated as a group about the vector defined by the C,,mCt~ bond. (B) The residue is torsioned toward its density. The old position still shows as a guide. (C) After the bond was torsioned into position, the Apply Fit button was pushed and the residue was fixed into its new position. Alternatively, if you choose Refine.Torsion Search, the program will find the best position by trying all 360 degrees and choosing the position that maximizes the electron density at each atom.

R e m e m b e r t h a t w i t h all t h e s t r u c t u r e s k n o w n t o d a t e , s o m e o n e has alr e a d y fit t h e c o n f o r m a t i o n in f r o n t of y o u . If y o u find a n e w c o n f o r m a t i o n n e v e r s e e n b e f o r e , y o u will be h a r d - p r e s s e d to m a k e a n y o n e believe y o u .

3.12 Fittingof Maps

259

These novel conformations seem to disappear with further refinement, never to be heard from again.

Fitting Side Chains Side chains have more degrees of freedom than the main chain. If you have fit the main chain accurately, then the Cv atom is also accurately positioned. If at all possible, avoid breaking the side chain off from the main chain and fitting it separately. Instead, try to position the side chain by torsioning about the twistable bonds (Fig. 3.54). This is best done iteratively. First, try to get a rough position, then adjust the torsion angles to fit the side chain more accurately. At medium resolutions the density is only a rough guide to the positions of the side chain atoms. The density is smoothed and does not closely follow the path of the side chain. It can be difficult to tell the flat face of ring systems, since they appear as a blob. The density at the C~ atom of many side chains is weak because it lies at the minimum of the ripple from the main chain and the larger side chains. In general, the C~ atom will not be centered in its density. In fitting medium-resolution maps, it is easy to make the mistake of insisting on centering the Cv atom in its density. This will cause an error in the main chain because of the constraints on the angles around the Ca atom. Instead, let the C~ be defined by the main chain, and recognize that it will probably be somewhat off-center from its density. Small side chains, such as alanine and valine, tend to "melt" into the main chain and may appear as large bumps off the main chain. Because of the impossibility of accurately positioning the side chains, fitting is necessarily an iterative process: the residues are roughed in and then further refined as more of the structure becomes apparent. In all fitting it is necessary to keep good stereochemistry in mind. If you fit with correct stereochemistry, the structure refinement will go much faster. When the position of a side chain is ambiguous, build it in a lowenergy conformation. Again, novel conformations do occur, but only if there is some reason.

Multiple Conformations Side chains on the surface and, to a lesser extent, buried residues are likely to be in multiple conformations. The density for two (or more) conformations may be apparent in these cases if both have significant occupancies (Fig. 3.55). In some cases the conformations are close enough that densities overlap and cause a confusing widening of the density that no single conformation can adequately explain. In extreme cases a side chain just disappears after a certain point because it has many conformations, all of which have density below the noise level.

260

COMPUTATIONAL TECHNIQUES

A

B

CB

FIG. 3.54 Beware of weak CI~ density. At medium resolution the density at the Ct~ atom is weak and pinched off. (This is due to series termination errors.) Three examples of the final refined model are superimposed on an excellent quality MIR map: (A) the density is almost gone at the C~ and positioned off to the side of the correct position; (B) the density is more even, but still the Ce would not be positioned in the center of the density; and (C) the side chain density is melting into the main chain density because of the incomplete resolution. When fitting such maps, you must take into account the positions of adjacent residues and not force the C~ atom into the middle of its apparent density.

3.12 Fittingof Maps

261

FIG. 3.55 Example of multiple conformations of side chains. This arginine residue from a ~4 hemoglobin shows two equally occupied positions hinged at C~. The map is a 1.8-A resolution Fo - Fc omit map contoured at 2~r. (Figure kindly supplied by Dr. Gloria Borgstahl.)

Phase Bias The single biggest problem with fitting maps is avoiding phase bias, which results from the use of a ca~c for calculating electron density Fourier transforms. The phase portion of the Fourier transform tends to dominate the synthesis, such that whatever model you used to calculate the phases is the one seen in the map, even if it is incorrect. Phase bias is especially a problem in partially refined structures with higher R-values. If your R-factor is above 0.25, it is doubtful that a 2Fo - Fc map will give a true picture of the protein. Difference maps, Fo - Fc, are less prone to phase bias and are more reliable. It is wise to keep referring to the MIR map if you have one. The MIR map is probably noisy but is unbiased with respect to any given model and can be used to decide between models and conformations. If you have used molecular replacement, overcoming phase bias can be a real problem. In this case there is no unbiased phase set to work with, and it can be a long uphill struggle to get the R-factor down to values low enough to produce reliable maps. There are two useful methods of lowering phase bias: omit maps and phase combination, where the experimental phases are combined with the model phases (discussed previously). Omit

Maps

A major strategy for overcoming phase bias is to use a map made by removing the residues of interest from the model while calculating the phases: that is, an omit map (Fig. 3.56). In theory, this will allow the phases

262 A

COMPUTATIONAL TECHNIQUES

B

FIG. 3.56 Omit map phase bias example. A phenylalinine and an isoleucine were purposely misfit and then refined at 1.8-A resolution for several cycles with the crystallographic terms weighted heavily over the geometry constraints. This had the effect of distorting the molecule and also building in some phase bias for the incorrect structure. (A) Map of the protein after refinement: the thick lines are the incorrect model of the protein after refinement and the thin lines are the correct model. (B) The partial structure factors for the misfit Phe and Ile were calculated and subtracted using the omit current atoms option in xfit and an F,, - F~ omit map was calculated. Now only density for the correct residues shows up. (C) The omitted phases were used to calculate an F,, omit map. Note that in both types of omit map the correct position shows up. In the F,, - F~ map the residues are cleaner. The F,, omit map is noisier, but it has the advantage of being independent of the relative scale of F,, to F~ and so can be advantageous if the scale is uncertain.

calculated from the rest of the model to phase the area of interest with no bias from the model left out. The method takes advantage of the Fourier transform property that every point in real space is influenced by every point in reciprocal space, and vice versa. If the rest of the model is mostly correct, then the phases calculated for this portion will be close to the true phase and will produce a mostly correct image of the portion left out. About 10% of

3.12 Fittingof Maps

263

the total model can be left out without unduly affecting the phase accuracy. Omit maps can be calculated on the fly in xfit if you have read in the map phases. Unfortunately, phase bias can be spread throughout the entire model by the least-squares minimizer used in protein refinement. If the incorrect model has been included in the refinement, the minimizer will adjust the coordinates of all the model small amounts to fit the data to the model in question. If a portion of the model is in doubt, it is better not to include it at all than to put in a trial model. After refinement, the incorrect model will be "remembered" by the rest of the structure and will come back even in omit maps. This result can be overcome partly by removing the piece in question and refining without it. XPLOR has a facility for doing molecular-dynamicscoupled refinement on the rest of the model to "shake it up" and remove this memory. A computationally less expensive method that achieves the same result is to add a small random number to all the coordinates. A random number between 0 and 0.25 A seems to work well. Omit maps made using the phases from the "shaken" coordinates show less bias in incorrect areas (see Chapter 5, Fig. 5.20). Another difficulty encountered with omit maps is calculation of the proper scale factor. This can best be seen by writing out the coefficients used in a 2Fo - Fc map more fully to include the scale factor between Fo and Fc: (21Fobsl- slFcalcl)e-iacalc.

(3.37)

The scale factor, s, is calculated to minimize the difference in the sums of Fobs and Fcalc. If a portion of Fcalchas been left out, this scale factor will be incorrect--the scale factor s will be too large. This has the effect of subtracting too much of the calculated phases from the map. Now consider that the calculated coefficients have the omitted model subtracted from them. If the scale factor is too large, then we will add to the map - ( s - s ...... t) (-Fomit), which is a positive image of the model we were trying to omit! If the scale factor is too small, then a negative image will be added to the map. Thus, making an omit map with 2Fo - Fc or Fo - Fc coefficients, where the scale factor is calculated in the normal way, will always bring back a calculated image of the omitted model. This outcome is less than useful, and its effects can be partially mitigated by first calculating the scale factor with the entire model and then using this scale factor instead of the one calculated for the incomplete model. Of course, your model may be incomplete without your realizing it. At early stages, solvent is not included and, usually, neither is an account of the disordered solvent. This raises the scale factor s, which, in turn, causes 2Fo - Fc omit maps to be biased unduly toward the model omitted. An easy way to avoid this is to use Fo coefficients in omit maps. The omitted portion comes back at about half the weight of the rest of the model but is not biased by scale factor problems.

264

COMPUTATIONALTECHNIQUES

Adding Waters and Substrates A difference Fourier, Fo - Fc, will show unaccounted density. Often this density can be explained by water molecules. Protein molecules commonly have water bound in crevices and near hydrogen-bonding groups on the surface. If the potential water molecule does not make a bad contact with the protein and there are hydrogen-bonding groups at the correct distance, then a water molecule is warranted. Start adding water molecules at the largest peaks in the difference map and work your way down. Symmetry atoms should be generated periodically to ensure that a water position has not been accounted for already. It is important not only to avoid adding too many waters too quickly but to avoid adding them where they are not justified. Because of the low number of constraints on water coordinates (i.e., they are not connected directly to the rest of the structure), they can model phase bias better than the highly constrained protein. It is dangerous to put waters in by fitting all the peaks in a difference map: the practice can essentially freeze the refinement by modeling the noise with a large number of free parameters. This will allow the minimizer to fit the noise and lower the R-factor, but not necessarily with any justification. On the other hand, adding water in correct positions will help the refinement. When the waters are being added to density whose peak value is below 2o-, or so, in a 2F,, - F~ electron density map, it may be worth checking their validity by protein refinement. Refine the B-values of the water molecules. If the B-values of the water molecules go above a conservative value of, say, 50.0, it is probable that the peaks are actually just noise. A difficulty often encountered in adding water is partial disorder. In a crevice, for example, there is often a tube of density that represents the averaged position of several waters (see Fig. 3.15B). Most refinement programs will not let atoms get closer than their van der Waals contact distance, so it may not be possible to model the water with discrete atoms. Such density can be modeled with special water molecules that have no, or very small, van der Waals radii. Several waters, each with a partial occupancy, can be placed in the density. Finally, not all unexplained density in protein maps is necessarily water. Buffer molecules and/or salt may also be bound by the protein. If a piece of density that needs several waters to explain it is encountered, it is worth carefully considering the exact contents of the crystal mother liquor. It is common for such density to be fit as water in early stages and then later, as the refinement progresses and the phases improve, to find the density revealing that another molecule is weakly bound there. It is worth reexamining all the waters after the R-value has decreased to find such features. Try to explain

3.13 Analysisof Coordinates

265

the density in terms of chemistry as well as shape. If you have three negative side chains pointing toward one water molecule, it is worth considering that a cation might be bound there.

.....

3.13

.....

ANALYSIS OF COORDINATES Lattice Packing There are several reasons for looking at the lattice contacts. First, the contact regions will have lowered B-values because of the increased resistance to movement. If you are doing a B-value analysis, this has to be taken into account. Another reason is to look for p r o t e i n - p r o t e i n interactions. In many cases, interactions that are i m p o r t a n t in the protein functioning are revealed in the lattice packing. For example, the protein that acts as a dimer in solution may have crystallized in such a way that the dimer was coincident with a crystallographic axis. The asymmetric unit would be a m o n o m e r , and lattice packing would reveal the dimer. M a n y molecular modeling programs have the ability to generate symmetry-related molecules, given the space group operators (Fig. 3.57). If not, they can always be generated in a separate step and read into the program.

FIG. 3.57 Lattice packing of endonuclease III. Endonuclease III crystallizes in space group P212121 (three perpendicular 21 screw axes)with one molecule per asymmetric unit. The box outlines one unit cell. The backbone of a single protein molecule is shown in thick black lines. The three other molecules that make up the unit cell are shown in shades of gray and are generated by application of the symmetry operators.

266

COMPUTATIONALTECHNIQUES

When generating contacts, do not forget that translation by a cell constant in any of the three directions is a valid operator. That is, the operators listed in the International Tables assume x _+ 1.0, y+ 1.0, z + _ 1.0, since this is the fundamental relationship of crystals. To generate all possible contacts, generate a single cell by symmetry operators and then generate a 3 • 3 x 3 block of unit cells around the molecule of interest by translating the single cell 0, +1.0, - 1 . 0 in all three directions. Then search to see whether any atom within the molecule of interest is within some cutoff distance of each of the symmetry-related molecules. The atoms at the contacts cannot come any closer than van der Waals packing distance. If they do, something is wrong. Either the structure is incorrect, the unit cell is incorrect, or there is an error in one or more of the symmetry operators. Obviously, two atoms cannot exist in the same space.

Hydrogen Bonding Hydrogen bonds are found in protein crystallography indirectly. If a proper hydrogen bond acceptor-donor pair is within the correct distance, the bond is taken to be a hydrogen bond. This distance is generally considered to be from 2.7 to 3.3 A, with 3.0 A being the most common value for protein and water hydrogen bonds. 4-~ The angle the bond forms is also important in determining the strength of the hydrogen bond. The closer the hydrogen bond is to correct geometry, the stronger the bond. Hydrogen bonds often occur in networks--frequently with water mediating. Water is especially facile at hydrogen bonding because it is both an acceptor and a donor. Histidines can have various protonation states, and an analysis of the hydrogen bonds can allow a determination of the most likely protonation state according to whether hydrogen is bonded to a donor or to a acceptor. In a similar manner, the orientations of histidines, threonine, glutamine, and asparagine are ambiguous in protein maps where the slight density difference between a carbon, oxygen, or nitrogen atom cannot be safely distinguished. An analysis of hydrogen bonding can give important clues in determining the correct orientation of an ambiguous side chain (Fig. 3.58).

Solvent-AccessibleSurfaces Ultimately, it is the shape and chemical characteristics of the solventaccessible surface that determine a protein's interaction with other proteins and substrates (Fig. 3.59). A solvent-accessible surface can be calculated and 43

1159.

Stickle, D. F., Presta, L. G., Dill, K. A., and Rose G. D. (1992). J. Mol. Biol. 226,1143-

3.13 Analysisof Coordinates

267

\ GL

OE1

/

ASNOD1

ASNND2

/ O

FIG. 3.58 Using H-bonding patterns to show Gln, Asn orientation. By examining the pattern of hydrogen bond acceptors and donors, it is often possible to assign the N and O atoms of glutamine and asparagine side chains.

color-coded by atom type to give some idea of the nature of this surface and of the solvent accessibility of surface groups. 44 Such a surface assumes a static view of the structuremwhich is an incorrect assumption. Another problem is dealing with hydrogens. Explicitly placed hydrogens are only part of the answer, since they can be rapidly rotating. To overcome this, an average radius is often added to atoms containing hydrogens, and the hydrogens are left out of the model. For some residues the degree of protonation can also be ambiguous. Given these limitations, such surfaces are widely used in looking for substrate-binding sites and at protein interfaces. Before calculating the surface, make a choice of the probe size. A water molecule is generally taken to be a 1.4-A sphere for calculating solvent-accessible surfaces. In many cases, a larger diameter of 1.6 A is desirable. This helps account for small differences in atom positions and for thermal motion 44Richards, F. M. (1985). "Calculation of Molecular Volumes and Areas for Structures of Known Geometry," in Methods in Enzymology, Vol. 115, pp. 440-464. Academic Press, San Diego.

FIG. 3.59 Three common uses of solvent-accesible surfaces. (A) The surface of an active site. Because it is bound to the Fe, the water, H O H , can be closer to the solvent-accessible surface. (B) A slice through a protein molecule, showing the shape of the outer surface and the presence of cavities in the interior of the protein. The active site of the protein is located at right center as a deep cleft in the protein. This is a common feature for active sites, and such a surface is a useful clue to the postion of an active site. (C) The interface at a protein-protein contact. On color displays it is useful to color the surface according to charge to facilitate the search for electrostatic interactions and hydrophobic contact surface.

3.13 Analysisof Coordinates

269

of side chains. The density of points to use in calculating the surface is also needed. A density of 10 points per angstrom squared is common, while a lower density of 4 may be used to generate sparse surfaces. The higher density should be used if an accurate measure of the surface area is needed; the lower density will generate fewer points that will rotate faster when viewing with a graphics program. The surface is then calculated by running MS 45 (or another suitable program) with the chosen diameter. The output surface file can be viewed with xfit by running a simple awk command that extracts the x, y, z coordinates in the output surface file. It is also possible to use information in the surface file to color surfaces based on different parameters. 45Connolly,M. L. (1983). Science 221,709-713;J. Appl. Crystallogr. 16, 548-558.

This Page Intentionally Left Blank

4 XtalView TUTORIALS

XtalView, a crystallographic software package that is designed to be interactive and visual, 1 runs on systems with UNIX and X l l . The programs were written at the Scripps Research Institute and distributed by the Computational Center for Macromolecular Structures (CCMS, http'// www.sdsc.edu/CCMS/, academic only) at the San Diego Computer Center, and commercially by MSI (Molecular Simulations, Inc., http:// www.msi.com). The main uses of XtalView are heavy-atom phasing, map contouring, map fitting, and molecular modeling.

.....

4.1

.....

INSTALLATION

Before continuing with this chapter you should install XtalView on your computer. It's easy to do, and you can follow along on the tutorials as you read them. Send an e-mail to the CCMS listserver to get the current download instructions.

1McRee, D. E. (1999). XtalView/XfitmA versatile program for manipulating atomic coordinates and electron density.J. Struct. Biol., 125, 156-165. See also the Scripps XtalView web site at http-//www.scripps.edu/pub/dem-web. 271

272

XtalViewTUTORIALS E-mail: [email protected] Subject: [leave blank] Message: get xtalview

In a few minutes you should get a reply e-mail with the download instructions. Follow the instructions to the ftp site. Note: The directories just above the XtalView download are hidden so hackers can't see them, so if you are using a browser to ftp you need to type the entire ftp path on the URL line rather than following the path down to it. As explained in the message, you need to get two files, the core XtalView and the one specific to your machine. Also get install.sh. Place all three files in one directory and then from the command line type "sh install.sh" and follow the prompts. You don't need to be superuser to install XtalView unless you want to place it in a system area. The directory where you place the XtalView directory tree will become XTALVIEWHOME. To start XtalView you type "source xtalvu_dir/ XtalView.env" (or add it to your .cshrc), where xtalvu_dir is the directory with XtalView, and then type "xtalmgr" or "fonts; xtalmgr" if you are on an SGI or DEC workstations. The fonts command is needed to put the XtalView fonts into the font path on some systems. For example data, see $XTALVIEWHOME/examples. More example data may be found at the book web site: http://ppcII.scripps.edu.

.....

4.2

OBTAINING

..... HELP

Help can be reached via e-mail by sending a message to [email protected]. Try to be as specific as you can about your problem. Cut and paste any error messages directly into the e-mail. You may also look at the book web site for late-breaking information and FAQs, http'//

ppcII.scripps.edu.

.....

4.3

.....

XView

To achieve portability, we decided to use the XView toolkit (hence the name), which is in the public domain and available for a large range of platforms. XView also had the advantage that it had one of the few graphical editors available in 1991, when the package was started.

273

4.3 XView

(- Do It)

(Menu V)

a

b

~-More...) c

FIG. 4.1 (a) Clicking on this button initiates the action specified. The button becomes shaded and remains so for as long as the actions take to finish. (b) Menu buttons have pulldown menus associated with them. If the button is clicked on with the Select button, the default menu item is actuated. If the Menu button is pressed and held, a menu appears. (c) Clicking this button with Select pops up a window with more information/controls.

XView Widgets T h e X V i e w w i d g e t s used by X t a l V i e w a n d their usage are s h o w n in Figs. 4 . 1 - 4 . 5 .

Saving T e x t p a n e

Messages

Every X t a l V i e w a p p l i c a t i o n has a t e x t p a n e w h e r e v a r i o u s m e s s a g e s are sent. This effectively f o r m s a log of e a c h session. It c a n be saved if the user w i s h e s by selecting the F i l e / S a v e As... c o m m a n d o n the T e x t P a n e m e n u t h a t a p p e a r s w h e n the right m o u s e b u t t o n is p r e s s e d a n y w h e r e in the t e x t p a n e . X t a l V i e w is o r g a n i z e d a r o u n d the c o n c e p t of a crystal a n d a project. A crystal is a file c o n t a i n i n g unit cell, space g r o u p , a n d n o n c r y s t a l l o g r a p h i c s y m m e t r y i n f o r m a t i o n a b o u t a crystal type. Usually, t h e r e will be m a n y d a t a sets t h a t share this i n f o r m a t i o n . To s u p p l y this i n f o r m a t i o n to the p r o g r a m s , t h e crystal n a m e is passed. A p r o j e c t is a d i r e c t o r y a n d d e f a u l t crystal. Typically a p r o j e c t uses o n e crystal, b u t s o m e t i m e s several m a y be used. X t a l V i e w also creates h i s t o r y files w i t h the e x t e n s i o n .hist to k e e p t r a c k

Drag here to change value Value:47A

1 ~ mm

Click here to

J 100 max

entervaluefrom keyboard

FIG. 4.2 The slider is used to change numerical values; that is, you can adjust to integral values between the minimum and maximum values shown. When a floating-point value is desired, it is usually 100 times of the actual value used, and this will be indicated in the legend. If you enter the value from the keyboard, as is often desirable, to enter a precise value, be sure to hit return to enter the value.

274

XtalViewTUTORIALS Raise Value by 1

Value:lO,

IA I v l

t Lower Value by 1 Click here to enter value from keyboard FIG. 4.3 Number fields are used to enter or display integer values. If you enter the value with the keyboard, be sure to hit return to register the new value. If you click on the arrow boxes, the number is incremented or decremented by 1. If the arrow box is grayed, the number is at its limit and cannot be changed in that direction.

of the extensive data path. These files are all compact to take up minimal disk space and should not be deleted (until the paper is published). They allow backtracking to find out where things went w r o n g and also allow reproducing large files that may have been deleted to save disk space. The user interface to XtalView has as its aim to provide a visual interface to what was traditionally a punched-card and line-printer-oriented computing f i e l d m n a m e l y macromolecular computing. The user should be able to find program options by browsing the screen and to see output in a graphical manner when appropriate. The user may not know what a field does by simple inspection, but browsing does give a starting point for using help. One of the most glaring flaws in XtalView is the lack of context-sensitive help. Unfortunately this proved to be very clumsy to implement with XView across multiple platforms. There is, however, online help in H T M L manual pages in the $ X T A L V I E W H O M E / d o c directory, and on the World Wide Web at CCMS. Table 4.1 lists the XtalView programs. An XtalView session starts with xtalmgr (Fig. 4.6), which is used to launch the other applications with the appropriate files.

The xtalmgr Note: Items that appear on the xfit windows are printed in bold. The first operation is to set a project by selecting one with the menu button or, if none exists, editing a new one by selecting the Edit button next to the Crystal field. Label: Alphanumeric text 10.0 A

A

Click hem to enter text from keyboard

FIG. 4.4 Textfields are used to enter or display alphanumeric values.

275

4.3 XView

Select an Option: I Fire I Air IWater I a Option: D Option: I ~ b

Element: I'~ Fire c

FIG. 4.5 (a) Setting: Multiple choices can be made. In (a), Air has been chosen, as shown by its highlighted box. To select another option, click anywhere within the other box. (b) Check box. The upper check box is not selected, and the lower one has the option selected or turned on. To toggle the selection, click on the square box. (c) Abbreviated menu button. This is similar to a setting, except that the options are accessed by clicking on the button with the downward-pointing arrow, which pops up a menu of setting buttons. This saves space when many options are needed. The currently selected option is displayed on the right, while the label for the button is on the left.

TABLE 4.1 XtalView Applications Application

Description

xcontur

Density map contouring and printing

xdf

Disk-free meter

xedh

Electron density histograms

xfft

Fast Fourier-transform of electron density and Patterson maps

xfit

Fitting and model building; display of electron density

xheavy

Refinement of MIR data and calculation of phases

xmerge

Merge and scale data sets

xmergephs

Phase mutant data with native phases; cross-Fouriers; Bijvoet difference Fourier

xpatpred

Predict Patterson peaks from trial solutions--display solution with xcontur

xprepfin

Import/export data to XtalView, reduce indices, fix various possible problems

xresflt

Resolution filter

xrspace

Reciprocal space viewer to check data completeness, symmetries, etc.

xtalmgr

Top level of XtalView: manages database, files, and projects; launches applications

xhercules

Automated Patterson map solving

stfact

Calculate structure factors from PDB files

276

XtalViewTUTORIALS

FIG. 4.6 Xtalmgris used to control the other applications and manage files.

In this window (Fig. 4.7) one can edit the parameters of a crystal. In particular, all 230 space groups are known and can be found by entering the symbol on the Space Group line and hitting return or by setting the space group number field and selecting Find Space Group by Number. Each crystal is given a unique keyword in the Crystal field. When the information has been updated, select Update This Crystal to store the information in the database. The directory in xtalmgr is usually set by setting the project. However, one can also manually set the directory by typing it in or using the Browse... button. The Browse window also allows deleting files. When the crystal and directory have been set, either by choosing a project or by entering them specifically, an application needs to be chosen. Use the Applications menu button to bring up a list of applications, as shown in Fig. 4.8. Drag the mouse over, and select one of the applications by choosing its icon. This will cause xtalmgr to list the files that can be used by that application2 and to start building a command line. To finish the command line 2The application list is stored in a plain text file, $XTALVIEWHOME/data/Applications. This file can be edited to add or delete applications. You can also customize the file extensions used by XtalView here.

4.3 XView

277

FIG. 4.7 Xtalmgr Crystal Editor is used to enter crystallographic parameters.

you need to choose one or two input files from the lists by clicking on them. If you need an outputfile, you can either enter a name or use the Auto Name Output button to make up a name from the input file[s]. The filenames are then added with the Add Args button, and the c o m m a n d launched with Run

Command. The following gives short descriptions of c o m m o n applications of XtalView to give a flavor of what the programs can do. The best way to learn the programs is to plunge in and start playing. Be sure to explore all the menus and options. See $XTALVIEWHOME/examples for example data.

PreparingData Xprepfin (Fig. 4.9) is used to convert data file formats to and from .fin files. The main file format used in XtalView is the .fin file. A .fin file has the following format: h k 1 F1 sigma(Fl)

F2 sigma(F2)

278

XtalViewTUTORIALS

FIG. 4.8 Application menu button from xtalmgr. Right-click on the button to open the Applications menu.

The files are free-format, where hkl are integers and the rest are reals. If either F1 or F2 is missing, then it should be entered as 0.0 with the sigma 9999.0. Xprepfin contains a number of routines for converting popular formats. You can also add a new one by editing and following the instructions in $XTALVIEWHOME/data/OtherFormats. To do this, a filter program is needed that takes as standard input (unit 5 in FORTRAN) the format to be converted and outputs to standard out (unit 6 in F O R T R A N ) . f i n format. In a typical .fin file F1 and F2 represent the amplitudes of Bijvoet pairs, F + and F - . The other common format is to have two merged data sets such as native data and heavy-atom data: F ....t and F~......y, o r Fwil,ttye~, and F .....,....~. The main difference is in how centric reflections are handled. With Bijvoet pair data, centrics do not have a Bijvoet pair, and this is indicated by having F set to 0.0 and its sigma set to 9999.0. To enforce this rule set the Data Are switch to Bijvoet pairs. One can also reduce the indices of incoming data and put them into a single unique volume in reciprocal space. This modification may be needed for later steps when merging reflections.

4.3 XView

279

FIG. 4.9 Xprepfin data preparation program.

Xprepfin can also be used to export data from a .fin file to XPLOR, CCP4, and other formats and to convert between XtalView file formats.

Other Formats A mechanism exists for importing virtually any reflection file into xprepfin. A filter is needed that can convert the input file on the UNIX stdin and output a free format list of reflections of stdout in the order h,k,l,Fl,s(F1),F2,s(F2). F2 and s(F2) can be replaced by " 0 . 0 9 9 9 9 . 0 " to indicate a missing observation. In this manner xprepfin can be used to reduce indices, identify Bijvoet pairs, merge duplicate reflections, and otherwise ensure that the file is in proper XtalView .fin format. The filter can be a small program or a shell file with an awk script or anything other command. M a r k the file as executable (chmod + x file) and place it in your path. Then edit the file $XTALVIEWHOME/data/OtherFormats. Follow the instructions in the file and place an entry at the end. There are already a number of standard XtalView filters in the file. The next time you start xprepfin it will read this file and make the command available on the Other Formats menu button.

280

XtalView TUTORIALS

Here is an example shell script that will convert a CCP4 file: # ! / b i n / c s h -f #usage scalamtz2fin set set

< scala.mtz

> out.fin

infile=infile.tmp outfile:outfile.tmp

# this complains # it w o r k s cat > $infile

about

an

inappropriate

operation

but

# o u t p u t of m t z 2 v a r i o u s is s a v e d in m t z 2 v a r i o u s . l o g mtz2various h k l i n $ i n f i l e h k l o u t $ o u t f i l e << eof > \ mtz2various, log labin I(+)=I(+) SIGI(+)=SIGI(+) I(-)=I(-) S I G I ( - ) = S I G I (-) OUTPUT USER ' (315,4F12.2) ' END eof #place outfile cat $outfile rm rm

on

stdout

$ infile $outfile

Using this as a guide, you can easily convert any other CCP4 file by changing the l a b i n line to the appropriate labels for your file. Of course you must have CCP4 executables installed on your computer and in your path for this to work. If you are just going to run it every once in a while, you can make a simpler script and load the output into xprepfin using the xtalmgr or on the command line. Here's a similar script that leaves the output in for_xprepfin.fin: # ! / b i n / c s h -f mtz2various hklin my_file.mtz hklout for_xprepfin.fin << e o f labin I(+)=I(+) SIGI(+)=SIGI(+) I(-)=I(-) S I G I ( - ) : S I G I (-) OUTPUT USER ' (315,4F12.2) ' END eof

See the CCP4 manual if you need more information. The CCP4 URL is listed in Appendix B, Useful Web Sites. Output to CCP4 is much more

4.3 XView

281

straightforwardmjust select the option. Xfit will place data in a new mtz file with the labels: H K L El SIGF1 F2 S I G F 2 . Again, you must have CCP4 executables installed on your computer and in your path for this to work.

MergingHeavy-At0mData First run your native and heavy-atom data sets through xprepfin to prepare your data and put it in the .fin format, if needed, as described in the preceding section. If needed, reduce the indices on both sets of files with xprepfin. This step ensures that both data sets have their reflections indexed in the same asymmetric unit. Run xmerge: specify the native file first and the heavy atom second. Xmerge scales the second file to the first, which it leaves unchanged. The program has two file options for the output: another .fin file or a double-fin file, .df. The double-fin file preserves the Bijvoet differences of both the native and the derivative, and the .fin file merges the Bijvoets to a single F for the native and another single F for the derivative. Both files have advantages and disadvantages--however, a .df can be converted to a .fin with xprepfin, whereas the opposite is not true, so .df is the default. There are two scaling options that can be set: the number of bins of resolution and isotropic or anisotropic scaling (Fig. 4.10). The number of

FIG.4.10 Xmergedata merging program. The graph shows the merging statistics as a function of increasing resolution; the solid line is the R-factor and the dotted line is the size of the differences.

282

XtalViewTUTORIALS

bins should be set such that enough reflections are included in each bin. Too fine a bin will overscale the data and reduce the signal in the differences. A good bin size will give about 100 reflections in the smallest bin. Anisotropic scaling uses six scale parameters per bin and so the number of reflections per bin should be higher--say 500 reflections for anisotropic scaling. After the data have been scaled, xmerge displays several graphs, which can be used to determine the quality of a heavy-atom derivative. For more information on how to use these graphs see Chapter 3, Heavy-Atom Statistics, in Sec. 3.3. If you are using anisotropic scaling, look at the Message window to see if the scale factors for a shell go up after the anisotropy from the single scale factor has been applied. If they do, try using fewer bins. If this doesn't work, use single scaling.

Patterson Solutions XtalView includes a program, xhercules, 3 designed to automatically solve Patterson maps by using a correlation function computed at every position in the unique volume of the unit cell. The method starts with one atom and then adds a second atom and so forth until the Patterson vectors have been fulfilled. To start the procedure, a Patterson map is first calculated with xfft, using the differences between either isomorphous pairs or anomalous scattering pairs. The outlier filter should be set to about 100 to filter out ridiculous differences, usually caused by scatter off the beam stop or by the beam stop being slightly different between the two crystals. This map is then contoured with xcontur and left open for later comparisons with the output of the automated heavy-atom solution procedure. Xhercules (Fig. 4.11) works in both the anomalous and isomorphous cases: simply use the appropriate .fin file with Bijvoet pairs in the anomalous case or merged heavy-atom data in the isomorphous case. If you made .df file when using xmerge to merge the native and derivative data, then use xpatpred with the df(1 + 2,3 + 4) option to make a .fin file. For the first position both the absolute configuration (hand) and the origin are arbitrary so that for an orthorhombic crystal only the volume 0-0.25, 0-0.25, 0 - 0 . 2 5 need be searched. 4 Start xhercules and enter in this 3For X heavy-atom reciprocal-space correlation search, with some poetic license. 4The asymmetric unit for an orthorhombic crystal (with no C centering) is one-fourth of the cell. Looking in the International Tables, you will find that the volume 0 . 0 - 1 , 0 . 0 - 1 / 2 , 0 . 0 1/2 describes one possible asymmetric unit. Since the hand is ambiguous, this is the same as adding an inversion center, which gives us the Patterson symmetry or Pmmm, which has the asymmetric volume of 0.0-1/2, 0.0-1.2, 0.0-1/2, or 1/8. For the first site the origin is arbitrary, and this makes 0.0 and 0.5 equivalent points in all three directions, which in turn means that we have to search only 0.0-1/4, 0.0-1/4, 0.0-1/4, or 1/64. If you can't follow this reasoning, just use the asymmetric unit in the International Tables and ignore the extra hits.

4.3 XView

283

FIG. 4.11 Xherculesheavy-atom searching program.

volume. Leave all the file fields blank, except for the .fin file, which will have the name of your scaled and merged derivative difference data. To verify sites found in xhercules, use xpatpred to see if the predicted vectors for that site agree with the Patterson map as detailed below. You can also inspect the solution(s) against the correlation map by pressing View Correlation Map. This starts xcontur with your map and loads the solution numbers as labels. Usually the highest peak is chosen as the correct site, but occasionally a bogus solution crops up with one or more of the coordinates exactly 0.0 or 0.5. These can be found by comparison to the Patterson map. Having decided on a first site, you save this into a solution file with xpatpred and then give xhercules this file as input. This time, when xhercules is run, it will keep the site(s) in the solution file fixed and search for a second site. Having one fixed site is equivalent to making an origin choice, and the unique volume thus increases in an orthorhombic space group to 0 - 0 . 5 , 0 - 0 . 5 , 0 - 0 . 5 . Fixing two sites fixes the absolute configuration (hand), and thus one needs to search the entire asymmetric unit for a third site or 0 - 1 , 0 - 0 . 5 , 0 - 0 . 5 in the orthorhombic case. To see if a site is correct, it is important to compare the predicted vectors with the actual ones in the Patterson map. For this use xpatpred. Put up

284

XtalViewTUTORIALS

the Patterson map with xcontur and also start xpatpred. Enter the heavyatom site into xpatpred by typing in its coordinates from xhercules and selecting Insert. Enter a filename for the Predictions File:mI usually use the filename "pred." N o w select the Predict button, which writes a labels file for xcontur. In xcontur, select the Files... button, enter "pred" in the Labels: field, and press Load Labels. The file will be loaded as labels filling the volume 0 - 1 , 0 - 1 , 0 - 1 . You can then look at the Harker sections to see if the self-vectors are there, and, if there is more than one site, look through the Patterson to see if the cross-vectors fall on density. If another program is used to solve for heavy-atom positions or if you figure them by hand, you can still use xpatpred and xcontur to check the positions against the Patterson map. It often happens that two single sites are evident from the Patterson map, and the problem becomes that of fixing the relative origin of one site to another. In this case, one can use xpatpred to cycle one of the sites through all the origin choices and simply look in xcontur to see which pair gives the best match to the cross-vectors.

Bijvoet Difference Patters0ns In XtalView a Bijvoet difference Patterson is treated exactly like an isomorphous Patterson except that the data are already merged for you by the data reduction program. The .fin file is loaded directly into xfft and a Patterson made using the .fin file input type on the Pattersons Only submenu of the File Type menu. To ensure that the centrics are handled properly, see the section above on xprepfin on preparing Bijvoet pair data.

Difference F0uriers Difference Fouriers are implemented with xmergephs, which takes as inputs a .fin file and a .phs file. The differences in the .fin file end up being phased with the input .phs file. When making an isomorphous difference Patterson, swap F1 and F2 so that the coefficients to the Fourier map are Fheavy -- Fnative. Otherwise the peaks will be negative. For a Bijvoet difference Patterson, the phase needs to be shifted by 90 degrees, so set this button on as well. The output is another .phs file, which is then run through xfft (or xfit) with the map type set to F,, - F~, which in this case is F~......y - F ....,w. The difference map can be displayed in xcontur. To find the peaks, set the slab to be one asymmetric unit thick and adjust the contour level such that only the large peaks show. Clicking on a peak with the mouse then gives the fractional coordinates, which can be entered into the heavy-atom solution in xpatpred (see above) and verified against the Patterson map, which is very important! In a similar manner a mutant difference Fourier can be made by merg-

4.3 XView

285

ing with the native phases. First merge the wild-type and mutant data with xmerge, putting the wild type as the first file. The order of the F's is then swapped in xmergephs, so that in the resulting difference map the positive density will indicate new atoms in the mutant and the negative will indicate the missing atoms.

Heavy-AtomRefinementand PhaseCalculations Xheavy is used to refine heavy-atom coordinates and to calculate phases from these coordinates and the merged heavy-atom data. A new version, 3.0, is available that has several improvements in the phase calculations and refinements. Xheavy consists of a main control window (Fig. 4.12A) and windows for editing derivatives (Fig. 4.12B) and for options (Fig. 4.12C). An example solution file is as follows: DERIVATIVE

NAME

sml

CRYSTAL FILTER

ANOFILTER SIGMACUT

RMIN RMAX

99.000000

1.000000

ANORMAX

i000.000000

WEIGHT

1.000000

ANOWEIGHT PHSTYPE

1.000000

1 1

p82n3n4_2sml.df

NSITES ATOM

0.300000

2.000000

3.000000

ANORMIN

FILE

p82

1.000000

A

1

0.172497

18.750000

0 .0 0 0 0 0 0

0. 1 1 8 1 5 7

SM+

i. 0 0 0 0 0 0

Xheavy uses a very robust refinement algorithm that is based on a correlation search. It gives very accurate positional parameters. The refinement is done by searching for the maximum of the correlation

mxf~ x J ~ • ~ ~ f~,'

where A is the difference between the native and the derivative and f~ is the calculated heavy-atom model. The advantage of using the correlation is that the scale factor drops out of the equation. In the early stages of heavy-atom

286

XtalViewTUTORIALS

FIG. 4.12 Heavy-atom refinement is done with xheavy. (A) The main window for xheavy. (B) The derivative editing window used to set options and enter data for individual derivatives. (C) The Options window is used to enter global options. (D) The Method menu located on the main window. Generally the order of operations is to enter the derivative data in the editor, refine the derivatives, and then calculate the phases.

refinement, the scale factor is unknown because the degree of partiality of the model is uncertain. The disadvantage is that the maximum must be searched for by trying many positions and keeping track of the behavior. For this search xheavy uses an algorithm that first chooses a grid based on the upper resolution limit and searches a small area. If a better match is found at the edge of this area, a coarser grid is used. If the shift is small, a finer grid is then tried. If there is no movement, a grid in between is used, and then a smaller grid is used again until changing the grid has no effect. This allows for a large radius of convergence and yet ends with a fine grid. Xheavy loops repeatedly

4.3 XView

287

FIG. 4.12 (continued).

through the sites until no more movements are found. In the Options window you can choose Remove Origin for the refinement. In marginal cases this may improve the refinement.

288

XtalViewTUTORIALS

The heavy-atom model can be incomplete, and the correct results will still be obtained. For example in a multisite derivative a single site can be refined alone. Refining each site independently allows for more accurate cross-vectors to be calculated when the relative origins of the sites have not been determined. The correlation function roughly indicates the quality of the derivative as follows (to be taken with a grain of salt: these numbers depend on resolution, isomorphism, and mood). In general, 0.5-0.6 can be obtained with wrong sites, 0.6-0.68 indicates something good is happening, and 0.680.77 indicates good solutions that can provide real phases. A function above 0.77 indicates an excellent solution and isomorphism. If the origin has been removed, these numbers will be about half the size. Starting with xheavy 3.0, several features have been added to improve the refinement and phase calculations. The popularity of cryo-data collection means that the unit cells of he derivatives can be significantly different, inasmuch as freezing can change the unit cells by dimensions. To account for this, lack-of-isomorphism parameters have been added. -s These dampen the differences with resolution. In addition, anisotropic scaling has been added to account for crystals with different shapes or crystals that, for whatever reason, diffract differently from native. Local nonisomorphism for individual sites is taken into account with anisotropic thermal parameters for each atom. All these parameters are highly correlated, so that the refinement of SA better strategy would be to collect several native data sets and pair them with the derivative that has the closest unit cell. Also, you may be better off staying at room temperature. Sometimes in our lab derivatives at cry() temperatures failed to produce good phases but the same derivatives at room temperatures worked.

4.3 XView

289

them must be done slowly and iteratively. This explains why refinement times are longer now than with earlier versions. However, the features lead to real improvements in the maps. To turn off these new features for a derivative, turn off Refine B's on the Edit window. You can end refinement at any time by hitting Abort; the last parameters will be saved. Important: Save the solution with Save Derivative. If previous protein phases are found by the program, either by putting a filename on the Input Phases line or from a previous calculation, the program will use the phases in the refinement to refine against the correlation of Fp - FpH and Fp - FvHcalc"This should be done only for phase sets that are at least reasonable and never against single-site phase sets. Also if you have not found the correct absolute configuration (i.e., hand), do not do this until the correct configuration has been established. Calculate protein phases by setting the method to Calculate Protein Phases and clicking on Apply. The program will print a lot of statistics that are also being saved to a log file (the filename is printed in the message window when the program starts, see Fig. 4.12A). The phases are calculated in two passes. First the individual SIR or single anomalous scattering (SAS) phases are calculated for each derivative and then these are combined. The combined phases are then used in a second step to reestimate the error parameters to calculate the final phases. The phases are written into the output file in the XtalView phase format, h, k, i, Fo, figure-of-merit, p h i ( d e g r e e s ) . These can be directly read into xfit to view the map or a map can be calculated with xfft and viewed in xcontur. The latter is handy for finding molecular boundaries.

Absolute Configuration(Hand) The phases must be calculated in the two possible heavy-atom configurations or hands. One will be correct and the other incorrect (see also section entitled Choosing the Absolute Heavy-Atom Configuration in Sect. 3.6). To calculate the phases in the alternate configuration open the Options window and choose the ( - 1 ) option for the Heavy-Atom Configuration (hand) widget. This will invert the heavy-atom configuration through the center. Enter a different name for the output phases such as a name with the extension .h2.phs. The exception to this rule occurs when you have a single site in a single derivative and are using SIRAS (single isomorphous plus anomalous) in a polar space group. In this case, and this case only, the ambiguity in the output phases is broken by the anomalous data. You can then use these SIRAS phases to cross-phase the other derivatives and find the correct configuration. In all other cases, there is no way to a priori find the configurat i o n - y o u must try both and look at the resulting electron density maps (see Choosing the Absolute Heavy-Atom Configuration, in Sec. 3.6).

290

XtalViewTUTORIALS

Enanti0morphicSpace Groups In some space groups there is an ambiguity regarding the correct enantiomorph, as in space groups P41/P43, P65/P6~, P62/P64, P4~22/P4322, and P41212/P432~2. In this case both space groups must be tried, as well as both absolute configurations in each of the space groups for four separate phase sets. To do this you need a crystal file for each of the two space groups (prepare them with xtalmgr). The easiest way to change the crystal is to edit the solution file, do a global replace on the crystal name, and then restart xheavy.

Exporting Data Although one may use another program to calculate the final phases, xheavy is very useful for calculating difference Fouriers for cross-phasing other derivatives and establishing the correct heavy-atom configurations. A phase file for the SIR or SAS phases can be used with xmergephs to produce a difference Fourier with another merged heavy-atom data set. We sometimes use other phasing programs when the maps seem noisy, but most often this just seems to be proof that the derivatives are no better than they really are. You will have to enter the heavy-atom positions into the new program by typing them in. The next step is usually to solvent-flatten the phases. You can use xfit for this purpose or write the phases out in CCP4 format and use CCP4 programs (i.e., DM or Solomon).

Xfit Figure 4.13 shows the xfit main window and canvas. The Mouse The main input device is the mouse (Table 4.2). Since a mouse has only three buttons on most UNIX systems, some compromises had to be made. The leftmost button controls the rotation about the center of the screen. The rightmost button brings up a menu of commonly used options. ~ This menu overcomes the back-and-forth motion of other systems that is needed to select options from a menu. The middle mouse button is made to do multiple duty. The right mouse menu is used to change the middle mouse function, and the cursor changes to tell the user what mode is current. The middle 6 O n the C o n t r o l w i n d o w there is an option to use the MSI Insight mouse conventions.

4.3 XView

/

FIG. 4.13

(A) The xfit main window. (B) The xfit canvas.

291

292

XtalViewTUTORIALS TABLE4.2 Mouse Operations

Mouse action

Modifier

Operation

Left-click

Pick atom

Left-drag

Rotate x-y

Left-drag

Top of screen

Rotate around z

Left-drag

Shift

Zoom/scale

Left-drag

Ctrl

Rotate around z

Left + middle

Zoom/scale

Middle-drag

Normal mode

Pan view

Middle-drag

Top of screen

Move slab in z

Middle-drag

Fitting mode

Rotate, translate, torsion depending on operation selected from the Canvas menu

Right-click

Canvas menumuse to change middle mouse function

mouse button defaults to dragging the screen center. In early tests it was found that a trackball model of mouse motion was confusing because the user could never be sure precisely which way the model would move m a serious problem in a fitting program. This left us with a need to implement a z-rotation about the center of the screen. To do this xfit uses the top inch or so of the screen for z operations. Again, the cursor changes to let the user k n o w where this region is. To zoom in and out, hold down the left and middle buttons simultaneously and drag up and down. On two-button mice this operation is not possible because it is used to emulate the middle button. In the two-button case hold down the shift key while dragging with the left to zoom in and out. Center on an Atom

Left-click the atom and then with the right mouse select Center @ on the Canvas menu. To center an atom not visible, bring up the model window, select the residue in the list, select the atom, and then use the Canvas menu to Center @.

Atom Stack Xfit uses an atom stack for many operations. All operations use a prefix notation. First, the operands are selected and put on the stack, and then the

4.3 XView

293

command is given. For example, to calculate a distance, first click on two atoms to put them on the stack and then click on Distance on the main window. Atoms are pushed onto the stack in two ways: by picking on the screen by left-clicking, and by picking from the atom list in the Model window.

Fitting with the Mouse Select a residue. First select the residue to be fit by left-clicking it. N o w with the right mouse pull up the canvas menu and select RESIDUE. The residue you clicked will turn green to indicate that it is currently selected. Translate. To translate the residue, select TRANSLATE on the Canvas menu and drag the middle mouse button. Dragging near the top of the screen translates the residue on z. Rotate. To rotate select ROTATE on the Canvas menu and drag with the middle mouse. Dragging near the top of the screen rotates the residue around z. Note that the rotation is about the residue you picked before activating it. Torsions. For torsions (twistable bonds), a vector around which to torsion is first picked by left-clicks. First, pick the tail of the vector (bond), then the head, and then select T O R S I O N in the menu. All the atoms bonded to the head and currently selected are included in the torsion group. Dragging ADJUST left and right torsions the group about the selected vector. This method works for ANY bond regardless of whether it's protein, DNA, or a new ligand. For protein residues a shortcut is available for the chi angles. Typing a "1" while the mouse is in the canvas selects chil and puts the middle mouse in torsion mode. Similarly, typing a "2" activates chi2. By keeping one hand on the mouse and the other on the numbers, you can quickly torsion side chains into density. Applying the fit. When done, type ";" to accept the new position. There are three buttons on the main window that also affect fits: Reset puts the atoms back in their starting position and leaves them active; Cancel puts the atoms back in their fitting positions and cancels the fitting operation; Apply Fit cancels the fitting operation and accepts the new position. Single atoms. Click on an atom and select A T O M on the Canvas menu. Since the only possible operation with a single atom is to translate it, the middle mouse is automatically put in this mode, so you can immediately begin dragging it. Undoing coordinate changes. The program doesn't have a specific undo button. Instead, the last 20 coordinate changes are saved on a stack and can be reversed at any time. On the main window, use the Select Save Set menu button to select the residue that was changed. Then click Swap to swap the old coordinates with the new. Pressing Swap again swaps them again. By repeatedly pressing Swap, you can compare the old and new positions.

294

XtalViewTUTORIALS

Save. Save often. Open the Model window and type in a filename in the Output PDB field. You can now save with the File/Save Model button or the Save button on the main window, or with Toolbar/Expert Tools/Save model being fit. If you are going to overwrite a file, the program prompts you to determine whether this is OK. If you save over the same file that was input, the program warns you (because it can't then copy the remarks and other records to the new file from the old file). However, such saving doesn't affect the coordinates. Autosave. If you place the flag - s a v e on the xfit command line, xfit will save the model into a file every time the coordinates are changed. The name of file, which is echoed in the message window, will be xfit_autosave_pid.pdb, where pid is the xfit process ID (a unique number assigned by the system to each process). Fragment Fitting. To fit several residues, or fragments, as a group, clear the stack on the main window, select one atom in all the residues to be fit, and select RESIDUES from the Canvas menu. The group can then be fit as for a single residue. The only difference is that the number key torsion shortcuts don't work because this use is ambiguous with more than one residue. If all the residues are in sequence, you can fit the residues simply by clicking on the last and the first and then using Expand Top 2 on the main window followed by RESIDUES. Hint: The center of rotation will be about the last atom clicked on, so before issuing the RESIDUES command, click the atom you want to rotate about. Refine the coordinates. After fitting a residue you can repair the geometry by using Toolbar/Auto Fit/Refine Region. The last residue selected is colored cyan, and the geometry is refined against the dictionary. Each time you press the space bar, another round of refinement is done. Use the ";" key to end the refinement. To refine a range of residues, click the first and the last and then use Toolbar/Auto Fit/Refine Region. Again the residues will be highlighted with a cyan color, the space bar is used to refine more, and ";" ends the process. You can leave the refinement going and fit residues and atoms at the same time. For example, you can fit a single atom as above, use the space bar to refine the geometry, drag the atom again, and so forth until the model is in position. Use ";" when done. There is also a Fit While Refine Range function that automates this process so that when you click on an atom, it automatically is activated. In the meantime, the model is continuously refined in the background. However, this function also takes into account the map, and it doesn't work too well at low resolution, where the map doesn't look much like protein atoms but is lumpier. At high resolutions Fit While Refine Range works very well. Flip peptide plane. The quickest method of flipping a peptide plane is to use the single-atom mode to drag the C and N atoms to the opposite sides

4.3 XView

295

and then use Refine Region to anneal the geometry. If it's not right, redrag the O atom and press the space bar. You will find that if you exaggerate the movement of the O atom, you can pull the rest of the structure around.

Additional Xfit Functions Go To a Specific Residue Open the Model window, type in the residue number in the Name field, or select it from the scrolling list and click on Go To.

Maps and Phases Xfit has a built-in fast Fourier transform (FFT) capable of going both from phases to maps and from models to phases. This allows a lot of flexibility in the use of the program. For instance, the resolution of the map can be changed at any time and the type of map can be changed. To encourage the use of phase files instead of maps, these components are treated in the same way in the program. Except for experimental phases such as MAD and MIR, crystallographic phases are derived from the model. Xfit can calculate these phases if given a reflection list and the PDB file. The reflection list is prepared with xprepfin (as a fake phase filemperhaps better called an empty phase file). This allows updating the phases at any time during the fitting. A partial structure factor calculator also allows making omit maps on the fly.

Spline Maps Normally maps are calculated in a grid that is convenient for the FFT algorithm. The map is sampled discretely, and if points in between the grid points need to be calculated, they are interpolated. This interpolation can give errors up to about 20% in the worst cases. Spline map avoid this by using a spectral B-spline description of the map that is accurate to 0.1% at any arbitrary point. This gives better-looking contours and more accurate refinement and averaging. To use the spline-based maps, set Splines to quadratic or cubic in the FFT window. The Use Grid option in the Contour window can then be set to Orthogonal, allowing contour grid spacing to be any desired value in angstroms, independent of the FFT grid.

Placing Ligands and Fragments The Large Translation Search button in the Refine window will translate the user's current fragment to anywhere in the visible contoured density,

296

XtalViewTUTORIALS

by superimposing the fragment's center of mass on the center of the density after the density of the atoms not in the fragment has been subtracted. Because this center is only an approximation of the desired position, the program follows the large translation search with a standard translation search, which does a brute-force search of nearby positions for the best density correlation.

Starting a New Model de Novo To start a new model, click the model number on the main window to an empty position. Then open the Model window and select the residue type. Enter in a number for the residue name. If you are not sure what it should be, just set it to something like 100. N o w use the menu button next to the Type field to select a residue type. Turn on the Autonumber function. N o w select New Model from the Insert menu button. The new residue will be placed at the center of the screen, so make sure that the density of the first residue is centered on the cross. At this point you can continue inserting more residues by centering their density and using Insert After or Insert Before (the autonumber will add or subtract in naming the residue), or you can use the new autofit functions as described above.

Fixing Main Chain One of the trickiest parts of fitting is getting the main chain peptide planes correct with good phi-psi's. In xfit the strategy is to use the positions of the C,~ and Cv atoms (shown as CA and CB on the screen) to define the main chain geometry rather than trying to move the peptide plane itself. Five pentamer poly-Ala fragments are then fit over the part to be fixed and the geometry of the main chain is replaced with the new geometry. The functions for this are on the Pentamer m e n u - - a pinnable pull-right menu on the Model window under the Insert menu. To fix a stretch of main chain, select the residue that is two in front of the one you want to fix. N o w select Find Best Pentamers. Several pentamers will be superimposed, loaded, and matched to the main chain. You can use Cycle Pentamer to cycle through them. When you find one that matches well, select Replace Mainchain Middle 3. If you do not find a good one, try moving the starting point forward or back one residue. If nothing shows up that looks good, you have a very unusual pattern of C,~-C~ vectors. For instance, if you are trying to build B-sheet with two adjacent side chains on the same side of the main chain, you won't find a good match because this geometry is impossible in proteins.

4.3 XView

297

Semiaut0mated Fitting Tracing the Main Chain Xfit version 3.5 has a new menu, "Auto Fit," that incorporates features for automated fitting procedures that used to be done one step at a time. The new menu greatly accelerates the fitting in of the parts of the map that are clear, allowing you to concentrate on the difficult bits without having to do the repetitive fitting tasks on most of the map. By "clear" we mean that the main chain density should be mostly connected at positive contour levels and the side chains are visible. The Auto Fit menu, shown in Fig. 4.14, can be accessed from the toolbar (see Fig. 4.15). Note that some menu items have a character in single quotes, indicating a keyboard shortcut for the operation. After you have become accustomed to the keys, you will find these keystrokes much quicker than using the menu. Before we explain how to use the menu we need to discuss some notation. MRK. An M R K is a residue with a single C~ atom used to mark the positions of the C~ atom of a residue. MRKs are part of the standard xfit dictionary found in $XTALVIEWHOME/data/dict.pdb. 7 If you don't have a dictionary being automatically loaded, read the beginning of the xfit main page before continuing. You must have a dictionary before automated fitting

can proceed.

Fragment/Chain. The basic unit of fitting is a string of residues connected together. The criterion for connectedness for M R K residues (i.e., C~'s) is two successive ones be within 4.5 A of each other. The Model window automatically numbers each fragment and marks the chain termini. The Sequence menu button on the Model window has a command Fix Chain Numbers that can be used to update the chain numbers. Focus residue. This is the last picked residue either on the Canvas or on the Model window. It shows up as highlighted in Model window residue list and on the lower right footer of the canvas. All the commands discussed 7There are three dictionaries available in the distribution. They differ in the treatment of hydrogen atoms. One has no hydrogens, one has polar hydrogens only (i.e., hydrogens involved in hydrogen bonding), and the third has all hydrogens. The default is the polar H dictionary. You can change the dictionary type in the Model window, or you can have a different dictionary load by inserting this line in .cshrc after you source XtalView.env: setenv XFITDICT

$XTALVIEWHOME/data/dict_noh.pdb

to load the no-hydrogendictionary.To load the all-hydrogendictionary,changedi c t . n oh. pdb to d i c t . a l l h. pdb. The polar hydrogen dictionary is d i c t . polarh, pdb.

298

XtalViewTUTORIALS

FIG. 4.14 The Auto Fit menu (CA = Ca).

here start with the focus residue and use it to identify the chain to be acted upon. Current map. The map used in the Auto Fit commands is the one enumerated in the Refine window in the Map Number field. The default is map 1. If another is desired it must be changed here before the Auto Fit commands can be used.

Auto Fit Menu Commands Add CA (MRK) After. Adds an MRK residue to the C-terminal end of the current focus residue (see above). The " > " key is a shortcut for this operation. If there is already a residue here, the user is warned and given the opportunity to cancel. If there is more than one residue in the fragment, the vector between the last two residues and the density of the current map (see above) is used to find the position of the residue. The five most likely positions are saved and can be cycled through by pressing the space bar or by the

FIG. 4.15 The xfit toolbar provides quick access to often-used menus.

4.3 XView

299

Cycle CA option on the menu. For small adjustments or if none of the positions looks correct, the middle mouse button moves the Ca atom in Refine, while Fitting mode keeps the distance 3.8 a to the focus residue. The MRK residue is automatically numbered one less than the focus residue. Last, the command makes the added residue the focus residue so that the next invocation of this command will add another residue to the end of the chain before the new one. In maps with clear main chain density, the command finds the next Ca position and no further fitting is needed. Occasionally the side chain density is stronger than the main chain, and in this case pressing the space bar will find the correct choice. Add CA (MRK) Before. Similar to the Add CA (MRK) After except that it adds to the N-terminal end of the chain or fragment starting at the focus residue. The " < " key is the shortcut. Next CA Choice. Cycles through the Ca choices for the Add MRK commands. The space bar is a shortcut to this command. Cancel Baton mode. Accepts the Ca position and ends the refinement mode. The ";" key is the keyboard shortcut. It is good practice to always issue this command upon finishing additions to the ends of a fragment; otherwise a later press of the space bar may have surprising results. Note that this command and the ";" key will accept any fitting operation and any refinement mode, so it can be used anytime after a fitting or refinement operation to accept the results. Go to Fragment Nter. Moves to the N-terminal end of the fragment or chain containing the focus residue and makes it the new focus residue. Use it as a precursor to the Add MRK Before command. Go to Fragment Cter. Moves to the C-terminal end of the fragment or chain containing the focus residue and makes it the new focus residue. Use it as a precursor to the Add MRK After command. Poly-Ala Fragment. Uses overlapping pentamer commands to turn all the MRK or VEC residues in the fragment or chain containing the focus residue into alanines. If the best pentamer contains either a glycine or proline it is kept as such, instead of as an alanine, to prevent the insertion of alanines in bad positions.

Auto Fit Strategy Starting a N e w Model After you have located a stretch of density, center the position of a Ca atom on the cross in the center of the screen and open the Model window. Set the active model to an unused number and the Type to MRK, enter a likely residue number (e.g., 100), and select New Model command on the

300

XtalViewTUTORIALS

Insert Res menu. A new residue will appear in the center of the screen. Select this residue to make it the focus residue by clicking on it with the mouse, and then use the " > " key to add a second MRK residue. Since there is ambiguity with respect to which direction the chain should go, the program leaves it up to you to move it. If you have some indication of the chain direction (see Fig. 3.44 and 3.47) then you can place the residue in that direction. Otherwise, just guess and reverse the chain direction later if needed. To place the Ca atom, use the middle mouse button and drag the residue until it is in the desired position. Use the " > " key again to place the third and subsequent residues automatically. The program finds the position of the next residue by considering the map density and the direction between the two preceding residues. In considering the map density, the program tries to eliminate strong side chain paths by considering a second move and adds the two probabilities together to find the final probability. Thus side chains are eliminated because they will not provide a good position for the second move. If the program still finds a side chain, press the space bar to cycle through the other four choices. Pressing the space bar five times brings you back to the original choice. You can also move the C,~ position with the middle mouse button, as you did for the second position. Since it is not possible to precisely determine the Ca position in a medium-resolution map, concentrate instead on getting the path right and use the pentamer fitting followed by real-space refinement to better position the fragment. Save frequently! You never know when there is going to be an accident. Open the File window, enter a filename in the Output PDB field, and click on Save just below the field. You can also use the Save button on the main window to write out the file. To add residues to the N-terminal end of the fragment hit the "[" key to move to the N-terminus and then type " < " to add residues at the end. When you get to a spot where the map is not clear, it is usually best to move on to another spot and fill in another chain fragment. You'll also want to hit the Symm Atoms on the main window occasionally to look for symmetryrelated residues to make sure you don't trace the same density twice. As you add, you will find that a fragment has the wrong chain direction. It can be easily reversed by selecting a residue in the fragment and then, in the Model window, issuing the Reverse Chain Containing Selection command on the Sequence menu button.

Putting fragments into order. In the best case you can find the entire main chain with one fragment. However, this is rarely the case. Since you

4.3 XView

301

added the fragments arbitrarily, they will appear in the Model window in no particular order. In general this is no problem unless you want two fragments to join and they won't because they are in the wrong order. To fix this you can renumber one of the fragments. In the Model window, click on the first residue in the fragment to be renumbered, enter the new number in the Name field for the start of the chain, and issue a Replace Name command from the Sequence menu button followed by a Renumber Chain. After you have your fragments numbered as you like, you can issue a Sort by Sequence Number command and re-sort the fragments into ascending order by number. You can also reorder fragments by cut and pasting if for some reason you don't want to change the numbering system you are using. To do this, first select the Non-Exclusive check box. This lets you select more than one residue at a time. Highlight all the residues in the fragment in the list by clicking on them, and then use Cut selected into buffer on the Insert Res menu button to put the fragment into the residue buffer. Then select the residue at the end of fragment you want to paste in front of and use Paste buffer after to put the residues back in the list. You can also Paste buffer before by selecting the appropriate residue. Note: The cut-and-paste commands can be very useful during model building to shuttle residues back and forth between models. By changing the Model # field between the cut/copy and the paste, you can cut or copy residues between models as well as within models. For instance, you can cut a fragment out of a related protein and save a lot of fitting effort. You can then rename the first residue to the correct sequence number and renumber the chain to put it in the correct sequence order.

Reversing the chain. If you get a chain in reverse order, it's simple to reverse it. Open the Model window, select a residue in the fragment, and then issue a Reverse Chain Containing Selection command from the Sequence menu button. This command works very well at the M R K stage, but later it will have the N and C termini of all your residues facing the wrong way. After the trace is complete (or if you get stuck and can't proceed). The fragment is then poly-Ala'ed to form a backbone. The poly-Ala command builds overlapping pentamer fragments from the N-terminal end of the fragment to the C-terminal end. It is best to get the trace fairly complete before you do this step. You can't undo it, so save the M R K trace first. N o w select a residue and from the Auto Fit menu button on the toolbar, issue a Poly-Ala Fragment command and the program takes over. After it is finished the MRKs will be replaced with a poly-Ala backbone. You are now ready for the next step--assigning sequence.

302

XtalViewTUTORIALS

Assigning the Sequence

Sequence File Before you can assign the sequence you need a sequence. You can almost always find your sequence on the World Wide Web via one of the sequence databases and then use your browser to cut and paste your find into a file. Give the file the extension .seq and edit the file to check the format. Xfit wants the following format: the first line is the title; subsequent lines should contain the sequence in single-letter in uppercase; numbers and punctuation will be ignored. That's it. Read this file into xfit using the File window. Put the name in the Sequence File field and push Load Sequence. In later sessions, you can put the sequence file on the xfit command line, and as long as it has the .seq extension it will be loaded as a sequence file. Hint: Since there isn't any way to enter in sequence numbers in the sequence file, xfit numbers the sequence starting at the first residue. If you want to change the numbering you can insert some dummy residues (e.g., A) at the beginning of the sequence or in a gap.

Identify Sequence N o w you have to find at least one residue that you can identify in the sequence. Sometimes you can identify one by a special position such as binding a prosthetic g r o u p m f o r instance, the histidine ligand to a heme group. Otherwise you need to look for a pattern of residues that you can identify by shape and size. Usually, the easiest to identify are the large residues such as a tryptophan molecule. The core is the most reliable place to assign sequence, since surface residues are often disordered or given a haircut by solvent flattening. Look for a repeating pattern and then try to identify that pattern in your sequence. Remember that the chain might be reversed so try the match in both directions. If you are still stumped or aren't sure about what a residue should look like anyway, just try some guesses. It's very easy in xfit to change the sequence around, so you can try different possibilities.

Assign Sequence When you have a candidate residue, click on it to select it. Open the Sequence window using the sequence Toolbar/Auto Fit/Set Sequence... and scroll down the sequence (see Fig. 4.16) to find the corresponding amino acid in the list; click it to select it. Now click on Change and Fit to Density and stand back. All the residues in the fragment will be converted according to the sequence, and then all the rotamers will be tried to find the best match to density. You can do this as many times as you like. Just remember to save any

4.4 A TypicalManual FittingSessionwith Xfit

303

FIG. 4.16 The xfit Sequence window used for setting the sequence of a chain.

models you want to keep. This way you could, for example, try out all the tryptophans in the sequence and then look along the model to see how it matches the sequence.

. . . . . 4.4 . . . . . A TYPICAL MANUAL FITTING SESSION WITH Xfit Source the $ X T A L V I E W H O M E / X t a l V i e w . e n v file if necessary. Type x t a l m g r to start. M a k e sure you have the correct crystal chosen. See the xtalmgr section above if you need to create a new crystal. Select xfit on the Applications menu button and then choose a model and a phase file in the file lists. Push Add Args to build the c o m m a n d line, followed by Run Command to start xfit. Xfit will load the files on the comm a n d line as it starts. You can also start xfit from the c o m m a n d line by typing x f i t . You can then set the crystal after starting using the Crystal Field menu button (the

304

XtalView TUTORIALS

little b u t t o n with the triangle on i t m c l i c k o n it with the right m o u s e button). The crystal contains the unit-cell a n d s y m m e t r y i n f o r m a t i o n . To load a m o d e l o p e n the Files w i n d o w . Click on the W i n d o w s b u t t o n w i t h the right m o u s e b u t t o n a n d select Files... In the Files w i n d o w (Fig. 4.17) you will see a list of all the files in the directory. If you did n o t start in the correct directory, y o u can use the directory list, below the file list, to change directories by clicking on the list, or enter the directory on the D i r e c t o r y line a n d press enter. Click on y o u r m o d e l in the w i n d o w a n d then click on the L o a d M o d e l b u t t o n . You can load several models at once. The slot the m o d e l is loaded

FIG. 4.17 The xfit Files window. At the top right is a list of files in the current directory that match the extensions in the File Filter field, and on the lower left is a list of directories. Click on a directory in the list to change directory.

4.4 A TypicalManual FittingSessionwith Xfit

305

into is controlled by the As Number field. This number is automatically incremented after you load each model. If you want to overwrite a model, click this button back to the model number you want to load and then click Load Model. The program will ask you to confirm the overwrite.

Loadingthe Map Although there is a binary map format that xfit reads, it is much more advantageous to take advantage of xfit's FFT capabilities and read in a phase file and FFT it to a map. It actually takes less time to do this than to load a binary map from a file. There are also other advantages: 9 You 9 You 9 You 9 You

can can can can

change the resolution at will (FFT window). change the map coefficients at will (FFT window). update the phases after changing the model (SfCalc window). make omit maps at will (SfCalc window).

Note: Phases can be calculated only if you are using isotropic B's and the resolution is lower than about 1.5A. This is because xfit uses an FFT that cannot handle anisotropic U's and, like all FFTs, it becomes inaccurate at very high resolutions. However, at these higher resolutions, o-A maps usually are all that is n e e d e d m i f you are still making decisions about where to place side chains, you are almost certainly not at very high resolution. Load the map by clicking on its name. Click on the Load M a p button. After the phases have been read in, the FFT window (Fig. 4.18) will pop up. Use the right mouse button on the Coefficients menu button and choose the coefficients you want. If the file contains F0 and Fc, I highly recommend using the era map types. At this time a map will appear in the canvas. If you want to change resolution ranges or coefficients later, you can do so by bringing up the FFT window again. Note: If you are one of those who like to use both a difference map and a 2Fo - Fc map at the same time, just load the map twice, changing the coefficients each time.

Contouringthe Map You can change the contour levels and box size in the contouring window. If you are going to inspect your map at the residue level, leave the box size at 5, as in Fig. 4.19, and Auto Contour On Scroll on. As you scroll through the map or step through the model by pressing the space bar, the program will automatically recontour. The first level is set to 1 sigma of the map and the second to 2 sigma, and so forth. When the map is FFTed the sigma was scaled to 50. In the

306

XtalViewTUTORIALS

FIG. 4.18 The M a p Type menu is being used to set tile map type just before the map is FFTed.

Messages window the scale was printed to allow calculation of the equivalent unscaled map values later. 8 Try sliding the contour level. You will notice that the contouring is in real time and interactive. Use the Styles menu button to quickly choose some preset contouring options. The default is the BlueMap shown in Fig. 4.19. Another useful one is DiffMap. The canvas has a gnomon in the upper left-hand corner that shows the direction of the Cartesian axes. At the bottom is a scale bar. At the lower right is a shameless advertisement. In the center is a white cross showing the 8If you use the xfit structure factor calculator, your data will be placed on an absolute scale. You can get electrons per cubic angstrom by dividing the contour level by the scale value. Most refinement programs scale F~ to fit F,,, which are usually on an arbitrary scale. Thus the scale for imported F~ data is also arbitrary. To get around this, either put the F,, data on an absolute scale, or use the Sfcalc/Calculate All and Scale options to recalculate the scale.

4.4 A TypicalManualFittingSessionwith Xfit

307

FIG. 4.19 The xfit Contouring Method window.

rotation center. All these features are shown in Fig. 4.20. To rotate the picture use the left mouse button. To recenter, use the middle mouse button. To bring up a shortcut menu, use the right mouse button. The middle mouse button changes function as you choose different fitting options. The default model color is yellow for carbons, with other atoms colored by atom type. You can change the coloring with the Color window. Nearer the top of the screen the cursor changes to a Z to indicate the z-rotation area where the rotation operations rotate in the plane of the screen and translations are in and out of the screen.

CombiningPhases At times it is desirable to put two maps together by combining the phases. For example, you may want to combine your MIR phases with the latest refined model phases. Since in general these maps can be very powerful

308

XtalViewTUTORIALS

FIG. 4.20 A blank xfit canvas. The gnomon in the upper left shows the direction of the axes, the white cross marks the center, and the ruler gives the scale in angstroms. Use the "+" key to change the cross to a larger cross for measuring.

for overcoming phase bias in the model phases on one hand and filtering noise in the MIR map on the other, the combined map is better than either alone. To do this use the Phase Combination w i n d o w (Fig. 4.21) found on the PhaseMod menu button on the main w i n d o w and in the Windows menu on the toolbar. Set the map numbers of Phase Set 1 and Phase Set 2 to the two phase sets you w a n t to combine and then choose a third number for the output phase set. You can alter the relative weights with the sliders. It is often useful to downweight the model phases to 5 0 % to allow more MIR information to come through. Press Combine Phases to combine the phases. The program will use era weighting to calculate figures of merit for phase sets with Fo and Fc and no figure of merit.

ImprovingPhases Phases can often be improved by solvent flattening and or histogram matching. To do this from within xfit, click on the P h a s e M o d / P h a s e Improvement menu item to bring up the Phase Improvement w i n d o w (Fig. 4.22). The

4.4 A Typical Manual Fitting Session with Xfit

309

FIG. 4.21 The xfit Phase Combination window is used for combining two phase sets.

FIG. 4.22 The xfit Phase Improvement window is used for solvent flattening and histogram matching to reduce map noise.

310

XtalViewTUTORIALS

first step is to enter the map numbers of the input and output phase sets. Then calculate a solvent mask with step 1. You can figure the percentage of the solvent using the Calculate... button. After it has calculated the mask, xfit will load the mask. If you are not centered near a boundary between the protein and the solvent, you may want to move the center of the viewpoint to look at the mask. The percentage of solvent can be adjusted if the mask seems too tight or loose around the protein density. Next, start the cycling to apply the mask and filter the phases. As each step proceeds, xfit will display the results. (Note: We are designing this window at the time of this writing: it may look somewhat different in your version of xfit.)

Saving Phases If you calculate the phases from within xfit and want to save them for the future or for another program, you can use the Map File menu button on the Files window. Click the As Number field (Fig. 4.23) to the number of the map to be saved, type a new filename in the Map File field, and then pull down the menu with the right mouse button and choose Save Phases (':.phs).

Some Additional Xfit Features The Shortcut Menu

The Shortcut or Canvas menu is reached by using the right mouse button. It comes up wherever the mouse is positioned in the Canvas window. This saves a lot of back-and-forth, back-and-forth mouse movement. The same menu can be reached from the toolbar as the Canvas menu. The items on the menu (moving down the columns from the left) are as follows: 9 Center at Centers at the last pick 9 Center middle mouse mode

FIG. 4.23

Menu used to load and save phases.

4.4 A TypicalManual FittingSessionwith Xfit

311

9 Translate middle mouse mode 9 Rotate middle mouse mode 9 Torsion middle mouse mode 9 C o n t o u r maps 9 A t o m Make last atom picked fittable 9 Atoms Make all atoms on stack fittable 9 Residue Make the last residue picked fittable 9 Residues Make all the residues on the stack fittable 9 G r o u p Make the group containing the last pick fittable 9 Molecule Make the entire molecule fittable 9 X + 90 Rotate plus 90 degrees about the x (horizontal) axis 9 X - 90 Rotate minus 90 degrees about the x (horizontal) axis 9 Y + 90 Rotate plus 90 degrees about the y (horizontal) axis 9 Y - 90 Rotate minus 90 degrees about the y (horizontal) axis 9 Z + 90 Rotate plus 90 degrees about the z (horizontal) axis 9 Z - 9 0 Rotate minus 90 degrees about the z (horizontal) axis 9 Toggle Stereo mode 9 Swap regular menu and expert menu (useful in hardware stereo mode) Figure 4.24 shows the Canvas menu in Xfit version 3.7. Turning the Map On and Off Use the Show w i n d o w to turn objects on and off. In Fig. 4.25 we have two maps that we can either alternate between or show at the same time by clicking on the entry.

Labeling the Residues Open the Label window, set it up as shown in Fig. 4.26, and click Label Every. This will facilitate identification during fitting. Going to the N-Terminus To get to the N-terminus click on the chain whose N-terminus you want to go to and then press the "[" key. The C-terminus is reached by using the "]" key. (You can use the Auto Fit menu if you forget the key.) Moving along the Chain To move to the next residue, hit the space b a r - - t h i s pops you to the next residue and the map will recontour (unless the Auto C o n t o u r on Scroll option is off on the Contour window). By hitting the space bar you can quickly inspect every residue in the model.

312

XtalViewTUTORIALS

FIG. 4.24 The Canvas menu can be accessed by right-clicking in the canvas.

Fitting a Residue In Fig. 4.27 we see that Val 3 is in the wrong conformer and doesn't match the density. We'll discuss manually moving the residue. This will seem complicated as it is described, but after you do it a few times it will be simple. Practice fitting a r e s i d u e - - i t is easy to undo any changes you make, even later after several other residues have been fit. First click on any atom in Val 3 to place it on the top of the stack. Then use the Residue option either on the Shortcut menu or on the main window to make the residue fittable. It will turn green to indicate that it is in fitting mode. N o w you can move it with the mouse. If you bring up the Shortcut menu with the right button, you can click on either Rotate or Translate. These connect the middle mouse button to the selected operation. The left mouse will continue to rotate the viewpoint. Select Translate and drag with the middle mouse button. The green residue will m o v e m t h e old, yellow ver-

4.4 A TypicalManual FittingSessionwith Xfit

313

FIG. 4.25 The xfit Show and Hide Objects window. Click on an object in the lists to toggle it on and off.

sion stays in place so that you k n o w where it was. Move the residue a lot and you will notice that the main chain bonds have become red to let you k n o w they are ridiculously long. Move it back and they will become green again. Move it a long way for now. N o w choose Rotate and then rotate the residue with the middle mouse again. Note that the residue rotates a r o u n d the atom originally picked. (If you need to move the viewpoint center, you can choose Center on the Shortcut menu to restore the middle mouse to its default mode.) To put the residue back at the starting position, choose Reset on the main window. Torsions

To torsion a bond, pick the atoms at the ends, choosing the end that will rotate second. Then choose torsion from the Shortcut menu. N o w as you drag the middle mouse horizontally, the bond will twist. For the chi angles of protein side chains there is a quicker w a y m j u s t type " 1 " on the keyboard to

FIG. 4.26 The Label Every nth Residue function on the Label window.

314

XtalViewTUTORIALS

FIG. 4.27 Fitting a residue. (A) The side chain of valine is out of the density. (B) The valine selected and ready to fit. Type "1" and use the middle mouse button to torsion the residue back into the density. (C) The side chain is in position.

set up chil, "2" to set up chi2, and so forth. Remember that if the input focus is not in the Canvas window, the keystroke will be sent to the window with input focus. By using the number keys to set up the chi angles and dragging with the mouse, you can quickly position a side chain. When you are finished fitting, use either Apply (to keep the changes) or Cancel (to return the residue to its starting position on the main window). The semicolon key ";" can be used to apply the fitting. This key, which means

4.4 A TypicalManual FittingSessionwith Xfit

315

"Stop everything," will also halt the Refinement option we will talk about later.

Save Save your model by opening the Files window, entering a name in the

Output PDB field, and pressing Save. Save often to avoid losing your work. You can also start xfit with the - a u t o flag to automatically save the file every time you do an operation that changes it.

Adding a Prosthetic Group or Ligand To add a prosthetic group or ligand, you first need the new residue type in a PDB file. You can go to the PDB Web site and download a PDB file with the prosthetic group you are looking for. N o w read this file in as a dictionary and append it to the already loaded dictionary. You do this on the File window. Type in the name of the PDB file, pull down the menu, and select Append (Fig. 4.28). Any new residue types in the PDB file will be added to the end of the dictionary. N o w when you go back to the Model window and right-click on the Type menu button, the new type should appear at the end of the list. To put the residue into its density, first center on the density. In the Model window select the insertion point for the ligand, and then issue the Insert After Selection command on the Insert Res menu button. The new group will appear in the center of the screen. You can fit this just like any amino acid.

Real-Space Refinement While the manual stuff is fun, after a while you just want the computer to do it. This is what the Refine window is for. All the functions shown in

FIG. 4.28 The xfit Dictionary functions on the File window. Any PDB file can be appended to the dictionary.

316

XtalViewTUTORIALS

FIG. 4.29

The option for real-space refinement in the Refine window.

Fig. 4.29 act on the fittable atoms. We can also move our valine into density by choosing Rotomer 9 search. This takes the current residue and systematically rotates through all of the chi angles and puts the side chain in the position with the best fit to the density. The R-density function options controls the trade-off between accuracy and time. The Atom Centers is the fastest mode but least accurate, since it looks only at the density at the atom centers. The other two functions calculate the density at each search position and compare this value to the map (after subtracting out the density for atoms that are fixed). In this mode the program looks at all the density in the contour box, so it's very slow if the contour box size is large. Usually the Atom Centers function is sufficient to get things close enough for the next fitting cycle. Geometry Refinement Earlier we discussed simple geometry refinements that are easy to do using the toolbar menus. If you wish, you can exercise a lot more control over the refinement with the controls on the Geometry Refinement window (Fig. 4.30). You can change weighting schemes and add restraints and constraints between atoms, or even to arbitrary points in space. In the middle of the Refine window are the geometry refinement options (Fig. 4.31A). To use, click on two ends of a fragment you want to refine. N o w click on the Set Up Refinement Residues menu button. The selected 9Yes, yes, it's spelled wrong. No, it's not worth sending an e-mail to report. I was thinking about the Roto-Rooter plumbing service when I edited this button. Rotamers have a few preferred chi positionsmusually 2 or 3. Xfit rotamers are calculated every 10 degrees about each movable chi value. Using only a few positions becomes too restrictive and assumes that the Ca and C~ atoms are placed perfectly.

4.4 A Typical Manual Fitting Sessionwith Xfit

317

FIG. 4.30 Besides the quick refine functions on the Auto Fit menu, you can set many Refine functions using the Geometry Refinement controls.

residue r a n g e will t u r n cyan. Clicking on Refine will p e r f o r m a cycle of geo m e t r y refinement. W h e n finished click on C l e a r r e f i n e m e n t residues.

Constraints Menu If y o u w a n t to get an a t o m to a specific p o i n t in space, y o u can center that p o i n t on the white cross in the center of the screen click on the a t o m , a n d then issue C o n s t r a i n a t o m o n s t a c k to screen center (Fig. 4.31B). N o w , as y o u refine, the a t o m will be pulled over a n d held at t h a t spot. You can use

FIG. 4.31 The menus on the (A) Refine and (B) Constraints windows.

318

XtalViewTUTORIALS

Constrain C-alpha atoms to prevent the chain from drifting from its present position. This option is especially needed if you going to try to flip the peptide planes or move side chains large distances, which will severely strain the geometry for a time. Restrain Ends is useful for preventing problems with the bonds to the rest of the model. The constraints can be weighted by using the Sigma field. The smaller this number, the more severely constrained the distance will be. The units are angstroms. If the difference between the expected distance and the actual distance is below the sigma, the weight will be lowered accordingly, and if the difference is above sigma the weight will be increased. The actual formula is:

( i d e a l - actual) 2 W

z 0-2

The constraints you add persist between refinements. To clear them, you need to use Clear All Constraints or Clear constraints on atom on stack.

Fitting While Refining Fit while refining is one of the most powerful modes to fit in: it combines fitting with real-space and geometry refinement simultaneously. It allows one to drag the model around as if it were a mechanical model built of sticks and ball joints. Set up a range of residues (it can be a single residue by clicking the same residue twice) and then also choose the options Always, Make Atom Draggable When Picked, and Restrain Ends on the Constraints menu. The fragment will start geometry refining in the background. Click on an atom and wait about a second for it to turn green. You can now use the middle mouse to drag the atom. As you do, the geometry will refine, allowing you to drag the rest of the side chain around with the atom. You can also turn on the Refine Against Map option to perform real-space refinement at the same time. This is especially effective for atomic resolution maps. Once you get the atoms close to their density, they will be sucked into the density. Being a real-space technique, this is nicely complementary to the reciprocal space refinement of SHELXL.

Viewing Thermal Parameters A lot of information can be gleaned from the thermal parameters of the model. You can use the new options on the View window to view either isotropic B's or anisotropic U's as 50% probability surfaces. Click on a range

4.4 A TypicalManual FittingSessionwith Xfit

319

FIG. 4.32 Controlson the View window that are used to Make Vu objects.

of residues and choose the Pop button next to First Res and Last Res (or type in the residue numbers) to set up a range (see Fig. 4.32). (You can do the whole model, but it may be slow.) Then choose Thermal Ellipses and click on Make Vu. The thermal ellipsoids of a n Fe4S4 center are shown in Plate xx.

Control Window A number of miscellaneous options have ended up on the Control window (Fig. 4.33). On slower machines the dynamic sliders may be too slow. In this case switch Dynamic Sliders off. Also useful for model building is the Draw Contacts option, which will show all contacts within the distance specified. These contacts are dynamic and update as you fit residues.

PostScript Plots The contents of the screen can be sent to a PostScript printer from the Postscript Plot window (Fig. 4.34). The PostScript can also be saved into a file. To make a stereo plot, first turn on side-by-side stereo from the View window. Most of the figures in this book were made this way.

Raster 3D With the help of Dr. Ethan Merritt at the University of Washington, we have developed a Raster3D interface (see Fig. 4.35). It will be necessary to

320

XtalView TUTORIALS

FIG. 4.33 The Control window has a number of useful controls. Note especially the Model Pick function, which lets the user pick nonactive models or the active model only.

obtain R a s t e r 3 D from Dr. Merritt via his Web page (see note 4, Chapter 3). With this option you can m a k e pictures c o m p o s e d of lighted cylinders and spheres of cover quality.

Model Window There are a n u m b e r of options on the Model w i n d o w (Fig. 4.36), but the one that is especially i m p o r t a n t to point out to SHELX users is the Split Residue menu item on the Insert Res m e n u button. To use this, first set up a

4.4 A TypicalManualFittingSessionwith Xfit

321

FIG. 4.34 The xfit Postscript Plot window.

residue to be fit (click on it and choose Residue on the Shortcut menu), and then activate the torsion you want to split the side chain about. N o w choose Split Residue, and the residue will be divided about the torsion and the two halves labeled A and B and given occupancies of 0.5. One half will become active and you can move it into the density with the middle mouse. The other menus available on the Model window are shown in Fig. 4.3 7.

322

XtalViewTUTORIALS

FIG.4.35 The xfit Raster 3D interface.

SfCalc Window The SfCalc window (Fig. 4.38) is used to calculate structure factors from the model (i.e., Fc, phi). At resolutions lower than 1.5 and with isotropic B models, this window can be used to calculate the structure factors for a model. For other refinement programs we usually keep an empty phase file

4.4 A TypicalManualFittingSessionwith Xfit

323

FIG. 4.36 The xfit Active Model window is used to replace, add, name, and delete residues.

around, calculate the structure factors from the model, and then FFT the map. This procedure has the advantage of producing the structure factors that are correctly scaled permitting the omit map and update options to be used. To make an omit map, you need to do Calculate All and Scale once after the phase data have been read in. You may want to turn the Shake option on, as this can help substantially in reducing phase bias. This option works by adding a small number to each atom coordinate (_+ 1/6 dmin) before the FFT (but does not change the model). The effect of this is to jiggle atoms that have been moved a small amount, to satisfy the error residual of wrong piece of structure. To make an omit map, choose one or more residue and make it or them current (green). N o w choose Omit Current Atoms. The partial structure factors for the current atoms are calculated and then subtracted from the structure factors of the whole model, and the omit map is FFTed and contoured. At higher resolutions (> 1.5), and especially if the B's are anisotropic, the program cannot do an accurate structure factor calculation because it

324

XtalViewTUTORIALS

FIG. 4.38 The xfit SFCalc window is used to calculate structure factors from the model and can be used for making omit maps. Use the Shake option to reduce phase bias.

4.4 A TypicalManual FittingSessionwith Xfit

325

uses an isotropic reverse FFT algorithm. However, if you are at the point of making your B's anisotropic and are still wondering where side chains are located, you may want to rethink your fitting strategy!

Finding Geometry Errors The Error w i n d o w (Fig. 4.39) is used to find geometry errors in the model. Bring up the Error w i n d o w and click on Analyze Active Model. The geometry is analyzed and the results put into the error list so you can click through it. Also, the phi-psi plot is popped up. Residues with bad phi-psi's are marked on the phi-psi plot (and put in the error list). As you click on each error or click on Go To Next Error, the model is moved to center the residue with the error. When you click on Fit Error Res the residue is activated so you can repair it. Use Delete Error Res to send the residue to oblivion.

Editing Waters Two shortcut keys have been added specifically to make editing the water list faster, thus encouraging users to actually look at the waters. Note

FIG. 4.39 The Geometry Errors window is used to analyze the geometry of the model. Click on an error to go to it.

326

XtalViewTUTORIALS

that in the majority of cases editing the water list and removing bad waters will cause your R-free to drop even if there is a very small increase or no change in the overall R-factor. A bad water is one in which there is no density, the density is not shaped like a water, or the B-value is higher than, say, 80.0. For high-B-value waters with sensible density, you can try halving the occupancy. 9 Shift + W. Adds a water where the cursor is and at the center of the slab. The new water is in fitting mode, and its position can be adjusted using the middle mouse button. To quickly position the water, leave the Refine window up in a corner and, after you have added the water, click on the Translation button to real-space-refine the water into the center of the peak. Use a thin slab (5 A) to get the out-of-screen direction close enough for the translation to pull it in. Use the semicolon key to end the water fitting, or go on to the next one and use the semicolon key at the end of the water-adding sequence. 9 Shift + D. Deletes the last picked water/residue. Click on a water and then issue this command. If it's a water, the program deletes it. If it's a residue, the program asks for confirmation of the delete. You can undelete a water by going to the Model window, finding the deleted water, and deleting it again to toggle the DEL flag. 9 Automated water addition. First make a difference map. The resolution should be better than 2.3 A, and the map should show clear water peaks. Bring up the Waters... popup (Fig. 4.40) and set the minimum density to be the lowest you will accept for a water. Press Add Water and wait. The new waters are added to the Error p o p u p - - n o t because they are errors, but this gives you a quick way to navigate through the list of added waters to see if you agree with them. Be sure to save the model.

.....

4.5 . . . . .

INTERFACING TO OTHER PROGRAMS This section gives specific information for interfacing XtalView to popular software. Molscript. The xfit script commands Rotation and Translation can be used to set the viewpoint in Molscript. The translation is the inverse of Molscript's, but other than this simple change in sign, the commands can be cut and pasted into a Molscript control file. When a good view is found in xfit, use View/Make Script/Save and edit the resulting file to get the rotation and translation commands. Paste these into the Molscript file, reverse the translation, and run Molscript.

4.5 Interfacingto Other Programs

327

FIG. 4.40 Add water molecules with this window. Waters can also be renamed to match the nearest atom in another structure.

SHELXL. Using SHELX with XtalView is straightforward. Xprepfin writes out an h kl file and can be used to convert from various input formats. You can then m a r k this file for R-free tests using SHELXPRO. For the first refinement cycle use SHELXPRO with the I option to create the .ins file. After refinement read this and the .fcf file directly into xfit. After fitting, use SHELXPRO to update the .res file with the fit .pdb file to make the next .ins file. SHELXPRO has an XtalView c o m m a n d (X) to write a .phs file, but this is not needed with versions of XtalView after 3.1. Xfit reads and understands the ANISOU cards and thus keeps the anisotropic U's correct between cycles. D E N Z O . Xprepfin can prepare input from the scalepack output .SCA file by choosing Other as the input file type and D E N Z O I's from the Other menu. CCP4. The preferred way to use XtalView with CCP4 map files is to use the phase files as input and not m a p files. This will be faster, as well as saving disk space and allowing you to make omit maps on the fly. Starting with XtalView 4.0, xfit can read the structure factors directly from .mtz files as long as you use the standard CCP4 labels for FO, F C A L C and PHIB.

328

XtalViewTUTORIALS

However, there is a CCP4 map converter available from CCMS that was kindly provided by John Irwin. Send e-mail to c c m s - h e l p @ s d s c . e d u and ask. You can also convert CCP4 map files by FFTing them into structure factors and reading this into xfit. You can convert .mtz files with xprepfin or by means of the CCP4 program mtz2various. Several XtalView programs can write .mtz files, including xprepfin, xheavy, and xfit. PHASES. Xheavy writes an input file for PHASES that will get you most of the way to running PHASES. Look at the menu on the menu button for saving a phase file. In this way you can use xheavy for the heavy-atom location and refinement and then switch to PHASES. There is not much difference in the actual phases produced. XPLOR. Xprepfin can be used to generate an input file for XPLOR Fo~s data. If your native data has Bijvoet pairs, be sure to use the Average F1 and F2 option in xprepfin. You can switch the segment ID with the chain ID in xfit by using the options on the Files... window. Since the chain ID field in a PDB file is only one character, the first character in the seg ID is used. This allows one to get around the fact that XPLOR loses the chain ID.

XPLOR,TNT, PROLSQ,and OtherRefinementPrograms To make maps, first prepare your native data into an empty phase file with xprepfin by reading it in and setting the Fake Phs output option. This creates a file with 0.0 for the phase. N o w run xfit, loading the empty phase file and your latest refined PDB file. The FFT window will pop up, but don't hit the Apply button. Instead just set the map type you want (e.g., 2 m F o - DF~); then go to the SfCalc window and choose the Calculate All and Scale button. Xfit will calculate F~ and the phase, scale F,, to F~ (to put on an absolute scale), and then FFT your density. If you want to look at an Fo - F~ map as a second map, just reload the phases with the File window and repeat the procedure, setting the FFT type to F o - Fc.

5 PROTEIN CRYSTALLOGRAPHY COOKBOOK

.....

5.1 . . . . .

MULTIPLE I S O M O R P H O U S R E P L A C E M E N T

Multiple isomorphous replacement (MIR) is the oldest method of phasing proteins and is still very successful. The basic method has changed little since myoglobin was solved, although the detailed implementation has. The basic cycle of MIR phasing is diagrammed in Fig. 5.1. 1. Soak the crystals in heavy-atom solutions to scan for possible derivatives. Crystals that survive the soaking are tested to see if they still diffract. Those that do are scanned to see if they produce any changes in X-ray intensity. 2. If intensity changes are observed, a data set is collected. The first data should be collected quickly. If possible, a nearly complete data set to at least 5 A should be collected within 24 h or even faster. Many heavy-atom derivatives are unstable in the X-ray beam, and either the crystals quickly degrade or they change with time. Often the best data are from the first run, and even though the crystal still diffracts strongly, later data are found to have significantly lower phasing power. Continue collecting data if the crystal has not degraded. Frozen crystals are more stable (see Chapter 6), but freezing can also change the unit cell leading to significant nonisomorphism. 329

330

PROTEINCRYSTALLOGRAPHYCOOKBOOK l Soak

Crystal in Heavy .

.

.

.

.

.

.

.

.

.

.

.

Atom

Collect partial data

Solution 1 !

set

Evaluate differences: above 10%?~ . ~ Toss Finish Data

collection

!

Merge and Scale with native

J Evaluate s!atistics: Isomorphous?~Tossl--P If Phases available

Cross Fourier or else Make Patterson

I PattersonSolvable? ~ J Refine and Compute

Save ~

SIR phases I

Cross Fourier other derivatives to solve and put on same origin Refine solutions and Compute MIR phases

Map interpretable?

I No I

FIG. 5.1 Heavy-atom phasing scheme.

3. Evaluate the heavy-atom statistics. By looking at the statistics of intensity changes it is possible to tell whether the crystal is likely to be a good derivative. The overall percentage difference should be above 10-12 %. While you may solve and refine a derivative with weaker changes, it will have little phasing power; try resoaking to see if you can raise the percentage difference with longer soaks and/or higher soaking concentrations. The centric data should have larger differences than the acentric zones. The root-meansquare magnitude of the differences should fall off with resolution in roughly the same proportion as the scattering factor for the heavy atom (including the temperature factor). If the differences do not fall off, this indicates noise or nonisomorphism. 4. Solve for the positions of heavy atoms. It is necessary to solve at

5.1 MultipleIsomorphousReplacement

331

least one derivative's Patterson map. This may not be the Patterson map of the first derivative you find. After solving one Patterson map, you can solve the others by cross-phasing with the single isomorphous replacement (SIR) phases of the first. This also puts all the heavy atoms on the same origin. 5. Refine each heavy-atom solution. Look for additional sites. Be conservative about adding sites at this point. 6. With two or more derivatives you can co-refine the derivatives to improve the phases (if the derivatives share common sites this should be done cautiously). This gives you the first set of protein phases. 7. Make difference Fourier transforms of each derivative, preferably leaving this derivative out of the protein phase calculation to reduce bias. Look for new heavy-atom sites and confirm old ones. 8. Refine the updated solutions and co-refine to produce better protein phases. 9. Reiterate if necessary. 10. Calculate the protein electron density map and evaluate its quality. If the map is difficult to interpret, go back to step I and look for more derivatives or work to improve the ones that you already have. Be objectivemyou will not divine the structure from poor protein phases without considerable luck.

Example 1"Patterson from Endonuclease III Escherichia coli endonuclease III (Table 5.1) was solved at the Scripps Research Institute by MIR techniques. 1 The solution of a single-site derivative Patterson map is presented here. Endonuclease III crystals were soaked in thiomersal, an organomercurial, at I mM, for 2 days. Data were collected on an area detector and then merged with the native data. The isomorphous difference Patterson is shown in Fig. 5.2. The symmetry of the Patterson is m m m , orthogonal mirrors at 0 and 1/2 in all three directions. In space group P212121, there are three Harker planes arising from the three 21 screw axes (Table 5.1) at x = 1/2, y = 1/2, and z = 1/2. A heavy-atom site will give rise to three unique self-vectors on these Harker sections (plus all the peaks related by Patterson symmetry). Note in Table 5.1 that if there is a vector at 2x on one section it will be at 1/2 - 2x on the other. Thus, if we line up the Harker sections so that the common axes run in opposite directions and 0.0 is opposite 0.5, then the self-vectors on the two sections will line up (Fig. 5.2). On the Harker sections for this derivative there are just three

1Kuo, C. F, McRee, D. E., Fisher, C. L., O'Handley, S. F., Cunningham, R. P., and Tainer, J. A. (1992). Science 258,434-440.

332

PROTEIN CRYSTALLOGRAPHY COOKBOOK 0.0000

Y

0.5000

._~~ ~

y

~~-

Y=

L:H?. :

O. 5 0 0 0

~

i P

II

<-

/

d'?

d I

1 "0.0000

I

,, X

0.5000

t

? I/ilI__ll tk ,~J kkkk~~A

000s

f-'l~ J A

r

1 1

J ~ t~l~

fl

0000"0

FIG. 5.2 Thiomersal derivative Patterson from endonuclease III. The three Harker sections of an isomorphous derivative Patterson map at 5.0-A resolution are shown for a thiomersal derivative of E. coli endonuclease III. The positions of three Harker vectors are marked with " X . " The dashed lines illustrate how these peaks line up when the sections are aligned as shown. Notice that the peak on z = 0.5 overlaps onto the section x = 0.5. This Patterson shows no significant features on the non-Harker sections.

unique large peaks, and they line up as shown by the dashed lines. The peak on z - 1/2 overlaps onto x -- 1/2, making it appear as though there are two peaks on x - 1/2, but only one belongs solely to this section.

5.1 Multiple Isom0rphous Replacement

333

TABLE 5.1 Endonuclease III Facts Protein: E. coli endonuclease III Unit cell: 48.5, 65.8, 86.8, 90.0, 90.0, 90.0 Space group: P212121 Patterson symmetry: m m m Harker planes: 1/2, 1/2 + 2 y, -+2z _+2x, 1/2, 1/2 _+ 2z 1/2 _+ 2x, _+2y, 1/2

This Patterson can be explained by a single major site. The peak at (1/2, 0.25, 0.05) is solved as follows 2" 1/2 - 2y = v y

m

Y= 2y--

1/2 - v

0.5 - 0.25 - 0.125 2 w

w Y-2-=

0.05 - 0.025 2

This gives us (--, 0.125, 0.025) from this section. From the Harker peak at (0.48, 0.25, 1/2) we can solve (0.01, 0 . 1 2 5 , - - ) . Combining these gives us a heavy-atom site at (0.01, 0.125, 0.025). The y = 1/2 section can be used to confirm the other two. From the m m m symmetry of the Patterson, it follows that peaks occurring on the Harker sections lie on a mirror plane and will be doubly weighted, being the superposition of the peak and its mirror image. Peaks that occur on the edges are at the intersection of two mirror planes and will be weighted four times. Peaks in the corner are at the intersection of three mirrors and will be weighted eight times. This explains why the peak on 2Throughout this book the coordinates x, y, z indicate the three directions in a Patterson. Traditionally the indices u, v, w have been used to indicate the three axes, to distinguish them from their real-space counterparts, just as h, k, I are used for reciprocal space. I have used x, y, z mainly because the software makes no distinction between Patterson maps and other kinds. However, for the example of how to solve these equations it is useful to use the notation u, v, w for the Patterson coordinates, and x, y, z for real space to make the equations clearer.

334

PROTEINCRYSTALLOGRAPHYCOOKBOOK

x = 1/2 is about half the height of the peaks on the other two sections, which both occur at the intersection of two planes and are weighted four times versus two times. A single site thus accounts for most of this Patterson map, and the other features may be due to minor sites or perhaps just noise from the approximations used to derive the Patterson coefficients. In any case, this single-site solution was good enough to cross-phase other derivatives for this protein. Minor sites can be picked up more easily and accurately later in the analysis, when other derivatives can be used to produce rough protein phases that can be used in a difference Fourier to give the positions of these minor sites. However, this particular derivative proved to be a true single-site derivative. In the solution of endonuclease III, this was the only Patterson that needed to be solved. All subsequent steps were done using difference Fouriers starting with the SIR phases of this derivative.

Example 2: Single-SitePatterson from Ph0toactive Yellow Protein Photoactive yellow protein is a 15-kDa protein that crystallizes in hexagonal space group P6~ from ammonium sulfate with cell constants of a = b = 66.9, c = 40.8, 90, 90, 120. A PtCI4 derivative can be made by soaking the crystals 7 h in a 0.1 mM solution of K2PtCI4 followed by back-soaking the crystals in artificial mother liquor without any heavy-atom reagent. If the crystals are soaked longer, they begin to deteriorate. The data for the derivative were collected on an area detector. The data were merged and scaled to the native data using the anisotropic scaling method. A Patterson map was calculated using the coefficients (Fptc:14 - Fnative)2 at 5.0-)k resolution and throwing out large differences where ]F~ - F2]/[(F~ + F2)/2] is greater than 1.0. A Patterson in space group P6~ has two unique Harker sections at z = 0 and z = 1/2, which are shown in Fig. 5.3. The section z - 1/2 is especially useful to look at because peaks on this section occur at x, y, 1/2 and 2x, 2y, 1/2 (Table 5.2). The peak at x, y, 1/2 is doubly weighted over the peak at 2x, 2y because it occurs twice as often in the vector table. This Patterson can be solved by simple inspection. The largest peak on the section z = 0.5 is at (0.25, 0.08, 0.5), and a corresponding smaller peak is at (0.5, 0.16, 0.5). We can, therefore, assign a heavy-atom position at (0.25, 0.08, 0.0). In this space group z is arbitrary because there is no symmetry element perpendicular to z to fix the origin, so z is arbitrarily assigned to 0.0. If a second derivative is found, the relative z will need to be found by other means, usually by crossFouriers. We can confirm our solution by looking at the section z = 0.0. The peak here arises from the vector x + y, 2y - x, and a cross labeled A - A shows the calculated position where our site will fall on this section. Most of the other peaks on the sections are related by the 6-fold to the peaks we have already used. However, an astute observer will notice that there are still some peaks of heights two contour levels and less unaccounted for. No other site

335

5.1 MultipleIsomorphous Replacement A

-0.500-0.500 .

.

.

@

.x

.

O

o.soo.

.

O

O

0.500 Section Z = 0.0

B

0 500 -0.500

- f

x

~'~----J/

"-'/

0.500

"-"

/~'~J

J

0.500 Section Z = 0.500 FIG. 5.3 Photoactive yellow protein PtC14 derivative Patterson map. Two sections are shown from the Patterson map of the PtC14 derivative of photoactive yellow protein, space group P 63, a single 6a-screw axis parallel to z: (A) Section z = 0.0 and (B) section z = 0.5. On each section the unique vectors are marked A - A . The rest of the peaks are generated by 6-fold symmetry about the z axis.

P6,

x , y, 2

o,o,o

x , y, z

-y, x y

-

-

y, z

x , -x,

?

- x , -y, 112 + z

x , 112 + 2

y, y

-

x

v. x , 112 + .:

-

x

-x

+ )', 2)' - x , 0 )!, )' + x , 0

2x

-

-y, x

-

y,

y, x

-

2y, 0

-

0, 0,0 x

-

2y, 2x

-

y

7

y, 0

y

-

-

x , -x, z

2x, - x -y, 0

- x , -y. 112 + z -2x, -2y, 112

2y-x,y-2x,0

y-x,-x,112

0, 0,0

-y, x

-

y, 112

o,o, 0

2x,2y,-1/2

x - y , x , - 112

y, y

x-y,x,-1/2

-2y,2x-2y,-1/2

-x,-y,-1/2

- x - y, x

- x , -y, -112

2y

y

)I,)'

-

x, -112

-

-

x , -112

2x, -2x, -112

-

2x, - x

-

y

x , - x , 112

2y, 0

-

y, 0

-

y, x , 112 + 2

x

-

-

y , x - y. 112

2y,2y-2x,1/2

x,y,112

x , y, 112

2x

-

2y, 2x, 112

2x

-

y, x

y -

x , 112 + 2

y, y

+ x , 2y - x , 0

0, 0,o 2y

-

x, y

x -

2x, 0

-

+ y, 0

2y, 2x

0,0, 0

-

y, 0

5.1 MultipleIsomorphousReplacement

337

could be found to account for these peaks, and they probably arise from one of three considerations: 1. The synthesis was artificially terminated at 5.0 A. In this small cell there are not very many reflections, and we may have series termination errors. To test this, the Patterson can be made at a higher resolution to see if it becomes less noisy. 2. An approximation to the heavy-atom vector was used and not the true heavy-atom vector, which invariably causes some noise in the Patterson. 3. Inaccuracies in the data sets, scaling errors, and nonisomorphism also contribute.

Example 3: Tw0-Site Pattersonfrom Ph0toactiveYellow Protein In another heavy-atom trial, a photoactive yellow protein crystal was soaked for several days in 50 mM GdSO4 and then in 0.1 mM K2PtC|4 for 7 h to produce a double derivative. It was already known that both GdSO4 and K2PtCI4 produce single-site derivatives, so a double-site derivative was expected. After collecting the data and merging with the native (Fig. 5.4), we produced the Patterson in Fig. 5.5. On the Harker section z = 0.5 there are the vectors for both the platinum (Pt) and gadolinium (Gd) sites. By chance, on the z = 0 Harker section both sites overlap. As was shown in Example 2, we can solve the peaks on the z = 0.5 by noting that the peaks fall in pairs at (x, y, m ) and (2x, 2y, m). The position of the Pt peak is already known as (0.25, 0.08, 0.0), so the other two pairs of peaks are from the Gd site. We can assign the Gd site as (0.50, 0.10, ?) since we cannot assign both sites at z = 0.0. Also, we do not know the relative origin of the second site. A justas-valid single-site solution for the Gd site would be at the Patterson symmetry-related position (y, x, z) or (0.10, 0.50, ?). To solve this dilemma, we must find a cross-peak between the two sites and use it to define the relative z and the origin. Since we found the Pt first and its peak is larger, we will arbitrarily decide that this is our origin and that it is at z = 0.0. That is the end of our arbitrary c h o i c e s ~ n o w we must find the corresponding Gd origin and z (actually we still have a hand choice by placing Gd at +z or - z ) . Scanning through the sections, we find peaks on both sections z = 0.133 and z = 0.367. We note that 0.367 is 0.5 - 0.133. This is the expected pattern for cross peaks in P 6 3 ~ f o r every cross-peak on z we know that there is one on 1.0 - z, 0.5 - z, and 0.5 + z. This can be worked out from algebraic manipulations of the symmetry operators, or we can try a few hypothetical test cases with xpatpred and then look at the patterns of the predictions. Taking the first likely cross-peak on the Patterson, we find a broad peak at (0.25, 0.03, 0.133) (see Fig. 5.5). We assume this is a vector Gd - Pt, so to generate the Gd coordinates from this site we add Pt + cross-peak or (0.24,

338

PROTEINCRYSTALLOGRAPHYCOOKBOOK

79.6

63.6

4. . . . . . . . . . . . . . . . . . . . . . . . . . 4,,-~-

...............

-7~-,"

,

T

,,"

47.7 ......

............

J

.....

\

IFI-F21

31.8

159 0

-

25.8

4.47

3.55

3.11

2.82

2.62

o

Resolution (Angstroms) All 2989 reflections Centric 271 reflections Acentric 2718 reflections

FIG. 5.4 Photoactive yellow protein double-derivative statistics. Graphs show the statistics for a double soak of K2PtCI4 and GdSO4 merged with native data. (A)]Fp - F~,~I]versus resolution. If the derivative is isomorphous, this graph should steadily decline with increasing resolution. The derivative seems to be isomorphous to about 3.2 A. Also, the centric differences should be larger than the acentric differences. (B) R-factor versus resolution. This gives the degree of substitution, which is fairly low for this derivative, being only about 0.13 for the average reflection.

0.08, 0.0) + (0.25, 0.03, 0.133) to get (0.49, 0.11,0.133). The H a r k e r vector for this site would be (0.49, 0.11, 0.5), and we see a nice H a r k e r peak on the z - 0.5 section at this point, marked Gd-Gd in Fig. 5.5, that partially overlaps with the Pt vector. This site is then put into xpatpred, and the Harker vectors and cross-peaks all fall in a pattern on or close to a peak in the Patterson map, so we conclude that this is a valid solution. A little fiddling with the coordinates to center the H a r k e r vectors for the Gd site gives a better match at (0.505, 0.105, 0.133). We now have two sites: Pt at (0.24, 0.08, 0.0) and Gd at (0.50, 0.10, 0.133), which should be confirmed. One way to confirm them is with a cross-Fourier if we have another derivative. However, in this case, since the derivative already contains the Pt site, a cross-Fourier with the Pt site will not be very informative because ghost peaks from the Pt will also give us a peak if the solution is incorrect. In this case, we went on and made a 5.0-A protein map that looked quite good, and so we continued

339

5.1 Multiple Isom0rphous Replacement

B 0.276

i

t 0.221 i

I

0.166 IF1-F2t/FI 0.111

i 0.0553

C

,| 4.47

25.8

H

3.55 Resolution (~,ngstroms)

3.11

2.82

2.62

All 2989 reflections Centric 271 reflections Acentric 2718 reflections

FIG.5.4 (continued).

the solution with confidence. The structure was subsequently solved and has been published. 3

Example 4' Complete Solution of Chromatiumvinosum Cytochrome c' The protein cytochrome c' for C. vinosum was solved by Zhong Ren, Susan Redford, and myself using four derivatives in the final phasing out of seven potential ones. The entire screening, data collection, and phasing were mostly done in 2 months of effort using a Bruker area detector to scan the derivatives. Crystal Growth Several forms of crystals were grown. The first crystals out of ammonium sulfate proved to be useless, as they were always multiples. Two crystal forms grew out of PEG, sometimes in the same drop: a monoclinic form with two dimers in the asymmetric unit and an orthorhombic form with a single 3 McRee, D. E., Tainer, J. A., Meyer, T. E., van Beeumen, J., Cusanovich, M. A., and Getzoff, E. D. (1989). Proc. Natl. Acad. Sci. U.S.A. 86, 6533-6537.

340

PROTEIN CRYSTALLOGRAPHYCOOKBOOK 0.000 9 . 0 1 7 ,

,

X,

.

.

.

.

0.667s

d

0.667

~

.. Z.-- 0.000 .017 ,

7"

,

X,

9

"

z_4

'

.

,

0.667,

Gd-Pt

, Gd-! Gd-Pt

0.667./'

-/Z,, Z=0.133

FIG. 5.5 Four sections of a photoactive yellow protein double-derivative Patterson map. The crosses indicate the positions of the heavy-atom vectors. A Pt self-vector is labeled Pt-Pt, a Gd self-vector is Gd-Gd, and a cross-vector between the Pt and Gd site is labeled Pt-Gd.

dimer in the asymmetric unit. 4 It was possible to tell them apart by e x a mi ni ng the angles between crystal faces. The o r t h o r h o m b i c form had t w o edges that met at an angle of 90 ~ while in the m o n o c l i n i c form there were no edges 4McRee, D. E., Redford, S. M., Meyer, T. E., and Cusanovich, M. A. (1990). J. Biol. Chem. 265, 5364-5365.

5.1 MultipleIsomorphousReplacement

341

G

0.667 Z = 0.367

o.ooo,O.O17. . . . . .

o0

9

0.667 z = 0.500

FIG. 5.5 (continued). exactly 90 ~ apart. By macroseeding, we could grow large crystals of the orthorhombic form reproducibly.

Heavy-Atom Soaks An artificial mother liquor with a higher PEG concentration was necessary for soaking, or the crystals cracked and dissolved. If the crystals were transferred to a solution at growth conditions, they dissolved, presumably because of the lowered protein concentration. The crystals were stable

342

PROTEINCRYSTALLOGRAPHYCOOKBOOK

in a PEG concentration 2% higher than that used for growth. Once it was established that crystals were stable for long periods of time, they were soaked in I m M solutions of our favorite heavy-atom compounds. Each drop had two or three crystals, the idea being that they would be mounted and scanned at different time intervals. The first crystal was usually mounted after 24 hours and the rest followed an uncertain schedule depending on the progress of the other scans. The basic scanning strategy used was as follows. The crystal was mounted on the area detector and aligned so that one axis was roughly parallel to the incident beam. Data were collected such that 100 ~ of data could be scanned in about 12 h. The swing angle was set to collect 2.7-A data, a decision that was later regretted. Higher-resolution data could have been easily collected with a larger swing angle. After about 50 frames had been collected, they were integrated and scaled to the native data that had been collected earlier (Table 5.3). If the Rmcrge of the shell infinity to 5 A was below 0.10, the crystal was removed and another put on (the usual case); otherwise a full data set was collected. Of the full data sets, it was often found that the Rmerge decreased with more data, but the R merge never increased with a fuller data set. Therefore, we did not make the mistake of removing a potential derivative by scaling a partial data set, but some beam time might have been wasted on poor derivatives. Patterson Maps As each data set was integrated, it was merged and scaled to the native data. A Patterson map was made for each one at 100- to 5-A resolution. The first Patterson maps all looked the same and were not interpretable. They had large diffuse peaks. By listing and sorting the differences, we discovered

TABLE5.3 Summary of Native Diffraction for C. vinosum Cytochrome c' Reflections Resolution (A) 2.9

No. possible 6498

No. collected 5750

Percent completed No. observations 88.49

24,205

let(I)

22.6

2.3

6291

6290

99.98

28,117

9.8

2.0

6214

6165

99.21

24,069

4.3

1.8

6162

5931

96.25

21,758

1.5

1.7

6140

5622

91.56

18,547

0.8

1.6

6158

3786

61.48

9,645

0.5

5.1 MultipleIsomorph0usReplacement

343

that the largest differences were due to very-low-resolution reflections that had been clipped'to varying degrees by the beam stop. These were eliminated in future Patterson maps by applying a filter cutoff so that any reflection with a difference greater than 100% of the average of the two observations was deleted. These Patterson maps were now interpretable. The first Patterson map solved, that of gold cyanide, had two major sites. This entire Patterson map sectioned in z and the three Harker sections are shown in Fig. 5.6. The solution was slightly tricky--there are two major heavy-atom sites, but both share the same Harker peaks. (In P212121 there are three Harker sections x = 0.5, y = 0.5, and z = 0.5, see Table 5.1.) The clue is that there are large cross-peaks on non-Harker sections, and we also knew that the protein was a dimer and, therefore, there were probably at least two sites. It is possible, if a heavy site is on the dimer axis, for there to be a single site (this was, indeed, the case for the major site in the iridium chloride, IrC16, derivative). A single site was first assumed, and the Harker vectors were solved to produce the site (0.125, 0.125, 0.125). Now if there is a second site and it shares the same Harker peaks, it could be thought of as being an origin shift relative to the first. In this space group the origin can be either at 0.0 or 0.5 in all three directions and produce the same Patterson. In addition, the sign of a site can be changed, and it will have no effect on the Harker vectors. However, if there are two sites, then doing these operations, adding 0.5, and changing the sign will have an effect on the cross-peaks between the two sites. The pattern of the cross-peaks will change with each combination of origins. The solution is entered into xpatpred and displayed on the Patterson map by writing out the predictions and reading them into xcontur as a labels file. Xpatpred predicts the Patterson vectors, given a possible solution and the space-group operators. Alternate origins can be chosen for each site. By changing the origin on the second site and displaying the resulting vectors, one can determine the correct position of the second site relative to the first when the vectors best match the pattern of cross-peaks. When the solution is satisfactory, it is saved in a solution file.

Initial Phasing The output of xpatpred feeds directly into xheavy. The name of the data set containing the merged heavy-atom data is added, using the edit-solution window (Fig. 5.7). Now the solution can be refined against the differences. A two-site model was used that refined quickly to an Rc of 0.55. A set of SIR phases was calculated and used to make difference Fouriers of the other derivatives. At this time, everything is still done at 5-A resolution. This is a good resolution for heavy-atom searches: higher resolutions often add more noise, and the contrast is lowered, making the search more difficult. At 5 A there are enough reflections to overdetermine the x, y, z positions of a

344

PROTEINCRYSTALLOGRAPHYCOOKBOOK

few heavy-atom sites; too low a resolution will not have enough reflections to determine the heavy-atom sites accurately or to allow them to be refined. Experience has shown that usually 5 a is a good compromise, and we solve all the derivatives at this resolution first and then raise the resolution later to calculate a protein map.

Search for Minor Sites Even though the SIR phases are very noisy, they produced interpretable heavy-atom difference Fourier maps (]Fp - FpH[, as~R). There are many more observed differences than there are heavy atoms bound and, while noise tends to add everywhere in the Fourier, the signal builds up in the correct position. The two-site refined solution for the Au derivative was used to make a difference Fourier to look for minor sites. The highest peaks will, of course, be the original Au sites. The next-highest peaks may represent minor sites. To check this, their positions were entered into xpatpred and the predictions compared against the Patterson map. The minor peaks superimposed on peaks in the Patterson map in most positions. For minor sites, it is best to look at the cross-peaks for two reasons: (1) the Harker sections are usually noisier, and lower peaks here are less reliable; and (2) the height of a peak is the product of the two atoms that give rise to that vector. So if the major site has a scattering power of 4 and the minor sites 1, then the self-vectors of the minor peak will have height of I and the cross-peaks will be 4. Thus, the selfvectors may actually be below the noise level, while the cross-peaks are still visible. The minor sites accounted for some of the unexplained density in the Patterson map. However, as we went to smaller and smaller peaks on the difference Fourier, they matched smaller peaks in the Patterson, and at some point most of the cross-peaks will be missing a peak in the Patterson. At this point, it is usually worth stopping, since adding these peaks will contribute little to the phasing power. In fact, a problem with adding too many minor peaks is that we may start modeling noise and not real sites, which will actually interfere with phasing.

Cross-Phasing Other Derivatives We could go on to solve all the derivatives by examining the Patterson maps, but now that we have solved one derivative we can use the SIR phases to make difference Fouriers for all the other derivatives data sets. In order to put all the derivatives on the same origin, we will need to cross-phase

FIG.5.6 Completegold derivative Patterson map of C.

vinosum

cytochromec' (CVCC).

A 0.50q

X o.oon,O~oo~bu~

.

.

,

x .

.

.

.

.

o..'~oq

Z 2

0.50C

o

Y = 0.031 Y = 0.000

G

D

o Ly L) .... oi

z

O 0.

o~

E

o.o~~..~ xd) ....

F

o.o.

0.00, p.000.

.

X.

,

,0.50(

,

G 0.000 ,.000.

.

X,

~

,

.

,

,0.50~

,0.50(

C~

Y = 0.188

FIG. 5.6 (continued).

346

Z ')

0.250

2-AUI

|

Y =

Y = 0.281

k 0.000

( Y =0.312

FIG. 5.6 (continued).

347

M

N

0.000 ) . 0 0 0 .

4

,

,

X.

.

.

.

.

0.500.

0 . 0 0 0 D.O00.

.

.

X.

.

,0.5

3

0

Y = 0.375

Y = 0.4~

0 0.000 ).000 .

)

.

.

.

X.

.

.

.

0.50(

0o0o

kO00,

x

Q

0

0 7.

) Y = 0.438

FIG. 5.6 (continued).

0.50~

349

5.1 MultipleIsomorphousReplacement Ft

C~ O.

-

,

,

,

X,

,0.5~,

|

i

9

0.i

~"

Y-- 0.500

X -,0.500

$ 0.000 ) . 0 0 0 ,

9

X.

,0.50(

Z = 0.500

FIG. 5.6 (continued). anyway. The difference Fouriers are made using xmergephs. The merged and scaled derivative data set is used as the input, and the SIR phases from the solved Au derivative serve as the input phases. We then make a map of the

350

PROTEINCRYSTALLOGRAPHYCOOKBOOK Ill

d

'xHeavy'Derivative Edit

'"

Derivative: au 1. DataFile: ccpniaul.fin File Type: ~

fin

Phase Type: I ~

Isomorphous

Weight: 100

1 i,

" 0 100

Resolution" 1000.00

to: 5.00

Sigma Cut: 3 Delta/Average Filter: 100 Sites:

X: 0.36296

Atom" AU+3

[~

H

(Replace)

(Apply) I

I,

Y: 0.12808

Occupancy: 0.18100

(Insert)

,

percent

AUl 0.6184 0,1471 0.1040 AU+3 0.1150 17,0 I Au2 0.3s30 0,12el 0,3e01 Au+3 6.1810 17,0

Label" AU2

Li

~

,

!

Z: 0.38012 B: 17.00

(De lel~e)

(Reset D ,

I

V

FIG. 5.7 Xheavyedit window with Aul solution.

type Fo-Fc with xfft and display it with xcontur. It is necessary to be careful that the Fo corresponds to F~,j, and Fc corresponds to Fp so that the peaks will be positive in the difference Fourier. The heavy-atom positions can be picked from the difference Fourier and put in a solution file using the xheavy Edit Solution function. The cross-Fourier of derivative Irl using the SIR phases from Aul is illustrated in Fig. 5.8. It is important, though, to check this solution against the Patterson of the new derivative to be sure that it is real (Fig. 5.9). Also, the possibility of ghost peaks exists. These are peaks that are due to using SIR phases instead of the real protein phases. Ghost peaks

5.1 MultipleIsomorphousReplacement 0.000 .

.

.

X.

.

.

.

.

351

1.00(

9

C~

~3

0

0 Y = 0.000; 0.028; 0.056; 0.083; 0.11 I; O. 139; 0.167; 0.194; 0.222

FIG. 5.8 Cross-Fourier of Irl with Aul SIR phases. Crosses and labels show the positions of two iridium sites.

are found at the position of the heavy-atom sites in the derivative used to calculate the phases. So a peak found at this position in the difference Fourier might be a ghost peak, or it really may be a c o m m o n site. To decide this, the Patterson of the new derivative should be checked to see if it has the correct vectors to explain the site.

Refine the New Site The new derivative can be refined at the same time as the first one. With two derivatives the phases will be improved, and if both derivatives are of good quality, we may even be able to make a recognizable protein map. Before rushing off to make a map, however, it is advisable to improve the solutions and to tune the derivatives further. N o w that we have better phases, it is worthwhile redoing the difference Fouriers for both derivatives. This can be done with xheavy by selecting each derivative in turn and then writing out

352

PROTEINCRYSTALLOGRAPHYCOOKBOOK A

B

0.000. 0-000 .

.

.

.

Y,

,

,

,

,

9

O.

0.

0. X-

Y,~ 0.500

C

'@2, -,

0.500

D

O.O00P.O00 .

.

.

.

X.

.0.50(

0,000,0.000.

.

Y.

Y

Z = 0.500

X =,0.031

FIG. 5.9

Irl-derivative Patterson map.

,

, 0.500,

5.1 MultipleIsom0rphousReplacement

353

a difference Fourier phase file, running this through xfft, and then examining the output the map with xcontur. Look for new peaks that are at two or more times or. Be careful not to pick peaks that are symmetry-related to the ones you already have in your solution.

Adding More Derivatives It is simple now to add new derivatives. Just cross-phase the differences with the latest protein phases at 5.0 A and pick the peaks from the map, check them against the Patterson, and refine. With several derivatives, it pays to become critical and to determine if a derivative is worth including. For C. vinosum cytochrome c' (CVCC) we found seven derivatives for which we could explain the major peaks on the Patterson map. Of these, three were found to be inferior because they had low phasing power and high Re values. Protein maps that included these derivatives were judged to be inferior to maps that left them out, so that in the final analysis only the best four out of the seven (Table 5.4) were used in the final protein map. The Pattersons of the two platinum derivatives are shown in Figs. 5.10 and 5.11.

The First Protein Map The first protein map is always an exciting moment. The proper map to make is an Fo times figure-of-merit (f.o.m.) map at the best (centroid) phase, xheavy writes out a file that is records of h, k, l, Fo, f.o.m., Cebest.Use xfft on this file with the Fo*f.o.m option to make the map. It is worth first making a 5-A map and studying it to see if there are clear solvent boundaries and if a single molecule can be picked out. This will make it easier to interpret the higher-resolution maps. At 5 A the solvent boundaries are usually obvious. There may be tight contacts that make it difficult to decide where the boundaries of a single molecule are. By symmetry considerations, it is sometimes possible to resolve these by noting that two bits of density are equivalent and, therefore, the boundary must be between them somewhere. A section of the 5-A map of CVCC phased by the four derivatives is shown in Fig. 5.12. You can see clear solvent boundaries, and it is possible to see the dimer axis in this map and the positions of the hemes. We knew beforehand that the protein should be a four-helix bundle and a dimer. All these features could be easily picked out. If a helix can be picked out at this stage, note its position for later reference. It will be useful to look at helices at higher resolution because they have a definite structure, are good guides of map quality, and are a good way to check the handedness.

TABLE 5.4 Heavy-Atom Derivative Parameters for C. uinosctm Cytochrome c' Compound KAu(CN),

Site no. 1

2 3 4 5 6 KJrCI,

K,PtCI,( CN),

X

Y

2

B-value

R-centric

Resolution

(A)

0.11 0.18 0.02 0.02 0.02 0.01

-0.6184 -0.3630 -0.8749 -0.1140 -0.6392 -0.3950

-0.1471 -0.1281 -0.0912 -0.0954 0.0088 -0.0983

-0.1040 -0.3801 -0.3513 -0.0954 -0.4134 -0.8210

17.0 17.0 17.0 17.0 17.0 17.0

0.57

2.7'"

0.23 0.08 0.08

0.3771 0.1487 0.6288

0.0229 0.221 1 0.2359

0.4698 0.0734 0.0142

17.0 17.0 17.0

0.60

5.0

0.6339 0.6105 0.7703 0.0215 0.1190 0.1047 0.8730 0.5094 0.7478 0.7457

0.505 0.232 0.050 0.230 0.069 0.033 0.215 0.150 0.020 0.159

0.1129 0.8781 0.1043 0.4174 0.3753 0.9053 0.1349 0.5795 0.1164 0.8713

17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0

0.62

2.7'"

10

0.22 0.13 0.23 0.17 0.1 1 0.03 0.21 0.15 0.04 0.05

1 2 3 4 5 6 7

0.13 0.10 0.10 0.05 0.14 0.07 0.12

0.7399 0.1085 0.5125 0.4796 0.0289 0.7466 0.7626

-0.0188 0.0321 0.1910 0.1334 0.2428 -0.0095 0.0432

0.1227 0.8975 0.5674 0.0273 0.4030 0.5904 0.1125

17.0 17.0 17.0 17.0 17.0 17.0 17.0

0.63

3.3

1

2 3 K,PtCl,

Relative occupancy

1 2 3 4 5 6 7 8 9

=Onlycentric reflections were included past 3.3-A resolution.

A

B

oo ~

....

o~

o

......

~

x

9 I-PTI

+m-m

.

PTI-PTi

pTCvrs.PT5 I-PTI

Q

o

Y -- 0.500

C

D

o . o ~

o.oo~~ l ) y n t .

. x.

.o.5oo.

v

+PTS-pTS ~

9

0

PT"7-PT7 0.500

4~

8

-~~r4

$

-PT7

~ 2 - P T ' 2

|

Y = 0.000; 0.025; 0.050

3-PT5 PT7-PT7

-PT$

Z = 0.500; 0.514; 0.529

FIG. 5.10 Pt3-derivative Patterson map. 355

"

356

PROTEINCRYSTALLOGRAPHYCOOKBOOK B 0.000 .000,

.

.. ~,

Y,

.~

.....

0.000 i0.000. . . . . . .

0.50(

X. . . .

,0.50~ /

( --F

0 7-vr7 5

-t"Irr2"m

f23 b

o.

.

0.5001

~

Y * 0.500

C O.O00D-O00,

.

.

.

X . .

.

.

.0.500.

o

Z= 0.500

FIG. 5.11

Ptl-derivative Patterson map.

Medium-Resolution Protein Map After examining the 5-A map, refine the derivatives to a medium resolution of about 3 - 2 . 7 A. It is important to look at the phasing power of the derivatives with increasing resolution. Using a derivative to too high a resolution may actually degrade the quality of the map and not provide any useful information. Some derivatives will be better than others. Most derivatives

5.1 Multiple Isom0rphous Replacement

357

1.48

Z = 0.500; 0.517; 0.533; 0.550; 0.567; 0.583; 0.600; 0.617; 0.633; 0.650;

FIG. 5.12 Slab from a 5-A MIR map of CVCC. The dimer axis is at the position of the "X" in the center of the figure. It is a few degrees from being coincident with z, so the two halves of the dimer are not exactly the same in this figure. The two hemes are marked with an "H" and appear as round blobs at this resolution. Running diagonally on either side are helices, which appear as rough tubes of density at this resolution and are being clipped so that they are incomplete. There are clear solvent regions surrounding the dimer. The first contour level is at 1.5 times the sigma of the map. fall off a b o v e 3 A. T h i s s e e m s t o be d u e in p a r t t o t h e d e c r e a s i n g a m p l i t u d e of s t r u c t u r e f a c t o r s at h i g h e r r e s o l u t i o n b u t , a l s o , it c a n be d u e t o n o n i s o m o r p h i s m . F o r a h e a v y a t o m t o b i n d , it m u s t d i s t u r b t h e s t r u c t u r e t o s o m e

358

PROTEINCRYSTALLOGRAPHYCOOKBOOK

degree. The size of this movement relative to the resolution will determine whether the assumption of isomorphism holds. In the past it seemed that most heavy atoms so disturbed the structure that above 2.8 A or so the derivative was not usable. In the case of CVCC, the derivatives were used to different resolutions determined by the quality of the statistics. When the phasing power fell below 1.0, the derivative was cut off. The final MIR map is shown in Fig. 5.13. This map also was used to illustrate Figs. 3.18, 3.20, 3.27, 3.29, 3.37, 3.43, 3.44, 3.52, and 3.53. At this point it is worthwhile using a three-dimensional graphics program such as xfit. Xfit can fast-Fourier transform maps directly from phase files so that it is not necessary to run xfft beforehand. The turns of the helices were clear, and the side chains could be picked out. It was also evident upon inspection that the helices were left-handed. In reality c~-helices are righthanded, as Linus Pauling demonstrated many years ago. When we chose our first heavy-atom solution, there was no way to choose the correct h a n d - both hands worked equally well. The Patterson map contains both hands

0.500 ).093

,

,

X

,

i

1.13G

J} <2:

9

0.8

~

~

3

0

Y = 0.508: 0.524: 034(~, 0.556: 0.571: 0.587: 0.603: 0.619: 0.635: 0.651: 0.667: 0.683: 0.698: 0.714: FIG. 5.13 S e c t i o n of 2 . 7 - A C V C C m a p s h o w i n g d i m e r axis. D i m e r axis is vertical near center of page. Sections cut d i a g o n a l l y t h r o u g h helices can be seen.

5.1 MultipleIsomorphousReplacement

359

because of the extra inversion center symmetry that the Patterson function adds. There was a 5 0 : 5 0 chance that we would get it correct, and we got it wrong. It is simple to fix, however. Just multiply all the heavy-atom coordinates by ( - 1 , - 1 , - 1 ) to put the solution on the other side of the inversion center, which changes the hand. N o w recompute the phases and make a new map and the helix (now at the new, inverted position) will be right-handed. a-Helices make this easy; it is harder to decide with all fi proteins, although tight turns provide an important clue (see Sect. 3.11 and Fig. 3.46). The other feature of the helices that we can discern is that the side chains point toward one end of the helix. If you look at a model of an a-helix, it is apparent that the side chains point toward the N-terminal end of the helix (Fig. 3.44). This then gives us the chain direction.

Fine-Tuning the Map If your map looks perfect, then go ahead and fit it. In the case of CVCC, while the map was obviously that of a protein molecule, it was still noisy and could use improvement. The first task was to fine-tune the heavy-atom parameters to see if the map could be improved. Up to now we have not made use of the anomalous scattering. However, in the case of CVCC, this turned out to be a much smaller signal than anticipated. At first this was puzzling, since we could think of no way that the differences would become smaller. Adding noise would have made the overall differences larger. The answer lay in the position of the noncrystallographic dimer axis. It goes through the unit cell at 0.25 in x and 0.25 in y and is parallel to z. It turned out that our heavyatom positions, when reflected by the 2-fold, formed a pseudocentric array. The same is true for the Fe positions in the two hemes. Further, a test was made using a program supplied by Siemens with our four-circle area detector that tests for centric versus acentric data based on the distributions of the amplitudes. This is a common practice for small-molecule structures. The CVCC amplitudes test as centric even though the protein is clearly acentric. This result is due to the presence and position of the 2-fold in the unit cell; at low resolution the protein is pseudocentric. We tried including the anomalous data anyway but the maps were not i m p r o v e d m i n fact it seemed that noise was added.

Solvent Flattening Another successful technique is solvent flattening. In this method, the solvent is assumed to be flat and featureless; therefore any features in the

360

PROTEINCRYSTALLOGRAPHYCOOKBOOK

solvent are noise and should be removed. The first trick is to find the solvent portion of the map. For CVCC we did this using the method developed by B. C. Wang as implemented by William Furey in the PHASES package. Furey's programs are easy to use and come with a good manual. We edited the files to add our unit cell and left all the other parameters at the default values. We ran d o a l l , sh, and after about half an hour, the cycling was finished and we looked at the resulting map. The map appeared to be improved, with higher contrast and, of course, flatter solvent regions (Fig. 3.29). Because of the automatic solvent masking, we were aware that some side chains may have been clipped. Lysines and glutamines, which are often quite weak toward the ends anyway, are susceptible to this phenomenon. In fitting, you may want to use both the flattened and unflattened maps at the same time to check this out. It is usually easier to follow the path of the main chain in the flattened map, and the exterior side chains can be checked with the unflattened map to see if they have been clipped by the solvent mask. "Chasing the Train" Chasing the train is a spoonerism for tracing the chain. This is clearly the trickiest and most difficult part of solving proteins. The more experience one has, the easier this is. It is difficult for the beginner to visualize the possible paths of the main chain and to see these possibilities in the map. You must plunge in and just try it at first. Do not be afraid to throw out your first attempts at interpretation and try again. As you progress, this process will become easier. In the case of CVCC, the chase was simplified because the structure of the Rhodospirillum molischianum cytochrome c' was already available from the Brookhaven Protein Data Bank. Although the R. molischianum structure is not similar enough for a molecular replacement solution, it has the same basic fold and can be used as a guide to fitting our map. The largest peaks in our MIR map were taken to be the Fe positions in the heme. The R. molischianum structure is a dimer of four-helix bundles with a heine bound in the center of each bundle. Searching between pairs of highest peaks, we found two peaks for the dimer pair at the expected distance of 25 A that were not crystallographically related. We suspected from the heavy-atom work that the dimer axis was nearly parallel to z, and looking at slabs of the MIR map in z confirmed this. The helices that form the dimer interface were apparent, and we zoomed in to look at these. Then one can use xfit to superimpose the model on the map so that the Fe atoms fit onto the heine peaks. This fixes the translation and leaves one degree of freedom to be decided, a rotation about the Fe-Fe vector. This can be roughly accomplished by put-

5.1 MultipleIsomorphousReplacement

361

ting the Fe-Fe vector horizontal and in the plane of the screen and then rotating the model about x. This gives an approximate superimposition. In our structure determination, we tried rigid-body refinement with XPLOR to try to improve the fit. However, while the R-factor improved, the fit drifted away from the best superposition. It was obvious that we needed a better model. In hindsight, the failure of the rigid-body refinement is due to some large differences between the two proteins. The loops are of different size and shape, and the helices, while they follow the same general directions, do not superimpose well. The dimer angle is also different by approximately 10 ~ in the two structures. The superimposed model was still quite useful because it provided raw material roughly placed with which to begin fitting. An individual helix can be placed within its density by rotating about the helix axis until the side chains align. The side chains are then "mutated" to the correct ones by inserting a new residue of the proper sequence and then least-squares-fitting this residue to the old model with the Ca, C, Cv, and N atoms. The side chain can then be positioned into its density by torsions about the proper bonds. The roughly fit model was then subjected to rigid-body refinement with XPLOR. Each monomer was refined separately so that we had two groups, and then each dimer was broken into separate helices and refined. The R-factor at this stage was quite high, about 0.45 in the resolution range 2 0 - 5 A. However, this is not unexpected for a model manually fit to an MIR map. We then started refinement at 3.0 A, allowing the individual coordinates of each atom to refine (subject to geometrical constraints) but keeping the B-values fixed at 15.0. This is the method in the XPLOR example file, positional.inp. This quickly dropped the R-factor to 0.34 from a starting R-factor of 0.51 (see Table 5.5). After additional fitting to the MIR maps, the R-factor dropped to around 0.27. At this point we tried the simulated annealing refinement of XPLOR. The first round lowered the R-factor a little, but the next round raised it. Then we decided we had to dig in and just carefully fit the MIR maps, giving up the hope of an automated solution. From that point, the path followed is less clear, since we sometimes backed up, tried fitting to combined coefficient maps, and so on. In any case, we changed the resolution to 2.3, which gave us more information to fit to, and gradually we discovered errors in the geometry and fixed them. We used a large number of omit maps and compared between the two dimers, which were fit independently. In the end, we got the R-factor down to 0.22 at 1.8-A resolution. The maps at this point looked quite good. We suspected that some of the error might actually lie in the data set. It was collected from two crystals and merged, neither data set being complete, and we had found problems earlier with merging data from different crystals. So we completely re-collected the

362

PROTEINCRYSTALLOGRAPHYCOOKBOOK TABLE5.5 Progress of CVCC Refinement Cycle

dmio

R-factor

A4~ from final model

0 (Start)

3.0

0.51

60.9

1

3.0

0.34

50.0

2

3.0

0.27

37.5

3 (SA refinement)

3.0

0.25

37.8

4 (SA refinement)

3.0

0.27

38.31

5

2.3

0.25

31.0

6

2.3

0.24

30.7

7

2.3

0.26

30.8

8

2.3

0.24

30.7

9

2.3

0.23

32.6

10

1.8

0.22

10.9

11

1.8

0.19

0.0

data from a single large crystal. This dropped the R-factor to 0.19, and we identified one last problem with the model at Gly 53 that had its peptide plane flipped (see Fig. 5.14). Figures 5 . 1 5 - 5 . 1 7 show the final structure after refinement. The Rfactor statistics are shown in Tables 5.6 and 5.7. At this point, the structure is fairly accurate. However, the R-factor could be even lower as judged by other structures in the literature. We cannot now find anything in the structure that needs to be fixed, but this does not mean that we have found everything. Hindsight Is 20-20 For the purposes of this book, I thought it would be interesting to look back at the structures and compare them with the final model. The last column in Table 5.5 lists the difference in phase between different stages of the refinement and the final model. The difference was calculated for all the data between 6.0- and 1.8-A resolution even when the model was refined at lower resolution. A separate analysis of the MIR phases reveals them to be about 45 ~ from the final phase in the resolution range 3 4 - 3 A. Figures 5.18 and 5.19 compare the backbone of several of the structures in Table 5.5 to the

5.1 MultipleIsomorphousReplacement

i

363

I

5 3 CA

\ CA

FIG. 5.14 Map showing area around Gly 53 in CVCC. The map is made with 2Fo - Fc coefficients from the phases of cycle 10 in Table 5.5. The model at this stage is shown in thick lines, and the final model is shown in thin lines. Note that the hydrogen is included on the peptide nitrogen atom; the electron density at hydrogen is so low in X-ray maps that it cannot be seen. This map clearly shows that the peptide plane needs to be flipped.

final structure. It can be seen that we had our biggest problems with the loops and with the unusual conformation found for residues 5 2 - 5 9 , which is an extended strand with a 3 - 1 0 helix in the middle. Figure 5.20 shows a map of the region around Glu 106, which was one of the last errors found, made with the phases from cycle 6 (Table 5.5). This error was fixed later, but I was curious to see if it could have been caught at an earlier point. The map is clearly ambiguous here. It does not match well to any protein structure, and there is a break in the chain. The thin lines show the final structure in this region, and there is no density for Ala 107, although there must have been some at an earlier stage because a water has been built into the density for the C~ atom that has now completely disappeared. I had begun to consider seriously a sequencing error in the protein, and we had shortened Glu 106 to an alanine because we had nowhere to put the side chain. We had made heavy use of omit maps, so I made an omit map of the region shown in

9

"N~

-~

"-

-~

.

-.

FIG. 5.15

,,";J

"

~-x,"-~,"-

"

~

i

A 1.8-A 2F,, - Fc map showing the heme and its surrounding area.

~ 4,

FIG. 5.16 The final fitted model of CVCC with the 1.8-A

364

9

2Fo -

Fc

map.

FIG. 5.17 Ribbon diagram of CVCC showing overall fold. This figure was produced using MOLSCRIPT. See Kraulis, P. J. (1991). J. Appl. Crystalogr. 24, 946-950.

TABLE 5.6 CVCC R-factors by Resolution for Final Model Resolution (A)

Range (A)

No. reflections

Shell R-factor

Accumulated R-factor

3.74

10.00

2339

0.1718

0.1718

3.00

3.74

2766

0.1551

0.1638

2.62

3.00

2706

0.1935

0.1709

2.39

2.62

2627

0.2008

0.1758

2.22

2.39

2548

0.2140

0.1804

2.09

2.22

2404

0.2328

0.1850

1.99

2.09

2217

0.2581

0.1896

1.90

1.99

1956

0.2673

0.1930

366

PROTEINCRYSTALLOGRAPHYCOOKBOOK TABLE 5.7

CVCC R-Factors by Amplitude for Final Model Range

No. reflections

ShellR-factor

Accumulated R-factor

18.102

291.406

16,933

0.2327

0.2327

291.406

564.709

2,218

0.1322

0.2039

564.709

838.013

363

0.0860

0.1951

Amplitude

838.013

1111.316

42

0.0537

0.1934

1111.316

1384.620

6

0.0666

0.1931

1384.620

2204.530

1

0.0334

0.1930

Fig. 5.20B, omitting residues 1 0 4 - 1 0 8 . This map is a little better, but the break is still there, the density is too broad, and it cannot distinguish between the right and wrong structure shown. Clearly, we had a severe phase bias problem h e r e m a l l the refinement had subtly altered the rest of the model to bring density back near the incorrect structure as well as some for the correct structure. To remove this bias, I tried "shaking" the structure by adding a small r a n d o m number between - 0 . 2 5 and 0.25 to all the atoms. This resuited in an average O. 17-A movement, which, compared to the 2.3-A resolution of the data, was quite small and raised the R-factor from 0.23 to 0.28. However, as can be seen in Fig. 5.20C, all the phase bias was removed. The structure became quite clear and, in fact, it appeared that the final structure could use some small adjustments to fit this density better. This method is quite simple and powerful, and we plan to use it more in the future to produce maps with reduced phase bias.

.....

5.2 . . . . .

MUTANT

STUDIES

A very powerful method of exploring the residues involved in function is to mutate positions and study the effect. In order to interpret many of these mutations, it is necessary to have a structure. Doing the structure of a m u t a n t once the "wild-type" protein's structure is known is usually much simpler since good starting phases are available from the wild-type structure to phase the m u t a n t structure. To interpret mutants best, there is more that can be done than simply examining a 2F,, - Fc map. The wild-type phases include the old structure but not the new one, so some caution is advised. Examina-

367

5.2 MutantStudies

06

106

06

106

FIG. 5.18 Models of CVCC at various stages of refinement. The main chain of the final model is shown in thick lines, with the various models at different stages in the refinement shown in thin lines: (A) cycle 0, starting model; (B) cycle 1; (C) cycle 2; (D) cycle 3; (E) cycle 4; (F) cycle 5; (G) cycle 10.

tion of an o m i t m a p is suggested, w h e r e the residues of interest are o m i t t e d f r o m the phase calculation. This greatly lowers the a m o u n t of phase bias, a l t h o u g h s o m e can still exist (see Sec. 3.12 u n d e r Phase Bias). In the

368

PROTEINCRYSTALLOGRAPHYCOOKBOOK

C 106

106

106

106

D

FIG. 5.18 (continued). following example of a m u t a n t of cytochrome c peroxidase, we illustrate the procedure in solving a mutant in which an Asp has been replaced by a glutamate resulting in some unusually large changes. -~The form of cytochrome c peroxidase we used as the wild-type was actually a recombinant form called MKT; it has three mutations already added that do not affect its functions. 5Goddin, D. B., and McRee, D. E. (1993). Biochemistry 32, 3313-3323.

E

106

106

F

lO6

106

G

lO6

106

FIG. 5.18 (continued).

370

PROTEIN CRYSTALLOGRAPHYCOOKBOOK

,3,

FIG. 5.19 CVCC with arrows showing model differences: an alternative way of showing the differences between the final model and the model from cycle 1. The backbone of the final model is shown with arrows drawn along the vector between the two models. The arrows are scaled by a factor of 2 to exaggerate the differences. Notice that most of the long arrows are in the loops between the helices.

Example 1"MKT D235E Mutant The m u t a n t M K T D 2 3 5 E crystallizes in the same conditions as the wild type a n d in the same form. D a t a were collected on o u r four-circle area detector a n d r e d u c e d with X E N G E N . T h e crystals were not as large as those used to collect the 1.7-A resolution native data, and so data were a b o u t 8 0 % complete to 2.1-A resolution. Difference maps are fairly insensitive to missing data, so the incompleteness was n o t a problem. The wild-type and m u t a n t data are c o m b i n e d with x m e r g e (Fig. 5.21 ). This file is then " p h a s e d " by adding the phase i n f o r m a t i o n from the wild-type phase file with x m e r g e p h s to p r o d u c e a file c o n t a i n i n g records of h, k, I, Ft)23.sE, Fwild-type, ~wild-type" The first m a p m a d e was an F,, - F,: map, which in this case is the m a p FD23sv Fwild-type. Peaks in this m a p that are positive represent density that was not in the wildtype structure, a n d negative peaks are places w h e r e density was in the wild type but n o t in the m u t a n t structure. In the simplest case, this m a p will reflect the a t o m s that were c h a n g e d to m a k e the m u t a n t structure. Often, there are also r e a r r a n g e m e n t s to a c c o m m o d a t e the new structure. It is i m p o r t a n t to e x a m i n e the difference m a p over a large area to see if there are any changes that are far r e m o v e d from the m u t a n t site; do not assume that all the changes will be local. In this case the changes were far greater than expected from just -

-

5.2 MutantStudies

371

the single carbon atom difference in the side chain of Glu versus Asp (Fig. 5.22). In fact, the entire helix that includes position 235 has been disturbed. Clearly, interpreting this difference map is confusing. There are large amounts of positive and negative density. One tool xfit has to help interpret the difference map is the ability to calculate the gradient of the difference map at an atom position. This then gives the direction that the atom needs to move according to the difference density at that atom. This direction is displayed as an arrow, where the length of the arrow indicates the steepness of the gradient on an arbitrary scale (Fig. 5.23). In the case of the helix, there is a net movement away from the side chain of the Asp to Glu mutation. The direction of the vectors indicates the direction of the gradient and the length indicates the relative magnitude, but the absolute magnitudes are incorrect. An omit map is used to refit the positions of the helix. Because the data are not on an absolute scale, the omit map must use the coefficients Fo and Cl~omit. N o w that it is known from the difference map which atoms move significantly, they can be selected in xfit and omitted. The Omit Selected Atoms option in the SFCalc window is used to calculate the phases for these residues and then to delete them from the protein phases. This map, shown in Fig. 5.24, can be used for fitting the mutant structure. After fitting, XPLOR was used to refine the model. The final refined structure is shown in Fig. 5.25.

Example 2: SOD C6A Mutant As another example of a mutant study, we will consider a mutant of bovine copper-zinc superoxide dismutase (SOD). We wished to make SOD more stable to long-term heating. This had proved useful in purifying the protein, as most of the other proteins in the cell can be destroyed by heating at 70 ~ C, while the SOD Cys 6 -~ Ala (C6~) mutant is relatively unaffected. To investigate the structural effects of this mutation, we did a crystallographic study. 6

Crystallization and Data Collection The mutant was crystallized, as was the native protein from 2-methyl2,4-pentanediol solutions. The data were collected on our Bruker three-circle area detector to 2.1-A resolution using three crystals. Most of the data were from a single crystal; the other two were used mainly to fill in missing data. Because the crystals are monoclinic and a quarter of a sphere of data is needed, it is difficult to get all the data from a single orientation. The unit cell 6McRee, D. E., Redford, S. M., Getzoff, E. D., Lepock,J. R., Hallewell, R. A., and Tainer, J. A. (1990). J. Biol. Chem. 265, 14,234-14,241.

3 72

PROTEIN CRYSTALLOGRAPHYCOOKBOOK

was 93 x 90 x 7 2 A 3, ~ --- 95.1 ~ and we used a detector distance of 12 cm and a swing of 22.0 ~ The crystal was oscillated in 0.25 ~ frames from o) = - 5 0 ~ to 10~ then q5 was i n c r e m e n t e d by 60 ~ and a n o t h e r run of ~o was collected. The data were reduced and scaled with X E N G E N .

Difference Map The data were m e r g e d and scaled with the native data using anisotropic scaling (xmerge) to p r o d u c e a .fin file with h, k, I, Fwild_type, O'wild_type, FC6A, O-c6 A. The differences were fairly small, but they consistently fell off with resolution, indicating an i s o m o r p h o u s structure with small differences. Since the m u t a t i o n is the removal of a single sulfur g r o u p , by changing residue 6 from a cysteine to an alanine, this was to be expected. The phases of the refined

GLY 105 CA

GLY 105 CA

106 G~,,/

~

6 D~

r

.J~"'-'~

~

"

LA 106 CA,

\

"

HOH 6

108 CA

108 CA

_

FIG. 5.20 Electron density at Glu 106 in the middle of the refinement. The model at cycle 9 is shown in thick lines, and the final model is shown in thin lines. The 2.3-A electron density map is shown in gray lines with coefficients as follows: (A) 2F,, - F~, (B) residues 105-108 2Fo - Fc omit map, (C) 2F,, - F~map with residues "shaken." See text for explanation.

LY105CA

LY105CA

108 CA

108 CA

i@( c

~J

~2V ~~~GLY 105C,~'~~

108 CA

FIG.5.20(continued).

GLY

105CA

108 CA

A 0.348

,

,

.

.

.

.

.

A

!

0.279

"

-r

"

!

"

1 !

I I

I I

I l

I I /

..

"

I

T !

i

"

~

~

/ ..,'IlPP'

I i

~

.~

7"

- -~...,.,,,,,.'i..&r-'T

. . . . . . . . . . .

.~

I I

I i

0

0.209

IFI-F21/FI 0.139

........

-~

-t

.....

0.0696

I

i !

0

I i i ,

14.4

H H

I

,

3.94

3.13 o Resolution (Angstroms)

2.74

2.49

2.31

All 14586 reflections Centric 1595 reflections Accntric 12991 reflections

137

!

110

. . . .

i

1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

t- . . . . . . . . . . . . . . . . i

82.4 IFI-F21

27.5

0 14.4

H H

3.94

3.13 o Resolution (Angstroms)

2.74

All 14586 reflections Centric 1595 reflections Acentric 12991 reflections FIG. 5.21

M e r g i n g statistics of D 2 3 5 E .

2.49

2.31

5.2 MutantStudies

375

f--z--

I i

FIG. 5.22 D235E 2.1-A difference map: Asp 235 to Glu mutant F. . . . . . -- Fwild_type electron density map. Black contours are at + 3or, and gray contours are at - 3o-. Note the movement of the entire helix.

wild-type structure were merged with the data (xmergephs) and a difference map F m u t a n t - Fwild_type , ~ w i l d - t y p . . . . lc was made. The largest differences in this map were at the site of the missing sulfur atom, where a large hole was found (Fig. 5.26). The unit cell contains two dimers of SOD for a total of four monomers. There were differences around residue 6, which indicated small changes in the main chain. Another feature was a pair of negative and positive peaks on either side of Gly 145 O, which indicate a shift of this atom toward the hole left by the removed Cys 6 SG. Since it is difficult to assess the correct distance to move atoms in a difference map, an omit map was made. To calculate the phases for this map, residue 6 and the residues near it with peaks in the difference map were left out of the calculation. These phases were then used to calculate an omit map. To increase the signal further, the omit map was averaged using the noncrystallographic symmetry operators k n o w n from the refinement of the native protein. This map was remarkably clean and clearly showed the movements needed to fit the structure. Thus, even though the movements were subtle, they could be accurately modeled from the 4-fold averaged omit map. The model was refit using this map and

3 76

PROTEINCRYSTALLOGRAPHYCOOKBOOK

J

/,.

'"7

/

././-.

~

,,.,

/

~

\

.~

....... ,,~

,

",,,,

....

]

,,~<...... p "

,,/.,,',

%

,\

1 /////" 9

~

B

~ j 5 7 " " . , < ...........

'~'-,,

~f

j

FIG. 5.23 D235E F . . . . . . . -Fwild-typ,.gradient vectors; arrows show the difference map gradient at the main chain atoms. The arrows show the direction that the atoms move in going from the wild type to the mutant structure. Their length is exaggerated. The thick arrow in the center shows the average vector.

refined. The final refined structure is shown superimposed on the wild type in Fig. 5.27.

Refinement In a parallel study we were interested in what would happen if we refined the structure without any refitting. We tried three different refinement methods: PROLSQ refinement, XPLOR positional-only refinement~ and XPLOR with molecular dynamics. None of the three programs moved the main chain around residue 6 to the correct position, even though we did 2000 cycles of molecular dynamics with XPLOR at 2000 ~ C. With hindsight, we think this is because the native model has been extensively refined and the model satisfies the geometry constraints very well. To move the main chain, all the atoms need to be moved in concert, as a group. If one atom or a few atoms move in the right direction, the geometry will be distorted, and the next cycle will move the atoms back to fix the geometry. This is an example

FIG. 5.24 D235E 2.1-A omit map. (A) The atoms in the helix containing Asp 235 were omitted from the structure factors, and a 2Fo - Fc map was calculated. The model shown is the wild type. Because of the large number of atoms omitted, the map is fairly noisy, but the movements of the residues can be clearly seen. (B) Close-up of residues around the mutation. The rotation of His 175 can be seen, and the new conformation of Glu 235 can be seen.

,"

I J~

~

51TRPVf

FIG. 5.25 Comparison of final structures. The wild-type model is shown in thin lines, and the D235E mutant is shown in thick lines.

f

0

17

147

147 1

VAL 5

VAL 5

FIG. 5.26 SOD C~-6 mutant 2.1-A difference map. The coefficients are C6~ Fwild_typeat + 3o(black) and -3o- (gray). The model is wild type, showing three fl-strands in the area of the mutation. -

5.2 MutantStudies

379

145 q17

CAL 7

VAL 1,

VAL 1

CYS->ALA 6

CYS->ALA 6 147

IS 19

VAL 5

147 IS 19

VAL 5

FIG. 5.27 Superpositionof SOD wild type (thick lines) and final refined C6A mutant (thin lines) structures.

of being stuck in a local minimum. On the other hand, the re-fit model stayed where it had been fit and refined to a lower R-value, indicating that it is in a lower minimum. The lesson of this structure is not to rely on refinement programs to fix your mutant structure. It would seem that if any structure could have been done solely by refinement, it would have been this one.

Analysis The mutant, while more stable to prolonged heating, has a melting temperature (Tin) that is slightly lower than that of the native structure. The increased stability to long-term heating could be explained by the lack of a the free sulfhydryl group on the cysteine. A cysteine would be subject to oxidation at the elevated temperatures, could form incorrect disulfides with other SOD molecules, or could undergo a fi-elimination reaction, leaving a kink in the main chain that could prevent refolding. The difference in the melting temperature was not enough to explain the loss of the energy due to the removal of the buried hydrophobic surface of the sulfur. We looked at the packing of the native versus mutant structures to see what the difference was in packing, using Mike Connolly's surfacing program MS. The mutant

380

PROTEINCRYSTALLOGRAPHYCOOKBOOK

was actually packed better than the native: the native has a small hole accessible to 0.7-A probe, while the mutant has no hole large enough for a probe of this size. This tighter packing may explain the better-than-expected stability of the mutant. The loss of the folding energy due to the sulfur-buried area is partially compensated by the better packing of the interior.

. . . . . 5.3 . . . . . SUBSTRATE-ANALOG EXAMPLE Substrate-binding studies are an especially powerful technique in protein crystallography, revealing details about the bound state of substrates and their interactions with protein groups. However, they have several difficulties. First, the substrate must bind tightly and have a high occupancy if it is to be seen in the electron density. The substrate must be stable for a relatively long time, as it takes several days to collect the data. In the case of enzymes, this means that the substrate is usually an inhibitor that cannot undergo turnover. While the chosen substrate may bind well to the protein in solution, this may not happen in the crystal. For example, there are many cases of a substrate that causes a large change in the conformation of the protein upon binding but cannot be accommodated by the crystal packing. In such cases, the crystal may crack or the substrate may not bind. Alternatively, a crystal contact may block the binding site. In these cases, it may be possible to cocrystallize the protein and substrate (i.e., grow the crystal in the presence of the substrate). These crystals may not be isomorphous with the nonsubstrate form, necessitating a solution by molecular replacement. An example of such a case is aconitase.

Example: Isocitrate-Ac0nitase Aconitase is an enzyme containing an Ve4S4 cluster that catalyzes the stereospecific interconversion of citrate to isocitrate via ci s-aconi t ase. The structure of the 83-kDa molecule was solved at Scripps by Robbins and Stout. 7 To probe the mechanism of the structure, Lauble and Stout 8 studied several crystal structures of inhibitors of the enzyme, and an example of their work is given next to illustrate substrate-binding studies. 7Robbins, A. H., and Stout, C. D. (1989). Proteins 5, 289-312; Proc. Natl. Acd. Sci. U.S.A. 86, 3639-3643.

8Lauble, H., Kennedy, M. C., Beinert, H., and Stout, C. D. (1992). Biochemistry 31, 2735-2748.

5.3 Substrate-AnalogExample

381

Crystal Growth The original aconitase structure was solved in an orthorhombic space group P21212. The active form of the enzyme is oxygen sensitive, and so all the substrate experiments were carried out in an anaerobic hood. In the first attempts, large crystals of the orthorhombic form were soaked with substrate solutions. These crystals invariably cracked and disintegrated when the substrate was added. To overcome this, it was decided to cocrystallize the protein in the presence of substrate. The crystals were grown by means of the vapordiffusion method from hanging drops, with ammonium sulfate as the precipitant. Each drop was a three-part mixture of protein, inhibitor/substrate, and precipitant. To start crystal growth and to encourage isomorphous crystals, seeding was used. Upon mounting and taking some test shots, it was found that the crystals had a tendency to twin and that the space group was now monoclinic. In a second round of seeding, where these new monoclinic crystals were used, it was possible to get large, single crystals in the monoclinic form suitable for data collection. A variety of inhibitors were tried, and they all invariably yielded the monoclinic form. The new space group was C2. To facilitate comparisons, it was decided to reindex the data as B2, an alternate setting of C2 with the 2-fold parallel to c instead of b, so that the 2fold of the monoclinic cell coincided with the 2-fold along c in the orthorhombic, P21212 space group. The unit cell of the orthorhombic space group was a = 173.6, b = 72.0, and c = 72.7 A, while the monoclinic cell was a = 185.5, b = 72.0, c = 73.0, and y = 77.7 ~

Data Collection and Phasing X-ray diffraction data were collected on a Bruker area detector. The resulting data had~a ~d min of about 2.1 A. Data were collected on crystals grown in the presence of cis-aconitate, the reaction intermediate. The resulting data were 90% complete, had an average I/~r(I) of 18.9 with last shell at a ratio of 3.8, and an Rsymm of 6.8% on F. Since the crystals were not isomorphous with the form that had been solved previously, a molecular replacement solution was used. The Fe-S cluster was removed from the starting model. Since its contribution to the total scattering power of this 83-kDa protein was minimal, it made little difference to the molecular replacement solution. However, it did provide a good check o n t h e final solution (i.e., one could see whether it showed up again after the solution was found). The rotation search gave a hit at 17.0~r. This is a very high hit and not surprising, since the protein molecules in both crystal forms have 100% homology. The translation search also produced a large peak at 23.4o-. The rotated and translated model was then refined with X P L O R with rigid-body refinement.

382

PROTEINCRYSTALLOGRAPHYCOOKBOOK

The starting R-factor was 0.33, which was refined to a final value of 0.22 at convergence. A 2Fo - Fc map at 3.0 A showed clear density for the Fe4S4 cluster, as well as new density for the substrate.

Refinement and Interpretation The structure of the protein in the monoclinic, cis-aconitate cocrystallized crystals was refined to an R-factor of 0.23 at 2.1-A resolution, at which point water molecules were added to the structure and the conformation of the side chains was adjusted as needed against 2F,, - Fc maps. Individual, isotropic B-factors for each atom were refined to give an R-factor of 0.19. A final round of simulated annealing refinement lowered the R-factor to 0.18. At this point, it was decided to interpret the density of the substrate, and a surprise was again found. Attempts to model cis-aconitate into the density gave poor fits. Instead, it was found that isocitrate gave the best fit (Fig. 5.28). Since aconitase interconverts citrate to isocitrate through the cis-aconitate intermediate, citrate was also modeled into the density, but without success. If the protein was active during crystallization, a mixture of isocitrate, cisaconitate, and citrate should be found in the crystal. The enzyme was able to convert the cis-aconitate to isocitrate during crystallization, indicating that it was active. That only isocitrate is in the crystal was confirmed by means of Mossbauer spectroscopy and crystals of SVFe-substituted enzyme. This spec-

FIG. 5.28 2Fo - Fc map of aconitase with isocitrate bound. The aconitase-Fe4S4 cluster is at the right, with the bound isocitrate at the left.

5.3 Substrate-Anal0gExample

383

trum showed only one form in the crystal and was the same form as observed after mixing aconitase with isocitrate and rapidly freezing the sample. Thus, the crystal has somehow trapped the protein in a conformation where it binds isocitrate only and thereby shifts the reaction equilibrium. After the model had been refined with the isocitrate included, another Fo - Fc map was made to check for unaccounted density (Fig. 5.29). This map showed a peak consistent with a water ligand in a sixth position to the unique Fe at the corner of the Fe4S4 cluster. E N D O R (electron-nuclear double resonance) spectroscopy indicated the presence of a water molecule bound to the Fe-S cluster in addition to the substrate, confirming the choice of a water molecule. An overall view of the aconitase-substrate complex is shown in Fig. 5.30. In summary, there were two points that make this solution interesting from a crystallographic point of view. First, the crystals in the presence of substrate grow in a new space group. Interestingly, the crystals can be seeded with the other space group. Comparison of the cells shows a similarity in cell size and the direction of the 2-fold present in both space groups. The second point is that the crystal selects a form of the enzyme that preferentially binds one of three possible substrates present in solution. This could be explained if it were found that the protein undergoes a conformational change during the reaction, and the crystal traps this form over other conformations.

FIG. 5.29 Aconitase-isocitrate difference density. This figure is rotated about 90~ downward about the horizontal from Fig. 5.29. The difference density is interpreted as a water molecule bound in the sixth position to the Fe of the Fe4S4cluster.

384

PROTEIN CRYSTALLOGRAPHYCOOKBOOK

FIG. 5.30 Ribbon drawing of aconitase illustrating the large size of the molecule. The arrows represent/3-sheet structure, and the helical ribbons indicate or-helices. In the center of the protein, in a deep cleft, is the isocitrate bound to the Fe4S4cluster.

.....

5.4 . . . . .

MOLECULAR REPLACEMENT The following steps are needed for m o l e c u l a r replacement: 1. 2. 3. 4. 5. 6. 7.

Collect a native data set of the u n k n o w n crystal structure. Run r o t a t i o n function with probe molecule. Refine r o t a t i o n solution (optional). Run t r a n s l a t i o n function. Carry out rigid-body refinement. Fit sequence into difference maps. Refine new structure.

385

5.4 MolecularReplacement

Example:YeastC0pper-ZincSuperoxideDismutase As an example of a molecular replacement solution, we present the crystal structure of yeast copper-zinc superoxide dismutase (ySOD) by Drs. Hans Parge and John Tainer. When this structure was undertaken, two Cu-Zn superoxide dismutases had been solved: bovine (bSOD) and human (hSOD). The structures are very similar, so either one could probably serve as a probe. The human SOD structure was chosen as the probe molecule, and the molecular replacement proceeded to a straightforward solution. SODs are dimeric molecules, and the dimer of hSOD was used as a probe. The first step after good-quality crystals had been obtained was to collect a native data set. This was done using a Bruker area detector, and the data are shown in Table 5.8. The data are complete and in the range used by molecular replacement, 1 5 - 3 . 5 A; moreover, they are bright (high I / o ( I ) ) and have a low Rsymm.The data are 100% complete in this range, which is important for the rotation function, and were collected with a high redundancy, 6.5, for the first shell, which helps with scaling out absorption and noise. X P L O R was used for the rotation function. The example input files provided with XPLOR were used (only the space group and filenames had to be changed). The rotation function yielded one hit that was clearly above the others (Table 5.9). The top 50 hits were then put through Patterson

TABLE 5.8

Native Data Statistics for y S O D

a

Observations Resolution (a)

Total no.

No. unique

Percent complete

4.0

12,189

1,867

99

Rsymmon

F

I/o(I)

3.4

75.8

3.2

11,450

1,800

100

5.8

41.3

2.7

10,281

1,787

100

10.9

18.7

2.5

7,925

1,781

100

21.2

6.2

2.3

7,925

1,781

100

21.2

6.2

2.2

3,409

1,264

72

24.0

3.7

54,844

10,296

95

8.7

27.4

Total

ayeast superoxide dismutase; space group R32 (hexagonal indexing): a = b = 119.3, c = 75.2, 90, 90, 120.

386

PROTEIN CRYSTALLOGRAPHY COOKBOOK

TABLE 5.9 Molecular Replacement Solution of ySOD a Rotation function Index

02

03

RF

9.79

87.50

322.52

1.115

2

8.51

82.50

331.49

3

10.05

87.50

311.87

4

9.35

85.00

5

9.00

77.50

6

9.82

7

192.62

1

01

PC refinement 01

02

03

PC

8.23

86.73

323.68

0.181

0.826

8.85

86.70

323.47

0.181

0.764

8.82

86.57

323.91

0.181

269.14

0.740

7.05

85.86

268.94

0.066

321.00

0.735

8.97

86.48

323.74

0.180

85.00

258.48

0.721

6.55

84.78

259.83

0.057

82.50

100.71

0.718

189.55

94.07

94.48

0.164

8

196.38

87.50

90.93

0.695

189.40

93.40

96.06

0.178

9

32.39

90.0

323.48

0.688

30.36

90.78

325.17

0.044

10

11.02

90.0

337.75

0.684

9.80

89.71

337.39

0.037

Translation: translation function value (TF) = 0.607; [0.373, 0.305, 0.267]. Initial R = 0.48; 8 . 0 - 3 . 0 a . Rigid-body refinement: R = 0.42. aThe top 10 rotation hits are listed, showing the value of the angles and the rotation function (RF). The same 10 hits are also shown after Patterson correlation refinement (PC). Notice that 1-3, and 5 all converge to the same value. The values of 0 angles were applied to the model, and the translation function produced a clean hit at the value shown. The model was then rigidbody-refined. This model was then used to fit the structure manually to 2F,, - F, maps. Note.

correlation refinement. Hits 1-3 and 5 refined to the same position with the highest correlation. The next-highest correlation was for solution 8, which is only slightly less than the highest correlation. The top hit was then used in a translation search (Table 5.9). The top hit from the translation function was then rigid-body refined from a starting R-value of 0.48 to 0.42 in the range 8 - 3 A. This R-value is what would be expected considering that hSOD has many changes from ySOD. When the model was built using the dimer positioned by the molecular replacement solution and the crystallographically related molecules synthesized to check contacts, it was found that the dimer was on a crystallographic 2-fold. This causes two crystallographically related molecules to superimpose. It was clear then that the asymmetric unit of the crystal was a monomer of ySOD, not a dimer, and that the dimer axis fell on a crystallographic 2-fold. Using the dimer only had no effect on the solution in this case: it essentially amounted to a doubling in the scale factor, since the extra molecule exactly overlapped another molecule. If a packing function had been run, though, the correct hit would have one of the worst packings using the dimer!

387

5.4 Molecular Replacement

It is useful at this point to find some independent proof that the solution is correct. For this purpose the Cu and Zn were left out of the original molecular replacement probe. If the solution is correct, there should be density at the metal sites due to the approximately correct phases. This was, indeed, the case--there was density at the Cu and Zn. An even more striking proof of the correctness of the molecular replacement phases was found at the site of Tyr 33 (Fig. 5.31). This tyrosine had replaced a glycine in the human structure, and the lack of a guiding C~ atom had resulted in its side chain being built in a rather improbable conformation. After the first round of refinement, the 2Fo - Fc map showed very clear density for a tyrosine. No information for this density was included in the molecular replacement

..,.

7,"

,

!-

: I~

,-., :..

',,

" i<- - ;, /,' '.. ',,

9

r,r~./', ,~,. (..~.,

,

L..,:. ~!"

9

....... :::; ....... ,.~ ' ',/"

",,,

/I"~"~'.~

TYR

33

34

i .j::: ./ ;"

.' .

"~ /,

"ks

,--r-,

ii//

-....,.

FIG. 5.31 2Fo - Fc map of yeast Cu-Zn SOD at Tyr 33. Shown in thick lines is the structure of the yeast SOD model after the first refinement following the molecular replacement solution. In gray is the 2.5-A resolution 2Fo - Fc map, showing very clear new density for the side chain in a position completely different from that included in the molecular replacement model. The thin gray structure shows the final model and demonstrates the excellent fit of the density to a tyrosine.

388

PROTEINCRYSTALLOGRAPHYCOOKBOOK

solution, and its presence could be due only to a correct molecular replacement solution.

. . . . . 5.5 . . . . . MULTIWAVELENGTH ANOMALOUS DISPERSION (MAD) PHASING MAD phasing is very powerful and an excellent way to solve proteins that have an intrinsic anomalous scatterer such as Fe, Zn, or Cu. If no intrinsic anomalous scatterer is in the crystal, one can be introduced by soaking in a heavy atom such as Hg or Pt or by replacing the methionines with Se-methionine if the protein can be produced recombinantly in E. coli. Semethionine is introduced by producing the protein in a Met- strain such as DL-41 and substituting Se-methionine in the growth medium for methionine. 9 Once a crystal with one or more suitable anomalous scatterers has been obtained, the following steps are needed for MAD phasing: 1. Perform an X-ray absorption spectrometry (XAS) scan on the crystal to determine the absorption edge of the anomalous scatterer. 2. Collect data at the inflection point, at the peak of the edge, and at a remote wavelength complete with Bijvoet pairs. 3. Find the positions of the anomalous scatterer with Patterson maps. 4. Calculate phases in both absolute configurations. 5. Fit the map or solvent-flatten to improve it if needed.

Example: Cua Subunit from T. thermophilus Cyt0chr0me c Oxidase 10 Cytochrome c oxidase is a multisubunit membrane-associated enzyme that is the final step in the respiratory electron transfer chain from foodstuffs to oxygen. Cytochrome c oxidase contains several domains with three metal centers, one of which, subunit II is mostly extracellular, having an N-terminal anchor of two membrane-spanning helices and containing the CUA center. Electrons pass from cytochrome c to the CUA center and then flow into subunit I to the CuB site. CUA centers are unique in that they contain two coppers only 2.5 A apart bridged by two thiolate atoms. Although two complete structures of cytochrome c oxidase exist, one bovine and the other for Paracoccus, they are at medium resolution and do not fully reveal the 9 Doubli&, S. (1997)."Preparation of Selenomethionyl Proteins for Phase Determination," Methods in Enzymology, vol. 276, pp. 523-530. Academic Press, San Diego 1~ follow along with the solution for this example using XtalView and CCP4, you can download the data and step-by-step instructions from the Web site at http://ppcII.scripps.edu. There are also instructions at this site for obtaining the software if needed.

5.5 MultiwavelengthAnomalousDispersion(MAD)Phasing

389

nature of the CUA geometry. In particular, these medium-resolution structures, while amazing feats in their own right, do not reveal the copper-copper distance in the CUA center, nor the geometry of the ligands around the copper. To answer these questions, we, in collaboration with James Fee at the University of California, San Diego, expressed the extracellular domain of Thermus thermophilus cytochrome c oxidase containing the CUA center in E. coli as a soluble fragment by removing the N-terminal helices. We were able to grow bright purple crystals that showed characteristic CUA spectrum both optically (using single-crystal spectroscopy) and by EXAFS (using a suspension of crystals). The latter were important because we were expressing a fragment and reconstituting it in E. coli. The structure was determined at beamline 1.5 at the Stanford Synchrotron Radiation Laboratory at 100 K. CUA crystallized in space group P21 with unit cell constants a = 34.9 A, b = 70.6 A, c = 53.5 A, and/3 = 98.12 ~ Comparing the subunit molecular weight to the volume of the unit cell, we concluded there were two molecules in the asymmetric unit.

XAS Scan The XAS scan results are shown in Fig. 5.32. The exact method of doing these scans will be beam-line dependent, but in general, an X-ray detector (e.g., a Geiger counter with energy discrimination) is put as close to the crystal as possible to measure the X-ray fluorescence at 90 ~ to the direct beam. The energy is then scanned by moving the beam-line double-focusing monochromator to change the energy, and a scan is made of the change in the fluorescence relative to the primary beam intensity. The ratio is used because the emission characteristics of the beam line cause the beam intensity to change with wavelength.

Data Collection From the XAS scans we determined four data collection wavelengths, as listed in Table 5.10. We also collected Bijvoet pairs at PI, PK, and RE by inverse bean geometry. That is, after each run we did a second run at take same wavelength with goniostat angles X = -X, ~b + 180. Since X was 0, however, we could change 4~ simply by adding 180. All wavelengths were collected exactly the same, except that we did not do the inverse beam geometry run for RE2 because it has no anomalous signal. This was mainly to save time: collecting the inverse beam geometry would have increased redundancy and averaged out any absorption differences. Since the crystal was frozen at 100 K, we didn't need to worry about crystal decay.

390

PROTEINCRYSTALLOGRAPHYCOOKBOOK A I

CuAI ~1-~ s c a l e d and t r u n c a t e d 1 3

0

/

I

12 i i 1 0

09 08

07 06

87ee

88er4

89ee

9e00

91 fie

920e

EneY~x (eV)

FIG. 5.32 EXAFS scans of the CUA crystal. (A) The X-ray fluorescence output of the XAS scan. An absorption edge at about 9000 eV is the characteristic absorption of the copper K edge. The scan converted into a plot of f' (B) f" (C). The f' signal is at least 5 electrons and the f" signal is about 3.5 electrons. The points at which MAD data should be collected are indicated by arrows.

Table 5.11 shows the size of the signal within and between wavelengths. On the diagonal are the differences within a wavelength rms(F + F-)/rms(F), or the anomalous signal, and off the diagonal are the signals between wavelengths, rms(*,F - *'-F)/rms(F), or the dispersive differences. Notice that signal for the anomalous differences, which is proportional to f" in Fig. 5.32C, is largest for the peak PK and smallest for the infection point PI. The smallest dispersive difference is the PI-PK pair, as expected. The largest is for the PI-RE2 pair and the second largest for the PI-RE pair. After the MAD data had been collected, we changed the wavelength to 1.0 A to shrink the pattern and minimize absorption effects, swung the detector out, and collected a high-resolution run. This run was complete to 1.6 a and also included incomplete data to 1.45 a from the corners of the square detector. Thus, from a single crystal we collected the MAD phasing data to solve the structure and the high-resolution data needed for refinement. This was very fortunate, as it was very difficult to grow single crystals

B

CuH1

-F P -3

/

-q

-S

-6

-7

-8

t 8799 Ener~• (eV) C

8890

8'900

?I

9OOO

9'109

9'209

i

i

CuR1 I

[45

PK

u~O 35 30 25 20 1 5 1 0

RE2

05

8700

8809

8998

9999

Energx (eV)

FIG. 5.32 (continued).

9190

9299

i

392

PROTEINCRYSTALLOGRAPHYCOOKBOOK

TABLE5.10 Signal rms(AF)/rms(F) within Wavelengths (diagonal)and between Wavelengths (off-diagonal) Data

Wavelength (A)

PI

1.3799

Point of inflection, f' minimum

PK

1.3780

White-line peak, f" maximum

RE

1.3050

Remote edge above, strong f", f' maximum

Description

PattersonMaps Bijvoet difference Patterson maps for the RE wavelength data are shown in Fig. 5.33. We also made the Bijvoet Patterson map of the PK wavelength data because it has the largest anomalous signal (largest value of f"). By experimenting with resolution and outlier cutoffs, the cleanest was made at 4.5 A with a filter of 35% on outliers to xfft. This map, shown in Fig. 5.34, had four large peaks on the Harker section at y = 0.5. However, since there is a 2-fold in the center of this section due to crystal symmetry (Patterson space group P2/m), only two of the peaks are unique. This was as expected for the two copper sites in the crystal. A search of the map revealed crosspeaks on section y = 0.286 and y = 0.214, which are related by y2 = 0.5 yl and thus represent the same cross-peak repeated by symmetry. In this space group a Harker peak is located on the section y = 0.5 at (2x, m , 2z) where y is arbitrary and can be set to 0. The two peaks are at (0.40, 0.5, 0.20) which solves to (0.20, 0.0, 0.10) and the second is (0.56, 0.50, 0.35) which solves to (0.27, 0.0, 0.17). Since there was a difference peak at y 0.21, the second site was set to (0.27, 0.21, 0.17). This solution was put into xpatpred, and the position of the cross-peaks are shown in Figs. 5.34D and 5.34E. As can be seen, they didn't match. We then tried adding various origin

TABLE5.11 Signal within and between Wavelengths

RE2 PI PK RE

RE2

PI

PK

RE

Not measured

0.061

0.038

0.036

0.048

0.025

0.049

0.068

0.048 0.055

5.5 MultiwavelengthAnomalousDispersion(MAD)Phasing

393

A PK A n o m a l o u s difference map at 4.5 A Y = 0.500; Levels = 50 100 150 200 250 300 350 400 450 500 550 Scale 4.00 A/cm

Q

FIG. 5.33 Bijvoet difference Patterson maps for the RE wavelength data. (A) H a r k e r section at y = 0.5 with four large peaks of which two are unique. (B) Section y = 0.286 showing two crosspeaks with one unique. (C) The cross-peak related by y' = 0.5 - y on y = 0.214.

shifts and found a perfect match by adding 0.5 to the z coordinate of the second site. This gave us our starting solution of A = (0.20, 0.0, 0.1), B = (0.27, 0.21, 0.67). We knew, though, that the CUA site has two coppers, and there was an obvious elongation to the sites in the Patterson map. Thus, the two sites were split along the direction of the elongation into four sites. This gave us the prediction shown in Fig. 5.35. We put this solution into xheavy

394

PROTEINCRYSTALLOGRAPHYCOOKBOOK B

PK Anomalous difference map at 4.5 A Y = 0.286; Levels = 50 100 150 200 250 300 350 Scale 4.00 A/cm

L

.

9 FIG. 5.33

(continued).

and calculated phases. We tried both possible absolute configurations (or hands) by negating each derivative by ( - x , - y , - z ) and found that the correct configuration was the first one tried (Fig. 5.36); it shows some ~3-sheet at the bottom of the map and clear solvent channels (with no solvent contours above 2or). We could also discern a small difference in the electron density histograms, indicating that the correct solution was the first one; the histogram of the first (Fig. 5.37, dotted line) is narrower in the central solvent region ( - 8 0 to 80) and has a longer tail in the positive protein region above

5.5

Multiwavelength Anomalous Dispersion (MAD) Phasing

395

PK Anomalous difference map at 4.5 A Y = 0.214; Levels = 50 100 150 200 250 300 350 Scale 4.00 A/cm O.

"

1.ooo

FIG. 5.33

(continued).

100. The difference is subtle but significant. The final solution file looked like this: DERIVATIVE

NAME

cua-PI-RE

CRYSTAL FILTER

cua

1.000000

ANOFILTER

0.300000

396

PROTEIN CRYSTALLOGRAPHY COOKBOOK

A PK Anomalous difference map at 4.5 A Y = 0.500; Levels = 100 200 300 400 500 Scale 4.00 A/cm 0.000 0.000,

i

,

,

,

X,

i

i

,

,I

*

b

d

9 FIG. 5.34 Bijvoet difference Patterson maps for PK wavelength data with the predicted solution marked. Every level has been omitted to make the labels easier to read. (A)-(C) as in Fig. 5.34. (D) and (E) show the cross-peaks with the second site at an incorrect origin choice, as evidenced by the crosses not falling on the cross-peaks.

SIGMACUT 2.000000 RMIN 3.000000 RMAX i000.000000 ANORMIN 1.000000

5.5 Multiwavelength Anomalous Dispersion (MAD) Phasing

PK Anomalous difference map at 4.5 A Y = 0.286; Levels - 100 200 300 Scale 4.00 A/cm 90 0 0 1

I

,ooo/ FIG. 5.34

ANORMAX i000.000000 WEIGHT 1.000000 ANOWEIGHT 1.000OO0 PHSTYPE 0 0 BREFINE 0 FILE PI.RE.fin

I

I

XI

(continued).

I

I,

I

t/

397

398

PROTEIN CRYSTALLOGRAPHYCOOKBOOK

C

PK Anomalous difference map at 4.5 A Y = 0.214; Levels = 100 200 300 Scale 4.00 A/cm I

0.000i-O00,

i

I

X!

I

I

I

jl.

FIG. 5.34 (continued).

NSITES 4 A T O M A1 0 . 2 7 2 5 8 2 0.00 A T O M A2 0 . 2 7 2 5 0 . 0 1 0 6 A T O M B1 0 . 1 6 3 6 0 . 2 2 7 7 A T O M B2 0 . 2 3 0 5 0 . 2 0 3 8

0 . 2 0 0 3 6 4 p t + 2 1.00 1 5 . 0 0 0 . 1 5 2 8 p t + 2 1.44 15.0 0 . 6 0 3 9 p t + 2 1.27 15.0 0 . 5 9 9 1 p t + 2 1.32 15.0

5.5 MultiwavelengthAnomalousDispersion(MAD)Phasing

399

D PK A n o m a l o u s difference map at 4.5 A Y = 0.214" Levels = 100 200 300 Scale 4.00 A/cm

_~-A

FIG. 5.34

(continued).

To do the MAD phasing, we combined the data sets as outlined in Sect. 3.10 under MAD as a Special Case of MIR. We then calculated phases with xheavy to produce the MIR map shown in Plate 12A. This map is perfectly interpretable, but we ran it through density modification with the CCP4 program dm, written by Kevin Cowtan, to improve it even more. First we needed the noncrystallographic symmetry matrix and vector that describe the rela-

400

PROTEINCRYSTALLOGRAPHYCOOKBOOK

PK Anomalous difference map at 4.5 A Y = 0.286; Levels = 100 200 300 Scale 4.00 A/cm 0.000

9

1.000 1

FIG.5.34 (continued).

tionship between the two molecules. To do this we fit just enough of the two molecules to get out a fairly good symmetry operation using the LSO Fit option of xfit. We converted the phases to CCP4 format using the CCP4 utility f2mtz and the script:

5.5 MultiwavelengthAnomalousDispersion(MAD)Phasing f2mtz hklin cua.phs hklout cua.mtz c e l l 3 4 . 9 7 0 . 6 0 5 3 . 5 0 90 9 8 . 1 2 90 symmetry 4 l a b o u t H K L FP F O M P H I B CTYPOUT H H H F W P FORMAT ' (3F4.0,2F9.2,F7.2) ' END eof

401

<<eof

Weran DMwiththeinputfile" #This part adds the high resolution #file created before mtzutils hklinl hires F.mtz \ hklin2 cua.mtz \ hklout hires cua.mtz symm p21 INCLUDE 1 F INCLUDE 2 FOM PHIB RUN eof #

F's

to

the

mtz

<<eof

#Now we run DM with solvent masking, histogram match#ing and symmetry averaging. dm HKLIN hires cua.mtz hklout dm hires aver.mtz \

h i s t l i b /nfs/image/usr/local/ccp4 3.3/lib/data hist.lib <<eof solc 0.45 mode solv hist aver ncsmask nmer 2 wang 8 0 aver refi # the identity matrix for the first molecule rota

matrix 1.00 0.00 1.00 t r a n s 0.0 0.0 0.0 aver refi

0.00

0.00

1.00

0.00

0.00

\

0.00

#the matrix and vector from xfit's LSQ Fit function #for the second molecule rota matrix 0.051060 -0.364196 -0.929970 0.164497\ -0.915385 \ 0.367570 -0.985096 -0.171713 0.013137

\

402

PROTEIN CRYSTALLOGRAPHY COOKBOOK

PK A n o m a l o u s difference map at 4.5 A Y = 0.500; Levels = 100 200 300 400 500 Scale 4.00 A/cm

/

0.000p. 000,

,

,

0

,

x,

,

.,

,

,

"

% A2

I-AI

2

1

9 FIG. 5.35 Same Patterson maps shown in Figs. 5.33 and 5.34, giving the final solution after the site had been split into two coppers each for a total of four coppers.

trans

ncycle

combine

LABIN

i0.930595 20

ii.009571

39.851776

free

FP=F

FOMO=FOM

PHIO=PHIB

5.5 MultiwavelengthAnomalousDispersion(MAD)Phasing

403

8

PK Anomalous difference map at 4.5 A Y = 0.268; 0.286; 0.304; Levels= 100200 Scale 4.00 A/cm O.000 3.000,

I

I

I

XI

I

I

I

I

1.000

~-B1

1.000 FIG. 5.35

LABOUT eof

PHIDM=PHIDM

(continued).

FOMDM=FOMDM

dm has the nice feature that it calculates all the needed masks automatically including the noncrystallographic symmetry mask. At this point we put the phases back into xfit using the script:

404

PROTEIN CRYSTALLOGRAPHY COOKBOOK

PK A n o m a l o u s difference map at 4.5 A Y = 0.196" 0.214; 0.232; Levels = 100 200 Scale 4.00 A/cm

.000,

0

I

I

I

X

I

i I.0

I

A

-AI -A2

~

r -A2 -AI

1.000

(continued).

hires

aver.mtz

mtz2various

hklin

OUTPUT

' (314,x, F7.2,3x, F7.2,3x,

dm_hires labin FP-FP

END

eof

USER

dm

FIG. 5.35

aver.phs FOM:FOMDM

<< e o f PHIB-PHIDM

hklout

F7.2) '

405

5.5 MultiwavelengthAnomalousDispersion (MAD)Phasing

PK A n o m a l o u s difference map at 4.5 A Y = 0.482; Levels = 100 200 300 400 Scale 4.00 A/cm 0

"O001

!

I

I

X

I

I

I

I

I

1"

t A2

~

e

A2

/

5 FIG. 5.35

(continued).

The final map is shown in Plate 12B. We finished fitting the model and refined it starting with XPLOR. Between molecules in the unit cell, we found a large peak that looked like a cation. Since we had added zinc to the crystallization medium, this was our first guess. A crystal was back-soaked in zinc-free mother liquor, and when the XAS scan was run in the zinc region, a zinc absorption edge appeared. Therefore, two zincs were added to the

A Y =-0.063;-0.050;-0.038;-0.025;-0.013; 0.000; 0.013; 0.025; 0.038; 0.050; 0.063; Levels = 60 110 160 Scale 3.00 A/cm

t~

v

FIG. 5.36 Fo-figure-of-merit maps made of the two alternate absolute configurations for the heavy-atom solution. (A) The first configuration tried was the correct one. Note the ~3-strands in the lower half of the figure and the solvent channels on the right-hand side. (B) the configuration made by putting each heavy atom at ( - x , - y , -z) shows no proteinlike features and no large solvent channels and is thus incorrect.

B Y =-0.063;-0.050;-0.038;-0.025;-0.013" 0.000; 0.013; 0.025; 0.038; 0.050; 0.063" Levels = 60 110 160 210 260 Scale 3.00 A/cm

FIG. 5.36

(continued).

407

408

PROTEINCRYSTALLOGRAPHYCOOKBOOK

3564

N(rho)

25023(~21(~19;17; 152'132'11~93'74'54'35 '15 J3 123 42 162 182 '101'121'140'160'179'19912181238 250 Size of rho

1 cua2_2.h2.map: All points 2 .............. cua2_2.map: All points

FIG. 5.37 The electron density histograms of the two solutions compared: dotted line shows the histogram of the first configuration (Fig. 5.36A); solid line shows the second one (Fig. 5.36B). See text for explanation.

model. At this point we switched to SHELX and used all the data to 1.4-A resolution for refinement and model bulding. The metal center was refined anisotropically, and this cleaned up the density around the copper site substantially. Another reason for switching to SHELX was to allow us to refine the copper center without any constraints, thus without prejudicing the geometry we finally found. A ribbon drawing of the final structure is show in Plate 13. A standard uncertainty analysis was performed on the copper center with SHELX on the final model; the block-diagonal matrix used contained all the positional parameters, and we calculated a c o p p e r - c o p p e r distance of 2.51 + 0 . 0 3 A . A lot of work for one distance!

6 CRYOCRYSTALLOGRAPHY: BASIC THEORY AND METHODS

Peter R. David

The techniques of cryocrystallography, collecting data with frozen crystals, are revolutionizing the field of macromolecular crystallography. A dramatically increasing fraction of the crystal structures currently being solved use cryocrystallography. This is especially true for the complex data collections at synchrotrons. Yet, cryocrystallography has perhaps the most to offer users of rotating anodes and sealed-tube sources. While the technique can appear complicated and more art than science at first, there is a method and there are scientific reasons for the success of cryocrystallography. Once the basic equipment has been made or purchased and simple techniques as described in this chapter mastered, cryocrystallographic data collections tend to outnumber room temperature data collections. Once learned, the technique is straightforward, although there will always be problem cases. This chapter is a brief overview of crystallography at temperatures below 273 K (0~ especially those techniques important to successful work below 130 K (-143~ After a brief background on some of the theory, we present details of some useful equipment, followed by a discussion of several strategies for success and a protocol for freezing crystals.

409

410

CRYOCRYSTALLOGRAPHY .....

6.1

.....

OVERVIEW

Cryocrystallography, defined as cold crystallography, has been in existence almost as long as crystallography. Early experiments used sealed cryostats, which were essentially dewars with beryllium walls transparent to the scattered X-rays. This method is still used today for experiments that must be performed below liquid nitrogen temperatures, 77 K. Later experiments used chillers to cool air to as low as 200 K; however, problems associated with dehumidifying the air and removing carbon dioxide set this as a lower practical limit. One manufacturer has introduced a chiller using room temperature air as input to cool below liquid nitrogen temperatures. Modern cryocrystallography is the general methodology by which the crystal of interest is placed into a protective solution (a cryoprotectant solution) and frozen with minimal damage, resulting in data collection at cryogenic temperatures. Cold, but not solid state, crystallography has a number of special advantages. These include the reduction of thermal motion in the crystal, the ability to slow down enzymatic reactions, and the ability to continually diffuse in fresh substrate to enzymes in crystals. What are the advantages of cryocrystallography for macromolecular crystallographers? The data collection process can be conducted with fewer worries about crystal decay, data with high signal-to-noise ratios can be collected, and optimal crystals can be more readily selected. How is this accomplished? For the most part, through changes in the mounting of crystals and cryogenic effects on the crystals and the solutions surrounding the crystals. Although the solvent remains liquid, the mean free path of free radicals is greatly shortened, thereby reducing radiation damage and increasing both data quality and resolution. As a result, it is possible to collect multiple data sets from a single crystal, greatly reducing the problems of data processing. Data quality is often improved for flash-frozen crystals, for several reasons. Since most crystals are mounted with minimal support and very little solvent, the background of diffuse X-ray scatter is reduced. This reduction in solvent scattering, in turn, improves the data quality of data between 6 and 3.5 A. Furthermore, the ordering of loops in the protein or nucleic acid both lowers the mosaicity of the crystals and gives better quality data, extending to higher resolution. Finally, exposures can be longer, allowing the experimentalist to obtain better signal-to-noise ratios, and statistics are improved because the crystals are not decaying rapidly. This chapter concentrates on cryocrystallography below 130 K, generally in the range of 80-95 K (observed temperatures depend upon how and where the temperature is measured). The choice of 130 K is not arbitrary; it

6.2 Theory

411

is the temperature below which vitreous water does not convert to a crystalline material, ice. Solid water that has no long-range order is said to be vitreous water, or vitreous ice. The formation of crystalline ice is often associated with the disruption of the macromolecular crystal lattice, and since ice strongly diffracts X-rays, ice formation is considered to be a bad thing. It is important to keep crystals well below 130 K.

.....

6.2

.....

THEORY Water is an unusual molecule; other compounds of similar composition and molecular weight have very different properties (see Table 6.1). Many of the differences can be attributed to the hydrogen-bonding patterns. Water donates two hydrogen bonds and receives two hydrogen bonds in an approximately tetrahedral arrangement. This has several results: first, ice (crystalline water) is quite stable; second, liquid water retains a high degree of local order; and third, liquid water appears to be made up of rapidly rearranging lattices of ice. Thus, when water is slowly cooled, it readily nucleates and forms ice at 273 K. Initially, water forms hexagonal ice, which upon further cooling can convert to a cubic form of ice. These are the thermodynamically stable forms. If, however, the water is rapidly cooled below 130 K before any of its molecules nucleate to form ice, the water will exist as a vitreous, or frozen, form with no long-range order. The required cooling rate is thousands of degrees per second, which typically requires very small samples. This technique is widely used in electron microscopy to flash-freeze samples. Since current technology does not permit such flash-freezing for X-ray diffraction with typical crystals of macromolecules, we must use other means, such as the use of cryoprotectants.

TABLE6.1 Properties of Water versus Other Small Molecules Compound Methane, CH4 Ammonia, NH3 Water, OH2

Molecular weight Dipole (kDa) moment(D) 16 17 18

0 1.5 1.8

Melting point (K) 90 195 273

Boiling point (K) 112 240 373

412

CRYOCRYSTALLOGRAPHY

If the sample is kept below 130 K once frozen as vitreous water, the water will not crystallize because the molecules do not have enough energy to alter their arrangement and re-form in a new one. If vitreous water is warmed above 130 K, the water will begin to crystallize as cubic ice. The exact temperatures of these conversions are dependent upon the history of the particular sample and the environment (i.e., pressure). Cubic ice is stable below 130 K, so if a flash-frozen crystal, with vitreous water, is warmed above 130 K, even for a few moments, it can form crystalline water. Therefore, it is important to ensure that frozen crystals of macromolecules stay well below 130 K at all times. There are several common strategies that can be used to prevent ice formation when cooling a sample. First, one can use freezing-point depres"sion by adding any chemical to depress the freezing point of water; but the limit to this method is only about 230 K. Second, the actual water content can be reduced by the addition of nonaqueous solvents, such as DMSO or ethanol. This is the method of choice for those who do cryocrystallography in the range of 2 0 0 - 2 7 3 K. These two methods function by thermodynamic inhibition of crystallization. No matter how slowly one cools the sample, the solvent remains liquid. The third option is to reduce the local order of water by adding a substance, termed a cryoprotectant, and then cool the sample very quickly to below 130 K. Cryoprotectants are functionally defined: if a substance is effective at reducing the formation of ice, it is a cryoprotectant. Cryoprotectants seem to function by disrupting the local order of water, thereby preventing the nucleation of water that is the critical first step in the formation of ice. Most cryoprotectants are kinetic inhibitors of crystallization. This means that the fast cooling of the sample, your crystal, not only is important but is critical to successful cryocrystallography. Although both water and ice have locally ordered water molecules, only ice has water molecules that are ordered over a long range, (i.e., as crystals). The degree of local order is readily observed from the X-ray diffraction patterns by looking for diffraction from the ice crystals (see Plate 7). Increasing concentrations of the cryoprotectant progressively disrupt the ice from large single crystals to microcrystals and, finally, to vitreous water. Generally, there are two major groups of cryoprotectants: bifunctional molecules (i.e., molecules that are both hydrophobic and hydrophilic) and salts. The salts coordinate clusters of water to help solvate their charges and thus reduce the amount of free water. The same effect is used to force macromolecules out of solution to grow crystals, or to form precipitates. The bifunctional molecules disrupt the local water order but do not necessarily coordinate large numbers of water molecules. Some of the most versatile cryoprotectants (e.g., glycerol) belong to the latter group.

6.3 RoomTemperatureversusLow-TemperatureCrystallography

413

. . . . . 6.3 . . . . . ROOM TEMPERATURE VERSUS LOW-TEMPERATURE CRYSTALLOGRAPHY Crystals of macromolecules contain a repeating motif of solvent channels around a macromolecule that touches its neighbors via crystal contacts at relatively few points. At room temperature, the solvent and solvated ions are in rapid motion and are not well resolved on the time scale of crystallography. This motion is reflected in the diffusion rates of small molecules, typically diffusing fractions of a millimeter per second. These diffusion rates are also important in the equilibration of cryoprotectants into macromolecular crystals. The amino acids or nucleic acids in a crystal are also in motion, vibrating and sampling different conformations, but to a reduced extent. The result of this motion is often observed in short loops that are present in multiple conformations in an X-ray crystal structure. If the parts of the molecule are in constant movement, this is dynamic disorder, while the term "static disorder" describes the case of a number of different molecular conformations within a crystal. As crystals are cooled, the motion of the atoms slows and the dynamic disorder decreases. Cooling, per se, does not alter the static disorder, although the addition of cryoprotectants may reduce the number of molecular conformations within the crystal, thereby improving the order of the crystal. Both the cooling and the freezing of crystals reduce the atomic vibration, the atomic diffusion, and the reactivity of free radicals. When macromolecular crystals are exposed to X-rays, free radicals are produced at a rate of approximately several hundred free radicals per absorbed photon (at 1.54 A). While the exact nature of the composition of free radicals and their production mechanism is not conclusively known, the results of X-ray exposure have been extensively studied experimentally. The free radicals are thought to consist of hydroxy, peroxy, and oxygen radicals and to affect crystals by oxidation. Free radicals can diffuse significant distances in the crystal. These free radicals can then react with the macromolecules, degrading the macromolecules by breaking bonds between atoms. Normally, this oxidation is hard to observe in an electron density map because it occurs at many sites on many macromolecules; but if some atoms are particularly susceptible, it can be seen. If the crystal contacts are degraded, the long-range order of the crystal is reduced and will manifest itself in an increase in mosaicity, a decrease in resolution, or both. After extensive exposure to X-rays, most crystals begin literally to disintegrate because the crystal contacts have been thoroughly disrupted.

414

CRYOCRYSTALLOGRAPHY

In solids, and specifically in water at normal cryogenic temperatures (85 K), there is virtually no atomic diffusion and there is greatly reduced atomic motion and vibrations. With a very high probability, free radicals that are produced in a vitreous matrix will self-quench and re-form the starting molecule, even if the free radicals have a relatively long lifetime. Similarly, bonds that have been broken are very likely to re-form, since the atoms are trapped in the exact starting positions. Even if a critical bond is broken, such as one at a crystal contact, the matrix of solid solvent will retain the position of the macromolecule, and thus the long-range order of the crystal. These effects manifest themselves on the macroscopic scale with greatly reduced or eliminated crystal decay and lower atomic vibration factors. For most macromolecules, data collection at cryogenic temperatures will result in very little decay, even at synchrotrons. For weakly diffracting crystals, especially crystals with large unit cells, there may be some observable decay. In addition to the microscopic effects of atomic motion and crystal decay, certain processes may go on as the crystal and macromolecule cool from 298 K to 85 K: there may be selection of one conformer of the macromolecule from the population that exists at room temperature; there are changes in the dielectric constant of the solvent; the amount of free water decreases; and there may be pH and ionization shifts as the temperature drops. All these can affect the quality of the resulting crystals and their diffraction. The structure of the protein you are solving may be changed, as well. In one case, a high-spin metal cluster switched to a low-spin state when frozen. On the time scale of atomic vibrations and movement, freezing is a long process, no matter how quickly the experimentalist manages to cool the crystal. While the crystal freezes, the macromolecules may preferentially adopt a reduced set of conformations or a single conformation. This is often referred to as annealing, by analogy to metallurgy. Even at room temperature, crystals occasionally undergo an annealing process as they are transferred from the mother liquor they were grown in to the cryoprotectant solution (the mixture of cryoprotectant and sample buffer, usually including all the salts and additives required to grow the crystals). Whether annealing occurs depends on the particular macromolecule and the conditions used in handling it. Since proteins are easily denatured, at any given moment some molecules will possess enough energy to sample other alternative conformations. Most macromolecules undergo substantial molecular motion at room temperature. As the macromolecules are frozen, however, there is a competition between the rate at which the molecules can convert to a thermodynamically more stable conformer and the rate at which the crystal is being cooled. If the rate of cooling is faster than the interconversion rate, there will be a number of kinetically trapped conformers, or dynamic disorder; conversely, the process of freezing may select a single, thermodynamically favored conformer. This is why going

6.4 CryogenicSafety

415

to higher levels of cryoprotectants often yields better diffraction at low temperature: the cryoprotectants are altering the thermodynamics of the solvent and thus the final equilibrium (i.e., favoring a single conformer). Changes in rates and magnitudes of atomic motion are not the only effects that are observed as crystals are being cooled. Many compounds, amines especially, have activities that are strongly affected by temperature. The common buffer Tris shifts 0.05 pH units per degree. The dielectric constant of water changes with temperature and during the addition of cryoprotectants. The change in the dielectric can be accompanied by shifts in the ionization of the macromolecule, and this can affect both molecular rigidity and intermolecular crystal contacts. If the macromolecule of interest is sensitive to shifts in pH, it is worthwhile finding a suitable buffer to protect the macromolecule while it is being frozen. An alternative way to visualize the changing dielectric of water is to think of the warm water as being more hydrophobic and colder water as more hydrophilic. Crystallizations that are dependent upon changes in hydrophobicity are likely to be affected by the cooling process and are likely to require changes in the cryoprotectant-buffer mixture. In extreme cases, it may be necessary to cross-link the crystals with a cross-linking reagent such as glutaraldehyde. Typical concentrations are 0 . 0 0 2 - 0 . 2 % and must be done in a solution free of primary and secondary amines (e.g., Tris). There are also some exotic methods of cooling that may be necessary under certain circumstances, such as forming xenon derivatives for highpressure studies, light-sensitive crystals, and anaerobic forms.

.....

6.4 . . . . .

CRYOGENIC SAFETY

Cryogenic apparatus is inherently dangerous. All experimenters should become familiar with the potential hazards and proper safety procedures for work with cryogenic materials. Cryogenic liquids can readily damage skin and eyes. Protective clothing should be worn at all times. An engineer skilled in cryogenic materials should be consulted before any new equipment is designed and used. Many materials undergo a dramatic shift in their properties at cryogenic temperatures; for example, rubber and plastics shatter, and many common glues fail catastrophically. Glass dewars fail by implosion, which rapidly becomes an explosion. Glass dewars are likely to fail when undergoing large temperature changes, being knocked while cold, or if scratched. Never, ever, let a dewar warm up to room temperature with liquid nitrogen in it. Liquid nitrogen can condense

416

CRYOCRYSTALLOGRAPHY

oxygen from the air, making a cocktail that has been known to detonate.

Always properly dispose of any remaining liquid nitrogen (e.g., by outside disposal). Do not let liquid nitrogen evaporate in a dewar! When working with cryogenic liquids, it is prudent to wear closed-toed shoes. It is also important to be very conscious of the location of one's fingertips when working close to a cryogenic liquid or cold solid. Safety glasses are absolutely required. Nitrogen is not a life-supporting gas. It is important to use liquid nitrogen in rooms that are well ventilated. Working in a cold room with cryogenic liquids is extremely hazardous to your survival.

.....

6.5

.....

EQUIPMENT

The equipment needed to begin cryocrystallography is minimal: a device capable of cooling a crystal to 85 K, a mount for the crystal, and all the normal items such as goniometers and X-ray sources. If a little effort is taken at the outset, an easy-to-use, versatile system can be set up. A variety of systems have been used to cool crystals in the past, some with sealed sample holders and others with the crystal open to the surrounding environment. Although older systems used closed cryostats, the problems of maintaining dewars with beryllium windows are significant, and except in the limited number of cases in which it is important to reach temperatures below 78 K, the boiling point of liquid nitrogen, not worthwhile. Ultra-lowtemperature experiments often require double- or triple-layer dewars to achieve enough insulation from the environment. Finally, below 78 K, one must use liquid helium or helium refrigerators, which are very expensive. In 1999 liquid nitrogen cost the same as milk, while liquid helium had approximately the same cost as fine French cognac. Therefore, unless you have a good reason for going colder, such as a phase change or the need to trap a very unstable intermediate, use an open cryostat with nitrogen cooling. Most modern cooling systems are open cryostats. A continuous stream of cold gas, usually nitrogen, is blown onto a crystal to keep it cool. The simplest devices use a small heater to boil liquid nitrogen in a sealed dewar. The resulting cold gas is piped through a vacuum-insulated transfer tube to cool the crystal. While having the benefits of being cheap and simple, these cryocoolers tend not to maintain a stable temperature, in part because of the variations in the flow rate of the gas caused by the boiling liquid nitrogen. The greatest drawback is that once the cryocooler has been started, the de-

6.5 Equipment

417

war is almost impossible to refill. More complex arrangements have used two dewars in series or pressure regulators to solve both the problem of refills and of temperature stability. Such equipment is available from a number of vendors. There are two other approaches. The Oxford Cryostream, made by Oxford Cryosystems, uses a room temperature gas to maintain a constant flow of gas. In this device, liquid nitrogen is sucked up a vacuum-insulated transfer tube almost to the crystal; then it passes through a heat exchanger and is converted into room temperature gas that is pumped via a flow regulator back through the heat exchanger and onto the crystal. Since the heat exchanger is very close to the crystal and has significant thermal mass, the Cryostream is able to maintain exceptional temperature stability. Because this arrangement is almost totally enclosed in vacuum and uses only two small heaters, the system has very low liquid nitrogen consumption rates (0.6 liter of liquid nitrogen/h). An alternative approach and one of the more complicated crystal coolers is the Xstream by MSC, which has the advantages of requiring no external liquid nitrogen and being able to achieve temperatures below 78 K if helium gas is externally supplied. In normal use, almost pure nitrogen gas is extracted from laboratory air by a set of prefilters. The nitrogen is cooled via a high-efficiency cooling engine and piped out via a vacuum-insulated transfer arm to cool the crystals. Disadvantages include a significant size and a somewhat higher noise level. All cryocoolers are only as good as their liquid nitrogen supply; care should be taken not to allow water and air to enter the liquid nitrogen dewars used by the cryocoolers and, in the case of the Xstream, to ensure that the filters and dryers are changed regularly. The dewars should be emptied, cleaned, and dried out periodically to preserve the life and success of the cryocooler. If the laboratory expects to be actively involved in cryocrystallography, it will be worthwhile examining the running costs of cryocrystallography, since some cryocoolers have lower operating costs than others. Considerable price reductions can be achieved on bulk purchases of liquid nitrogen, often at 25% of the normal price. In some areas it may be more cost-effective to install a liquid nitrogen generator on site. Having chosen a cryocooler, it is important to attend to the details of its installation. The region around the crystal in an X-ray beam is usually physically cramped. If the location of the cold nitrogen stream is readily adjustable, working with the cryocooler will be much easier. Often it is useful to accurately raise and lower the cryocooler a centimeter or two to temporarily enlarge the working space around the crystal. Be sure to align the nozzle as close to the crystal as possible without shadowing the detector, and allow enough room to mount crystals easily. On some systems, it may be easiest to raise the cryocooler nozzle when first attaching a crystal and lower

418

CRYOCRYSTALLOGRAPHY

the nozzle as soon as the crystal has been mounted. It is also important to determine the lowest practical operating temperature: that is, the lowest temperature that the system can produce stably and reproducibly. Most cryocoolers can achieve colder settings without requiring more liquid nitrogen. The lower limit is usually determined by residual heat leaks. If the minimum stable temperature of the cryocooler begins rising, it is probably time to have the vacuum in the transfer arms checked and have the vacuum-insulated transfer arms pumped down to a good vacuum again. Having a cold cryocooler is of no use if it does not cool your crystal. It is best to align the cold gas stream on an empty loop of the appropriate height before each use. Figure 6.1 shows a loop mounted on a bendable wire in a mounting pin. This pin has been mounted in a magnetizable base, termed a magnetic base, for ease of mounting. A small magnet attached to the goniometer holds on the magnetic base. Eventually, all these components will be used to mount the crystals, and the magnetic base will play an important role in storing frozen crystals.

Crystal Supports While early attempts at cryocrystallography of macromolecules used capillary-mounted crystals, there are severe constraints to this method, notably the slow cooling rates due to the high thermal mass of the capillary. Ideally, a mounting for crystals is transparent to X-rays, is of low mass, and adds no additional X-ray scattering. While no single system is perfect, some are excellent. There are a number of different systems available commercially and a number of home-developed systems. As is common in small-molecule cryocrystallography, crystals can be mounted on straight glass fibers, an ap-

FIG. 6.1 A typical mounting for frozen crystals on goniometers.

6.5 Equipment

419

proach that is simple, but often useful only for robust needles, when data can be collected on an overhanging portion of the crystal. Very thin glass sheets made by rapidly blowing a bubble in a closed glass have been used as crystalholding spatulas. The glass is a very strong attenuator of X-rays, however, and the resulting strong anisotropies in the diffraction data constitute a disadvantage. The glass spatula method can sometimes be useful for fragile crystals of platelike morphology that cannot be mounted in any other way. The most successful method has been to use loops of a thin fiber. The crystal is either held in the meniscus of the liquid in the loop or supported entirely by a loop about half the size of the crystal. The former practice is less likely to stress the crystal, while in the latter case there is less solvent on the crystal. Many fibers have been used, and the ideal fiber would be X-ray transparent, infinitely stiff and strong, amorphous, and hydrophilic. While the ideal fiber does not exist, several come close and several have some real problems. Wire has severe diffraction problems. If the choice is limited to the elements found in proteins, several fibers are possible candidates: nylon, silk, wool, and rayon. Nylon is strong, but being hydrophobic is hard to work with. At low resolution nylon suffers from polymeric order and leaves regions of locally high X-ray scattering. Silk and wool are also polymers of regular repeating units and can show fiber diffraction patterns. Raw silk and mohair wool have been used successfully by a number of groups. Before using these fibers, clean them thoroughly with solvents to ensure that the loops will be as hydrophilic as possible. The fiber of choice for most groups is rayon, which is composed of amorphous cellulose and therefore attenuates X-rays at approximately the same rate as solvent and proteins. Rayon and mohair show the least amount of fiber diffraction. A few samples of rayon that have been stretched or otherwise modified can show some fiber diffraction; if this happens, try using smaller fibers or a different source. Experiment to see what works well for you and your crystals. Once the materials and the apparatus for mounting the crystal have been chosen, the crystal still must be mounted. There are three simple rules to crystal mounting: (1) handle the crystals as little and as gently as possible; (2) other than the crystal itself, minimize the mass of material in the X-ray beam; and (3) work very quickly once the crystal is in the loop. Rule 3 is most important because the crystal has an extremely high ratio of surface area to volume, and the crystal can rapidly lose solvents and change dramatically in osmolarity. It is a great help to practice the necessary crystal manipulations with unimportant samples until you can mount the crystals consistently. Any additional materials that are present in the X-ray beam will add to the background scattering and lower the quality of the data by reducing the signal-to-noise ratio. Correctly using loops to mount crystals will minimize the amount of additional x-ray scattering.

420

CRYOCRYSTALLOGRAPHY

Necessary Materials for Loop Making and Mounting 9 Fine fibers or premade loops 9 Brass pins, 13 mm x 3 mm (1/8 in.) diameter (Huber type, or equivalent) 9 Glass pipettes 1-5 #1 (borosilicate) 9 Platinum wire, 0.125 mm (0.005 in.) diameter to fit inside pipette 9 Cyanoacrylate glue ("Superglue," Loctite, etc.) 9 Epoxy glue, quick-setting (5 min) 9 Modeling clay (Plasticine) or a rack to hold pins while the glue sets 9 A small needle (26 ga. or smaller) or stiff steel wire 9 A mixing dish for the epoxy, preferably polyethylene 9 Applicator stick to mix the epoxy 9 Glass cutter 9 Scissors 9 Several hemostats, or locking tweezers 9 Fine tweezers to manipulate the platinum wire 9 Fine scalpel (#15), or razor blade

Helpful Additions 9 A stereo microscope for delicate work and checking details 9 A desk light or other work light

MakingLoops Both premade loops and loop-making machines are available. If one needs a custom-sized loop or a loop in a hurry, here is a quick method. Rayon is easily obtained from a number of sources, either as threads or in bandages (e.g., the Kling| bandages sold by Johnson & Johnson, the fibers of which are long and very thin). One can twist several thin fibers into a fine thread only 3 - 1 0 / ~ m in diameter. Whatever material is used for loops, it should have very fine fibers. Pull several of the fibers from the main group and roughly align the ends. Hold one end between the thumb and forefinger of your left hand, the end other in the same way in your right. Then by moving one thumb across the corresponding forefinger, you can roll the fibers into a thread. If you roll the fibers too much, they will twist back upon themselves and form a loop. This is a very simple loop, but it is nicer to be able to control the diameter of the loop. Loops can be made by twisting the fibers around a small needle or steel wire. Glue the ends with superglue when done. Make a variety of different diameters so that you can select the one most appropriate for each of your crystals. When you have made a stock of loops, they need to be mounted. Figure

6.5 Equipment

421

Loop --'---------._.._~

w~ ~...~.~

l]

Glass Pipette Epoxy

Brass Pin

' '~

f A

B

C

D

FIG. 6.2 Typical crystal mounts, showing the component parts and demonstrating various methods of mounting crystals.

6.2A shows a normal mount in a cutaway view. Figure 6.2B shows the same mount after the wire has been bent to allow for the collection of the blind region. In general, robust cubes can be mounted in loops about half to twothirds their size; thin plates will need loops larger than the plate (Fig. 6.2D), while needles generally benefit from a creased loop about the length of the needle (Fig. 6.2C). Robust needles can be mounted in a creased loop with up to half of the needle projecting beyond the loop (Fig. 6.2B). Users of premade mounts that include a mount point for the loop and terminate either in their own base or in a standard 3-mm pin can skip the next paragraph. The mounts shown in Fig. 6.2 and described below consist of a twisted platinum wire held in a glass support within a hollow brass pin. While the platinum wire can be directly glued to the pin, borosilicate glass and platinum have similar coefficients of expansion and make a much more durable unit. Begin by cutting the glass pipettes into 15- to 17-mm lengths. Then mix a little bit of the epoxy. Pick up a piece of the glass pipette, dip it in the epoxy until a ball of epoxy about 0.5 mm thick forms around one end of the pipette. Press this end into the brass pin until the glass protrudes about 3 mm from the far end of the brass pin, then slowly pull back until the glass is flush with the base. If the amount of epoxy is sufficient and the glass is inserted and removed correctly, the result is a nice aerodynamic taper, flaring out gently from the glass pipette to the brass pin. If any epoxy spilled over the edge,

422

CRYOCRYSTALLOGRAPHY

clean it up now and check it later. Under no circumstances should the epoxy bulge out beyond the diameter of the brass pin. Repeat this procedure until you have enough pins for your experiment. Now you need to twist the platinum wire up. Fine platinum wire is available from electron microscopy supply houses and from inorganic suppliers, such as Alfa. Cut a piece of wire about 3 cm long. Bend it in half around the needle (a U shape) and clamp the free ends together in a hemostat. Now, by twisting the hemostat around its long axis and holding the needle fixed (here a second hemostat is quite useful), you can rapidly and uniformly twist the wire. Then cut off the platinum wire close to the hemostat, but in a well-twisted region, and insert the wire into the glass pipette until only 3 - 4 mm is protruding. This step is easier to do under a microscope, if there is a stereomicroscope available. The wire loop around the needle is cut off. A drop of Superglue will hold the wire to the glass. Epoxy can be used, but one must have steady hands and work rapidly. The twists in the wire help hold the wire in the glass and will help the loops adhere to the wire. Loops can now be glued to the wire, taking care to ensure that the base of the oval portion of the loop is at least 1-200/,m above the wire. First, the wire is wetted with Superglue. Then the loop is picked up on a small piece of wire or needle and the tail of the loop is dipped in a puddle of Superglue and touched to the wet wire. The height is adjusted by raising or lowering the loop and then the needle is gently slid out of the loop. Check for any glue on the loop and remove it if necessary. Allow the Superglue to dry for several hours. Anaerobic areas or UV light will speed up the curing process. Check the mounts for excess glue; any epoxy over the edge of the brass pin will stick to the transfer tongs, and any excess glue in the loop will add to the background intensities during data collection and produce turbulence in the gas stream, causing ice formation. The epoxy can be carefully removed with a sharp needle or scalpel while soft, and the Superglue can be trimmed carefully with a small scalpel or razor blade. It is easiest to correct these little imperfections while the glue is not yet fully set. With practice, you will get it right the first time. The tool shown in Fig. 6.3 is recommended for bending the wire. A robust crystal is shown resting in a loop about half its size. If the X-ray beam hits the part of the crystal beyond the loop, the amount of noncrystalline mass in the X-ray beam is minimized, resulting in data with higher signal-tonoise ratios. An elongated loop (Fig. 6.2C) is used to hold a narrow needle, while the wire mounting technique (Fig. 6.2D) is applied to a tapered pin. The twisted wire is inserted in the end of the pin, with enough wire protruding to allow the wire to be bent as shown in Fig. 6.2B. The thin plate crystal shown in Fig. 6.2D illustrates how the crystal is just slightly smaller than the loop.

6.6 CrystalLoopMountingTechniques

423

~2cm

4mm

I

,--2cm

I

---4cm

I

I

2mm

FIG.6.3 Wiremount bending tool (not to scale).

Optimally, all the mounts should be exactly the same height; this will simplify transferring frozen crystals into and out of the cold stream. The dimensions given here are the largest limits appropriate for use with a magnetic base and magnetic mount on a standard goniometer head on a Huber fourcircle diffractometer, one of the instruments with the greatest restrictions on sample height. R-axis goniometers have an even smaller adjustment range, and the necessary dimensions of the loop mounting will depend greatly on the exact details of the cryocooler, magnetic base, and magnetic mount used.

. . . . . 6.6 . . . . . CRYSTAL LOOP M O U N T I N G TECHNIQUES Avoid handling the crystals more than necessary, and always handle them gently. These points cannot be overemphasized. Fragile crystals usually do best in loops at least 5 0 - 1 5 0 #m larger than the longest dimension of the crystal. If possible, use a wick to remove some of the excess solvent. Remember that the solvents will be evaporating from the loop at a significant rate. If getting the crystals drier takes too long, the crystals will be damaged by the change in ionic strength and possibly in the content of the cryoprotectant. If the crystals are needles, crease the loop at one end to make an elongated oval loop; this configuration will retain less mother liquor. The crystals may show improved diffraction because there is less noncrystal mass in the X-ray beam and because the crystals are less mechanically stressed. For extremely fragile plates, it may be necessary while making the loops to wind several turns of fiber side by side around the needle to make a taller loop. These loops are somewhat stiffer, but they can be made to match the thickness of the crystal and minimize stresses on the crystal. For delicate crystals, try to pull the crystal through the meniscus along the longest edge of the crystal and with the loop perpendicular to the meniscus. This minimizes the stresses on the crystal and will reduce the amount of cryoprotectant solution adhering to the crystal.

424

CRYOCRYSTALLOGRAPHY

Robust crystals are best frozen in a loop half to two-thirds the size of the average dimension or at least half the longest dimension. Since so little solvent clings to the crystal, solvent removal is neither necessary nor possible with this method in most cases. If the relationship between the crystal morphology and the crystal cell axis is known, a great deal of time can be saved by aligning the crystal with respect to the pin before freezing. Often, a small crease in the loop will help orient the long axis of the crystal along the loop, assisting in alignment. To enable collecting the most complete data set in the minimum rotation, it is best to align the crystal with its long axis about 10-15 degrees away from the axis of the mount. Like many techniques in protein crystallography, the mounts or the methods may have to be modified to suit particular crystals. By using the rationale for the materials and methods just given, it will be easy to modify these methods as required for any particular crystal.

The Use of PlatinumWire The use of platinum wire in the crystal mounts gives an advantage during data collection. If the desired axis for data collection is not coincident with the axis of the pin, bending the wire with the tool shown earlier (Fig. 6.3) can align the crystal. This method allows one to bend the wire supporting the crystal and thereby collect the blind region(s), even on a single-axis goniostat. This is important for high completeness of data, whether for heavy-atom derivatives or for high-resolution data. The wire mount bending tool is made from a piece of plastic, preferably Delrin | failing that nylon, about twice the diameter of a pencil, tapering to a 1.5-mm diameter shaft with a 2-mm deep notch on the end. The shaft is tapered at the tip to make the plastic on either side of the notch as thin as possible to prevent turbulence and heat transfer both from the tip and from turbulent inclusion of warm air. The critical features of this tool are as follows: 9 The base should be large enough to turn comfortably in one's fingers. 9 The stem should be long enough to allow easy access to the crystal from a warm area outside the cold stream. 9 The shaft should be thin. 9 The notch in the end should be only about twice as thick as the twisted wires. If the tool is inserted slowly from outside the cold stream and downstream of the crystal, the wire can slowly be bent by up to 90 ~. This allows the collection of blind regions and the alignment of crystals that are too far

6.6 CrystalLoopMountingTechniques

425

from the axis of the pin to align with the goniometer arcs. Fine adjustments are best made on the goniometer arcs after bending. Unless the magnetic base is exceptionally weak, it is not necessary to hold the base while bending. Practice this first with a crystal-free loop. Check to make sure that the wire will not be in the beam during the data collection, as the diffraction rings will be very intense. Translate the crystal vertically, if necessary. DO N O T BEND THE CRYSTAL OUT OF THE COLDSTREAM! It may be necessary to create a 90 ~ bend in a series of small steps: bend, recenter, and then bend, recenter, until the desired bend is achieved. The closer the wire is bent to the crystal, the less recentering will be necessary.

Attaching Pins to Goniometer Heads Pins can be placed directly into standard goniometer heads after the cold stream has been diverted with a simple baffle, such as a ruler or paper card. Alternatively, hold the pin with a hemostat and plunge the loop rapidly into the cold stream. Then the pin can be manipulated into the goniometer while the loop is kept centered in the cold stream. For ease of use or subsequent storage of the crystals, however, some type of base is desirable. The most popular and easiest mounts to use are magnetic bases. These bases are designed to fit in the tops of standard cryovials, to allow the crystals to be stored in storage dewars before and after data collections. The bases can be made out of any magnetic material, but magnetic stainless steel or nickel is the material of choice. A typical base is shown in Fig. 6.4. The central hole is chosen to fit the mounting pins. The set screw holds the pin in place and allows height adjustment, if necessary. M2 3< 2 or #2-56 set screws match common goniometer head keys. The lip at the bottom of the magnetic base increases the area for magnetic attachment and matches the outer diameter of the cryovials, while the tall section of the base is slightly smaller than the inside of the cryovials. The angle of the bevel is not critical; 30 ~ has been found to help direct the cold nitrogen stream away from the goniometer, reducing icing problems. A rounded edge, like an inverted bowl, can also be used. Bases are available from a number of vendors, but purchase carefully: some are awkward to use, some are more prone to icing than others, and the bases vary greatly in price. A magnetic mount for the goniometer head can be made by the simple method of attaching some type of magnet (e.g., thick magnetic rubber tape or a button magnet) to the top of the goniometer. Commercial adapters are sold for every type of goniometer. There are three main variations: 1. The strong magnet adapter, typically using a samarium-cobalt high-power magnet.

426

CRYOCRYSTALLOGRAPHY

Matches outside of cryovial

Top View

3.25mm Slightly smaller than the inside of cryovial

FIG. 6.4 Magnetic base.

2. An adapter that holds the thick magnetic rubber tape with a screw, which creates a protrusion used to align the center of the magnetic base to the goniometer. 3. An adapter with the thick magnetic rubber tape having a central hole that accommodates the mounting pin if it protrudes beyond the magnetic base. The first option creates a strong magnetic bond to hold the crystal on the goniometer, but some users have difficulties aligning the crystals against the strong force and occasionally lose crystals. The second allows rapid centering by means of the center protrusion. Unlike the first two options, the third option allows the height of the pin to be adjusted over a large range, even while the crystal is on the goniometer. The third option accomplishes the centering function of the second option, since if the mounting pins protrude slightly below the level of the base, the protrusion will index on the center hole. Thus, the user can select centering or no centering on a case-by-case basis. Any protrusions between the magnetic mount and the base will result in a slight misalignment that greatly weakens the attraction between these

6.7 CrystalStorageOverview

427

magnetized units. This can cause the crystal to fall off the goniometer, especially if the spindle axis is horizontal. .....

6.7 . . . . .

CRYSTAL STORAGE OVERVIEW

Eventually, you will want to save your crystals, and the crystals will need to be stored. Perhaps there is an especially nice crystal, or a failure with the X-ray source or a failure of the detection equipment. Since it is now possible to remove a frozen crystal and put it back later, the problem of removing a good crystal in search of a better one, never to find it, is no longer an issue. Crystals can be prescreened and only the best crystals selected for data collection. If necessary, they can also be removed to allow for equipment repair or realignment, or for checking the precision of the detector. The crystals can also be stored for later data collection at synchrotrons or to accommodate the scheduling needs of other users on the X-ray set. Crystals are easily frozen on the laboratory bench, and no time is needed on the X-ray set until the experimenter is ready to analyze the crystals and collect data. This is a tremendous time-saver and greatly improves the efficiency of the X-ray sets that are used to collect data. Very little is needed to make the transition from freezing a single crystal to saving and storing crystals to make optimal use of your crystals and X-ray set. Since the specific technologies of crystal handling are in flux, the rationale for analyzing the utility of the tools is relevant. Even if the specific tools are not readily available, you should be able to easily make something from the typical tools in a biochemistry laboratory. To begin, the laboratory will need a minimum of a small transfer dewar, a storage dewar, a system for storing and retrieving crystals from the storage dewar, and a magnetic base that can both hold the crystal on the goniostat and protect it while in storage. The items needed are listed shortly. Initially, a small pair of hemostats or reverse-acting forceps will be enough to grasp the pin, lift the crystal from the cryosolvent, and add it to the waiting cryocooler. Whether reverse-acting or direct-acting tools are chosen is purely a matter of personal preference: direct-acting hemostats clamp tightly and stay closed, but can require some effort to lock; reverse-acting forceps always require effort to open, but are easier to close. Practical tip: gently bend the locking hooks slightly apart to make locking and unlocking easier. The process will be much easier if the laboratory standardizes on a single system and obtains a few additional tools; several pairs of cryotongs, a large supply of magnetic bases, and several additional transfer dewars.

428

CRYOCRYSTALLOGRAPHY

Lists of Materials Essential Starting Materials (No Crystal Storage) 1. 2. 3. 4.

One cryocooler One loop and a mount for it Two pairs of hemostats or reverse-acting tweezers A simple heat shield, preferably heated (the rounded dome of an aluminum can base will suffice)

Minimal Materials to Freeze and Store Crystals 1. 2. 3. 4. 5. 6. 7. 8. 9.

One cryocooler One pair of cryotongs A 500-ml narrow cylindrical transfer dewar One heat shield, preferably heated A storage dewar, such as a 35-liter, large-mouth dewar (e.g., 35HC, or its equivalent), which easily stores 150 canes. Cryovials and enough cryocanes to fill the dewar. Enough loops and bases to fill 150% of the dewar, to allow one to have extra loops and bases ready with the correct sized loops Magnetic mount for a goniometer A magnetic transfer toolma magnet on a plastic rod or magnetic tape on a loop of thin insulated wire

Materials for a Laboratory with Many Users 1. 2. 3. 4. 5. 6. 7. 8.

One cryocooler Several pairs of cryotongs Additional hemostats or tweezers A heated heat shield, preferably with a temperature regulator (to help prevent burnouts) Two additional 500-ml transfer dewars, narrow top Two 1000-ml working dewars A 35-liter, high-capacity storage dewar (available in up to 150cane, 750-crystal capacity) Enough loops and bases to fill 120% of the dewar

Optionally, the following may be helpful to any laboratory; 1. Additional magnetic tops for goniometers 2. A blow-dryer for deicing cryotongs or cryotweezers between uses

6.7 CrystalStorageOverview

429

3. A dry shipper for shipping crystals (e.g., to synchrotrons) 4. A second cryocooler, if the usage warrants it

Additional Practical Considerations Depending on the local air humidity and the geometry of the cold stream and goniostat, icing of the goniometer and goniostat may be a problem. In such cases, a small heat shield over the goniometer can make an enormous difference. On some goniostats, the cold stream can cool the drive motors so much that the motors lock up or fail to drive to the desired angles. A heated shield for a goniometer can easily be made from a variable transformer (e.g., a toy train transformer), a piece of aluminum sheeting, some nichrome, preferably glass-fiber-insulated, wire, and a little bakable ceramic [e.g., automobile muffler (silencer) cement]. Make a hole for the heightadjusting top of the goniometer in the aluminum sheet and bend it into a dome or a conical shape. Put a thin layer of muffler cement on the inside of the aluminum and press the nichrome wire in a serpentine pattern into the wet cement. Typically, try to place enough nichrome wire to produce 6 - 1 0 W when the assembly is coupled to the transformer. (To select the right length of wire, remember R = P/I 2, where R is in ohms, P = VI is in watts, V is in volts, and I is in amperes. Measure out the correct length using an ohmmeter.) Add a second ceramic layer with muffler cement 2 - 3 mm in thickness. Allow this to dry and then bake it according to the instructions. Using crimp connectors on the nichrome wire, connect to the toy train transformer and adjust the output so that the heat shield is neither hot nor condensing water when on the goniometer head with the cold stream operating. If you are making a heat shield with a temperature regulator, do not forget to embed a temperature-measuring device, such as a thermocouple, resistance temperature detector (RTD), or thermistor. The shield is best fastened to the goniometer by removing the heightadjusting base, or the pin holder, and inserting the heat shield between the goniometer and the height-adjusting base. This will deflect the cold stream away from all the small projections that cause turbulence and icing on a goniometer. Ice forms on any cold surface that has access to room temperature air. Having a coaxial stream of warm air or heated nitrogen, and minimizing drafts and turbulence, will greatly reduce icing problems.

Manipulating and Storing Frozen Crystals To store frozen crystals, you need a supply of magnetic bases, cryovials, and cryocanes. The latter two items are readily available from most biological suppliers. Several crystallographic suppliers now sell magnetic bases, or

430

CRYOCRYSTALLOGRAPHY

you can have your own made. The design in Fig. 6.4 has been shown to work well.

Cryotongs Cryotongs are used to pick up and transfer loop-mounted crystals to and from cold nitrogen streams and dewars. The cryotongs usually are used with magnetic bases in one form or another. In the simplest form, two small blocks of stainless steel have matching hemispherical grooves cut in their surfaces. The groove is just large enough to clamp around the loop mounting pin. The blocks are attached either to hemostats or reverse-acting forceps. More elaborate models grasp both the mounting pin and the base simultaneously.

Storing Crystals for Later Use Crystals must be mounted in magnetic bases before they can be stored; if this has not been done, you will have some very trying manipulations to put the base on under liquid nitrogen. The maneuver is possible but is not recommended. All the following operations are done under liquid nitrogen, in a small (500-1000 ml) dewar as full as possible. A full dewar is easier to work in, and fingertips are less likely to freeze from the cold nitrogen above the liquid nitrogen surface. The liquid nitrogen should always be fresh and free of any snow or frost. If frost or snow forms, dispose of the snowy liquid and continue with fresh liquid nitrogen. This is important because snow in the liquid nitrogen will settle on crystals and form an icy frost when the crystals are transferred back to the cryocooler. If this happens, gently rinse the crystal on the cryocooler with a little fresh liquid nitrogen until the frost has been washed away. Obviously it is also important to use and maintain clean liquid nitrogen in storage dewars as well; use only dry liquid nitrogen. Frozen crystals are usually stored on magnetic bases in 2-ml cryovials on cryocanes. These are available from many biological supply houses. Unless there is a ventilation hole in the magnetic base itself, punch a hole in the cryovial, just below the level of the base, as indicated by the arrow in Fig. 6.5A. This hole will allow nitrogen gas to escape from the cryovial. Large-bore hypodermic needles are very effective. It is prudent to puncture a hole in the opposite side of the cryovial to allow gas to escape if one hole is accidentally covered by the cryocane. In storage, the cryovials are pushed up against the aluminum tab of the cryocane to help prevent the magnetic base from coming loose in handling. If the crystal is already frozen and is on the goniometer head in the cold

431

6.7 CrystalStorageOverview

Q

A

B

C

D

E

FIG.6.5 Crystalremoval and storage.

stream, it must be transferred back to a transfer dewar filled with liquid nitrogen. To do this, cool a pair of cryotongs under liquid nitrogen until the rapid boiling ceases. Immerse only the metal blocks at the end. Always try to keep the rest of the cryotong warm to prevent the joints from sticking or freezing. Bring the cryotongs immersed in liquid nitrogen in a small transfer dewar as close as possible to the cold gas stream on the X-ray set. When the cryotongs are cold, rapidly remove them, in their closed position, from the liquid nitrogen. Approach the cold gas stream carefully to prevent drafts, open the tongs, rapidly grab the crystal from the cold gas stream, and rapidly immerse it in liquid nitrogen. Once the crystal is safely returned to a liquid nitrogen environment, it can be transferred back to the cryovial. Crystals frozen directly under liquid nitrogen and crystals frozen under the cold gas stream now follow the same procedure. First have all your tools ready. Have the magnetic transfer tool, hemostats or forceps, labeled cryovials, labeled cryocanes, and your dewars full of liquid nitrogen ready and easily available. Once cold, cryocanes and cryovials are almost impossible to label. If this is your first time, having a coworker nearby can be very helpful. Ensure that you have a labeled cryovial with ventilation holes mounted on a cryocane and that the cryovial is angled away from the cryocane by 3 5 - 5 0 ~ Then, under liquid nitrogen, attach the magnetic base to the magnetic transfer tool, a simple magnet on a plastic rod, or a piece of magnetic rubber tape attached to a thin insulated wire. As shown in Fig. 6.5A, gently release the cryotongs, being very careful not to touch the crystal-containing loop. Your loop should now be pointing downward in liquid nitrogen, with the magnetic base near the surface. Now, as far

432

CRYOCRYSTALLOGRAPHY

away from the crystal as possible, slowly immerse the cryocane with the attached cryovial into the liquid nitrogen. If the cryovial is somewhat horizontal as it is submersed, it cools faster and the boiling is less turbulent. When the cryovial and cryocane are cold, raise the cryovial slowly up to the magnetic base, holding the crystal, as shown in Fig. 6.5B. Figure 6.5C shows the cryovial being gently snapped into the cryocane. The cryovial is raised up to the aluminum tab as shown in Fig. 6.5D. The crystal is now ready for storage (Fig. 6.5E). While the cryocane can be stored for a short time in a tall (1000-ml) transfer dewar, it should be placed into a storage dewar as soon as possible. To prevent excessive frosting, the cryocanes should be stored until actually used in tall, l-liter dewars full of liquid nitrogen. Keep the insulating lids on at all times. If the cryocanes are going to be unused for more than 15 min or so, leave them in the storage dewar. By using the tall dewars, it is possible to stack multiple crystals on a single cryocane. Add the crystals sequentially from the bottom of the cryocane upward and remove them sequentially from the top down.

.....

6.8 . . . . .

CRYSTAL REMOVAL AND STORAGE

When you are putting cryocanes into storage dewars, have the receptacle in the transfer dewar in the neck of the dewar before removing the cryocane from under liquid nitrogen. To reduce the amount of time that the cryocane is not under liquid nitrogen, bring the transfer dewar as close as possible to the storage dewar. To remove the crystal from the cryovial, gently push the cryovial down the cryocane. Often this is most easily done by rapidly removing the cryocane from liquid nitrogen, placing the end of the cryocane on a countertop, and pushing gently down on the magnetic base. With practice, you can have the cryovial back in liquid nitrogen within 5 seconds. This action is shown in Fig. 6.6B. The next operations are easiest if the dewar is full to the brim and the operations are carried out just below the surface of the liquid nitrogen. As shown in Fig. 6.6C, the top of the cryovial is then gently pried out far enough to allow a magnetic tool to attract the base. The tool can be as simple as a bit of magnetic tape on a wire holder or a magnet on a handle. Ensure that the handle is not a good conductor of heat! Next, the cryocane is gently lowered away from the magnetic base and crystal, as shown in Fig. 6.6D. Be careful not to lift the crystal out of the liquid nitrogen. Finally the cryotongs can be put into the liquid nitrogen and cooled until the liquid nitrogen has

433

6.8 CrystalRemovaland Storage

/ A

B

C

D

E

FIG.6.6 Frozencrystal retrieval.

ceased to boil rapidly. During the cooling, be careful to keep the cryotongs away from the crystal. Relative to the crystal, the cryotongs are hotter than an oven for baking bread. When the cryotongs are cold, they can be clamped around the mounting pin for the crystal. The crystal is now ready to be transferred to the cryocooler. Try to keep the magnetic base, the crystal, and the end of the cryotongs under liquid nitrogen at all times, but avoid immersing the cryotongs as deeply as the joint between the handles, which is prone to freezing solid. By keeping the cryotongs dry between transfers and by keeping the joint out of liquid nitrogen, you can prevent frozen cryotongs. Before transferring crystals into the cold nitrogen stream on the goniometer, check that the cryocooler is at temperature, the heat shield is functioning, the gas stream is centered on the goniometer, and the goniometer is centered and at the correct height for your loops. Also ensure that all the tools that are required are at hand. When you are ready, try to mount an empty loop. Practice mounting the empty loop until you are comfortable with the process. Then, and only then, proceed to mount a crystal. To transfer the crystal to the cryocooler, bring the small transfer dewar with the cryotongs immersed under liquid nitrogen close to the cryocooler. Remove the cryotongs from the liquid nitrogen, align the magnetic base with the magnetic mount of the goniometer, quickly open the cryotongs, and smoothly remove them. It is wise to practice the operation with an empty loop until it can be performed smoothly, quickly, and easily. Typically, the cryotongs can keep a crystal cold for 2 0 - 3 0 s. If the magnetic base and the magnetic mount of the goniometer are exactly the same size, it is very easy to

434

CRYOCRYSTALLOGRAPHY

check the alignment of the base with the goniometer top before the cryotongs are removed. Now check the crystal alignment and then center the crystal, starting with the greatest misalignment. Since some microscopes invert the image, be careful when translating the crystal to ensure that it is not accidentally translated out of cold nitrogen stream. In general, avoid large or rapid motions around the cryocooler. Quick, jerky movements are likely to cause drafts that can ice and thaw the crystal. Also, be careful of your breathing, especially when near the cryocooler: talking or exhaling near the cooler can often waft warm air onto the crystal. On some X-ray sets, the air cooling of the anode drive motor may blow onto the crystal. Air currents from the room heating or cooling systems can momentarily blow the warm air onto the crystal. Even if the crystal remains frozen, large air currents dramatically increase the rate at which ice forms on the goniometer and goniostat. Until the magnitude of the problem is known and a permanent solution made, temporary tents can be made from large-width polyethylene or Mylar films, shrink wrap, clamp stands, coat hangers, and modeling clay. All these can be readily obtained and used. Depending on how often the storage dewar is opened, empty it out every 3 - 6 months to dry out the inside, and refill with clean, dry nitrogen. The old, wet nitrogen can often be used in liquid nitrogen freezers. Obviously, the storage dewar is best defrosted when its contents can be accommodated in other dewars, or when the crystal stocks have been depleted (e.g., after a synchrotron run). As the number of frozen crystals grows, it will be necessary to purchase more dewars to hold all the crystals. This will simplify the defrosting process, as crystals can be shuttled from one dewar to the other while the dewars are sequentially defrosted.

FreezingAwayfromthe Cold Stream Freezing crystals away from the actual X-ray set allows data collection to proceed while users are preparing crystals. Some crystals will freeze better in the cold stream, while others will freeze better when frozen directly into liquid nitrogen at the bench. Some crystals have been most successfully frozen from 4 ~C, while in a cold room. I[you choose to freeze in an enclosed area, ensure that there are always sufficient quantities of fresh air and oxygen for breathing. To freeze crystals into liquid nitrogen at the laboratory bench, you will need at least one, preferably two dewars, labeled cryocanes, labeled cryovials, and a selection of magnetic bases with loops. If two dewars are used, one can be a short (500-ml) dewar; otherwise all operations can be performed in

6.8 CrystalRemovaland Storage

435

a single dewar. Begin by equilibrating the crystals in cryoprotectant. Have prepared a selection of magnetic bases with appropriately sized loops. Attach the magnetic base either to a hemostat or to a magnetic transfer tool. Fill both dewars, and when they are cool, top off the small one until it is as full of liquid nitrogen as possible. The next step should be done as quickly as possible to minimize evaporation from the crystal. Put the crystal in the loop, drying it off if necessary, and plunge it loop first into the liquid nitrogen. When the boiling has subsided and only a few or no bubbles are rising from the crystal, you can load it into a cryovial as shown earlier (Fig. 6.5B). It is probably easiest if the magnetic base is held loop down, close to the surface of the liquid nitrogen, and the cryovial raised slowly from below. If there are many crystals to be frozen, the freezing should be carried out in a series of small batches. After several crystals have been frozen, it is best to transfer them to a larger storage dewar to prevent frosting from water condensing from the air. The small transfer dewars can then be emptied, dried, and refilled with fresh, dry liquid nitrogen.

Selecting a Cryoprotectant The actual choice of cryoprotectant will depend upon the particular crystal and the particular macromolecule. The more you understand about the stability of the crystal and the macromolecule, the easier the search will be. While there are no fixed rules, there are clear patterns for the selection of cryoprotectants. Crystals grown from high-molecular-weight PEGs are often compatible with low-molecular-weight PEGs and alcohols, while crystals grown from high salt often show better diffraction with polyalcohols, such as glycerol. In many cases, glycerol alone will work. In many cases, the diffraction will be improved by minimizing the change in osmolarity between the mother liquor and the cryoprotectant solution. Osmolarities can be computed from tabulated values in reference such as the CRC Handbook of Physics and Chemistry. A broad survey of cryoprotectant conditions should be undertaken, and conditions beyond the minimally sufficient conditions should be explored. It is not uncommon for the best diffraction to occur with more cryoprotectant than is minimally sufficient. The best diffraction may be that with lowest mosaicity or highest resolution, often simultaneously.

Testing Cry0protectant Solutions The same method is used to test all putative cryoprotectant solutions. The following equipment is needed:

436

CRYOCRYSTALLOGRAPHY

Necessary Items 1. Cryocooler, operating below 100 K, preferably 85 K. 2. A source of X-rays and a means of detecting them, preferably an area detector 3. An alignment microscope 4. A large loop on a suitable mount 5. A pair of hemostats, preferably locking, or reverse-acting tweezers 6. Distilled water for rinsing the loop (in some cases, alcohol) 7. Paper tissues for drying the loop

Helpful Additions 1. 2. 3. 4. 5. 6.

Magnetic mounting system for crystals Three to five large loops on magnetic bases A pipette capable of dispensing volumes less than 1/zl A bright flashlight or fiber-optic light A heat shield A microscope to check loops after freezing

Setup First, the X-ray set should be brought to normal operating power and the cryocooler turned on and brought to its operating temperature and allowed to stabilize, which often can take several hours. Select a loop larger than the largest crystal expected. Put the loop on the X-ray set to confirm that the entire setup is aligned. Remove the loop, and defrost it by warming it in warm air or rinsing in alcohol. Dry the loop off by dabbing it with a thin piece of paper tissue. To begin the search for a cryoprotectant solution, first identify a starting solution (see Fig. 6.7). In general, this will be the mother liquor or in some cases, the harvest buffer. This solution alone should be frozen to determine its adequacy as a cryoprotectant solution. The following method will be used to test all cryoprotectants and cryoprotectant solutions. Note: Adding the cryoprotectant to the mother liquor obviously dilutes the mother liquor and may affect the crystals. Ensure that everything required by your crystals is in the cryoprotectant solution, including that I mM unusual cation that proved critical for crystallization. A common practice is to make up a concentrated solution of the mother liquor that can be readily diluted with cryoprotectants and water.

6.8 CrystalRemoval and Storage

437

Obtain new crystals. Begin search for cryocooling conditions. [

Is crystallization condition suitable for direct freezing?

Yes

Try it alone, then " optimize conditions

Yes

Try those, then "~J optimize conditions

~r N o Are cryoprotectants known for similar conditions?

,L No Is the primary precipitant high MW PEG?

1) Add glycerol 2) Add PEG 400/600 3) Add a glycol 4) Crosslink; try .1-3 above

Yes

,L No Is the primary precipitant low MW PEG?

1) Raise PEG concentration 2) Add glycerol/glycol 3) Add an alcohol 4) Add a sugar

Yes

No Is the primary precipitant high ionic strength?

Yes

No

1) Add glycerol 2) Add a glycol 3) Add 2,3,R,R-butanediol 4) Switch to an organic salt

~r

Is the primary precipitant an alcohol or sugar?

Yes

1) Increase the concentration 2) Try a different alcohol or sugar

Yes

1) Try to add glycerol 2) Add 2,3,R,R-butanediol 3) Crosslink and repeat

,L No Is the primary precipitant low ionic strength?

,~ No Pick the condition above that is closest.

FIG. 6.7 Flowchart for initial cryoprotectant selection.

Method First vortex, shake, or o t h e r w i s e m i x the c r y o p r o t e c t a n t solution vigorously. Glycerol a n d salt or PEG a n d salt can separate o u t to s o m e extent.

438

CRYOCRYSTALLOGRAPHY

Using the micropipettor, or by picking up a small droplet, fill the loop with a spherical drop of the cryoprotectant solution to be tested. The loop containing drop is transferred rapidly to the cryocooler and frozen. Next use the alignment microscope on the X-ray set to visually check the droplet for any signs of ice. Note that the strongest ice rings appear at 3.5, 3.7, and 3.9 A ~ i f the detector is swung out in 2~ so you are collecting 0.8-A data, or if the detector is too far away, you may miss the lowerresolution ice rings. If necessary, vary the illumination of loop so that you can clearly see the details of the liquid. If the droplet is opaque, or shows one or two crystals of ice, increase the concentration of cryoprotectant by at least 10%. To find single ice crystals, look for flat reflective planes on the surface of the droplet; if crossed polarizing filters are available, examining the loop for crystals of both ice and protein is greatly facilitated. Solutions containing high salt concentrations and an organic cryoprotectant may become cloudy after freezing. This separation into two phases is not a failure; the two solutions are in equilibrium with each other. After the solution has been made and thoroughly mixed, centrifuge it to separate the organic layer from the aqueous layer. The aqueous layer represents the highest concentration of cryoprotectant that can be used at the given salt concentration. Usually, the aqueous layer alone will freeze by itself as vitreous water. If not, the crystal can be moved to the organic layer and the organic cryoprotectant concentration can be increased until the solution freezes without forming ice. In general, biphasic cryoprotectant solutions are harder to work with and prone to more variability due to errors in mixing. Frost on the surface of the droplet and loop is an indication that the cold nitrogen stream was deflected by a draft after the loop was frozen. Thaw the loop, rinse and dry it, and repeat the process, but be more careful of drafts, exhalations, and fast movements near the loop. Assuming that no obvious signs of ice can be seen, it is time to check with X-rays, by taking as short an exposure as possible. If diffraction from ice is seen (Plate 7A), increase the cryoprotectant concentration by 10% and try again. If only some ice is seen (Plate 7B), increase the cryoprotectant concentration by an additional 5% and try again. If a well-ordered solvent ring is seen (Plate 7C), try increasing the cryoprotectant concentration by an additional 5 %. Plate 7F shows an acceptable solvent ring, but for the best data, try to achieve a diffraction pattern in which there is no longer any local order to the water and therefore no solvent ring (Plate 7G). Save these conditions as a starting point for optimizing cryoprotectant solution conditions as shown in Fig. 6.8.

Rationale The method just described works well for many crystals. To ensure success, here are some additional points to consider. The colder the cryocooler

439

6.8 CrystalRemovaland Storage Initial test of cryoprotectant conditions [ I

Add new cryoprotectant to ]~ mother liquor I E

Fill large loop with a drop I Ice ,~ Try increasing cryoprotectant as large as the largest crystal ] " at least 10% Diffuse I ice Rings

~X"~S harp solvent tings

First round success! Are there other putative cryoprotectants? No~

[

Try increasing cryoprotectant 5%

Yes

J FREEZE!

Intactf

1

|

Adjust [ Test cryoprotectant by overnight Damaged [ Try quick dip [ equilibration of cryosolution ~1 and/or serial dip Damaged ~[ buffer, [ osmolality, ] with a small crystal in a drop of ~[L mother liquor. ~Intact

Damage |

Stop. Add cryosolution incrementally [~ .. Equilibrate (0.1 !~1)to drop containing the crystal ! ~ Intact Schlieren patterns "[ drop longer, Remove most of drop and add Damaged? cryosolution. Repeat 2X. ~ Intact

Repeat with at least two additional crystals

Retest cryosolution l

Check buffer pH, osmolality, repeat the additions again more slowly, or after a longer equilibration time. If the ionic strength is low, consider increasing it.

Analyze the diffraction limit, FWHM peak width, and mosaicity

FIG. 6.8 Second-stage cryoprotection flowchart.

is, the faster the crystals will freeze, but it is also important that the temperature be stable. In the long run, it will be easier if a slightly warmer, but stable temperature is selected. The temperature should be no higher than 100 K for devices that measure the temperature within 10 cm of crystal. When selecting a loop, it is important that the volume of the test drop be at least as large as the largest crystal you expect to freeze. By selecting a test drop that is similar in thermal mass to your crystal, you will have more successes when real crystals

440

CRYOCRYSTALLOGRAPHY

are frozen. Since the cooling rate of the crystal is a function of both its total mass and its ratio of surface area per unit volume, you may have better success with smaller and thinner crystals. Do not allow the loop and droplet to stand after the loop has been filled, for there will be evaporative losses, and the freezing results will be erratic. The first exposure of the frozen crystal to X-rays should be short, to minimize the possibility that a diffraction spot from a large ice crystal will overload and possibly damage the detector.

Cry0pr0tectantOptimization Once diffraction has been obtained with the cryoprotectant and mother liquor alone, the experiment must be repeated for crystals. There are a number of methods for transferring crystals to cryoprotectants: cocrystallization, dipping, serial dipping, serial additions, and diffusion. If the modifications to the mother liquor are slight, you can try to grow the crystals in the presence of the cryoprotectant, but many cryoprotectants suppress or otherwise alter crystal growth. Sometimes higher precipitant concentrations can compensate, but not always. The simplest method is to transfer the crystal directly into cryoprotectant. This can be done with a pipettor, a mounting capillary, or a loop. Then allow the crystal in the solution to equilibrate for between one second and overnight. A good starting point is 1-3 min. Some crystals that crack upon direct transfer will anneal if soaked overnight. Others will slowly crumble into dust if allowed to sit in the cryoprotectant. If equilibration proceeds under a microscope, you can observe how the crystal reacts to the change. If fast dips cause problems, you can serially add 0.2 #1 of cryoprotectant to the crystallization drop and then remove 0.2 #1 of the droplet until the crystallization drop has been fully equilibrated. If this proves too stressful for your crystals, you can transfer the whole coverslip containing the crystals over a new reservoir containing the cryoprotectant solution. Typically, in 2 4 - 4 8 h the crystallization drops will have equilibrated. Some precipitants, such as PEGs, will require serial additions of cryoprotectants after the equilibration period because they have very small effects on the vapor pressure of water and take a very long time to equilibrate. These conditions should be repeated with several crystals, as there is often a crystal-to-crystal variation. It is important to know that your crystals diffract at room temperature, especially if several cryoprotectant solutions yield no diffraction. Verify that the crystals can still diffract X-rays at room temperature.

6.9 CrystalHandling

441

. . . . . 6.9 . . . . . CRYSTAL H A N D L I N G Most crystals are better off if they are never handled. However, except for the few scientists who can grow their crystals in mounting capillaries, most experimentalists have to handle their crystals in some way. Some use a micropipettor to transfer crystals from mother liquor; others use a drawnout Pasteur pipette or a mounting capillary attached to a small rubber bulb or syringe. Others use the loops described earlier to move the crystals. If possible, once a crystal has been harvested from mother liquor, any further changes in buffers should be done by moving the buffer, rather than by moving the crystal. A crystal in buffer can be readily transferred to a fresh coverslip, slide, or microwell plate. As a final note, when the crystal-containing loop is being withdrawn from the cryoprotectant solution, try to keep the loop perpendicular to the meniscus. This will minimize both the quantity of additional solution adhering to the crystal in the loop and the strain on the crystal.

Osmotic Effects When making up the final cryoprotectant solutions, be careful not to dilute the mother liquor concentrations. In general, you should make up a concentrated form of the mother liquor and then add cryoprotectant to it. Often you can make a starting buffer that is twice the concentration of the final one and then add the required cryoprotectant and sufficient water to generate the correct cryoprotectant solution. This method produces a cryoprotectant solution that is 100% of the original components, plus the cryoprotectant. Although the method works very well, often better results can be achieved by taking special care with the handling of the crystals and their buffers. The foregoing method has worked quite well for us; but in all the cases that we have tried, we were able to either increase the resolution, decrease the full width at half-maximum (FWHM) of the reflections, or both by following the procedure described next. In any case, I do not advocate merely diluting the mother liquor with cryoprotectant for use with crystals. The osmotic pressure comes into force, literally, when you transfer the crystal from one solution to the next. Differences in osmotic pressure can compress or impress the crystal, causing cracks, FWHM changes, or resolution degradation. Often cryoprotection is a competition between the cryoprotectant that denatures the protein and the osmotic shock that cracks the crystal. Quick dips sacrifice the latter to prevent the former. Often you can have your cake and eat it too by watching the osmotic potentials of the solutions.

442

CRYOCRYSTALLOGRAPHY

The essence of the technique is to determine the osmotic strengths of both the mother liquor and the cryoprotectant solution (i.e., the concentration of the desired cryoprotectant). Ideally, one then reduces the ionic strength of the cryoprotectant solution to achieve matching osmalities (Os/kg) or osmolarities (Os/liter), while at the same time maintaining the concentration of the cryoprotectant. This is not always possible if the starting ionic strength was low or the amount of required cryoprotectant high or both. For illustration, suppose that a simple salt solution (2.0 M NaC1, 50 mM Tris/HC1 at pH 7.8) has recently been used to grow nice hypothetical crystals of a protein, bongkrekic acid decarboxylase. By checking the section on osmolarities and osmolalities of The CRC Handbook of Chemistry and Physics, we determine that the osmality of this solution is about 3.95 Os/kg. Let us further suppose that given Fig. 6.7, we have some reason to believe that glycerol will be a suitable cryoprotectant for this protein crystal. To get to the vitreous form of this solution using glycerol (i.e., "freezing"), by following Fig. 6.8 and performing a series of freezing tests, we find that a solution of approximately 20% glycerol is required to cryoprotect this solution. Now, consulting the CRC Handbook again, we learn that 20% glycerol alone is 2 . 9 0 s / k g . Now we know that our starting mother liquor is 3.95 Os/kg, so by doing the math, we find 3.95 O s / k g - 2 . 9 0 s / k g = 1.05 Os/kg So, we have to make up a difference of 1.05 Os/kg in osmolality by adding sodium chloride to the glycerol solution. From the CRC table, 1.05 Os/kg of NaC1 is 0.55 M. Thus, as an initial condition, we try 20% glycerol, 0.55 M NaC1, and 50 mM Tris/HC1 at pH 7.8. Moreover, to minimize the osmotic shock to the crystal, we test this by first vapor-equilibrating the crystal overnight or for a day or two against this new buffer. Then we either quick-dip or add the solution serially to the crystals. Caveats: In some solutions, the cryoprotectant will have a much higher osmality than the starting solution. Get as close as you can, and vaporequilibrate it the rest of the way. Tips: 1. To minimize osmotic shock differences, a ring of 0.1-/21 droplets of the cryoprotectant solution can be placed around a drop containing the crystal on a coverslip. This set up is allowed to equilibrate overnight over a reservoir containing the cryoprotectant solution. The small droplets are then added serially to the crystal drop and equal volumes drawn off. Eventually, larger volumes of the reservoir can be added and the crystal thoroughly washed.

6.9 CrystalHandling

443

2. You can often substitute NH4OOCCH3 (ammonium acetate) for ammonium sulfate. This vapor equilibrates very quickly, but remember that NH3 is more volatile than acetate and there may be pH-shifting during the equilibration process. Therefore, the crystal-containing solution may need a higher concentration of buffer. 3. Do not lower the ionic strength below 0.2 M if you can help it, and preferably maintain the total salt concentration above 0.5 M. It seems that below these levels, many proteins have problems. The probable cause is that many residues, which are ionically screened in the high-ionic-strength crystals, are no longer screened in the low-salt cryoprotectant solution. We have noticed many problems with crystals when the ionic strength is low.

This Page Intentionally Left Blank

APPENDIXA CrystallographicEquations in ComputerCode

Program 1 R e a d in a reflection file and filter based on F > 2tr. i0

20 i00

read(5,*,end=lO0) ih, ik, i l , f l , s l , f 2 , s 2 if ( fl .it. 2.0 * sl .or. f2 .it. 2.0 i0 write(6,20) ih, ik, i l , f l , s l , f 2 , s 2 format (3i4,4f8.2) g o t o 10 continue end

* s2)goto

Program 2 Transform coordinates x, y, z by matrix(3,4) to new coordinates xp, yp, zp. xp = x*matrix(l,l) + matrix(l,4) yp = x ' m a t r i x ( 2 , 1 ) + m a t r i x ( l , 4)

+

y'matrix(l,2)

+

z'matrix(l,3)

+

y'matrix(2,2)

+

z'matrix(2,3)

445

446

APPENDIXA

zp = x ' m a t r i x ( 3 , 1 ) + m a t r i x ( l , 4)

+ y'matrix(3,2)

+

z'matrix(3,3)

Program 3 C program to strip hydrogens from .pdb file. #include m a i n ()

{

<stdio.h>

char buf[200] ; w h i l e (gets (buf) != N U L L ) { /* if not an A T O M or H E T A T M r e c o r d a n d c o n t i n u e */ if(strncmp(buf,"ATOM",4) != 0 && strncmp(buf, "HETATM",6) != 0) { p u t s (buf) ; continue ;

write

}

if(buf[12]----'H' I I b u f [ 1 3 ] - - : ' H ' ) continue; else p u t s (buf) ;

Program 4 FORTRAN77 program to calculate resolution from indices--space group independent. i n t e g e r ih, ik, il w r i t e ( * , * ) 'Enter u n i t cell in a n g s t r o m s a n d degrees ' r e a d (*, *) a, b, c, alpha, beta, g a m m a call r e c i p o n c e a f t e r cell e n t e r e d to i n i t i a l i z e call r e c i p (a, b, c, alpha, beta, gamma) w r i t e ( * , * ) 'Enter i n d i c e s - e n t e r 0,0,0 to stop' i0

r e a d (*, *) ih, ik, il if( ih .eq. 0 .and. stop

ik

.eq.

0 .and.

il

.eq.

0)

447

CrystallographicEquations in ComputerCode write(*,*) 'resolution goto 10 end

=

' , r e s o l u t i o n ( i h , ik, il)

real f u n c t i o n r e s o l u t i o n ( i h , ik, il) c o m m o n / r e c e l l / raa,rbb, rcc,abg, c a b , b c a i n t e g e r ih, ik, il real th, tk, tl th = tk = tl =

ih ik il

sqsthl=th*th*raa+tk*tk*rbb+tl*tl*rcc+th*tk*abg+tl *th*cab $ +tk*tl*bca sthol:sqrt(sqsthl) resolution = 0.5/sthol end s u b r o u t i n e r e c i p (a, b, c, alpha, beta, gamma) c o m m o n / r e c e l l / raa, r b b , r c c , a b g , c a b , b c a w r i t e ( * , * ) ' I n i t i a l i z i n g r e c i p r o c a l space cons tants '

c c t r a n s f o r m to r e c i p r o c a l space, stout and p.32 s a l p = s i n (alpha/57. 29578) s b e t = s i n (beta/57. 29578) s g a m : s i n (gamma/57. 29578) cbet=cos (beta/57.29578) calp:cos (alpha/57.28578) c g a m = c o s (gamma/57. 29578) r o o t = s q r t (i. O - c a l p * * 2 - c b e t * * 2 - c g a m * * 2 2. O * c a l p * cbet* cgam) vol-a*b*c*root ra=b*c*salp/vol rb=a*c*sbet/vol rc:a*b* sgam/vol c o s r a = ( c b e t * c g a m - c a l p ) / (sbet*sgam)

jensen

+

9

448

APPENDIXA cosrb=(calp*cgam-cbet)/(salp*sgam) cosrg=(calp*cbet-cgam)/(salp*sbet) raa=ra*ra/4.0 rbb=rb*rb/4.0 rcc=rc*rc/4.0 abg=ra*ra*cosrg/2.0 cab=rc*ra*cosrb/2.0 bca=rb*rc*cosra/2.0 return end

Program 5 Subroutine to calculate structure factors by Fourier transform. N

F(hkl) = ~ fi exp[2~ri(hxi + kYi + Iz,)] j=l

where f is the scattering factor for the jth atom, whose coordinates are expressed as fractions of the unit cell a, b, c. In the following code, the matrix f s ( 3 , 3 , n o p s ) contains the rotation part of the symmetry operators, and t s ( 3 , n o p s ) contains the translation component of the symmetry opera. tors. The reflection indices are stored in the arrays ah, ak, al; Fo is in fo; and s t h contains the precalculated value of (sin4~)/A (0.5/resolution in angstroms). The coordinates are in x, y, z. Note that instead of expanding the atoms to account for crystallographic symmetry by multiplying by the symmetry operators, the indices are expanded by multiplying by the transpose of the symmetry operators. The form factors as a function of (sinS)/A are stored in the table f t a b l e ( 3 2 , n f o r m s ) , where each value is 0.05 greater in (sinS)/A (these can be found in the International Tables for Crystallography, Vol. 4). The code that follows is quite slow compared to the much faster Fast Fourier Transform method but serves to illustrate the Fourier transform. $ * * * c C

subroutine sfcalc(natoms , fs, ts, n e q i v , ah, ak, al, fo, b v a l u e , o c c u p a n c y , n r e f l ,n t y p e , x,y, z ,f, f t a b l e , n f ) *** d a t a u s e d in c o m m o n w i t h r e s o l u t i o n common/recell/ raa,rbb, rcc,abg, cab,bca dimension ts(3,24) ,fs(3,3,24) dimension ih(nrefl),ik(nrefl) ,il(nrefl),fo(nrefl)

449

CrystallographicEquations in ComputerCode dimension f(32) , f t a b l e ( 3 2 , 1 0 ) dimension n t y p e ( n a t o m s ) ,x ( n a t o m s ) ,y ( n a t o m s ) , z(natoms)real*8 acalc,bcalc twopi :6.283154 structure

factor

sumfofc = 0.0 s u m f o = O. 0 s u m f c = 0.0 d o 500 ir - i, th=ih(ir) tk=ik(ir) t l : i l (ir)

calculating

loop

==

do

500

ir

nrefl

######################### sthol : 0.5/resolution(ih(ir),ik(ir),il(ir)) it=sthol/.05 ;p12

*** p r e c a l c u l a t e type

C

C

the

*** b y i n t e r p o l a t i n g factors

23

53

*** as a f u n c t i o n d o 53 j t : l , n f

of

form

factors

into

table

for

each

containing

atom form

resolution

f(jt)ftable(it,jt) + (ftable(it+l,jt) - f t a b l e (it, it) ) $ *( s t h o l - ( i t - l ) * 0 . 0 5 ) / . 0 5 continue *** zero sums acalc=O. 0 b c a l c = 0.0 xpart - 0 ypart = 0 zpart = 0 trans = 0 do

200

j=l,neqiv

*** c a l c u l a t e operator

translation

component

of

symmetry

trans : th*ts(l,j) + tk*ts(2,j) + tl*ts(3,j) *** c a l c u l a t e rotation by multiplying indices ***

by

transpose

(switch

rows

with

columns)

of

450

APPENDIXA

C

*** c r y s t a l l o g r a p h i c s y m m e t r y o p e r a t o r xpartfs(l,l,j)*th + fs(2,l,j)*tk + fs(S,l,j)*tl ypartfs(l,2,j)*th + fs(2,2,j)*tk + fs(3,2,j)*tl zpart : fs(l,3,j)*th + fs(2,3,j)*tk + fs(3,3,j)*tl C *** L o o p o v e r all a t o m s at e a c h s y m m e t r y o p e r a t o r do 200 i - l , n a t o m s C *** c a l c u l a t e p h a s e a n d a m p l i t u d e phase = twopi*(xpart*x(i) + ypart*y(i) + z p a r t * z ( i ) + trans) scftmp:f(ntype(i))* exp(-bvalue(i)*sthol*sthol) * o c c u p a n c y (i) acalc=acalc+scftmp*cos (phase) bcalc=bcalc+scftmp*sin (phase) 200 c o n t i n u e C *** the c a l c u l a t e d a m p l i t u d e is fcalc=sqrt(acalc*acalc + bcalc*bcalc) p h i = a t a n 2 ( b c a l c , acalc) C *** c o n v e r t p h a s e f r o m r a d i a n s to d e g r e e s phidegree = phi * 360.O/twopi w r i t e ( 6 , * ) t h , tk, tl, fo(ir) , fcalc, p h i d e g r e e 500 continue return end

Program 6 To calculate electron density given structure factors, the equation is N

p(x,y,z) - E 151 cos[E

(xh, + yk, + zZ,) -

j=l

where IFI is the amplitude and ce is the phase component of the structure factor. The arrays are as above. The electron density (rsum) at a given x, y, z in fractional coordinates is

C

r s u m = O. 0 do 200 j : l , n e q i v *** e x p a n d x,y, z w i t h c r y s t a l l o g r a p h i c try operators xpart : fs(l,l,j)*x + fs(2,l,j)*y +

symme-

CrystallographicEquationsin ComputerCode

200

451

fs(3,l,j)*z + ts(l,j) y p a r t -- f s ( l , 2 , j ) * x + fs(2,2,j)*y + fs(B,2,j)*z + ts(2,j) zpart : fs(l,3,j)*x + fs(2,3,j)*y + fs(3,3,j)*z + ts(3,j) *** L o o p o v e r a l l r e f l e c t i o n s at e a c h s y m m e t r y operator do 2 0 0 i r - l , n r e f l s *** s u m e a c h r e f l e c t i o n s contribution to e l e c t r o n density rsum = rsum + fo(ir)* $ cos(twopi* (ah(ir)*xpart+ak(ir)*ypart+al(ir) * z p a r t ) - p h i (ir)) continue w r i t e (6, *) x, y, z, r s u m

Program 7

Read a .pdb file into atom arrays (see PDB user's guide). This includes some lesser known but important PDB fields that are often ignored, s e r n o is the serial number for each atom, which increases ordinally for each atom. The a t o m n a m e field is the atom name, such as C A or ODI. Alternative locations for multiple conformations can be specified in a l t l o c such OD1A and OD1B. The residue type is a a t y p e (e.g., ALA, CYS). There is some confusion because this field is called the residue name in the PDB user guide, but a name should be unique so I have changed this to type. The chain identifier, c h a i n , is used to identify separate chains not connected to each other (e.g., A, B, C). The residue name is usually a sequence identifier, such as 1, 2, 100. The i n s e r t i d field is for a single letter to identify insertions in the sequence such as 10A, 10B. The three orthogonal angstrom coordinates appear next as floating-point numbers. The occupancy and B-value are last. parameter( maxatm ---- i 0 0 0 0 ) r e a l x ( m a x a t m ) ,y ( m a x a t m ) , z ( m a x a t m ) r e a l b v a l u e ( m a x a t m ) ,o c c u p a n c y ( m a x a t m ) character*l chain (maxatm) integer*4 resname (maxatm)

character*4 atomname (maxatm) character*l insertid(maxatm) , altloc(maxname) character*3 aatype (maxatm) c h a r a c t e r * 80 b u f i =

0

452 i0

20

i00

APPENDIXA r e a d ( 1 0 , ' (a80)', e n d : 1 0 0 ) , buf i f ( . n o t . (buf(l- 5) .eq. "ATOM" .or. b u f ( l - 5) .eq. "HETAT") )goto i0 i = i + 1 r e a d (buf, 20) serno, a t o m n a m e (i) ,a l t l o c (i) , aatype(i),chain(i) ,resname(i), insertid(i), x(i), y(i), z(i), b v a l u e ( i ) , o c c u p a n c y ( i ) f o r m a t (6x, i5, ix, a4, al, a3, ix, al, i4, al, 3x, 3f8.3, 2f6.2) i = i + 1 g o t o 10 continue natoms = i end

Program 8 An awk script can be used to reorder data from one program for another. In this example we have data in the form h, k, i, FP, FPH and we want to put in the .fin format. We will have to add a fake sigma field, as this information is not in the original: a w k ' {print $i, $2, $3, inputfile > outputfile

$4,

" 0.01

",$5,

"0.01

"}'

The symbol $1 means the first item on each record. Awk works on series of records, repeating the command on each record. Items on records are separated by white space entered via the space bar or the tab key. The output will again be items separated by spaces. This is what is commonly called "free format." To put out formatted output where each item is in the same column, the command can be modified: awk $i,

' { p r i n t f ( " % 4 d %3d %3d %7.2f 0 . 0 1 % 7 . 2 f 0.01\n", $2, $3, $4, $5}' i n p u t f i l e > o u t p u t f i l e

Note that the formatting statement is the same as that used in the C language. In general, any valid C expression can be used in awk. However, an awk program is much simpler to write and is interpreted rather than compiled.

Program 9 Awk can be used to do simple calculations. For instance, if we want to modify a phase file with records of h, k, 1, FO, FC, phi so that it has records of h, k, 1, 5FO - 3FC, Fc, phitomakea5Fo-3Fcmap: awk

'{$3

=

5*$3-3*$4;

print}'

inputfile

> outputfile

453

CrystallographicEquations in ComputerCode

Program 10 Another use of awk is to make decisions with i f statements. In this example we want to remove all data that is below 3~r from a file with records h, k, l, Fo, cr(Fo): awk '{if( outputfile

$4

>

3.0

*

$5)

print}

' inputfile

>

Program 11 This awk script can be used to find the m i n i m u m and m a x i m u m of a stream of coordinates x, y, z. Save the awk script into a file called m i n m a x . awk

'

BEGIN{ {

xmin

xmax if( if(

= $i

$i

>

$3

>

if(

$2

END { print }' $*

<

<

$3

if(

-9999;

$2

if( if(

99999;

=

<

>

xmin,

xmin)

ymin-

ymax

xmin

ymin)

ymin

zmin)

xmax)

zmax)

ymin,

99999; = :

ymax

:

$2;

:

zmin,

= =

99999;

-9999;

$2;

$3;

=

zmax

$i;

=

zmax

zmin

-9999;

zmin

xmax

ymax)

=

$i; $3;

xmax,

}

ymax,

zmax;

Now, if you needed to find the range of coordinates in a .pdb file, where x, y, z are fields 6, 7, 8, then you can use another awk c o m m a n d to print x, y, z and pipe this into minmax: awk

'/ATOM/{print

$6, $7, $ 8 } '

inputfile

I minmax

The / A T O M / field causes awk to ignore all records that do not match the pattern A T O M . In a .pdb file this will reject all n o n a t o m records that do not have any coordinate information.

Program 12 Awk can call mathematical functions such as s q r t ( ) , as in this script to find the distance between two atoms input as Xl, yl, zl, x2, y2, z2: awk

'{xd

-

zd

-

yd

:

print

$i

-

$4;

$3

-

$6 ;

$2

-

sqrt(

$5 ;

xd

* xd

+

yd

* yd

+

zd

* zd)

}'

$*

454

APPENDIXA

Program 13 XPLOR requires that you split up your protein into separate files before you can enter the coordinates. This can be easily done with awk. In this example, the protein is split into three files containing the protein, the heme prosthetic group, and the water: awk'/LYD ISERITHRIALAICYSIASPIGLUIPHEIGLYIHISIILEI LYS ILEUI MET IASN IPRO IGLNIARG IVAL ITRP ITYR/{print]' $i > protein.pdb awk '/HEM/{print}' $i > hetatom.pdb awk '/HOH/{print} ' $i > water.pdb

APPENDIX B Useful Web Sites

Practical Protein Crystallography II Web Site The Scripps Research Institute h ttp ://pp cII.s cripp s. edu This site has example data, updates, and other information. It will be updated periodically, so look here for changes to information in the book. If any of the URLs in this appendix go out of date, you can look here for updated ones.

XtalView online manual http://www.sdsc.edu /CCMS /Packages /XTALVIE W/XV1 TO C.html

Software Web Sites CCP4 http ://www.dl.ac.uk /CCP/CCP4 /main.html The CCP4 program suite is an integrated set of programs for protein crystallography that includes a large number of very useful tools.

455

456

APPENDIXB DENZO

h ttp-//www, h k l-xray, com/ The HKL suite is a package of programs intended for the analysis of X-ray diffraction data collected from single crystals.

DPS http-//bilbo.bio.purdue.edu /~-viruswww/Rossmann_home/rstest.html The Data Processing Suite (DPS) will be a complete package for data processing of crystallographic area detector data.

MAIN

http-//omega.omrf, ouhse.edu /doc/main /index.html MAIN is an interactively driven computer program dealing with computational parts of macromolecular crystallography.

MIDAS

http'//www.cgl.ucsf.edu /midasplus.html MidasPlus is an advanced molecular modeling system developed by the Computer Graphics Laboratory (CGL) at the University of California, San Francisco.

Molscript http.//www.avatar.se /molscript / MolScript is a program for displaying molecular structures, such as proteins, in both schematic and detailed three-dimensional representations.

MOSFLM

http.//wserv i .dl.ac.uk /S RS /PX /jwc_external /abs_mosflm_suite.html The MOSFLM suite of programs is designed to facilitate the processing of monochromatic X-ray diffraction rotation data.

UsefulWebSites

457

O http-//imsb.au.dk / - m o k /o/

O is a general-purpose macromolecular modeling environment. The program is aimed at scientists with a need to model, build, and display macromolecules.

PHASES

h ttp-//www, ims b.a u. d k / - m o k /p h as es /p h as es.h tm l

Bill Furey's phasing and density modification package.

PROCHECK

http'//www, biochem.ucl.ac.uk /~ roman/procheck/procheck.html

Checks the stereochemical quality of a protein structure, producing a number of PostScript plots analyzing its overall and residue-by-residue geometry.

PROTEIN

h ttp'//www, bio ch em.mpg, de /P RO TE I N /

The PROTEIN program system is an integrated collection of crystallographic programs designed for the structure analysis of macromolecules. Raster3D

http-//brie.bmsc.washington.edu /raster3d/

Raster3D is a set of tools for generating high-quality raster images of proteins or other molecules. REPLACE

h ttp-//como, b io. co lumbia, edu /tong /P ub lic /R ep lace /rep lace.h tml

A suite of programs for molecular replacement calculations, REPLACE currently consists of two major programs, GLRF for rotation function calculations and TF for translation function calculations.

458

APPENDIXB RIBBONS

h tpp-//www, cm c. uab. edu/rib bons / A program for drawing publication-quality pictures of protein structures as a smooth ribbon with space-filling and ball-and-stick representations, dot and triangular surfaces, density map contours, and text. Shake and Bake

http ://www.hwi. buffalo.edu: 8 0/Sn B / SnB is a computer program based on Shake-and-Bake, a direct-methods procedure for determining crystal structures. SHARP and Buster

http://Lagrange.mrc-lmb.cam.ac.uk / Programs for heavy-atom phasing and improving phases based on Bricogne's algorithms. SHELX

http ://linux.uni-ac.gwdg.de /SHE LX / Source of SHELX-97 software. The site contains tips and FAQs for running and using SHELX, as well as instructions for installation. SOLVE

http ://www.solve.lanl. gov/ Automated crystallographic structure solution for MIR and MAD. wARP and ARP

http://den.nki.nl /---perrakis /arp.html Automated map fitting and model building. WHATIF

http://swift.embl-heidelberg.de /whatif / WHATIF is a versatile protein structure analysis program that can be used for mutant prediction, structure verification, molecular graphics, etc.

UsefulWebSites

459

XPLOR, CNS http://atb, csb. yale. edu /

XPLOR is a widely used refinement program. CNS (crystallography and NMR system) is a package for phasing by MAD and MIR and refining structures with torsion refinement and maximum-likelihood targets. XTAL h ttp ://www-structure. bio. purdue, edu / ~ k vz /

The Purdue University XTAL Programs Library (PUXTAL) was developed as part of the macromolecular structure research effort. Since the 1960s, a series of crystallographic computing techniques has been developed at Purdue, and many of the Xtal programs have been extensively used in laboratories around the world. XTALVIEW http://www.scripps.edu /pub /dem-web /toc.html

XtalView is the software featured in this book; it is available for free download to academics and nonprofits.

Databases Metalloprotein Database h ttp ://meta llo.scripp s. edu

Protein Data Bank h ttp ://www. rcs b. o rg

The Prosthetic Groups and Metal Ions in Protein Active Sites Database h ttp ://bmbs gi l 1 .leeds.ac.uk /bmb knd /promise /

Synchrotrons 9 Advanced Light Source (ALS) h ttp-// www-a ls. l b l. g o v/

460

APPENDIXB 9 Advanced Photon Source http.//epics.aps.anl, gov/welcome.html 9 Cornell High Energy Synchrotron Source (CHESS) http'//www.tn.cornell.edu / 9 Daresbury Synchrotron Light Source http-//www.dl.ac.uk /SRS /index.html 9 European Synchrotron Radiation Facility (ESRF) h ttp-// www. es rf .f r/ 9 Hamburger SynchrotronstrahlungsLABoratorium HASYLAB (Hamburg) http-//www-hasylab.desy.de / 9 LURE http://www.lure.u-psud, fr/ 9 National Synchrotron Light Source (NSLS) h ttp.// www. ns ls. bnl. go v/ 9 Photon Factory, Japan http.//www.dl.ac.uk /SRS /index.html 9 Stanford Synchrotron Radiation Laboratory (SSRL) http.//ssrl.slac.stanford.edu /

Useful Information mmCIF

http-//ndbserver.rutgers.edu / N D B/mmcif/ Macromolecular Crystallographic Information File home page with CIF tools, description, and HTML dictionary, and data definition language. Kevin Cowtan's Book of Fourier

http-//www, yorvic, york.ac.uk /---cowtan /fourier/fourier.h tml Learn about the properties of Fourier transforms.

Heavy-AtomInformation 9 Bart Hazes' heavy-atom info page h ttp ://my cro ft.mmid.ualberta, ca. 8 0 8 0 /bart /derivatives /main.html 9 Enrico Stura's heavy-atom page http-//bmbs gi13.1eeds.ac.uk /wwwprg /stura/heavy.html 9 HEAVY-ATOM DATABANK h ttp'//bonsai.lif, icnet.uk/bmm/had/heavyatom.html

UsefulWeb Sites

461

Crystallization http://www.hamptonresearch.com / H a m p t o n Research Company home page, and crystallization resources and information.

http-//bmbsgi13.1eeds.ac.uk /wwwpgr/stura/cryst.html Enrico Stura's crystallization techniques page.

X-Ray AnomalousScattering http-//www, bmsc.washington, edu /scatter This page has tables of f ' and f" in a convenient periodic table format. Invaluable information for MAD experiments.

X-Ray EquipmentVendors 9 Bruker Analytical X-ray Systems

http://www.bruker-axs.com /index.html 9 Charles Supper Company

http://charles-supper.com / 9 Molecular Structure Corporation

http'//www.msc.com / 9 Oxford Cryosystems

http-//www. OxfordCryosystems.co.uk / 9 Polycrystal Book Services

http-//www.dnaco.net /-polybook /

CrystallographicAssociations 9 American Crystallographic Association

http-//nexus.hwi, buffalo.edu /A CA / 9 British Crystallography Association

h ttp ://gordon. cryst, b b k.ac. u k/B CA/index. h tm l 9 Crystallography World Wide

h ttp ://www. lmcp.juss ieu. f r/cww- top /crystal. index.h tml

462

APPENDIXB 9 International Union of Crystallography

http://www.iucr.ac.uk /welcome.html 9 SInCris Information Server for Crystallography

http-//www.lmcp.jussieu.fr/sincris / 9 World Database of Crystallographers

http://www.iucr.ac.uk /iucr-top/wdc/index.html

INDEX

Absolute configuration, 289 Absorbance, measuring, 2 Absorption, 90 correction, 76 Aconitase, 380 Adding, substrates, 264 water, 264 Additives, 435 if, see a l s o specific chemical Alcohols, 435 Alignment of crystals in loops, 421 microscope, 31, 32 optical, of crystal, 31 Amicon, 4, see also Centricon; Microcon Amino acid handedness, 159 stereochemistry, 226 Ammonium sulfate, 339 precipitation, 4 AmoRe, 113 CCP4, 113 Anaerobic apparatus, 21 crystal, 20 handling, 415 Analysis, mutants, 379 Anderson, D., 70 Anisotropic

B-value, definition of, 100 displacement parameter, 100 scaling, 117 thermal parameters, 205 ANOLSQ, 157 Anomalous difference Pattersons, 133 signal of, 133 Anomalous scatterer, 67 locating, 188 Anomalous scattering, 115,154 phasing with, 182, 189 Antimicrobial agents, 2 Area detector, 370, 66, 69 data collection, 76 ARP, 458 Arrows, for comparisons, 370 Artificial mother liquor, 342 Arvai, A. S., 60 ASCII files, 103 ASTRO, 67 Asymmetric unit, 57, 99 Atom field, PDB, 453 Auto fit, Xfit menu, 298 strategy, 299 Autoindex, 51, 70, 74 Automated solvent flattening, 176

463

464 Awk, script, 453 splitting files, 454

B-value, 196 definition of, 100 water, 264 Background, 43, 61 decreasing, 81 Bases, magnetic, 418,425-427 Batch method, crystallization, 11 Beam stop, 37 Beinert, H., 380 Bending wire, 422-423 Berthou, J., 171 Best phase, 150 Bijvoet data, 67 difference, 154 Fourier, 157 Patterson map, 133,284, 392 pair, 116, 119, 185, 188,389, 51, 76 expected differences, 185 Binary files, 103 Birefringence, 33 Blind region, 50 zone, 69 Block diagonal, 213 Blow, D. M., 147 Blow-Crick equations, 149 Blundell and Johnson, 34, .50 Bragg equation, 95 Bragg's Law, 60 Bricogne, G., 223 Brilliance, 38 Bruker, 77, 359 detector, 77, 81 Brunger, A. T., 193 Buffers, 2 Buster, 458

C-centering, 57 C6A mutant, 371 CCD detector, 188 CCMS, 271,275 CCP4, 104, 111,202 AmoRe, 113 crystallographic programs, 111

INDEX density modification, 112 DM, 112, 399 f2mtz, 400 FFT, 113 labels, 104 map calculation, 113 mLPHARE, 113,193 model validation, 114 molecular replacement, 113 MTZ format, 104 mtz2various, 280 PDB files, 113 PROCHECK, 114 reflection files, 112 REFMAC, 113 SFALL, 113 SFCHECK, 114 SFTOOLS, 114 SIGMAA, 113 Solomon, 112 tips, 114 web site, 455 XtalView data, 290, 327 CNS, 459 CORN, 159 CRC handbook, 435,442 CRYSTAL environment variable, 104 Calculation, of Pattersons, 127 corrections, film, 40 Canes, cryo-, 429, 431,432, 433,434, see also Cryocane Capillaries, 23, 418, 441 crystal-mounting, 24 quartz, 23 sealant, 24 source, 23 Cartesian coordinates, 218, 97 Center of symmetry, 125 Centering, 57 Centric phases, 145 R, 154 reflections, 70 Centrosymmetry, 145,359 Chain direction, 359 tracing, 232 Characterization, crystals, 39 Chasing the train, 360

INDEX Check reflections, 74 Chillers, 410 Choice of, wavelength, 185 C h r o m a t i u m v i n o s u m cytochrome c r, Patterson maps, 342 cis-aconitase, 382 Cocrystallization, 380,440 cold stream, 416ff, see also Cryocooler Collimator, 36 Collodicon, 2 Combined-phase coefficients, 218 Compounds, heavy atom, 19 Computer code, 445 portability of, 103 files, 102 Concentration, samples, 4 Conformations, multiple, 259 Connolly, M. L., 269, 379 Control, of temperature, 7 Cooling, of crystals (versus freezing crystals), 412 Coordinate analysis, 265 refinement, 193 systems, definition of, 96 Copper, X-ray sources, 35 Correct hand, 172, 359 Correlation coefficient, 285 function, 168 search, 146, 169, 194 Counting, of statistics, 88 Cowtan, K., 399,460 Cracked crystal, 64 Crane, B., 78 Crevices, and water, 264 Crick, F. H. C., 147 Cross peaks, 344 Cross-fourier, 145,344 Cross-linking crystals, 415 Cross-phasing, derivatives, 344 Cross-validation, 199, 209 Cruikshank, D., 210 Cryocane, 429,431,432,433,434 Cryocooler, 416ff Cryocrystallography, advantages of, 410ff equipment, 416 list of, 428ff Cryogenic safety, 415

465

Cryoprotectant, 410,411-415,423,435ff testing, 435 ff Cryoprotection, 411-415 Cryostat, 410, 416, see also Dewar Cryostream, 417 Cryosystem, 416ff Cryotongs, 427ff, 430 Cryotweezers, 428,430, see also Cryotongs Cryovials, 429ff Crystal aligning, 40 characterization, 39 determining if protein, 40 evaluating quality, 61 exposure time, 40, 64 grid screen, 8 growing, 7 large, 14 light and, 18, 33 in loops, 421 mounting, 7 multiple forms, 381 offsetting, 68 optically, 31 preparing, 23 quality of, 34 radiation damage, 37 resolution limit, 61 size needed, 64 slippage of, 30 twinned, 61 file, 104 growth, 339 mounting, 23, 24 avoiding crushing, 24 capillaries, 24, 30 drying, 24 illustrated, 28 filter paper, 24 illustrated, 26 loops, 422-424 loosening crystals, 24 resting, 29 simple rules for, 421,423-424 supplies, 23 quality, 61 sample purity, 16 storage, 18 trials, grid screen, 8 Crystalline precipitant, 4

466

INDEX

Crystallization, X-ray quality, 14 batch method, 11 equilibrium, 18 by first precipitating, 13 impurities, 1 incomplete factorial, 11 new conditions, 18 nucleation, 14 varying conditions, 14 web sites, 461 Crystallographic associations, web sites, 461 equations, 445 .cshrc, 104 CuA subunit, 388 CuK, X-ray sources, 36, 35 Cunningham, R. P., 331 Cysteine, Oxidation of, 379 Cytochrome c oxidase, 388 peroxidase, 370 D235E mutant, 370 Cytochrome c', 339, 360

d-spacing, definition of, 95 D235E mutant, cytochrome c peroxidase, 370 DENZO, 91 web site, 456 and XtalView, 327 DPS, 456 Data collection, 23,370 area detector, 76 diffractometer, 76 image-plate, 82 MAD, 389 strategy, 67, 79 filtering of, 116 indexing, 70 merging, 117 re-indexing, 72 scaling, 117 reduction, 87, 115 to parameter ratio, 214 Defrosting dewars, 434 Deicing tongs (hair dryer), 428 Density modification, 112, 151,176 unexplained, 264 Dental wax, 24

Derivatives, absolute configuration, 159 cross-phasing, 344 fine-tuning, 161 Desalting, 2 Detector distance, determining, 77 Dewars, 411,415-417, 427-428,430ff, 435 Diagonal zone, 57 Dialysis, 2 Difference Fourier, 138,145,264, 284 maps, 138,370 mutant, 375 Diffractometer, 64, 67 data collection, 76 geometry, 76 four-circle, 74 Diffuse scattering, figure 6.7 Diffusion, of cryprotectants, 440 heavy atoms, 19 Dihedral angles, side chains, 227 Dill, K. A., 266 Diluting seeds, 16 Dipping method, 440, see also Vapor equilibration method Directory, 102 Disorder, 264 Disulfides, 379 Dithoionite capillaries, 21 DM, 112, 399

E, error estimate, 147 Electron density gradient, 370 evaluating, 224 histograms, 178,224 map, fitting of, 217 skelotonizing, 232 Electrostatic terms, 196 Ellipsoid, anisotropic, 100 Enantiomorphic space groups, 160, 290 Endonuclease III, 331 Engh, R., 210 Epoxy, 24 Equilibration, of crystals, 413,435ff, 439-440 Equipment, cryocrystallography, 416 Error estimate, E, 147 estimation, 88

INDEX R-factor, and, 198 evaluating, 197 Estimation, of standard deviation, 210 Eularian angles, 163 identity operator, 165 Evaluation of crystal quality, 61 of errors, 197 maps, 223 Evaporation, from loops, 435 EXAFS, 186 scan, example, 391 Example, molecular replacement, 384 Expected differences, Bijvoet pairs, 185 Exposure time, 43, 51

FA coefficients, 191 Failing, see Trying Fast Fourier transform, 144 Fc, Fourier, 138 map, 138 Fe-S cluster, 382 Ferredoxin, 210 FFT, 113, 144 approximations of, 204 CCP4, 113 grid of, 144 FHLE coefficients, 135 Fiber loops, 419ff, see also Loops Fibers, 418-419, see also Mohair fibers; Nylon fibers; Rayon fibers; Silk fibers Fiducial mark, 74 Figure-of-merit, 144, 150, 151 File format .df, 106 .fin, 106 .hkt, 216 .map, 106 .phs, 106 .sol, 106 history file, 106 mmCIF file, 106 other formats, 106 PDB, 216 postscript, 106 SHELX, 106 TNT, 106 Xfit sequence, 302

names, 102 systems, 102 Film, 39, 68, 69 calculating corrections, 40 determining exposure, 43 measuring error, 43 developing, 74 marking, 39 reducing background, 43 scanning, 74 small-molecule, 40 X-ray, 73 Filter paper, 24 crystal-mounting, 24 Filtering, of data, 116, 127 Fine-tuning, of maps, 359 Fisher, C. L., 339 Fitting of electron density map, 217 general, 243 main chain, 250 and noise, 243 and phase bias, 250 and resolution, 243 side chain, 259 Fo, Fourier, 137 map, 137 Fo-Fc, fourier, 138,264 map, 138 2Fo-Fc map, 264 Fourier, 140 map, 140 omit map, 263 2mFo-DFc, Fourier, 142 map, 142 Focusing mirrors, 37 X-ray sources, 37 FORTRAN, 103, 104 Four-circle, diffractometer, 74 Four-fold (4-fold), 55 Fourier 2Fo-DFc, 142 Bijvoet difference, 157 difference, t38, 145,284 Fc, 138 Fo, 137 Fo-Fc, 138,264 of heavy atoms, 145 Kevin Cowtan's book of, 460

467

468

INDEX

Fourier (continued) techniques, 137 transform, program, 448,450 Fractional coordinates, 218, 97 Free radicals, 410, 413-414 Freeze thawing, 5 Freezing conditions, testing of, 436ff crystals, 427, 434ff of samples, 4 Friedel pair, 67 Friedel's Law, breakdown of, 116 FRODO, 218 Frost, method, for removal, from crystals, 430 Full-matrix, 210, 214 Furey, W., 176, 360

GRINCH, 233 Getzoff, E. D., 371 Ghost peaks, 145,151,161,344 Glutaraldehyde, for crosslinking crystals, 415 Glycerol, 412,435,437, caption for plate 7, flowchart 6.7 Goniometer, 31, 39, 40, 416,418,423,425ff Goniostat, 74, 78 Gradient electron density, 370 vectors, 376 Grid FFT, 144 screen, 8 Growing, crystals, 7

Hallewell, R. A., 371 Hamlin, detector, 77, 82 Hamlin, R., 70 Hampton Research, 13, 24 web site, 461 Hand choice of, 172, 359 Hanging drop, 8, 10 Harada, Y., 171 Harker peaks, 12 7 sections, 127, 133, 169, 331,342, 344 vector, 59 Harris, M., 70 HASSP, 131

Heat-shield, 428,429 Heavy atom, 64, 359 absolute configuration, 159, 289 area detector scanning, 64, 66 statistics, 66 strategy, 66 compounds, 19 soaks, 19 trials, 19 derivative, 65 finding more sites, 151 fine-tuning, 161 Fourier of, 145 handedness, 159,289 merging, 281 Pattersons of, 128 phase calculation, 285 phasing statistics, 153 refinement, 285 safety, 20 scanning for, 76 storage, 20 suggestions, 20 toxicity, 20 sites, refining, 351 soaks, 341 statistics, 120 web sites, 460 Helices, recognizing, 237 Helium, 416, 417 box, 36 path, 81 Helix, 248 Helliwell, J. R., 36, 84, 185 Hemostat, 420, 422,425,428,430, 431, 435,436 Hendrickson and Konnert, 193 Hendrickson, W., 147, 157, 185,223 Hendrickson-I.attman coefficients, 147, 158, 172, 223 Histogram electron-density, 224 modification, 178 History file, 106 Homologous structure, 161 Howard, A., 77 Humidity, 429 Hydrogen bonding, 266 riding, 205

INDEX Ikr, 115 Ice on crystals, 422,434 properties and theory, 411-412 formation, common strategies to avoid, 429, 434, 435ff, see also Testing cryoprotectants Ideal geometry, 195 Image plate, 69 data collection, 82 dynamic range, 82 erasing, 83 MAR Research, 82 Incomplete factorial, crystallization, 11 Increasing brilliance, 38 Inhibitors, 380 Initial crystal trials, 10 Insulin, 1 Integration, of intensities, 87 Intensity change, 65 integration of, 87 International Tables for Crystallography, 53, 57, 67, 99, 266, 448 International Union of Crystallography, 210 INTREF, 168 Iron-binding protein, 60, 66 Isocitrate- aconitase, 380 Isomorphism, 65 Isomorphous phasing, 146 replacement, 116, 128,145 signal, 120

lolles_ P._ 171

Large crystals, 14 Lattice packing, 265 Lattman, E. E., 147, 223 Lauble, H., 380 Laue photography, 51 white-radiation, 82 Lepock, J. R., 371 Lifchitz, A., 171 Light, and crystals, 18, 33 Limiting axis, 49 Local minimum, 195, 379 scaling, 117, 119, 188 symmetry, 171 Location, of anomalous scatterer, 188 Locking hemostats, 420, 422, 425,428,430, 431,435,436, see also Hemostats .login, 104 Loops discussion of, 419 making of, 420ff Lorentz correction, 50, 89 Lunes, 45 Luzatti plot, 210

Macroseeding, 15,339, 381 MAD, 182, 193 data collection, 389 example of, 388 as MIR, 190 phasing, 388 equations, 189 Magnetic bases, 425ff MAIN, 456 Mninchnin 6ttin~ )qD bounds, choosing, 218 Map, 2Fo- Fc, 140 2 m F o - D F c , 142 difference, 138 evaluating, 223 Fc, 138 fine-tuning, 359 Fo, 137 Fo-Fc, 138 omit, 142 sigmaA weighted, 142 MAR Research image plate, 82 Marker residues, 243 ................ M a p

130 K, 4 1 0 - 4 1 1 , see also Vitreous point Kennedy, M. C., 380 Kevin Cowtan's book of Fourier, 460 Kling, 412, see also Rayon fibers Kraulis, P. J., 365 Kretsinger, R., 206 Kuo, C. F., 331

Labeling, samples, 5 Laboratory, X-ray sources, 34

469

~ .......

o~

--

470 Mask and count, 87 Matrices, 100 Matthews coefficient, 57 Max-flux, mirrors, 39 McRee, D. E., 331,371 Measuring, absorbance, 2 Melting temperature, 379 Merging data, 117 Metatloprotein database, 459 Microcon, 2, 4 Microscope, 20 alignment, 31-32 bases for, 7 dissecting, 24 fiber-optic lights, 7 source for, 7 Microseeding, 16 MIDAS, 456 Miller indices, 97 Min-max, program, 453 Mini-map, 218,232 Minor sites, 151,344 locating, 344 MIR, 223 map, 242, 261,359, 360, 361 phases, 218 phasing, 329 Mirrors, 37 focusing, 37 increasing brilliance, 38 max-flux, 39 monochromating, 37 osmic, 39 Misindexing, 119 Missing data, 116, 370 MKT, 370 MLPHARE, 113,193 mmCIF, 104, 106 dictionary, 110 examples, 108 syntax, 108 web site, 460 Model validation, 114 Moews, P., 206 Moffatt, K., 53 Mohair fibers, 419 Molecular dynamics, refinement, 376 packing, 170 replacement, 113, 161 example, 380, 384

INDEX solution verification, 387 steps in, 384 MOLSCRIPT, 326, 365,456 MOLTAN, 133 Monochromator, 36, 37 Monoclinic, 70 Mosaic spread, 43,410, 413,435, figure 6.8, see also Mosaicity Mosaicity, 410, 413,435, figure 6.8, 61 MOSFLM, 91,456 Most probable phase, 150 Mother Liquor, 18 Mounting crystals, 7, figure 6.2, 422, 423-424, see also Crystal, mounting MRK residue, Xfit, 297 MS, 269, 379 MTZ format, CCP4, 104 Multiwavelength anomalous dispersion, 388 data collection, 188 phasing, 182 Multiple conformations, 259 isomorphous replacement, 145,329 MAD phasing, 193 Multiple data set, scaling, 119 Mutant screening for solubility, 8 studies, 370

N-terminus, finding, 359 Needles, 24 crystal, loops for, 421,423 for making loops, 420,422 New conditions, crystallization, 18 Nichrome wire, 429 Nickel filter, 36 Nickel foil, 36 Nielson, C., 70 Nodal, 51 Noncrystallographic symmetry, 171,175,359 Nonorthogonal cells, 61 coordinates, 187 Normal matrix, 211 Novel conformations, 257 Nucleation crystallization, 14 of ice, 412 Nylon, loops, 419

INDEX O, web site, 457 O'Handley, S. F., 331 Omit map, 142,261,371,377 Optical alignment, 31 analyzer, 31 removal, 287 space-group, 99 ORTEP, 206 Orthorhombic, 67 Oscillation method, 74 photography, 43 Osmic, 39 mirrors, 39 Osmolality, 44 lff Osmolarity, 419,441ff Osmotic shock, 441,442 Other formats, XtalView, 106,278 Oxidation, of cysteine, 379

P6, 68 PDB atom field, 453 file, program, 451 filter, program, 446 formats, 216 PEG, 339 4K, 18 20K, 4 and light, 18 PHASES, 176, 193,328,360, 457 and XtalView, 328 PROCHECK, 114, 209,457 and CCP4, 114 PROLSQ, 193,376 and XtalView, 328 PROTEIN, 457 Parge, H. E., 385 Partial spot, 66 Patterson coordinates, 97 map, 59, 125, 127, 151,191,344, 359 anomalous difference, 133 Bijvoet, 392 difference, 133,284 chromatium vinosum cytochrome c', 342 heavy atoms, 128 solving, 128,282, 331

471

radius, 163 space, 163 symmetry, 125 synthesis, 124 Patterson-correlation, refinement, 385 Pauling, L., 359 PDBfit, 172 Pentamers, 257 Peptide bonds, 227 Percent solvent calculation, 177 pH, 414-415,443 Phase bias, 142, 218, 261 and waters, 264 error examples, 225 file, 145 probabilities, 150 power, 153 Phi-psi angles, 227 plot, 198 and errors, 200 examples, of, 200 Philips, W. C., 36 Phodospirillum molishcianum cytochrome c', 360 Photoactive yellow protein, 334 Pins, 421ff, 424, 425-426, 427, 429, 430 Plasticine, 24 Platinum wire, 424ff Polar space groups, 68 Polarization corrections, 89 of light, 33 Polyalinine, 243 Portability considerations, 103 Positional uncertainties, 210 Postscript, 106 Precession camera, 39, 40 corrections for, 42 screens for, 42 photo, 44, 48, 57, 58, 60, 61, 66 Precipitant, 8,440 ammonium sulfate, 4 concentration by, 4 Preliminary characterization, 39 Preparation, of crystals, 23 Preste, L. G., 266 Profile fitting, 87

472

INDEX

Program, Fourier transform, 448,450 PDB file, 451 filter, 446 awk, 452 if statement, 453 coordinate transform, 445 electron density calculation, 450 filter, 445 hydrogen removal, 446 minmax, 453 reformatting, 452 resolution calculation, 446 simple calculation, 452 splitting files, 454 structure-factor calculation, 448 Protein conformations, 248 refinement, 195,263 sample, 1 solubility, 8 Protein Data Bank, 451 Pseudocentric, 359 phases, 145 Pseudosymmetry, 59

Quality, of crystals, 34 Quartz, capillaries, 23

R crystallographic, 98 R-factor, 98,198,218, 261,264, 385, 361,365 and errors, 198 search, 169, 194 R-free, 199,209 R-merge, 342 R-symm, 115,385, 76 equation for, 90 Radiation damage, 61, 76, 89, 410, see Free radicals Radicals, free, 410, 413-414 Ramachandran plot, 198 Random amplitudes, 13 7 phases, 137 R-factor, 98 Raster3D, 457 and Xfit, 319

also

Rayment, I., 30 Rayon fibers, 412,413 Real-space refinement, 243 Recrystallization, 5, 6 Redford, S. M., 371 Redissolving crystals, 6 Refinement checking, 209 coordinates, 193 cycles compared, 366 heavy atom, 285 sites, 351 molecular dynamics, 376 mutants, 376 rigid-body, 361 Patterson-correlation, 385 SHELX strategy, 215 simulated annealing, 376 software, 193 strategies, 196 very high resolution, 202 and waters, 264 weighting, 196 Reflection definition of, 93 sigma, 67 REFMAC, CCP4, 113 Ren, Z., 53,339 REPLACE, 457 Residual map, 151 Resolution bin scaling, 117 definition of, 95 limits, 217 Retrieval, of crystals, 427, 432ff Reversal, 65 Rhombic, 56 RIBBONS, 458 Ribbon diagram, 365,384 Richard's box, 232 Richards, E. M., 232 Richards, F. M., 267 Richardson, D. C., 232 Richardson, J. S., 232 Ridgelines, 232 Riding hydrogens, 205 Right-handed helices, 359 Rigid groups, 194 Rigid-body refinement, 194, 361 Ring planarity, 232 stereochemistry, 232

INDEX Rings, ice, plate 6.8, solvent, plate 6.8 Rini, J. M., 168 Robbins, A. H., 21,380 Rose, G. D., 266 Rotating anode, X-ray sources, 34 Rotation axis, 51 camera, 68 function, 385 geometry, 50 image, 45 matrix, 100 method, 74, 162 photography, 43, 49, 66 search, 162, 385 choosing resolution, 163 refining, 165 RSPACE, 67 RTD, 429

Safety cryogenic, 415 heavy atoms, 20 Sample concentration, 2, 4 freezing of, 2, 4 handling, 2 labeling, 5 logging, 2 protein, 1 storage of, 4 Sandwich box, 29 Scale factor, 117, 263 anisotropic, 117 local, 119, 188 multiple data set, 119 Scaling data, 117 resolution-bin, 117 Scanning, heavy atom, 64 Schwarzenbach, D., 210 Scissors, 24 Screening cryoprotectant conditions, 435ff Screens for, precession camera, 42 Screw axis, 59 Script, awk, 453 Sealant, capillaries, 24 Sealed tube, X-ray sources, 34

Searching, for cryoprotectants, 435ff Secondary structure, recognizing, 237 Seeding, 14 diluting seeds, 15 illustration of, 17 macroseeding, 14 microseeding, 14 serial dilution, 16 streak seeding, 15 to reduce twinning, 16 Selenium-methionine, 182 Self-rotation function, 175 example, 164 Self-vectors, 127, 344 Sequence, identification, 242 matching, 242 Series-termination effects, 176 errors, 217 SFALL, 113 SFCalc Window, Xfit, 322 SFCHECK, 114 SFTOOLS, 114 Shake-and-bake, 191,458 Shaking coordinates, 263 SHARP, 193,458 Sheet, 248 recognizing, 237 Sheldrick, G., 203,205 SHELX, 203,213,215,327, 408 features, 205 file format, 106 strategy, refinement, 215 web site, 458 and XtalView, 327 SHELXPRO, 216, 327 SHELXS, 131,133,191 Shipping dewars ('dry' dewars), 429 Side-chains, identifying, 245 Siemens (Bruker) area detector, 339, 385 Sigma, 115 estimating, 88 reflection, 67 SIGMAA, 113,210 weighted map, 142 weighting, 142 Signal, of anomalous difference, 133 Signal-to-noise, 81, 84 Silk fibers, 419 Sim weighting, 223 Simulated annealing, refinement, 376

473

474 SIR phases, 344 SIRAS, 157, 193 Sites, in common, 151 Sitting drops, 10 Six-fold (6-fold), 55, 58 Skelotonizing, electron density map, 232 Soaking crystals, 19,413,435ff, 439-440, see a l s o Equilibrating crystals Soaks, heavy atom, 19 SOD, 379 Software, refinement, 193 Solomon, 112 Solubility, grid screen, 8 Solutions, of heavy atoms, 19 SOLVE, 193 web site, 458 Solvent contrast, 224 flattening, 176,359 mask, 360 model, 206 rings, plate 6.8 accessible surfaces, 266 Solving heavy atoms with Fourier, 145 Patterson map, 128 Source, capillaries, 23 Space group definition of, 99 mistakes in, 59 origin, 99 specificity, 103 table, 54 determination, 53 enantiomorphic, 160, 290 symmetry, 67 Split side chains, 206 Split spot, 63, 64 Spot profile, 63, 64 Standard uncertainty, averaging, 213 calculating, 211 definition, 210 memory requirements, 214 Stanford Synchrotron Radiation Laboratories, 78 STAR, 106 Statistical analysis, 115 Statistics heavy atoms, 66, 120 mutant, 374 Stereochemical restraints, 195

INDEX Stereochemistry, 197 amino acid, 226 and fitting, 225 peptide bond, 227 phi-psi angles, 227 ring planarity, 232 side chains, 227 Stfact, 135 Stickel, D. F., 266 Still photo, 39, 63 Storage of crystals, frozen, 427ff, 430ff heavy atoms, 20 rings, 84, see also Synchrotron of samples, 4 Stout, C. D., 380 Strategy, data collection, 67 Streak seeding, 16 Subdirectory, 102 Substrate adding, 264 example, 380 Suckers, 25 Suggestions, heavy atoms, 20 Superoxide dismutase, 21,371 yeast, 385 Supplies, crystal-mounting, 23 Symmetry determination, 5 7, 59 noncrystallographic, 175 elements, 53 operators, 171 of Pattersons, 125 Synchrotron, 409,414,427 list, 459 radiation, 84 X-ray sources, 36 Systematic absence, 53, 57

TNT, 193,203,328 file format, 106 and XtalView, 328 Tainer, J. A., 331,371,385 Temperature, controlling, 7 Ten Eyck, L., 193 Terwilliger, T., 131, 193 Testing, of cryoprotectants, 435 ff freezing conditions, 435 ff Textpanes, 274

INDEX The Scripps Research Institute, 455 Thermistor, 429 Thermocouple, 429 Three-fold (3-fold), 55, 58 Time-resolved data collection, 86 Tongs, 427ff, 430, see a l s o Cryotongs Torsioning, 259 Toxicity, heavy atom, 20 Tracing the chain, 360 Transfer pipette, 25 Translation function, 168 search, 162, 169, 386 Trials, heavy atom, 19 Tronraud, D., 193 Tulinksy, A., 176 Turbulence, 422,429 Turns, 248 recognizing, 241 Tweezers, 23,420, 428,436 Twinning, 63 crystals, 61 image of, 63 Two-fold (2-fold), 55, 59

U, definition of, 100 Uij, 206 Ultrapurification, 5 Unique data, 116, 67 volume, 68, 70 Unit-cell determination, 59 UNIX, 104

Van der Waals packing, 266 radii, 264 Vapor equilibration method, 440, see a l s o Dipping method Varying conditions, crystallization, 14 Very high resolution refinement, 202 Vitreous, water (ice), 411-412,414,438,442

Wang, B. C., 176, 360 WARP, 458 Water, adding, 264 properties of, 411 when to add, 197

475

Web sites, 455 Weighting, refinement, 196 Weis, W., 190 WHATIF, 458 White-radiation, Laue, 51, 82, 85 Wild-type structure, 370 Wilson, B., 211 Wire, 420,422, 424ff bending tool, 423 Wyckoff, H. C., 76

X-ray film, 73 fluorescence, 172, 186 quality, crystallization, 14 sources

collimation, 36 copper, 35 CuK, 35, 36 film, 39 focusing, 37 increasing brilliance, 38 laboratory, 34 monochromation, 36 nickel-filtered, 36 reducing noise, 37 rotating anode, 34 sealed tube, 34 spectrum, 51 synchrotron radiation, 36 vendors, web sites, 461 XAS scan, 190, 388-389 examples, 398 Xcontur, 133,284, 343,344, 353 Xedh, 224,230 XENGEN, 59, 91,370 Xenon, 415 Xfft, 126, 127, 151,284, 344, 353,359 Xfit, 209, 216, 218, 231,263,267, 290, 359, 361,370 addition, of, constraints, 317 prosthetic groups, 315 assignment, of sequence, 302 atom stack, 292 Auto Fit menu, 298 strategy, 299 automated fitting, 297 automatic waters, 326

476 X-fit (continued) calculating structure factors, 323 center on atom, 292 chi angles, 313 combination, of phases, 307 contouring, of maps, 305 control window, 319 current map, 297 de novo model, 296,299 density modification, 308 editing waters, 325 error window, 325 fit-while-refine, 318 fitting operations, 293 a residue, 312 fixing main chain, 296 focus residue, 297 fragments, 295 geometry errors, 325 refinement, 316 go to N-terminus, 311 residue, 295 hiding, of maps, 311 identification, of sequence, 302 improvement, of phases, 308 ligands, 295,315 loading maps, 305 maps, and phases, 295 model window, 320 and Molscript, 326 mouse, 290 moving, along the chain, 311 MRK residue, 297 omit maps, 323 ordering fragments, 300 and other programs, 326 poly-Ala model, 301 popping chain, 311 postscript plots, 319 prosthetic groups, 315 and Raster3D, 319 real space refinement, 315 reversing chain, 300 SFCalc window, 322 saving model, 315 phases, 310 sequence file, 302

INDEX shake option, 323 shortcut menu, 311 showing maps, 311 solvent flattening, 308 spline maps, 295 splitting side chain, 321 starting new model, 296,299 torsions, 313 tracing the main chain, 297 typical session, 303 use, of mouse to fit, 293 phase files, 305 viewing, of thermal parameters, 318 Xheavy, 148, 150, 328,343 Xhercules, 132, 191,282 Xmerge, 118,281,370 Xmergephs, 145, 157, 344, 375 Xpatpred, 133,284, 343 XPLOR, 91,165,193, 196,203,218,263, 328,361,376,385,405 splitting files, 454 web site, 459 XtalView and, 328 Xprepfin, 277, 328 XRSPACE, 67 illustration of, 71 XTAL, 459 XtalView, 91, 99, 106, 144, 145, 161,271 CCP4 and, 327 crystal file, 104 data and CCP4, 290 example, 280 DENZO and, 327 downloading, 272 exporting data, 290 file formats, 104 help, 272 history file, 106 installation, 271 manual, 275 other formats, 278,279 and PHASES, 328 and PROLSQ, 328 preparation, of data, 277 requesting, 272 and SHELX, 327 saving messages, 274 and TNT, 328 textpanes, 274 tutorials, 271

INDEX and XPLOR, 328 web site, 459 Xtalmgr, 104, 275, 92 XTALVIEWHOME, 272 Xuong, N. H., 70 XView toolkit, 272 widgets, 293

Yeast, superoxide dismutase, 385 Yeates, T. O., 168

Z number, 57 Zone, 40

477

This Page Intentionally Left Blank

PLATE 9. Example of thermal ellipsoids for side-chain extending into solvent (A) with weakening density toward the end (B). Notice how the ellipsoids gradually get larger along the length of the side-chain from top to bottom. At the top end is a water molecule (in red). Notice how it has roughly the same size as the end of the lysine, indicating a similar thermal parameter.

PLATE 10. (A) Nice density for a leucine side-chain at 1.4 A resolution with thermal ellip soids. (B) Density for another leucine with a split end.

PLATE 11. Xfit canvas. In the upper left is the gnomon showing the direction of the axes. In the center is a white cross indicating the center of rotation. Across the bottom of the canvas is a green ruler with the units in Angstroms. Across the very bottom is a status bar showing the last atom picked and its properties.

PLATE 12. A section of electron density for CuA MAD example (A) before and (B) after density modification and phase extension with DM. The MAD density in A could be easily fit and the modified map is stunning in quality.

PLATE 13. The final structure of the two molecules of CuA in the asymmetric unit rendered as ribbon diagram. The coppers are shown as orange balls, the zincs are silver. The molecule is an 8-stranded ~-barrel cupredoxin folding motif.

PLATE 14. (Chapter 6) Picture of a typical cryo setup. The crystal is mounted on the end of the goniometer in lower center at the end of a pin. The cold stream of nitrogen gas is blown out of the nozzle coming diagonally down from the upper right. The X-ray collimator is shown on the left. The microscope used to align the crystal is in the back. On the right is the X-ray detector.

PLATE 15.

Figure 6-7: The Effects of Varying Cryoprotectant Concentration

a) 100% water / 0% glycerol b) 95 % water / 5 % glycerol c) 90% water / 10% glycerol d) 80% water / 20% glycerol e) 70% water / 30% glycerol f) 65% water / 35% glycerol g) 60% water / 40% glycerol

Large ice diffraction peaks Ice peaks broadening The beginnings of ice rings Strong ice rings Ice rings begin to broaden No ice rings, but strong band of scattering No ice rings and the band of diffraction is broad and diffuse.

Note: Cryoprotectant solutions with high PEG concentrations will show diffraction rings due to the PEG at similar resolution to the ice rings. If these appear, try reducing the high molecular weight PEG concentration by adding a mixture of lower molecular weight PEGs. Figure modified from Figure 1 of Garman and Mitchell, J. Appl. Cryst. (1997) 29, 584-587, used with permission.

PLATE 1. Extended F map. In (A) is shown an Fsoaked-Fnative difference map after soaking in a ligand to displace a loop in cytochrome c preoxidase. The density for both conformations of the loop is present with the new, flipped out loop position weak. This makes it very difficult to fit. To isolate the density an extended F map is made with the coefficients Fnative + (Fsoaked-Fnative)/(estimated partial occupancy). A range of estimated partial occupancies is tried until the map at the old position is essentially flat. This map (B) now shows only the new position and is easier to fit. To refine the partial occupancies, SHELX was used with the PART command and the relative occupancies tied to a free variable.

PLATE 2. Examples of typical electron density at 2.7 A resolution in a region of f~-sheet. (A) MIR map, (B) 2Fo-Fc map with final model phases. Note that in A many of the details, such as the carbonyl bulges are obscured by noise. In fitting such details one must make use of correct peptide geometry and fit to the larger features.

PLATE 3. Bijvoet difference Fourier at 4.0 ~i showing sulfur density superimposed on the final model. It is always worth collecting Bijvoet pairs for your native data to locate the sulfur atoms in your protein. In this map of a 53,000 kD protein with a single heme about 2/3 of the sulfurs also have clear density in the Bijvoet difference Fourier. In the figure the heme Fe is in the upper left (labeled HEM 500 FE) and 2 sulfur peaks can be seen in the lower part of the figure on the methionine S~5atoms (labeled MET 408 SD and MET 400 SD).

PLATE 4. Closeup of the density around an imidazole bound into a cavity in CCP at 2.1 ~l resolution. The heme ligand, histidine, was removed and an imidazole molecule soaked into its place.

PLATE 5. 1.6 A density around the Fe-binding site in ferric binding protein. The tyrosine rings are just getting holes in their density.

PLATE 6. Why go to very high resolution? In this example of the heme ligand in cytochrome c peroxidase at 1.4 A resolution it was found that the peptide is significantly bent. In fact, the bend pulls the heme ligand in such a way as to keep the Fe below the heme plane, which will tend to keep the Fe five-coordinate and active in the resting state of the enzyme. This fact, which was hypothesized in 1979 by Joan Valentine, was missed in the 1.8 A resolution structure.

PLATE 7. (A) Density at 1.35 A for a threonine. Notice how the oxygen atoms (red) and nitrogen atoms (blue) have significantly higher density than the carbon atoms. (B) Same region showing the thermal ellipsoids.

PLATE 8. Example of using thermal ellipsoids to find bad parts of the structure. (A) The density in the N-terminal region of the map. (B) The same area with thermal ellipsoids drawn on the model. Note how the thermal ellipsoids become large and jumbled in the region with no density.