Crystal Simulation Tutorial: Picking an X-Ray Structure

System selection: Not all crystal structures are equal!

This section contains no directives regarding what to do with the programs that were unpacked in the previous section. If you are comfortable with the X-ray structure you have chosen to simulate, or if it is the only one of its kind, then you may proceed directly to the next section. For many proteins of interest, however, there may be multiple X-ray structures available, obtained by different experimental conditions or offereing different levels of resolution. This section is included to provide some narrative that may be helpful in distinguishing between several choices to select a particular structure to simulate.

There are a number of questions that one may wish to answer by using a crystal simulation. Primarily, the purpose of simulating an X-ray structure is to validate a biomolecular model (the force field and associated water model) against known X-ray data. More detailed questions include:

Can a particular behavior that is observed in a solution phase simulation be reconciled with known X-ray data?
Do the protein:ligand binding poses seen in a solution phase simulation deviate systematically from observed poses in X-ray structures?
Do crystal contacts between proteins in an X-ray structure affect the flexibility of protein loop regions?

In order to validate a biomolecular model, there must be a precise correspondence between the simulation and experimental conditions. Having the lattice itself in the simulation is a big step towards bridging the gap between an X-ray structure and a solution phase simulation, but a number of other factors must be considered. It is the experimental conditions, or more precisely the feasibility of simulating the experimental conditions, that may drive the selection of one X-ray structure over another.

The first thing to look for when selecting an X-ray structure is the space group, and the size of the unit cell. The space group can be found in the pdb file in a REMARK 290 record; the exact number of symmetry-related copies in the unit cell is a function of the space group, and can be found by looking for SMTRY tags in the REMARK 290 records:

REMARK 290 CRYSTALLOGRAPHIC SYMMETRY
REMARK 290 SYMMETRY OPERATORS FOR SPACE GROUP: P 21 21 21
REMARK 290
REMARK 290      SYMOP   SYMMETRY
REMARK 290     NNNMMM   OPERATOR
REMARK 290       1555   X,Y,Z
REMARK 290       2555   -X+1/2,-Y,Z+1/2
REMARK 290       3555   -X,Y+1/2,-Z+1/2
REMARK 290       4555   X+1/2,-Y+1/2,-Z
REMARK 290
REMARK 290     WHERE NNN -> OPERATOR NUMBER
REMARK 290           MMM -> TRANSLATION VECTOR
REMARK 290
REMARK 290 CRYSTALLOGRAPHIC SYMMETRY TRANSFORMATIONS
REMARK 290 THE FOLLOWING TRANSFORMATIONS OPERATE ON THE ATOM/HETATM
REMARK 290 RECORDS IN THIS ENTRY TO PRODUCE CRYSTALLOGRAPHICALLY
REMARK 290 RELATED MOLECULES.
REMARK 290   SMTRY1   1  1.000000  0.000000  0.000000        0.00000
REMARK 290   SMTRY2   1  0.000000  1.000000  0.000000        0.00000
REMARK 290   SMTRY3   1  0.000000  0.000000  1.000000        0.00000
REMARK 290   SMTRY1   2 -1.000000  0.000000  0.000000       22.95000
REMARK 290   SMTRY2   2  0.000000 -1.000000  0.000000        0.00000
REMARK 290   SMTRY3   2  0.000000  0.000000  1.000000       15.05000
REMARK 290   SMTRY1   3 -1.000000  0.000000  0.000000        0.00000
REMARK 290   SMTRY2   3  0.000000  1.000000  0.000000       20.35000
REMARK 290   SMTRY3   3  0.000000  0.000000 -1.000000       15.05000
REMARK 290   SMTRY1   4  1.000000  0.000000  0.000000       22.95000
REMARK 290   SMTRY2   4  0.000000 -1.000000  0.000000       20.35000
REMARK 290   SMTRY3   4  0.000000  0.000000 -1.000000        0.00000

In the example above (taken from PDB file 1AHO), there are four symmetry operations for the P2₁2₁2₁ space group, each found on three consecutive lines and enumerated by the numbers in the fourth column. Each set of three lines defines one 3 by 3 transformation matrix, which rotates the coordinates found in the PDB file in three dimensional space, and a translation vector (found in the column furthest to the right) which is applied once the coordinates are rotated. In general, the first symmetry operation will comprise the identity matrix and the zero vector, which will leave the atoms specified in the PDB file in place. Subsequent rotations and translations will fill out the unit cell, and in a well-fitted X-ray structure should not generate any clashes between symmetry-related copies of the atoms specified in the PDB file. Note that, to fill out the unit cell in a way that makes sense, the rotations and translations must be applied to the coordinates as they appear in the original PDB file. Applying the symmetry operations to coordinates with a different origin (as may be the case after tLEaP processes the coordinates) will result in misalignment of the symmetry-related subunits.

A more concise decription of the unit cell symmetry operations can be found in the CRYST1 record. In the example below, which comes from the same PDB file (1AHO, provided in this tutorial) as the example above, the first three numbers on the line represent the lengths of the three unit cell vectors (for a recilinear or orthorhombic unit cell, these are the x, y, and z dimensions). The next three numbers are the unit cell angles; in this case, they are all 90 degrees, but AMBER's simulation packages sander and pmemd can handle any space group, even triclinic unit cells in which all three angles differ from 90 degrees. Note that the six numbers in the CRYST1 record correspond exactly to the six numbers that should appear on the last line of the input coordinates file that will be used to begin a simulation of the crystal, and thus the overall size of the system can easily be estimated by simply looking at the CRYST1 record. There is also a correspondence between the numbers in the CRYST1 record and the translation vectors in the SMTRY tags.

CRYST1   45.900   40.700   30.100  90.00  90.00  90.00 P 21 21 21    4

Generally, it is most convenient to simulate crystals with orthorhombic space groups, as these unit cells can be simulated with anisotropic unit cell rescaling during constant pressure simulations, which provides the most realistic simulation conditions (imposes the least artificial symmetry on the system, allowing the model to explore conformational space most freely). The size of the unit cell is also important. Larger unit cells in more complex space groups will likely contain as many independent copies of the biomolecule, atom for atom, as smaller unit cells, but the simulation will be more expensive and the "overhead" of system equilibration will be larger in proportion to the total run time. However, some small proteins can have very small unit cells, and the periodic boundary conditions in molecular simulations may impose artificial symmetry on the system, keeping it too rigid. Very small unit cells can be propagated multiple times within the simulation cell, however, to ensure that all simulation cell dimensions are sufficiently large. It is recommended that no simulation cell dimension be less than 40 Angstroms; as will be seen in the tutorial, the unit cell in the example above contains roughly 6,000 atoms when fully solvated; including two unit cells in the simulation cell (propagating the unit cell in the z direction within the simulation cell) would result in a system of only 12,000 atoms, which can be simulated at several ns per day on a modern eight-processor computer.

The next important feature of an X-ray structure is the experimental conditions. These are generally given in REMARK 280 records in the PDB file, although some files do not list these conditions and the primary literature given in the REMARK 1 records must be consulted. In our experience, precipitating agents such as methyl-(2,4)-pentanediol (MPD) and ammonium sulfate can be difficult to model properly, particularly with regard to highly concentrated aqueous solutions of the compounds. However, it is also our experience that explicitly modelling the crystallization conditions leads to better results and better agreement between the simulations and the X-ray data. If possible, X-ray structures that contain precipitating agents such as high molecular weight polyethylene glycol (PEG, 4000 MW or larger) or mild crystallization conditions without large amounts of precipitating agents will likely be easier to simulate accurately. In the particular case of PEG reagents, if the PEG molecules are large enough that they do not seem to fit within the crystal lattice voids, they can be excluded from the simulation setup altogether.

The temperature at which the diffraction was performed to produce an X-ray structure is also of interest in simulations. This temperature can be found in a REMARK 200 record. Many of the more recent crystal structures in the Protein Data Bank are generated from diffraction data collected at cryogenic temperatures (after the crystals have been flash-frozen in liquid nitrogen at 100K). The purpose of the freezing is to improve the resolution of the data by reducing the thermal fluctuations of certain groups, and offering the crystals some protection from damage as they are bombarded with X-rays. However, the flash freezing process is known to introduce its own perturbations to the structure, particularly in terms of side chain conformations. Because there is no suitable method for modeling the flash freezing process or cryogenic temperatures in molecular simulations, it is highly preferable to simulate crystals whos structures were determined after room temperature diffraction. However, if no such structure is available, it is advisable to assume that the flash freezing process captured the exact state of the crystal in thermal equilibrium at the crystal growth temperature, and simulate at that temperature.

After these practical considerations have been addressed, the resolution of the crystal structure is finally of interest. While it is desirable to choose a structure with a high resolution to provide the most certainty in the experimental data, the errors generated during any classical molecular dynamics simulation are likely to be large enough that a 2 Angstrom resolution X-ray structure will be an adequate reference. The observation of water molecules, and other solvents, in the X-ray structure is probably helpful, as it reduces the guesswork of determining what solvents to add and where to place them. However, there can still be uncertainty as to the identities of various clumps in the experimentalists' electron density maps (ammonium, sodium, and water all have the same numbers of electrons and can be mistaken for one another), and as will be discussed later in the tutorial placement of solvent requires a number of assumptions even under the most favorable circumstances.

Proceed to the next section.

Return to the beginning of the tutorial.