NMR Refinement of DNA and RNA Duplexes

9.1 NMR Refinement of DNA and RNA Duplexes

created by Vickie Tsui; updated by Rhonda Torres

There is no single recipe to the "best NMR refinement." The route chosen for refinement of a particular system, in the end, is a mixture of common sense, trial and error, and personal taste. This tutorial introduces several possible routes which will help you in your refinements. Here , we set up and run an AMBER refinement of a 10-base pair duplex DNA, d(GCGTTAACGC)₂.

The topics covered are:

Building an AMBER parameter/topology file prmtop required for minimization and molecular dynamics simulated annealing. The program LEaP, which reads in the force field, topology and coordinate information and produces the prmtop file, will be introduced.

Generating NMR restraint files in the AMBER format. These include distance restraints, sugar pucker restraints, backbone torsion angle restraints, and Watson-Crick restraints.

Running NMR refinement with the sander program, which is the main minimization and molecular dynamics engine in AMBER. We will run minimization and molecular dynamics simulated annealing in vacuum, and discuss refinement in the generalized Born model, which mimics the effect of solvent via an electrostatic continuum.

To get a different perspective as well as more information, see Basic Workshop Tutorial 1 on setting up a standard decamer poly(A)-poly(T) DNA duplex in AMBER, created by Ross Walker.

Before we start...

We need to make sure the Unix shell environment variable that specifies where the AMBER home resides is set properly, and that the path is defined so as to access all the necessary executables. To check if AMBERHOME is set correctly, type:

echo $AMBERHOME

If you see:

AMBERHOME: Undefined variable

then AMBERHOME was not set properly, and you need to set it to the directory where AMBER resides. If the AMBER source is in /usr/local/src/AMBER/amber8, you would type at the prompt, or include in your .cshrc file:

setenv AMBERHOME /usr/local/src/AMBER/amber8

To add the AMBER binary directory to your path, type at the prompt or add to your .cshrc:

set path = ($path $AMBERHOME/exe)

Now we are ready to run the different programs!

Generating an AMBER parameter/topology file

Let's start with two available PDB files for the d(GCGTTAACGC)₂ DNA duplex as our starting structures: an A-form DNA (gcg_a.pdb) and a B-form DNA (gcg_b.pdb). These could be generated from a variety of programs, including NAB, or could be imported from a crystal structure. LEaP automatically adds hydrogen atoms and any missing heavy atoms to the PDB file. We will use the PDB file for the B-form DNA as a template for building the paramter/topology (prmtop) file.

Two executables of LEaP could be used: a text version (tleap ) and a graphical version (xleap). We will use the graphical version in this example, but all functions are the same between tleap and xleap; however, tleap lacks the capability to view molecular structures.

When you type

xleap

you will see something resembling the following screen:

The exact library files that are loaded depend on the specifications of the leaprc file which is sourced in the beginning. By default, LEaP first looks for a leaprc file in the current directory. You could specify another leaprc file by using the "-s -f" flags, i.e. xleap -s -f leaprc.myown . We will discuss more details about the leaprc file later. For this tutorial, we will use the Cornell et al. force field (parm94.dat ), which is loaded using leaprc.ff94.

To load the pdb into a unit (a residue or collection of residues) called gcg, type:

source leaprc.ff94

gcg = loadpdb gcg_b.pdb

What you will see is:

> gcg = loadpdb gcg_b.pdb
Loading PDB file: ./gcg_b.pdb
total atoms in file: 632

This simply means that there were 632 atoms in the PDB file, exactly the number that LEaP expects given this sequence, and all with the correct names. Unfortunately, life is not always this simple. Here we will digress a little to discuss the meaning of several frequently-occuring LEaP messages.

If your PDB file does not contain any hydrogen atoms, LEaP will add those hydrogens based on the geometry specified in the residue databases. In that case, the message output upon loading the PDB file may be something like:

Leap added 186 missing atoms according to residue templates:
186 H / lone pairs

Alternatively, you may encounter the following messages:

Creating new UNIT for residue: A3 sequence: 10
One sided connection. Residue: missing connect0 atom.

This is usually an indication that the necessary "TER" card has not been placed properly in the input PDB file. LEaP requires that molecules or residues not connected via covalent bonds be separated with a "TER" card. In other words, if no TER is present between the 3' end of strand 1 and 5' end of strand 2, LEaP will expect those two residues to be covalently linked and, hence be confused. So, if you see comments about missing connect0 or connect1, you may want to revise chain connectivity in the PDB file. Instead, if you get comments about "Unknown residue" or "Created new UNIT for residue..." or "Created a new atom...", this may refer to misnaming of residues or atoms from what LEaP expects.

Now that you have loaded in the PDB, you can look at it using

edit gcg

A graphical display of the molecule will appear.

Note that the LEFT mouse button allows atom selection (drag and drop), the MIDDLE mouse button rotates the molecule, and the RIGHT button translates it. To zoom in or out, simultaneously hold down the MIDDLE and RIGHT buttons. Since xleap only uses generic and primitive X-windows graphics, it will be useful to have available a GL ("Graphics Library")-based program that provides better depth cues, smooth lines, and has more display options. We will use some of those programs later in the analysis of results.

Now, we are ready to build the prmtop file. Close the graphical display by clicking the UNIT -> Close on the menu bar. Then type:

saveamberparm gcg prmtop gcg_b.x

This saves the unit gcg into an AMBER parameter/topology file name prmtop (which defines the connectivity and parameters for our current model), and also saves the coordinates of the PDB into a file in AMBER coordinate format called gcg_b.x.

Note, you will see:

> saveamberparm gcg prmtop gcg_b.x
Checking Unit.
WARNING: The unperturbed charge of the unit: -18.000000 is not zero.

 -- ignoring the warning.

Building topology.
Building atom parameters.
Building bond parameters.
Building angle parameters.
Building proper torsion parameters.
Building improper torsion parameters.
 total 116 improper torsions applied
Building H-Bond parameters.
Marking per-residue atom chain types.
 (no restraints)

The warning is fine, since we know our DNA has a net negative charge of -18. At this point the prmtop file is saved. If there were problems, such as missing parameters, LEaP would complain about those missing parameters, and would put out a message:

Parameter file was not saved.

You can quit xleap by simply typing quit. This will create a log file named leap.log in the current directory which you can look at later to remember what you did. There should also be two new files in your directory, prmtop and gcg_b.x. We could re-create a PDB file that has all the atom names and numbers corresponding to the prmtop file (the atom names and numbering would not actually change in this case, since LEaP liked our starting PDB files to begin with). We will use the executable ambpdb :

ambpdb -aatm -p prmtop < gcg_b.x > gcg_b.amb.pdb

Now take a look at gcg_b.amb.pdb. Suddenly, our residue names are no longer "GUA", "ADE", "THY" and "CYT". Instead, they are "DG5" (for 5' guanine), "DT" (for internal thymine), "DC3" (for 3' cytosine), etc. These are the actual names of the residues loaded up in the libraries. LEaP recognized the former residue names because of a residue map defined in the leaprc file.

#
# Define the PDB name map for the amino acids and DNA.
#
addPdbResMap {
   { 0 "GUA" "DG5" } { 1 "GUA" "DG3" } { "GUA" "DG" }
   { 0 "ADE" "DA5" } { 1 "ADE" "DA3" } { "ADE" "DA" }
   { 0 "THY" "DT5" } { 1 "THY" "DT3" } { "THY" "DT" }
   { 0 "CYT" "DC5" } { 1 "CYT" "DC3" } { "CYT" "DC" }
}

If you were working with RNA instead of DNA, you would want to source $AMBERHOME/dat/leap/cmd/leaprc.rna instead of the default $AMBERHOME/dat/leap/cmd/leaprc (which assumes the nucleic acid residues are DNA).

Because the B-form DNA and A-form DNA differ only in conformation and not topology, the A-form DNA should use the same prmtop file. We could re-start xleap or tleap, load in the A-form PDB file (gcg_a.pdb) as was done for the B-form DNA, and save it as an identical prmtop file and a gcg_a.x coordinate file. Let's use ambpdb again to create a PDB file (gcg_a.amb.pdb). Now, we have the basic files required for minimization and molecular dynamics simulations using sander.

Generating NMR restraints

To generate the NMR restraints, either the A or B-form PDB files could be used as a template for matching atom and residue names with the correct atom numbering. For the examples in this section, we will use the B-form DNA PDB file (gcg_b.amb.pdb). Note that, because our starting PDB files which we input into LEaP (gcg_b.pdb) has the same atom numbering as gcg_b.amb.pdb, we could use either to create the NMR restraints (the residue names actually do not matter). However, if your starting files were missing hydrogen atoms or had alternative atom numbering, you *must* use the PDB created by ambpdb to generate the NMR restraints in order to preserve the correct atom numbering. The NMR data used to build the restraints in this tutorial are from J.A. Smith, V. Tsui, W.J. Chazin & D.A. Case (unpublished).

1. Distance restraints

To obtain an input distance restraint file for the sander program in AMBER, we first need to put the restraints in a so-called 7-column file. This file format is very straightforward and similar to DIANA-like distance restraints. A 7-column file contains (as the name suggests) 7 columns on each line, with the following information for each restraint:

1^st_res#  1^st_res_name  1^st_atom_name  2^nd_res#  2^nd_res_name  2^nd_atom_name  upper_bound

Here's the 7-column file we will use, and we will call it 7col.dist.

The program used to convert a 7-column file to the much less intuitive AMBER restraint file is makeDIST_RST. Typing makeDIST_RST -help will give directions on how to run the program. You will also need to specify the template PDB file and the MAP file (this by default can be found in $AMBERHOME/dat/map.DG-AMBER), which maps distance geometry pseudostructure nomenclature onto all-atom nomenclature used in AMBER.

makeDIST_RST -upb 7col.dist -pdb gcg_b.amb.pdb -rst RST.dist

This will give you a couple of cautions that "Output RM6NOE and DGOMIN files are not specified", which are OK. The file rst.dist has been generated in AMBER format. Take a look at this file, and you will notice that r1 is defined as 1.3 A and r2 is defined as 1.8 A. The upper bround in the 7-column file is defined as r3 , and r4 is 0.5 A above the upper bound. Conventionally, the violation energy is a well with a square bottom between r2 and r3, with parabolic sides out to a defined distance (r1 and r4 for lower and upper bounds, respectively), and linear sides beyond that distance. This is shown in the following plot:

makeDIST_RST automatically uses a distance of 1.8 A for lower bounds. Alternatively, if you wanted to specify your own lower bounds, you could use an 8-column file, in which the 7^th column lists the lower bounds and the 8^th column lists the upper bounds. We will illustrate the use of 8-column files later with the Watson-Crick restraints.

The first restraint also specifies the rk2 and rk3 values. These are the force constants for the lower and upper bounds, respectively, and they maintain the values given throughout the following restraints until rk2 and rk3 are re-defined. The makeDIST_RST command uses the default values of 20.0 kcal/mol ^. A for rk2 and rk3. We will change rk2 to 0 kcal/mol ^.A and rk3 to 32 kcal/mol ^. A. Setting rk2 = 0 essentially means there is no lower bound between two atoms. This is reasonable with molecular mechanics refinement because the van der Waals parameters in the force field will not allow two non-bonded atoms to get closer than they should be.

Another variable in the RST.dist file is ialtd, which is set to 0. If ialtd were set to 1, the potential then "flattens out" beyond the distance r4, and there is no force for large violations. This allows for errors in constraint lists, but might tend to ignore constraints that should be included to pull a bad initial structure towards a more correct one. In this example, we will stick with ialtd = 0. Two more variables, ixpk and nxpk, are used to label peaks based on spectrum and peak numbers, but we will not be using the labels in this tutorial.

2. Torsion angle restraints

To make the torsion angle restraints, such as sugar pucker and other phosphate backbone torsion restraints, start with a 5-column file. The format is:

res#  res_name  torsion_name  lower_bound  upper_bound

where the lower and upper bounds are in degrees, and the torsion name can be any of the standard torsion angles in capital letters, such as ALPHA, BETA, GAMMA, DELTA, EPSLN, ZETA, and CHI. The torsion name "PPA" stands for the pseudorotation phase angle and is used for the sugar pucker restraints. We will make 2 separate 5-column files for the sugar pucker restraints (5col.sugar) and the phosphate backbone restraints (5col.bb).

To convert the 5-column file into an AMBER-format restraint file, use makeANG_RST. The usage of makeANG_RST is:

makeANG_RST -pdb pdb_file -con constraint -lib library_file [-les lesfile]

The library file defines which atoms are specified by a particular torsion angle name. The standard torsion angle library is provided in $AMBERHOME/src/nmr_aux/prepare_input/tordef.lib. You could also specify your own nomenclature or add extra angles by copying and modifying this file. For this example, we will use the standard library:

makeANG_RST -pdb gcg_b.amb.pdb -con 5col.sugar -lib $AMBERHOME/src/nmr_aux/prepare_input/tordef.lib > RST.sugar

and

makeANG_RST -pdb gcg_b.amb.pdb -con 5col.bb -lib $AMBERHOME/src/nmr_aux/prepare_input/tordef.lib > RST.bb

Now look at the rst.sugar file. You will notice that, for each pseudorotation phase angle restraint you input, 5 restraints (NU0, NU1, NU2, NU3 and NU4) for the component angles making up this pseudorotation angle were generated.

By default, the rk2 and rk3 force constants for torsion angles are set to 2.0 kcal/mol ^. rad. In this example, we will change rk2 and rk3 to 32 kcal/mol^. rad for the sugar pucker restraints the backbone torsion restraints (rst.bb). There are no set rules for what force constants to use, and they often depend upon the confidence in restraints or the tightness of the range of the restraints. One generally wants a large enough force constant to drive convergence and achieve generally good agreement with the restraints, but not too large such that the system is over-constrained which leads to large force field energies.

3. Watson-Crick restraints

In simulated annealing runs, we will be heating up the DNA to facilitate sampling of conformational space. Unless we include additional inter-strand restraints, the two strands may separate during the high-temperature regime and not be able to renature to the correct conformation upon cooling. In a standard duplex DNA, we assume that all base pairs are properly Watson-Crick bonded, and apply the inter-strand restraints as distance restraints to maintain the proper Watson-Crick hydrogen bonds. This becomes more complicated for non-standard nucleic acid structures, and the inclusion of additional restraints often needs to be supported with biochemical data. In general, one should include as many restraints as possible to specify all the known pieces of structural information. You will find that this may still involve much trial-and-error, and the types of restraints you need for obtaining converged structures will often be highly system-dependent.

We will include both lower and upper bounds for the Watson-Crick hydrogen bonds. This is specified in an 8-column file (8col.wc), in which the 7^th column is the lower bound and the 8^th column is the upper bound. The distances used here were obtained from Saenger's Principles of Nucleic Acid Structure (Springer, 1984), allowing 0.1 A movement from the equilibrium bond distance (either closer or farther).

Again, we will use makeDIST_RST to generate AMBER-format restraints. This time we will use the -ual flag (for 8-column files) instead of -upb (for 7-column files).

makeDIST_RST -ual 8col.wc -pdb gcg_b.amb.pdb -rst RST.wc

Now the Watson-Crick restraint file (rst.wc), the last set of restraints needed for the refinement, has been created. We will change the rk2 and rk3 values to 32 kcal/mol^. A for this particular refinement.

4. Putting all restraints together

Now we are ready to make the one big restraint file we will use for refinement. This is very easy, since we already have all the pieces. Simply put everything together:

cat RST.dist RST.sugar RST.bb RST.wc > RST

The resulting rst file is the final form of the restraint file used for AMBER refinement.

Running minimization and molecular dynamics simulated annealing

1. Minimization

The first thing we want to do is relax the positions of the atoms a little to get rid of bad contacts without making drastic changes in the structures. We will use sander to carry out minimization of the A- and B-form structures.

The basic usage for sander is as follows:

sander [-O] -i mdin -o mdout -p prmtop -c inpcrd -r restrt [-ref refc -x mdcrd -v mdvel -e mden -inf mdinfo]

Arguments in [ ]'s are optional...
If an argument is not specified, the default name will be used.
-O overwrite all the output files
-i the name of the input file (which describes sander options), mdin by default
-o the name of the output file, mdout by default
-p the parameter/topology file, prmtop by default
-c the set of initial coordinates for this run, inpcrd by default
-r the final set of coordinates from this MD or minimization run, restrt by default
-ref reference coordinates for positional restraints, if this option is specified in the input file, refc by default
-x the molecular dynamics trajectory (if running MD), mdcrd by default
-v the molecular dynamics velocities (if running MD), mdvel by default
-e a summary file of the energies (if running MD), mden by defaults
-inf a summary file written every time energy information is printed in the output file for the current step of the minimization or MD, mdinfo by default

Now that we have the prmtop and inpcrd (gcg_a.x and gcg_b.x) files, all we need to run sander is the mdin file to specify options for this run.

Here's what the input file looks like for our initial minimization run:

 energy minimization for DNA starting structures

 &cntrl

imin=1, maxcyc=100, ncyc=50, ntpr=20,
        ntb=0, 
 /

&ewald
        eedmeth=5,
 /

The section of the input file after the title are the "namelist" variables specified between &cntrl and /. Note the space or empty first column before specification of the namelist control variables and the actual "&cntrl" and "/": this is necessary. Here we have specified that we want to run 100 cycles (maxcyc=100) of minimization (imin=1), starting with 50 steps using steepest descent algorithm (ncyc=100). We want the results to be reported both in the mdout file and the mdinfo file every 100 steps (ntpr=100 ). Since sander assumes that the system is periodic, we need to turn off periodicity explicitly (ntb=0) and also turn on the distance dependent dielectric (eedmeth=5). Detailed explanations for these variables, plus many others, can be found in the Users' Manual.

Here is how we would run sander with this input file, which we will name min.in:

sander -O -i min.in -o gcg_b.min.out -c gcg_b.x \
          -p prmtop -r gcg_b.min.x

Input files: min.in, gcg_b.x, prmtop

Output files: gcg_b.min.out, gcg_b.min.x

We could also use the same min.in to minimize the A-form structure, creating gcg_a.min.out and gcg_a.min.x.

2. Molecular dynamics simulated annealing in vacuum

The main idea in molecular dynamics simulated annealing refinement is to heat up the system such that the molecule of interest has enough energy to explore a wide range of configurational space and get over barriers. Relatively large structural rearrangements are permitted at these high temperatures. As the temperature is cooled gradually, the structural changes proceed in smaller steps, continuing to descend toward the global energy minimum. However, one is not guaranteed to reach the global minimum at the end of a simulated annealing run, and the optimal annealing schedule may be dependent on the system or even the starting structures. In this example, we will use an annealing schedule shown (via trial and error) to produce fairly reasonable structures for this system. You may want to test out different protocols for your particular system by changing the length of the simulation, the temperature profile, non-bonded cutoff, etc.

The annealing schedule can be defined in a sander input file. We will call this anneal.in:

 simulated annealing protocol, 20 ps 
                                                                               
 &cntrl                                                                        
    nstlim=20000, pencut=-0.001, nmropt=1,
    ntpr=200, ntt=1, ntwx=200,
    cut=15.0, ntb=0, vlimit=10,
 /                                                                        
 &ewald
    eedmeth=5,
 /
#                                                                              
#Simple simulated annealing algorithm:                                         
#                                                                              
#from steps 0 to 5000: heat the system to 600K
#from steps 5001-18000: re-cool to low temperatures with long tautp
#from steps 18001-20000: final cooling with short tautp
#                                                                              
 &wt type='TEMP0', istep1=0,istep2=5000,value1=600.,
            value2=600.,    /
 &wt type='TEMP0', istep1=5001, istep2=18000, value1=600.0,
            value2=100.0,     /
 &wt type='TEMP0', istep1=18001, istep2=20000, value1=0.0,
            value2=0.0,     /
                                                                               
 &wt type='TAUTP', istep1=0,istep2=5000,value1=0.4,
            value2=0.4,     /
 &wt type='TAUTP', istep1=5001,istep2=18000,value1=4.0,
            value2=4.0,     /
 &wt type='TAUTP', istep1=18001,istep2=19000,value1=1.0,
            value2=1.0,     /
 &wt type='TAUTP', istep1=19001,istep2=20000,value1=0.1,
            value2=0.05,    /
                                                                               
 &wt type='REST', istep1=0,istep2=3000,value1=0.1,                             
            value2=1.0,  /                                                
 &wt type='REST', istep1=3001,istep2=20000,value1=1.0,                         
            value2=1.0,  /                                                  
                                                                               
 &wt type='END'  /
LISTOUT=POUT                                                                   
DISANG=RST

Again, as in the minimization input file, we define a list of namelist variables in the &cntrl section. Now we are running molecular dynamics (imin=0, default) instead of minimization. We will run 20000 steps (nstlim=20000), and since the default time step is 1 fs, this corresponds to 20 ps of molecular dynamics. We can include distance and angle restraints only if nmropt=1, which are read from the rst file we created above. All the restraints with deviations greater than pencut in the final step will be summarized in the output file. Since we set pencut to a negative number (pencut=-0.001), all of our restraints will be printed. We will update the output every 200 steps (ntpr=200), and we will save the snapshot every 200 steps into an AMBER trajectory file (ntwx=200 ). Constant temperature is used, with the Berendsen temperature coupling algorithm (ntt=1). A cutoff of 15 A is used for non-bonded interactions (cut=15). The variables igb and ntb were introduced earlier. The variable vlimit resets the velocity to the value of VLIMIT once it becomes greater that abs(VLIMIT). This can be used to avoid occasional instabilities in molecular dynamics runs, and is especially important for simulated annealing runs because of the high temperature. It should be set to some value between 10 and 20, which is well above the most probably velocity in a Maxwell-Boltzmann distribution at room temperature. A warning message will be printed whenever the velocities are modified.

When nmropt > 0, sander will continue to read the so called "weight change information". Each weight change information is included between &wt and /, and this namelist is read repeatedly until a namelist &wt statement is found with type='END' . See Input Section 2 of the sander manual for a list of all the types one could use. Here we will first start with changing the target temperature with type='TEMP0'. From step 0 to step 5000, the target temperature bath is maintained at 600 K. Since the initial temperature is 0 K (default), this means the temperature of the system will rise very quickly to ~ 600 K from 0 K, and stay there until ~ step 5000. We continue to define the target temperature, and from step 5001 to step 18000, we will cool the system gradually from 600 K to 100 K. The temperature bath is cooled linearly over this length of time, but that does not necessarily mean the actual temperature of the system follows the progression of the temperature bath exactly. How close the system follows the temperature bath is partly dependent on the variable TAUTP , which is discussed below. Finally, from step 18001 to 20000, the temperature bath is set at 0 K.

Next, we will vary the time constant for heat bath coupling using type='TAUTP' . The smaller the value, the tighter coupling to the heat bath, and hence a less natural trajectory (but follows the target temperature more closely). Generally, values for TAUTP should be in the range of 0.05-5.0 ps for simulated annealing. Here we start first with relatively short TAUTP (< 1 ps) to heat up the system from step 0 to step 5000. Then, we move to a longer TAUTP for the system to experience a more natural trajectory and sample the conformational space (step 5001 to 18000). Then we decrease the TAUTP again at lower temperatures to 1.0 ps from step 18001 to 19000. For the final 1000 steps, we use a much tighter coupling at the final cooling stage (0.1 to 0.05 ps) to bring the system to ~ 0 K.

Finally, we vary the weight of the restraints using type='REST' . This affects the force constants (rk2 and rk3) we had input in the restraint file, and the effective force constant is multiplied by the weight we give the restraints. We will gradually increase the weight from 0.1 to 1 between step 0 and step 3000. Starting with a smaller weight on the restraints ensures that large violations in the initial structure do not cause the system to blow up, and that the structure will quickly adjust in the beginnning of the simulation to get rid of the really high violations. For the rest of the run (step 3001 to 20000), we will keep the weight of the restraints at 1. Following that, we end the weight change section with type='END'.

Following the weight change section, sander reads in the "File redirection commands". See Section 3 of the sander manual for more details. In this example, we choose to output the listing of the restraints and their deviations from the target after the simulation is finished (LISTOUT=POUT ). These deviations will be printed in the normal output file, and how many are printed depends on the value of pencut in the control variable namelist. Also, we will read in the distance and angle restraint information from the specified filename: in this case, we have called our restraint file RST. Hence, DISANG=RST .

We can call sander the same way we did for minimzation. The input file is now anneal.in, and we start with both the minimized B-form structure (gcg_b.min.x) and the minimized A-form structure (gcg_a.min.x).

sander -O -i anneal.in -o gcg_b.ann.out -c gcg_b.min.x -p prmtop \
          -r gcg_b.ann.x -x gcg_b.ann.traj

The output (gcg_b.ann.out) and restart (gcg_b.ann.x) files are created. Here we also create a trajectory file gcg_b.ann.traj, in which snapshots of the simulation are saved every 200 steps (100 total snapshots in this file). Use the same input file starting from A-form to get the following files: gcg_a.ann.out, gcg_a.ann.x, and gcg_a.ann.traj.

During this run, we can continue to look at where the simulation is and how it is doing by viewing the mdinfo file. This is a temporary file that contains a report of the current snapshot of the simulation, and is overwritten and updated every ntpr steps (in our example, we specified ntpr=200). After you start running the simulation, type

more mdinfo

Something close to the following message will appear:

 NSTEP =    20000   TIME(PS) =      20.000  TEMP(K) =     1.75  PRESS =     0.0
 Etot   =     -1022.3375  EKtot   =         3.2893  EPtot      =     -1025.6268
 BOND   =        23.3073  ANGLE   =        96.8446  DIHED      =       303.5306
 1-4 NB =       155.8853  1-4 EEL =      -890.2717  VDWAALS    =      -309.2969
 EELEC  =      -417.5237  EHBOND  =         0.0000  RESTRAINT  =        11.8976
 EAMBER (non-restraint)  =     -1037.5244
 NMR restraints: Bond =    6.501   Angle =     0.000   Torsion =     5.396

You can see the current step number (NSTEP), temperature (TEMP(K)), force field energy (EAMBER (non-restraint)), and NMR violation energy (NMR restraints) for this step ( Bond and Torsion).

Once you have finished the simulated annealing runs, convert all these coordinate files to PDB files using ambpdb, which we will call gcg_a.ann.pdb and gcg_b.ann.pdb.

3. NMR refinement in solvent

The simulated annealing run we had carried out above used the distance dependent dielectric function. This was generally used to mimic the presence of a high dielectric solvent but has no real physical basis. The general idea in NMR refinement is, if given enough experimental data to completely define the structural features, there is no need for a sophisticated molecular mechanics force field. However, the experimental data are generally underdetermined, especially for nucleic acids that lack long-range NOEs specifying tertiary structures. Therefore, the force field can play a large role as well in the resulting structure.

The most reasonable and obvious "next step" of improvement from using distance dependent dielectrics for NMR refinement is to use the generalized Born (GB) implicit solvation model. In this model, the effects of solvent are represented via an electrostatic continuum, and the solvent electrostatic free energy is included as an additional term in the molecular mechanics energetics. See Tsui & Case, JACS 122, 2489, 2000 for more details. We will not actually carry out a GB run in this tutorial, but it is very straightforward should you decide to try it in the future. In the &cntrl section, set igb to 1 to use GB. Additional namelist variables you could add are:

saltcon=0.2,    # set the salt concentration.  in this case, it is set to 0.2 M
offset=0.13,    # a constant offset parameter for the GB radii. 0.13 gives 
                # closest results with Poisson-Boltzmann for nucleic acids

See the sander section in the manual for more options. One thing to definitely keep in mind is that GB simulations are substantially slower than vacuum simulations. For this system, running on one processor of the R10000 SGI Origin took approximately 25 minutes for each round of refinement, while the same simulated annealing run with GB took ~ 150 minutes. However, if you have access to parallel processor machines, running GB on many starting structures is a do-able task.

Of course, the most accurate physical representation of the solvent environment is to include all the solvent molecules explicitly. In this case, the solute is placed in a box of water molecules. Unfortunately, one cannot perform the usual "simulated annealing" because of basic problems with heating up the explicit water molecules. These simulations will be carried out close to room temperature, which means that the molecule may never achieve enough sampling to find the global minimum in a reasonble time scale. Furthermore, the addition of explicit water molecules makes the computation drastically more expensive. For a 10-base pair DNA, explicit solvent simulations can be approximately 50 times slower than vacuum simulations. Therefore, one would generally carry out NMR refinement in explicit solvent starting from structures fairly close to the correct structure (perhaps obtained from vacuum or GB refinement). The purpose of refining in explicit solvent would be to assess the (usually fairly small) structural adjustments in the presence of water molecules, or the behavior of water molecules in the presence of the solute (i.e. DNA minor groove hydration, water-mediated protein-DNA contacts, etc.).

Analyzing the Results

1. Calculating the RMSD and viewing the structures

Many programs and scripts exist to compute the root-mean-square deviation (RMSD) between structures. Here we will use a program suppose written by Jarrod Smith using NAB. suppose is very easy to use and performs most of the necessary tasks. Click here for a short manual.

First, try simply:

suppose gcg_a.ann.pdb gcg_b.ann.pdb

The output will be:

Molecule 1 = gcg_a.ann.pdb
Molecule 2 = gcg_b.ann.pdb

All atom fit.
All atom RMSDs.

RMS pairwise difference between structures = 1.742
RMS difference from mean structure = 0.871

The structures are superimposed on all atoms, and both the RMS pairwise difference between the two structures and the RMS difference from the mean structure (about 2 times smaller than the former value) are reported. Now try:

suppose -fit "::C*,N*,O*,P*" -calc "::C*,N*,O*,P*" -rot gcg_a.ann.pdb gcg_b.ann.pdb

Now we are fitting and calculating only on the heavy atoms. The atom specification (explained in the manual) matches residue numbers after the first colon (if nothing is specifies, all residues are matched) and atom names after the second colon. Here we are specifying all atoms whose names begin with the letters C, N, O or P. Furthermore, using the "-rot" flag, we have created two additional PDB files for the superimposed, rotated molecules: gcg_a.ann.rot.pdb and gcg_b.ann.rot.pdb. We can view these structures using one of many graphics programs. In this tutorial, we will use MidasPlus from UCSF.

At the UNIX prompt type

midas

A window will be brough up -- set the window size to any you like.

At the "command:" line in the midas window, type

open gcg_a.ann.rot.pdb

and

open gcg_b.ann.rot.pdb

Use the LEFT mouse button to rotate and the MIDDLE mouse button to translate. To zoom in and out, use the "sideview" box in the lower right corder. Try moving the little square box at the tip of the focal point closer and farther from the molecule. You could also move the vertical lines to create slab mode and depth cue effects. Click here for a more complete MidasPlus manual. In this tutorial, we will not go into other tricks one could do using midas.

2. Viewing the trajectory

In our simulated annealing runs, we had saved a snapshot at every 200 steps into the .traj trajectory files. We can look at this as a movie using a variety of programs. For this tutorial, we will introduce MOIL-View written by Carlos Simmerling for viewing the trajectory.

First, bring up the MOIL-View window by typing:

moil-view

Click the mouse button to continue. Now click and hold the RIGHT mouse button. You will see a menu. Go down to "File", and there will be another list of options displayed to the right. Select "Read 1st coordinates". Now you will be asked to select the format of the coordinate. For now, select PDB. Now you will be asked to choose from the list of files. Choose gcg_a.ann.pdb, and hit OK. It will ask you whether to use DISTANCE bonds. Since we have the prmtop file with the available connectivity information, choose NO. Then, choose YES for "Load connectivity file?" And for the connectivity file format, choose AMBER. Now, select prmtop from the list of files, and hit OK. A window pops up, saying "Using connectivity. Click to continue". Click. And now the DNA pops up. You will also see a "Toolbox" used to control the mouse mode and other graphical effects, like this one.

On the top of the toolbox is the mouse mode. If "R" is selected, the LEFT mouse button controls Rotation. If "T" is selected, the LEFT mouse button controls Translation. If "S" is selected, the LEFT mouse button controls Size.

Hit the RIGHT mouse button again to obtain the menu. Go down to "Dynamics", and then select "View Trajectory". It will ask you to choose trajectory file format. Choose "AMBER" (not "AMBER PBC" -- that is the periodic boundary condition). From the list of files, choose gcg_a.ann.traj and hit OK. Now it will ask you to input the "Dynamics options": the delay between frames, start and end frame number, number of frames to skip between each frame displayed, and whether you want to 'cycle' the frame, i.e. repeat the movie continuously. For this example, choose 3 for "Delay", 1 for "Start #", 100 for "End #", 0 for "Skip #" and n for "Cycle". Hit OK. You now have the option to save all frames as coordinate files. Choose NO. At this point, you should be watching a movie of the DNA wiggle into a stretched out conformation at high temperatures and gradually moving (more and more slowly) into the final conformation. Also view the gcg_b.ann.traj trajectory. Because you had loaded in a coordinate file already ( gcg_a.ann.pdb), you can go straight to Dynamics -> View Trajectory.

Finally, to quit the program, bring up the menu and choose "Quit". This will automatically create a file named save.current in your current directory. If you recall MOIL-View again from that directory, the data you had input previously will automatically be loaded up. Remove this file before running the program again if you don't want to restore the previous session. For a more detailed menu of MOIL-View, click here .

3. Examining the energies

All the important energy terms are printed in the output files gcg_a.ann.out and gcg_b.ann.out. You will see a summary of the energy terms for every 200 steps because of ntpr=200, similar to the following:

 NSTEP =    20000   TIME(PS) =      20.000  TEMP(K) =     1.75  PRESS =     0.0
 Etot   =     -1022.3375  EKtot   =         3.2893  EPtot      =     -1025.6268
 BOND   =        23.3073  ANGLE   =        96.8446  DIHED      =       303.5306
 1-4 NB =       155.8853  1-4 EEL =      -890.2717  VDWAALS    =      -309.2969
 EELEC  =      -417.5237  EHBOND  =         0.0000  RESTRAINT  =        11.8976
 EAMBER (non-restraint)  =     -1037.5244
 ------------------------------------------------------------------------------

Note that this is in the same format as the mdinfo file, which was explained earlier in this tutorial. After the last step is printed, the "AVERAGES OVER 20000 STEPS" and "RMS FLUCTUATIONS" are also printed. These numbers are basically useless in a simulated annealing run, and what you want to focus on are the numbers printed for the last step.

===============================================================================

      A V E R A G E S   O V E R   20000 S T E P S


 NSTEP =    20000   TIME(PS) =      20.000  TEMP(K) =   477.86  PRESS =     0.0
 Etot   =       776.8759  EKtot   =       897.3781  EPtot      =      -120.5022
 BOND   =       310.8069  ANGLE   =       452.4735  DIHED      =       434.8085
 1-4 NB =       182.2483  1-4 EEL =      -909.1729  VDWAALS    =      -269.1332
 EELEC  =      -356.8683  EHBOND  =         0.0000  RESTRAINT  =        34.3349
 EAMBER (non-restraint)  =      -154.8371
 ------------------------------------------------------------------------------

 NMR restraints: Bond =    6.501   Angle =     0.000   Torsion =     5.396
===============================================================================

      R M S  F L U C T U A T I O N S


 NSTEP =    20000   TIME(PS) =      20.000  TEMP(K) =   146.44  PRESS =     0.0
 Etot   =       556.9202  EKtot   =       275.0007  EPtot      =       284.8576
 BOND   =        97.3622  ANGLE   =       115.1058  DIHED      =        43.0748
 1-4 NB =        11.6590  1-4 EEL =        15.9889  VDWAALS    =        23.0807
 EELEC  =        29.4234  EHBOND  =         0.0000  RESTRAINT  =        12.1874
 EAMBER (non-restraint)  =       272.6703
 ------------------------------------------------------------------------------

After the sections of AVERAGES and RMS FLUCTUATIONS, the "NMR restraints on final step" are printed. All of the restraints are printed because we had set pencut to -0.001, in the format of:


 NMR restraints on final step:

 ------------------------------------------------------------------------------


 Final Restraint Analysis for coords: gcg_b.ann.x                             


 Restraints, deviations, and energy contributions:    pencut =    0.00

 ------------------------------------------------------------------------------
     First atom        Last atom    curr. value target deviation  penalty
 ------------------------------------------------------------------------------
  H1'  DG5    1 --  H3'  DG5    1:    3.739    4.600    0.000    0.000 d    0: 0
  H2'2 DG5    1 --  H2'1 DC     2:    4.348    6.000    0.000    0.000 d    0: 0
  H3'  DG5    1 --  H2'1 DG5    1:    2.361    2.900    0.000    0.000 d    0: 0
  H3'  DG5    1 --  H2'2 DG5    1:    2.727    3.200    0.000    0.000 d    0: 0
  H3'  DG5    1 --  H4'  DG5    1:    2.861    3.300    0.000    0.000 d    0: 0
  H4'  DG5    1 --  H2'2 DG5    1:    3.963    4.100    0.000    0.000 d    0: 0
  H8   DG5    1 --  H5   DC     2:    4.023    6.000    0.000    0.000 d    0: 0
  H1'  DC     2 --  H2'1 DC     2:    3.029    3.400    0.000    0.000 d    0: 0
  H1'  DC     2 --  H2'2 DC     2:    2.376    2.900    0.000    0.000 d    0: 0
  H1'  DC     2 --  H3'  DC     2:    3.920    4.200    0.000    0.000 d    0: 0
  H1'  DC     2 --  H4'  DC     2:    3.094    3.300    0.000    0.000 d    0: 0
  H1'  DC     2 --  H3'  DG     3:    5.663    6.000    0.000    0.000 d    0: 0
  H1'  DC     2 --  H4'  DG     3:    4.312    6.000    0.000    0.000 d    0: 0
  H2'2 DC     2 --  H2'1 DC     2:    1.772    1.800    0.028    0.000 d    0: 0
  H3'  DC     2 --  H2'1 DC     2:    2.180    2.900    0.000    0.000 d    0: 0
  H3'  DC     2 --  H4'  DC     2:    2.819    3.100    0.000    0.000 d    0: 0
  H4'  DC     2 --  H2'2 DG5    1:    5.545    6.000    0.000    0.000 d    0: 0
  H4'  DC     2 --  H2'1 DC     2:    3.754    3.600    0.154    0.763 d    0: 0

  ....
  H6   DC3   20 --  H2'2 DG    19:    2.484    3.300    0.000    0.000 d    0: 0
  H6   DC3   20 --  H3'  DG    19:    4.596    6.000    0.000    0.000 d    0: 0
  H6   DC3   20 --  H1'  DC3   20:    3.740    4.100    0.000    0.000 d    0: 0
  H6   DC3   20 --  H3'  DC3   20:    2.038    3.500    0.000    0.000 d    0: 0
  H6   DC3   20 --  H4'  DC3   20:    4.247    4.200    0.047    0.069 d    0: 0
  H6   DC3   20 --  H5   DC3   20:    2.458    2.700    0.000    0.000 d    0: 0
  O4'  DG5    1 --  C1'  DG5    1:  -38.620  -14.700    0.000    0.000 t
  C1'  DG5    1 --  C2'  DG5    1:   42.885   48.100    0.000    0.000 t
  C2'  DG5    1 --  C3'  DG5    1:  -30.864   -6.700    0.000    0.000 t
  C3'  DG5    1 --  C4'  DG5    1:    9.483   24.200    0.000    0.000 t
  C4'  DG5    1 --  O4'  DG5    1:   18.076   34.000    0.000    0.000 t
  O4'  DC     2 --  C1'  DC     2:  -22.605  -14.700    0.000    0.000 t
  C1'  DC     2 --  C2'  DC     2:   28.033   48.100    0.000    0.000 t
  C2'  DC     2 --  C3'  DC     2:  -21.585   -6.700    0.000    0.000 t
  C3'  DC     2 --  C4'  DC     2:    8.904   24.200    0.000    0.000 t
  C4'  DC     2 --  O4'  DC     2:    8.694   34.000    0.000    0.000 t
  O4'  DG     3 --  C1'  DG     3:  -41.592  -13.900    0.000    0.000 t
  C1'  DG     3 --  C2'  DG     3:   44.801   52.200    0.000    0.000 t
  C2'  DG     3 --  C3'  DG     3:  -30.606  -14.600    0.000    0.000 t
  C3'  DG     3 --  C4'  DG     3:    7.597   26.800    0.000    0.000 t
  C4'  DG     3 --  O4'  DG     3:   21.118   25.600    0.000    0.000 t
  O4'  DT     4 --  C1'  DT     4:  -30.235  -22.100    0.000    0.000 t
  C1'  DT     4 --  C2'  DT     4:   33.335   45.000    0.000    0.000 t
  C2'  DT     4 --  C3'  DT     4:  -23.083    2.600    0.000    0.000 t
  C3'  DT     4 --  C4'  DT     4:    6.081    5.000    1.081    0.011 t
  C4'  DT     4 --  O4'  DT     4:   15.173   43.500    0.000    0.000 t
  O4'  DT     5 --  C1'  DT     5:  -34.845  -22.100    0.000    0.000 t

  ...

  O5'  DA    17 --  C5'  DA    17:  214.595  240.000    0.000    0.000 t
  O5'  DC    18 --  C5'  DC    18:  167.379  240.000    0.000    0.000 t
  O5'  DG    19 --  C5'  DG    19:  171.789  240.000    0.000    0.000 t
  H1   DG5    1 --  N3   DC3   20:    1.847    2.040    0.000    0.000 d    0: 0
  H22  DG5    1 --  O2   DC3   20:    1.728    1.750    0.022    0.015 d    0: 0
  N1   DG5    1 --  N3   DC3   20:    2.852    3.050    0.000    0.000 d    0: 0
  O6   DG5    1 --  H42  DC3   20:    1.782    1.800    0.018    0.010 d    0: 0
  O6   DG5    1 --  N4   DC3   20:    2.802    2.810    0.008    0.002 d    0: 0
  H42  DC     2 --  O6   DG    19:    1.784    1.800    0.016    0.008 d    0: 0
  N3   DC     2 --  H1   DG    19:    1.831    1.840    0.009    0.002 d    0: 0
  N3   DC     2 --  N1   DG    19:    2.845    2.850    0.005    0.001 d    0: 0
  N4   DC     2 --  O6   DG    19:    2.806    2.810    0.004    0.001 d    0: 0

  ...

  N4   DC3   10 --  O6   DG5   11:    2.811    3.010    0.000    0.000 d    0: 0
  O2   DC3   10 --  H22  DG5   11:    1.725    1.750    0.025    0.020 d    0: 0
                                       Total distance penalty:      6.438
                                       Total torsion  penalty:      5.397
|                               RMS deviation from ideal bonds :      0.0090
|                               RMS deviation from ideal angles:      2.307
 ------------------------------------------------------------------------------

The current value, target value, and deviation are printed in Angstroms or degrees depending on the restraint, and the penalty is printed in kcal/mol. The "d" and "t" letters after the penalty refer to "distance" and "torsion", respectively. The "0: 0" at the end of distance restraints are used for peak labeling, which we are not using in this tutorial. At the end of the restraint list, the "Total distance penalty" and "Total torsion penalty" are printed in kcal/mol. Also, the "RMS deviation from ideal bonds" and "RMS deviation from ideal angles" are printed (only for single processor runs) in Angstroms and degrees, respectively. This provides us an idea of whether there are large structural distortions from the equilibrium bond lengths and angles upon adding the NMR restraints. We will discuss more about these problems in the next section.

Since we only have two starting structures, looking at the files individually is no problem. However, when there are a large number of output files to sort through, we will want a summary of all those results. AMBER comes with a perl script called sviol ($AMBERHOME/src/nmr_aux/sviol) to parse sander output restraint analyses, and prints out a nice summary. Simply list all the sander output files as arguments (or use the filename*.out option). For more information on sviol , check out the NMR refinement section of the sander manual.

4. Potential problems

The annealing protocol we are using has been optimized via trial and error for this case, and it works with many other systems and starting structures. However, it is in no way guaranteed to be the best protocol even for this system, and nor is it guaranteed to work for every other system you are studying. We will discuss some problems you may very likely encounter in a simulated annealing run, and what to try in those situations.

First of all, your system may blow up. That means, the relatively large changes made at a high temperature may have led to a structure with unusually high energy, causing an increase in the temperature. Normally, it will adjust back to the lower energy conformation, but sometimes that does not happen. As the structure tries harder to relax but is instead stuck in the high energy conformation, the temperature increases even more, and the move sets become even larger. You can tell this very easily by checking the temperature throughout the simulation (looking continuously at the mdinfo file or reading the output file), and the temperature jump will be extremely noticeable. In those situations, the temperature often never returns to close to 0 K. This could be caused by weird restraints which oppose each other or oppose the force field energy tremendously. One should always double check the set of input restraints. Also, double check to make sure you have set vlimit to somewhere between 10 and 20. The annealing schedule could also make a difference. For example, are you raising the weight of the restraints too quickly? Is the force constants for your restraints too high? Look at the output file to see where the temperature jump starts to occur, and correlate that with your annealing protocol. Does it occur at the step when you suddenly change TAUTP? You could also look at the trajectory to detect if anything suspicious is going on with the structural adjustment. Is the structure moving enough to climb over the barrier? Are you staying at a high temperature for too long or too short? Sometimes it also helps to increase the non-bonded cutoff (cut in the control variables).

Alternatively, one or several of your structures could result in high force field or violation energies. Sometimes this may not be as serious a problem as it sounds. It is often standard practice in NMR refinement to use many starting structures, and "throw out" a couple that give unusually high RMSD or energetics. Of course, you have to be fairly confident that the rest of your structures have converged and are representative of "reality" before you start throwing away structures. Sometimes you may consistently obtain higher force field energies once you put in the NMR restraints. Getting higher electrostatic energies is more justifiable, since we are not modeling the aqueous environment and salt conditions very accurately. However, if the angle or dihedral force field energies increase by a noticeable amount, this is an indication that the NMR restraints you have input are opposing the force field specifications and may be distorting the structures. Again, double check your restraints. You could also increase the weight of the ANGLE and TORSION terms. If you think there might be incorrect constraints, you could set ialtd to 1 (discussed earlier) so that they do not cause large structural distortions.

Running the refinement more than once can also result in improved structural convergence as well as lower energies. Ideally, one could repeat the simulated annealing several times until no improvement is found. Most of the papers on NMR structure determination state that two rounds of simulated annealing were carried out.

Finally, you might find that the structures are consistently distorted in a certain way although the experimental restraints are fairly well-satisfied. This is an indication that you have an underdetermined system, in which the experimental constraints are not enough to specify the exact structure of your system. One example is, your DNA may be unwound as to maximize phosphate-phosphate distances. This is probably because of the underestimation of solvent screening of phosphate-phosphate repulsion. Different groups have different solutions to this problem. One possibility is to reduce the charges of the phosphate groups in the force field (modify the force field libraries that are input into LEaP).

Another option is to use the GB model. You can look at the anneal.in script for GB and try this out in place of the distance-dependent dielectric results shown above. The calculations will take about 5 times as long, and you may need to play with the annealing parameters to get consistent structures. These sorts of explorations are a part of the general procedure for examining convergence in NMR structure calculations. (You can find a sample GB output file, starting from the B-form structure, here.)