(Note: These tutorials are meant to provide
illustrative examples of how to use the AMBER software suite to carry
out simulations that can be run on a simple workstation in a reasonable
period of time. They do not necessarily provide the optimal choice of
parameters or methods for the particular application area.)
Copyright Ross Walker 2015
Section 6: A Practical Example (A-DNA)
Updated for AMBER 15
In the final stage of this tutorial we will use the skills we have learned to run a simulation on a more practical example. Up to this point we have been using canonical B-DNA as our starting structure. An alternative structure for DNA is canonical A-DNA, compared below:
|
|
Which structure is the more stable, however? In this section we will try running a long MD simulation starting from canonical A form poly(A)-poly(T). We will create the input structure, charge, neutralize and solvate it in the same method as we learned earlier. We will then minimize and equilibrate the structure using the same approach as we did for the solvated B-DNA simulation and then finally we shall analyze the results and see what happened. For a more in-depth discussion of A-form and B-form DNA and MD simulations see Cheatham, T.E., Kollman, P.A. J. Mol. Biol., 1996, 259, 434-444.
Note: The simulations in this tutorial may take a considerable amount of time to run - the production MD simulations take about 57,000 s (15.8 hours) [31.6 s per ps] on eight 1.3GHz processors of a SGI ALTIX. While I urge you to set up and run these calculations, the output files at each stage are provided so that you can continue with the remainder of the tutorial without waiting for the simulation to finish running.
6.1) Creating the starting structure
The first stage of setting up our simulation is to obtain a starting structure. As before we shall use nab for this. Previously we instructed nab to construct our DNA 10-mer in right-handed B-form geometry with the keyword ABDNA. Changing this to ADNA will give us what we want. (See the AMBER manual for more details on nab usage). Here is the input file for nab:
m = fd_helix( "adna", "aaaaaaaaaa", "dna" );
putpdb( "a-dna.nab.pdb", m, "-wwpdb");
Run nab to create our pdb file:
./a.out
This should produce the following file: a-dna.nab.pdb
We should also take a quick look at the pdb structure nab created for us. Is it what we want?
Looks good - Right handed A-form, exactly what we want.
6.2) Creating the sander input files
The next step is to create the prmtop and rst7 files for sander. As before we will use xleap for this.
Run xleap:
As before, we must also load our water and ion parameters:
Load our pdb file into a new unit called adna. In the main xleap window enter:
NOTE: If you see all sorts of atom naming errors then you are using an older version of xleap which has not been updated for the new pdb naming conventions. For the purposes of this tutorial simply use the prmtop and rst7 files provided below.
Charge neutralize our system:
Now we will save a prmtop and rst7 file for our neutralized (but not solvated) system. We will use the prmtop file later when we visualize our trajectory.
Output files: a-dna_charges_only.prmtop, a-dna_charges_only.rst7
Next step, solvate our system with a truncated octahedral box containing an 8 angstrom buffer of TIP3P water:
This should have added about 3,683 water molecules. Edit the model to check everything looks ok:
Finally we save our solvated prmtop and rst7 files:
Output files: a-dna_wat.prmtop, a-dna_wat.rst7
6.3) Minimizing the system
The next stage is to minimize the system prior to performing MD in order to remove any bad contacts created by hydrogenation and solvation. For this we will use the same protocol as we used in the previous section of this tutorial.
Stage 1: 500 steps of steepest descent followed by 500 steps of conjugate gradient minimization, with a 500 kcal/mol restraint force on the DNA molecule (Res 1-20)
A-DNA 10-mer: initial minimization solvent + ions &cntrl imin = 1, maxcyc = 1000, ncyc = 500, ntb = 1, ntr = 1, cut = 10.0 / Hold the DNA fixed 500.0 RES 1 20 END END
Input files: a-dna_min1.in,
a-dna_wat.prmtop, a-dna_wat.rst7
Output files: a-dna_min1.out,
a-dna_min1.ncrst
This takes approximately 153 s (2 min 33 sec) on a 1.14 GHz Intel Core i5.
Stage 2: 1,000 steps of steepest descent followed by 1,500 steps of conjugate gradient minimization with no restraints
A-DNA 10-mer: initial minimization solvent + ions &cntrl imin = 1, maxcyc = 2500, ncyc = 1000, ntb = 1, ntr = 0, cut = 10.0 /
Input files: a-dna_min2.in,
a-dna_wat.prmtop, a-dna_min1.ncrst
Output files: a-dna_min2.out,
a-dna_min2.ncrst
This takes approximately 436.5 s (7 min 16.5 sec) on a 1.14 GHz Intel Core i5.
6.4) Running initial equilibration
The next stage is to start running molecular dynamics. We will initially run 20 ps of heating/equilibration at constant volume in the same fashion as we did for the B-DNA explicit water simulation. This will be done using the same protocols, i.e. 20 ps at constant volume periodic boundaries with weak (10 kcal/mol) restraints on the DNA. An initial temperature of 0 K, final temperature of 300 K, Langevin temperature controls, SHAKE constraints on hydrogen atoms and a 2 fs time step. To save disk space we will only write to our output and nc files every 250 steps (0.5 ps). We will also compress our nc file upon completion using gzip. This will give us a compression ratio of about 60 %. This is important because the next step will be to run 1.8 ns of MD which can create some very big mdcrd files:
Note: The timings given below are for running across multiple processors in a parallel computer. AMBER contains two main MD engines, sander and pmemd. It supports runs in parallel using the Message Passing Interface (MPI). Assuming you have MPI installed and have build the MPI executables (refer to the AMBER manual for instructions) you can run simulations across multiple cores and/or nodes by using the MPI versions of the MD engines (sander.MPI and pmemd.MPI). The command lines below are given for single CPU sander. To get better performance you can switch to PMEMD (if you have the full version of AMBER) which is significantly faster than sander and does not support all the functionality. If you wanted to run in parallel - say using 8 cores in a node - you would run with: mpirun -np 8 $AMBERHOME/bin/pmemd.MPI -O -i ...
A-DNA 10-mer: 20ps MD with res on DNA &cntrl imin = 0, irest = 0, ntx = 1, ntb = 1, cut = 10.0, ntr = 1, ntc = 2, ntf = 2, tempi = 0.0, temp0 = 300.0, ntt = 3, gamma_ln = 1.0, nstlim = 10000, dt = 0.002, ntpr = 250, ntwx = 250, ntwr = 10000 / Keep DNA fixed with weak restraints 10.0 RES 1 20 END END
gzip -9 -v a-dna_md1.nc
Input files: a-dna_md1.in,
a-dna_wat.prmtop, a-dna_min2.ncrst
Output files: a-dna_md1.out,
a-dna_md1.ncrst,
a-dna_md1.nc
This takes approximately 193 s (3 mins 13 s) on eight 3.6 GHz processors of a FX-8150 (using mpirun -np 8 $AMBERHOME/bin/sander.MPI)
Note: For this section of the tutorial I have not included links to the nc files since they are too big to reliably host on a webserver. Not having the nc files will, if you haven't run the simulations yourself, prevent you from interactively conducting some of the analysis below. This should not be a problem, however, since I provide links to the analysis output files. I will also provide links later on to smaller versions of these nc files that have the explicit water removed. These will be sufficient to watch the dynamics in VMD.
6.5) Running further equilibration and production MD
In the next stage of our simulation we are going to run a further 1.8 ns of MD. Rather than run a separate equilibration and production phase we will, for convenience, combine the two since our production conditions will be identical to our final equilibration conditions. Here we will use the same conditions as we did for the B-DNA explicit solvent equilibration. Namely we will run without restraints on the DNA. We shall maintain the temperature at 300 K using the Langevin thermostat, run with constant pressure (1 atm) periodic boundaries, use SHAKE constraints on the hydrogen atoms and a 2 fs time step. Since we will be writing 1.8 ns of trajectory we will only write to our output and nc files every 500 fs (250 steps). We shall also compress all of our files to save disk space as 1.8 ns of simulation will lead to some very big files.
Since this simulation will take a long time to run it is probably not a good idea to run all 1.8 ns in one shot. Thus we will actually run 9 x 200 ps simulations with each successive simulation continuing on from the previous one. In this way if a job crashes we only lose the progress of that job and can restart from where we got to. The input file will be the same in each case:
A-DNA 10-mer: 200ps of MD &cntrl imin = 0, irest = 1, ntx = 7, ntb = 2, pres0 = 1.0, ntp = 1, taup = 2.0, cut = 10.0, ntr = 0, ntc = 2, ntf = 2, tempi = 300.0, temp0 = 300.0, ntt = 3, gamma_ln = 1.0, nstlim = 100000, dt = 0.002, ntpr = 250, ntwx = 250, ntwr = 10000 /
As mentioned in Section 3, when running production simulations and especially when running multiple stages as we do here, with ntt=2/3 you should always change the random seed (ig) between restarts. If you are using AMBER 10 (bugfix.26 or later) or AMBER 11 or later then you can achieve this automatically by setting ig=-1 in the ctrl namelist. We do not do this here in the tutorial for reproducibility but it is generally recommended that you do this in your calculations.
We will run the 9 simulations one after the other using the restrt file from the previous run as the input for the next run. While you could run each of these jobs manually using a command line as below (with the relevant filenames incremented by one each time):
gzip -9 -v a-dna_md2.nc
It is probably best to write a script to run all of these jobs for us so that we can leave it running over night. Here is an example script:
#!/bin/csh set AMBERHOME="/usr/local/amber14" set MDSTARTJOB=2 set MDENDJOB=10 set MDCURRENTJOB=$MDSTARTJOB set MDINPUT=0 echo -n "Starting Script at: " date echo "" while ( $MDCURRENTJOB <= $MDENDJOB ) echo -n "Job $MDCURRENTJOB started at: " date @ MDINPUT = $MDCURRENTJOB - 1 $AMBERHOME/bin/sander -O -i a-dna_md_1800ps.in \ -o a-dna_md$MDCURRENTJOB.out \ -p a-dna_wat.prmtop \ -c a-dna_md$MDINPUT.ncrst \ -r a-dna_md$MDCURRENTJOB.ncrst \ -x a-dna_md$MDCURRENTJOB.nc gzip -9 -v a-dna_md$MDCURRENTJOB.nc echo -n "Job $MDCURRENTJOB finished at: " date @ MDCURRENTJOB = $MDCURRENTJOB + 1 end echo "ALL DONE"
In order to be able to run this script we have to ensure that it is executable:
Now we will run the script - we use nohup so that we can logout without killing the job. The ampersand '&' means run in background. We 'pipe' the output to a log file so that we can simply look at the log file to see the current progress of the job:
This will take a long time to run. Each 200ps stage takes approximately 2,260s (38 min) on eight 1.3 GHz processors of a SGI ALTIX.
Here are all of the output files:
Time Span (ps) | mdout | ncrstrt | nc |
20 - 220 | a-dna_md2.out | a-dna_md2.ncrst | a-dna_md2.nc |
220 - 420 | a-dna_md3.out | a-dna_md3.ncrst | a-dna_md3.nc |
420 - 620 | a-dna_md4.out | a-dna_md4.ncrst | a-dna_md4.nc |
620 - 820 | a-dna_md5.out | a-dna_md5.ncrst | a-dna_md5.nc |
820 - 1020 | a-dna_md6.out | a-dna_md6.ncrst | a-dna_md6.nc |
1020 - 1220 | a-dna_md7.out | a-dna_md7.ncrst | a-dna_md7.nc |
1220 - 1420 | a-dna_md8.out | a-dna_md8.ncrst | a-dna_md8.nc |
1420 - 1620 | a-dna_md9.out | a-dna_md9.ncrst | a-dna_md9.nc |
1620 - 1820 | a-dna_md10.out | a-dna_md10.ncrst | a-dna_md10.nc |
6.6) Analyzing the results
The next stage is to analyze our results and look for anything interesting. First of all lets process the output files:
cd summary
process_mdout.perl ../a-dna_md1.out ../a-dna_md2.out ../a-dna_md3.out ../a-dna_md4.out ../a-dna_md5.out ../a-dna_md6.out ../a-dna_md7.out ../a-dna_md8.out ../a-dna_md9.out ../a-dna_md10.out
Tar file containing the summary files: a-dna_output_summary.tar.gz
Take a look at the summary files and you will see that after initial relaxation the temperature, pressure, volume etc stay fairly constant. This indicates that there are no major problems with the simulation.
Lets calculate the RMSd of the backbone atoms and see what is happening to the DNA structure. While we are at it, we will also get cpptraj to re-image the whole trajectory and also remove the waters from the trajectory file. The reason we do this instead of just hiding the waters in VMD, is because our nc files are now considerably bigger than previously and loading everything into VMD may cause it to run very slowly. It is also fiddly to load 10 separate nc files into a single molecule. This is why we needed the a-dna_charges_only.prmtop file. Thus we will get cpptraj to strip the waters and append all of the nc files to a single file. Here is the input file for cpptraj (note, we don't need to unzip the files before use, cpptraj does this on the fly):
trajin a-dna_md1.nc trajin a-dna_md2.nc trajin a-dna_md3.nc trajin a-dna_md4.nc trajin a-dna_md5.nc trajin a-dna_md6.nc trajin a-dna_md7.nc trajin a-dna_md8.nc trajin a-dna_md9.nc trajin a-dna_md10.nc trajout a-dna_0-1820ps_no_wat.nc rms first out a-dna_0-1820ps.rmsfit @P,O3',O5',C3',C4',C5' time 0.50 center :1-20 image familiar strip :WAT
Output files: a-dna_0-1820ps.rmsfit, a-dna_0-1820ps_no_wat.nc [20MB]
First of all, lets plot the backbone atom RMSd fit:
Interesting... The final structure differs from the initial structure by over 5 angstroms. This is quite a large RMSd and would in some cases indicate a serious problem with the MD simulation. However, the RMSd increases slowly between 20 ps (when the restraints were switched off) and 500 ps. From 500 ps to end of the simulation the RMSd remains more or less constant at between 4.5 and 5.5 angstroms. This suggests that our starting structure changed slowly changed over the first 500 ps of our simulation into a more stable structure. This more stable structure then remains intact for the remainder of the simulation. Could it be that our A-form DNA converted to B-form? Lets visualize the trajectory and note what we see.
As already mentioned the trajectories including water are a little large to open comfortably in VMD so we shall use the stripped trajectory cpptraj created for us. For this we need to use the prmtop file we created before solvation a-dna_charges_only.prmtop.
Load the a-dna_charges_only.prmtop file as AMBER7 Parm. Then load the a-dna_0-1820ps_no_wat.nc file into this molecule as "AMBER Coordinates with Periodic Box" (Make sure you unzip it first). You should find that the trajectory loads just showing the DNA and charges (no water). Take a good look at the trajectory, have a look down the minor groove of the DNA and observe what happens to the structure:
0 ps Minor Groove (A-Form) |
0 ps Major Groove (A-Form) |
100 ps Minor Groove (Still A-Form) |
200 ps Minor Groove (Still A-Form) |
300 ps Minor Groove (A-Form) |
400 ps Minor Groove (Distorted A-Form) |
500 ps Minor Groove (B-Form) |
600 ps Minor Groove (B-Form) |
700 ps Minor Groove (B-Form) |
800 ps Minor Groove (B-Form) |
1820 ps Minor Groove (B-Form) |
1820 ps Major Groove (B-Form) |
As you can see somewhere between step 600 and 800 (300 ps and 400 ps) our structure begins to change from A-form DNA to B-form, you may need to rotate the structure in order for this to become apparent. By step 1,000 (500 ps) the conversion is complete. From this point on the structure of our DNA remains in B-form.
This demonstrates a practical use of such an MD simulation. We have been able to use molecular dynamics to move away from a local minima (our A-form DNA) to a lower minima (B-form DNA).
Finally, lets calculate the average structure over the 1,800 ps of our simulation in which the DNA was not restrained, nc files 2 to 10. Again, for this we will use cpptraj. Here is the input file to create an average pdb from all 3600 structures in our constant pressure nc trajectory file:
trajin a-dna_md2.nc trajin a-dna_md3.nc trajin a-dna_md4.nc trajin a-dna_md5.nc
trajin a-dna_md6.nc
trajin a-dna_md7.nc
trajin a-dna_md8.nc
trajin a-dna_md9.nc
trajin a-dna_md10.nc
average a-dna_20-1820ps_average.pdb pdb nowrap
Output file: a-dna_20-1820ps_average.pdb
And there you have it, the average structure over the entire constant pressure stage of our simulation.
You now have the necessary knowledge to set-up and run your own simulations. Have a go and see how you get on. To learn about other aspects of AMBER, including more complex simulations that involve creating your own residues please see the AMBER tutorials.
Feel free to use what you have learned to experiment with alternative starting geometries. Do they all change to B-DNA if run for long enough? What about in a mixture of 80% water, 20% ethanol?
If you have any comments on this tutorial please email me (Ross Walker). For queries concerning AMBER or MD simulations in general please post to the AMBER mailing list - see http://ambermd.org/ for details on how to subscribe.
(Note: These tutorials are meant to provide
illustrative examples of how to use the AMBER software suite to carry
out simulations that can be run on a simple workstation in a reasonable
period of time. They do not necessarily provide the optimal choice of
parameters or methods for the particular application area.)
Copyright Ross Walker 2015