Replica Exchange

(Note: These tutorials are meant to provide illustrative examples of how to use the AMBER software suite to carry out simulations that can be run on a simple workstation in a reasonable period of time. They do not necessarily provide the optimal choice of parameters or methods for the particular application area.)
HTML Pages Copyright Ross Walker 2008

Replica Exchange

Formerly known as TUTORIAL A7

By: Dan Roe, Asim Okur and Carlos Simmerling (SUNY Stony Brook) Adapted for AMBER Website by: Ross Walker (SDSC)

Please note: This tutorial will only work with AMBER v10 or later.

1) Before you begin

This tutorial is intended to be a basic guideline for running Replica Exchange (REMD) simulations with AMBER v10. This tutorial assumes that you have experience running MD simulations with AMBER and know about the terminology and typical variables used for standard MD simulations. If you are NOT familiar with setting up a system, equilibration and running MD simulations please consult the more basic tutorials BEFORE attempting to run REMD simulations. You should also be familiar with the REMD theory. Make sure to read the paper by Sugita and Okamoto ("Replica-exchange molecular dynamics method for protein folding." Chemical Physics Letters 314(1-2): 141-151, 1999) and the relevant sections in the AMBER manual to learn about REMD.

2) Setting up the system

In this section we will build the system via Leap and run minimization via Sander. Here a brief description of the system and the procedure used to generate the topology and coordinate files.

For this tutorial we will build the peptide Ala₁₀ in Leap. We will build the linear sequence, and we will use the ff99SB force field and the "mbondi2" radii that are appropriate for the igb=5 option in sander. Here is the input file for leap:

leap.in

source leaprc.ff99SB
set default PBradii mbondi2
m = sequence { ACE ALA ALA ALA ALA ALA ALA ALA ALA ALA ALA NHE }
saveamberparm m ala10.prmtop ala10.inpcrd
quit

Have tleap read this file with the following command: "tleap -f leap.in"

This should produce ala10.prmtop and ala10.inpcrd.

Make sure to check the leap output and log file for any errors.

Next we will run minimization to relax the input structure before we run molecular dynamics. The following input file should give a minimized structure suitable for starting MD.

min.in

energy minimization
&cntrl
imin=1, maxcyc=1000, ncyc=500,
ntwr = 1000, ntpr = 100,
cut = 999.0, rgbmax = 999.0,
ntb = 0, igb = 5, saltcon = 0.0,
/

$AMBERHOME/exe/sander -O -i min.in -o min.out -p ala10.prmtop -c ala10.inpcrd -r min.rst

3) Determining the number of replicas and temperatures for each replica

Before starting a replica exchange simulation we need to determine the number of replicas needed and the temperatures for each replica for the simulated system. To obtain reasonable exchange probabilities we need to have overlaps between the potential energy distributions of neighboring replicas. Closer temperatures would increase this overlap and exchange probabilities but result in more replicas for the given temperature range. Finding an optimum temperature distribution is difficult and there are many ways to determine the temperatures which are beyond the scope of this tutorial. You are advised to consult the literature for example on calculating optimum temperatures. Here we will just use an example set of temperatures so you understand the mechanics of running REMD simulations.

To determine the number of replicas and temperature distribution one typically needs to determine the number of atoms in the system. This information can be found in both the topology and coordinate files. The first number in a coordinate (or restart) file shows the number of atoms.

Alternatively in the topology file the system description again starts with the number of atoms:

For both cases the number of atoms is highlighted with the blue arrow.

After determining the number of atoms (in this case 109) for the system one can determine reasonable numbers of replicas to use and a reasonable initial temperature distribution. Typically the number of replicas would be related to the square root of the number of atoms and the temperature distribution would be chosen to be a geometric progression. There is much discussion in the literature about this and you are advised to do a thorough literature search.

Usually REMD simulations are run for a temperature range between 270 - 600K. Depending on your system a different temperature range may be required, however, an in-depth discussion of this is beyond the scope of this tutorial. We always need to have an even number of replicas since exchanges are always attempted pair-wise. For the purposed of this tutorial we will use 8 replicas and we will use the following temperature distribution.

269.5
300.0
334.0
371.8
413.9
460.7
512.9
570.9

You should copy the list of temperatures produced and copy them into a new file called "temperatures.dat" which will be used by the setup scripts later on.

4) Generating chirality restraints

To enhance conformational sampling REMD runs typically need to go to very high temperatures (ca. 500-600K). At such temperatures unwanted rotations around the peptide bond might occur leading to non-physical chiralities. To prevent this we need to use chirality restraints on the backbone. We can use the "makeCHIR_RST" script that is provided with AMBER 10 in the exe directory to do this.

To use this script we need a pdb file for our system. We can obtain the pdb file through the ptraj or ambpdb programs available with AMBER or alternatively we could have had leap create one during the system setup stage.

> $AMBERHOME/exe/ambpdb -p ala10.prmtop < ala10.inpcrd > ala10.pdb

Once we have the pdb file we can generate the restraints file using the following command:

> $AMBERHOME/exe/makeCHIR_RST ala10.pdb ala10_chir.dat

Take a look at this file and make sure you understand what it is doing, you can find more information about distance, angle and dihedral restraints in the AMBER manual.

5) Equilibration

Before starting the REMD simulations we will run short simulations with each replica to equilibrate them to their individual temperatures. This step is not strictly necessary since sander can assign velocities but this step will show how sander can be used to perform multiple independent (non-exchanging) MD runs. Running all replicas using the same inpcrd can also cause instability since they will initially have the same energies and rapidly exchange across temperatures, perhaps causing more distortion in the initial coordinates than is desired. Since we need to independently equilibrate each replica we have two choices. We can either run each equilibration individually, so running eight separate MD simulations or we can use Multisander to perform the task in parallel. Multisander is a mode of running sander that simply runs a single parallel MPI sander job that is generating multiple MD simulations rather than a single simulation. This works very much like running REMD, so setting it up and running it is a good introduction to the usage for REMD. This type of simulation will also run efficiently across a parallel cluster if you have access to one.

Generating the input files for multisander

Since we have 8 replicas with temperatures from 269.5 to 570.9 K we need 8 sets of input files for multisander. The script "setup_equilibrate_input.x" will generate the input files (mdin) for us. Take a look at this file and make sure you understand how it works. The purpose of this script is to read the temperatures from the temperatures.dat file and adjust temp0 of each mdin file to match. The script also ensures that the seed for the random number generator is different for each structure. This is very important, especially when running Langevin simulations. Without this all sorts of weird correlations can occur during the MD. This script also sets up the group file for us which will be discussed shortly.

(Note: when running multisander it is important that every file that sander writes to be defined with a unique name. If this is not done then performance can be very very poor. Hence we must specify unique names for mdinfo as well as mdout, rst and mdcrd.)

For this exercise we will run 200ps (nstlim=100000, dt=0.02) of simulation for each replica slowly heating them to their respective temperatures using a Langevin thermostat (ntt=3, gamma_ln=1.0). To read the chirality restraints we use "nmropt = 1" to the restraints on and "DISANG=ala10_chir.dat" defines the file to read the restraints from. Since all simulations will use the same restraints we do not need to duplicate this file. We start by making a single input file called mdin. This file is shown below. Note the " temp0=XXXXX" and "ig=RANDOM_NUMBER", these will be filled in automatically by the setup_equilibrate_input.x script.

equilibrate.mdin

Equilibration
 &cntrl
   irest=0, ntx=1, 
   nstlim=100000, dt=0.002,
   irest=0, ntt=3, gamma_ln=1.0,
   temp0=XXXXX, ig=RANDOM_NUMBER,
   ntc=2, ntf=2, nscm=1000,
   ntb=0, igb=5,
   cut=999.0, rgbmax=999.0,
   ntpr=500, ntwx=500, ntwr=100000,
   nmropt=1,
 /
 &wt TYPE='END'
 /
DISANG=ala10_chir.dat

Since we will run 8 independent simulations in one multisander run we need a "groupfile" to define the individual input and output files for each simulation. An explanation of the groupfile format can be found in the AMBER manual. Essentially it is just a set of nreplica lines specifying what we would normally have included on the command line for a regular sander run. The setup_equilibrate_input.x script will produce this for us automatically.

Once we have generated the generic mdin file we can then run the setup_equilibrate_input.x to produce the multiple input files and the groupfile.

> ./setup_equilibrate_input.x

This should produce the following output:

8
TEMPERATURE: 269.5 K ==> FILE: equilibrate.mdin.001
TEMPERATURE: 300.0 K ==> FILE: equilibrate.mdin.002
TEMPERATURE: 334.0 K ==> FILE: equilibrate.mdin.003
TEMPERATURE: 371.8 K ==> FILE: equilibrate.mdin.004
TEMPERATURE: 413.9 K ==> FILE: equilibrate.mdin.005
TEMPERATURE: 460.7 K ==> FILE: equilibrate.mdin.006
TEMPERATURE: 512.9 K ==> FILE: equilibrate.mdin.007
TEMPERATURE: 570.9 K ==> FILE: equilibrate.mdin.008
N REPLICAS  = 8
 Done.

Along with a groupfile that should be as follows:

equilibrate.groupfile

-O -rem 0 -i equilibrate.mdin.001 -o equilibrate.mdout.001 -c min.rst -r equilibrate.rst.001 -x equilibrate.mdcrd.001 -inf equilibra
te.mdinfo.001 -p ala10.prmtop
-O -rem 0 -i equilibrate.mdin.002 -o equilibrate.mdout.002 -c min.rst -r equilibrate.rst.002 -x equilibrate.mdcrd.002 -inf equilibra
te.mdinfo.002 -p ala10.prmtop
-O -rem 0 -i equilibrate.mdin.003 -o equilibrate.mdout.003 -c min.rst -r equilibrate.rst.003 -x equilibrate.mdcrd.003 -inf equilibra
te.mdinfo.003 -p ala10.prmtop
-O -rem 0 -i equilibrate.mdin.004 -o equilibrate.mdout.004 -c min.rst -r equilibrate.rst.004 -x equilibrate.mdcrd.004 -inf equilibra
te.mdinfo.004 -p ala10.prmtop
-O -rem 0 -i equilibrate.mdin.005 -o equilibrate.mdout.005 -c min.rst -r equilibrate.rst.005 -x equilibrate.mdcrd.005 -inf equilibra
te.mdinfo.005 -p ala10.prmtop
-O -rem 0 -i equilibrate.mdin.006 -o equilibrate.mdout.006 -c min.rst -r equilibrate.rst.006 -x equilibrate.mdcrd.006 -inf equilibra
te.mdinfo.006 -p ala10.prmtop
-O -rem 0 -i equilibrate.mdin.007 -o equilibrate.mdout.007 -c min.rst -r equilibrate.rst.007 -x equilibrate.mdcrd.007 -inf equilibra
te.mdinfo.007 -p ala10.prmtop
-O -rem 0 -i equilibrate.mdin.008 -o equilibrate.mdout.008 -c min.rst -r equilibrate.rst.008 -x equilibrate.mdcrd.008 -inf equilibra
te.mdinfo.008 -p ala10.prmtop

As you can see it resembles the regular sander command line options. The -rem 0 specifies that we are using multisander but that do not wish to do actual replica exchange. The script should also produce eight mdin files:

equilibrate.mdin.001	equilibrate.mdin.002	equilibrate.mdin.003	equilibrate.mdin.004
equilibrate.mdin.005	equilibrate.mdin.006	equilibrate.mdin.007	equilibrate.mdin.008

You should check that each mdin file produced has the correct temperature and that each contains a different value for the random number seed (ig=xxxx).

We are now ready to run the equilibration. When we want sander to run in multisander mode our command line options are somewhat different to a regular run. Firstly we have to use sander.MPI since multisander only runs in parallel. In addition the number of cpus we specify for the mpirun command must be a multiple of the number of replicas, so in this case a minimum of 8. While you can run such a run on a single processor desktop it will generally be much more efficient to run such a simulation on at least eight cores, either an 8 core desktop machine or over a parallel cluster. You can also specify more than eight cpus if more are available. For example specifying 16 cpus will use 2 cpus per replica. The command line I used was as follows (this may need adjusting based on your mpi implementation, queuing system, cluster architecture etc.)

mpirun -np 8 $AMBERHOME/exe/sander.MPI -ng 8 -groupfile equilibrate.groupfile

Here the -ng 8 option specifies that there are 8 groups and the -groupfile specifies the name of the groupfile to be read and indicates to sander.MPI that we want to run in multisander mode.

Depending on the speed of your system this could take anywhere from a couple of minutes to several hours. You can check the progress by opening a separate terminal and tailing one of the mdout files. When it is complete you should have 8 rst files, 8 mdcrd files and 8 mdout files. A tar of them is available here: equilibrate_output.tar.gz

You should check each output file to make sure that it read the restraints correctly, ran without problems, had the correct target temperature etc. You should also visualize the mdcrd files to make sure there are no problems.

Now we have equilibrated our initial structures we can move on to running the actual REMD simulations.

6) REMD Simulations

Running REMD is essentially the same as running multisander. During REMD each multisander job will periodically communicate and attempt an exchange. We need to use additional variables in the input files for each replica and the groupfile to be able to run REMD.

Input files for REMD

As with multisander we will need an input file for each replica. The input files will be the same except for the target temperature (temp0) and the random number seed (ig). There are also some other minor changes compared with non-REMD simulations. Here are the necessary changes and explanations:

nstlim = 500 During REMD this variable determines the number of MD steps between each exchange attempt. In this case we set it to 500 steps which with a 2fs timestep gives us 1ps between exchange attempts.
numexchg = 1000 This specifies the number of exchange attempts during the simulation. The total simulation length will be numexch * nstlim * dt which in this case yields 1ns of simulation.

Note that unlike normal MD, trajectory output can be written less frequently than nstlim steps. In other words we can set nstlim=500 and still use ntwx=1000. This allows us to write less frequently than once per exchange (this was not supported in AMBER 9 REMD). This also allows us to have very short exchange frequencies, say once every 10 steps without generating huge trajectory files.

We need to generate the input files again. First we create a generic mdin file:

remd.mdin

Equilibration
 &cntrl
   irest=0, ntx=1, 
   nstlim=500, dt=0.002,
   irest=0, ntt=3, gamma_ln=1.0,
   temp0=XXXXX, ig=RANDOM_NUMBER,
   ntc=2, ntf=2, nscm=1000,
   ntb=0, igb=5,
   cut=999.0, rgbmax=999.0,
   ntpr=100, ntwx=1000, ntwr=100000,
   nmropt=1,
   numexchg=1000,
 /
 &wt TYPE='END'
 /
DISANG=ala10_chir.dat

We then have a slightly modified version of the setup script above to build the multiple mdin files and the groupfile.

setup_remd_input.x

#!/bin/bash -f

if [ -f groupfile ];
then
  rm groupfile
fi

nrep=`wc temperatures.dat | awk '{print $1}'`
echo $nrep
count=0
for TEMP in `cat temperatures.dat`
do
  let COUNT+=1
  REP=`printf "%03d" $COUNT`
  echo "TEMPERATURE: $TEMP K ==> FILE: remd.mdin.$REP"
  sed "s/XXXXX/$TEMP/g" remd.mdin > temp
  sed "s/RANDOM_NUMBER/$RANDOM/g" temp > remd.mdin.$REP
  echo "-O -rem 1 -remlog rem.log -i remd.mdin.$REP -o remd.mdout.$REP -c equilibrate.rst.$REP
-r remd.rst.$REP -x remd.mdcrd.$REP -inf remd.mdinfo.$REP -p ala10.prmtop" >> remd.groupfile

  rm -f temp
done
echo "#" >> groupfile

echo "N REPLICAS  = $nrep"
echo " Done."

Note that we have made some changes to the groupfile generation part of this script:

-rem 1 Run REMD.
-remlog rem.log Save the log file into "rem.log" to keep track of each replica.
We are using the restart files from the equilibration as the input coordinates for each replica.

Running this script creates the necessary input files:

> ./setup_remd_input.x

remd.groupfile
remd.mdin.001	remd.mdin.002	remd.mdin.003	remd.mdin.004
remd.mdin.005	remd.mdin.006	remd.mdin.007	remd.mdin.008

We are now ready to run the replica exchange MD run. We run this just as we did for the equilibration:

> mpirun -np 8 $AMBERHOME/exe/sander.MPI -ng 8 -groupfile remd.groupfile

Depending on the speed of your system this could take anywhere from a couple of minutes to several hours. You can check the progress by opening a separate terminal and tailing one of the mdout files. When it is complete you should have 8 rst files, 8 mdcrd files, 8 mdout files and a rem.log file. A tar of them is available here: remd_output.tar.gz

You should note that the output here are for REPLICAS, not for temperatures. This is a change from AMBER 9 and allows each replica MD to write to the same output files, rather than closing and re-opening the file on each exchange. This greatly improves performance, especially for short exchange frequencies. However, this means that we need to carry out extra processing on the output files in order to obtain data for ensembles sampled at a specific temperature. This processing is described below.

An additional file "rem.log" is also created which has summary information about each replica and exchanges. The log files looks like:

# Replica Exchange log file
# numexchg is       1000
# REMD filenames:
#   remlog= rem.log
#   remtype= rem.type
# Rep#, Velocity Scaling, T, Eptot, Temp0, NewTemp0, Success rate (i,i+1), ResStruct#
# exchange        1
 1     -1.00      0.00     25.08    269.50    269.50      0.00      -1
 2     -1.00      0.00     16.21    300.00    300.00      0.00      -1
 3     -1.00      0.00     49.19    334.00    334.00      0.00      -1
 4      1.06      0.00     57.15    371.80    413.90      2.00      -1
 5      0.95      0.00     66.86    413.90    371.80      0.00      -1
 6     -1.00      0.00     60.58    460.70    460.70      0.00      -1
 7     -1.00      0.00     67.98    512.90    512.90      0.00      -1
 8     -1.00      0.00     93.67    570.90    570.90      0.00      -1
# exchange        2
 1     -1.00    244.55      0.95    269.50    269.50      0.00      -1
 2     -1.00    251.98     13.77    300.00    300.00      0.00      -1
 3      1.06    241.54     31.16    334.00    371.80      1.00      -1
 4      0.95    311.46     29.45    371.80    334.00      1.00      -1
 5      1.06    297.63     39.26    413.90    460.70      1.00      -1
 6      0.95    381.98     34.99    460.70    413.90      0.00      -1
 7     -1.00    430.62     54.54    512.90    512.90      0.00      -1
 8     -1.00    424.99     80.56    570.90    570.90      0.00      -1
# exchange        3
 1     -1.00    238.63     11.17    269.50    269.50      0.00      -1
 2     -1.00    291.79     18.81    300.00    300.00      0.00      -1
 3     -1.00    325.13     23.99    334.00    334.00      0.67      -1
 4     -1.00    371.31     41.80    371.80    371.80      0.67      -1
 5     -1.00    375.01     54.78    413.90    413.90      0.67      -1
 6     -1.00    364.77     64.48    460.70    460.70      0.00      -1
 7     -1.00    449.13     81.14    512.90    512.90      0.00      -1
 8     -1.00    599.45     92.80    570.90    570.90      0.00      -1
# exchange        4
 1     -1.00    252.84     12.33    269.50    269.50      0.00      -1
 2     -1.00    311.67     31.03    300.00    300.00      0.00      -1
 3     -1.00    302.79     19.00    334.00    334.00      0.50      -1
 4     -1.00    402.99     44.30    371.80    371.80      0.50      -1
 5      1.06    398.00     54.78    413.90    460.70      1.00      -1
 6      0.95    461.72     64.51    460.70    413.90      0.00      -1
 7     -1.00    449.58     72.89    512.90    512.90      0.00      -1
 8     -1.00    628.56    111.20    570.90    570.90      0.00      -1

The columns provide (in this order) the replica number (corresponding to the order of files listed in the groupfile), the velocity scaling factor if an exchange has occurred (-1 otherwise), the instantaneous system temperature (but note that the thermostat temperature is used for the exchange calculation), the potential energy, the thermostat bath temperature during the previous MD stage, the new bath temperature for the next MD stage, the overall running average success rate between the Temp0 temperature and the next higher temperature (0 to 10, but it can be greater than 1 at the beginning since we divide the # of successes by 0.5* the total attempts since each pair is attempted only every other exchange), and finally a last integer that is used only for reservoir REMD, and is -1 otherwise.

7) Analyzing REMD results

Mdout Files

The mdout file will have additional data in a REMD run that described the information being used for the exchange calculation. An example is shown below, where the normal sander energy output (specified by ntpr) is written, along with information about the replica number (REPNUM) and exchange attempt counter. This section is followed by additional information about the exchange attempt, such as the energy difference between replicas, the Metropolis probability and the random number that was generated for comparison. In this example, the exchange was successful.

===============================================================================

 NSTEP =      500   TIME(PS) =     205.000  TEMP(K) =   287.31  PRESS =     0.0
 Etot   =        96.6216  EKtot   =        77.6484  EPtot      =        18.9731
 BOND   =        23.5875  ANGLE   =        51.8775  DIHED      =        84.7745
 1-4 NB =        17.5760  1-4 EEL =       629.2285  VDWAALS    =       -18.4421
 EELEC  =      -687.3208  EGB     =       -82.4913  RESTRAINT  =         0.1832
 EAMBER (non-restraint)  =        18.7899
 TEMP0  =       269.5000  REPNUM  =              1  EXCHANGE#  =              5
 ------------------------------------------------------------------------------

 NMR restraints: Bond =    0.000   Angle =     0.000   Torsion =     0.183
===============================================================================
REMD: myEptot=         18.9731 myTargetTemp= 269.50 mytemp= 287.31
==========================REMD EXCHANGE CALCULATION==========================
Exch=          6 RREMD= 0
Replica          Temp= 269.50 Indx=  1 Rep#=  1 EPot=      18.97
Partner          Temp= 300.00 Indx=  2 Rep#=  2 EPot=      19.93
Not controlling exchange.
Rand=       0.274923E+00 MyScaling=       1.06 Success= T
========================END REMD EXCHANGE CALCULATION========================
REMD: checking to see if bath T has changed: 269.50->300.00
REMD: scaling velocities by    1.055 to match new bath T  300.000
|  # of SOLUTE  degrees of freedom (RNDFP):     272.
|  # of SOLVENT degrees of freedom (RNDFS):       0.
|  NDFMIN =     272.     NUM_NOSHAKE =      0     CORRECTED RNDFP =     272.
|  TOTAL # of degrees of freedom (RNDF) =     272.

Analyzing REMD trajectories

REMD will produce trajectories for each replica. Each trajectory represents the snapshots sampled in the continuous MD run, which changed in thermostat temperature during the run. Furthermore, this trajectory is not standard Amber format- it has additional information about the temperature at each snapshot so that ptraj can reconstruct the data for individual temperatures (or standard format trajectories for each replica). In the next section we provide scripts and explanations for each of these processes: extracting temperature trajectories in standard format from the entire set of REMD trajectories, and extracting a standard format replica trajectory from an individual REMD trajectory.

Here is an example of a replica format trajectory written by sander. Note the "REMD" lines. Instead of using ptraj to convert to a standard format replica trajectory, one could simply use "grep -v REMD" to remove the REM lines. However, if one wants to obtain temperature-based trajectories or data, then using ptraj to process the REMD format trajectories is required. The "REMD" line gives the following information: replica number, exchange number, MD step number, and thermostat temperature.

REMD         1        4     2000  269.500
   4.115  25.625  -7.082   5.088  25.200  -6.837   5.449  24.747  -7.761   4.926
  24.435  -6.078   5.977  26.261  -6.288   7.154  26.484  -6.594   5.485  26.960
  -5.243   4.582  26.747  -4.844   6.317  28.031  -4.632   6.687  28.768  -5.344
   5.345  28.815  -3.711   5.867  29.641  -3.227   4.515  29.203  -4.301   5.025
  28.096  -2.958   7.520  27.594  -3.850   8.407  28.412  -3.559   7.513  26.336
  -3.456   6.761  25.725  -3.739   8.736  25.654  -2.967   9.600  26.250  -3.263
   8.729  25.502  -1.433   8.610  26.505  -1.023   7.878  24.891  -1.131   9.687
  25.057  -1.165   8.913  24.288  -3.664   7.972  23.524  -3.695  10.115  24.052
  -4.154  10.822  24.713  -3.867  10.388  22.890  -5.039   9.462  22.692  -5.579
  11.441  23.331  -6.059  12.302  23.846  -5.632  11.821  22.459  -6.592  10.927
  23.890  -6.841  10.650  21.615  -4.156  10.722  21.588  -2.896  10.939  20.520
  -4.950  10.833  20.723  -5.933  11.205  19.203  -4.419  10.480  19.162  -3.606
  11.004  18.105  -5.433  11.437  17.171  -5.075   9.948  17.892  -5.602  11.563
  18.409  -6.318  12.652  19.093  -3.826  13.680  19.414  -4.445  12.888  18.699
  -2.555  12.098  18.336  -2.042  14.318  18.608  -2.045  14.882  19.440  -2.467
  14.297  18.746  -0.523  15.348  18.705  -0.239  13.856  19.711  -0.271  13.744
  17.964  -0.003  15.080  17.340  -2.475  14.441  16.387  -2.918  16.425  17.367
  -2.411  16.805  18.275  -2.185  17.266  16.394  -3.028  17.109  16.366  -4.107
  18.656  16.752  -2.761  18.874  17.149  -1.769  19.399  15.993  -3.007  18.955
  17.534  -3.459  17.015  14.967  -2.481  16.636  14.827  -1.306  17.252  13.910
  -3.289  17.711  14.148  -4.156  16.967  12.515  -2.978  15.880  12.436  -2.956
  17.435  11.759  -4.266  17.204  10.697  -4.187  17.015  12.244  -5.147  18.519
  11.816  -4.366  17.611  12.092  -1.694  16.953  11.377  -0.936  18.900  12.398
  -1.471  19.434  12.736  -2.259  19.663  12.239  -0.217  19.055  11.655   0.474
  20.932  11.483  -0.647  20.724  10.769  -1.444  21.824  12.038  -0.938  21.282
  10.963   0.245  19.785  13.621   0.542  20.875  13.975   0.968  18.624  14.306
   0.805  17.810  14.041   0.269  18.620  15.479   1.739  19.469  16.136   1.553
  17.397  16.330   1.455  17.506  17.269   1.999  17.341  16.447   0.373  16.510
  15.815   1.824  18.781  15.092   3.230  18.626  13.945   3.612  19.210  15.971
   4.095  19.435  16.883   3.724  19.343  15.881   5.539  18.743  15.108   6.019
  20.826  15.523   5.765  21.290  16.498   5.917  20.800  14.993   6.717  21.265
  14.962   4.940  19.116  17.267   6.188  18.864  18.323   5.605  19.189  17.409
   7.529  19.504  16.636   8.097  19.014  18.339   7.883
REMD         1        6     3000  269.500
   4.389  26.138  -9.505   5.070  25.434  -9.026   5.214  24.609  -9.724   4.602
  25.014  -8.137   6.383  26.047  -8.737   7.442  25.556  -9.136   6.354  27.293

Extracting replica trajectories

The following script can be used with ptraj to read in the REMD trajectories in the modified format written by sander and then write out standard trajectories that can be used in other programs that read the AMBER format. The script looks like a normal ptraj script, with the exception that it loops over all replicas, and ptraj will automatically recognize that the trajin uses the modified REMD format. If analysis of data from the replica trajectories is desired, this can be done as well. In the example below, a distance is calculated, and each numbered output file will have the time sequence of this distance for the corresponding replica. The trajectory format only needs to be converted for use with external programs- ptraj can directly perform analysis on the modified REMD format and it is not necessary to write out a new trajectory file. This script assumes that ptraj is in your PATH.

create_replica_trajs.x

#!/bin/csh

set i = "1"
while ($i < 8)
set ext=`printf "%3.3i" $i`

ptraj ala10.prmtop << EOF
trajin remd.mdcrd.$ext
trajout remd.reptraj.$ext nobox

distance d1 out dist.rep.$ext :2@CA :8@CA

go

EOF

@ i++
end
wait

Running this script will create the following files:

remd.reptraj.001	remd.reptraj.002	remd.reptraj.003	remd.reptraj.004
remd.reptraj.005	remd.reptraj.006	remd.reptraj.007	remd.reptraj.008
dist.rep.001	dist.rep.002	dist.rep.003	dist.rep.004
dist.rep.005	dist.rep.006	dist.rep.007	dist.rep.008

The output from running the above script should show that ptraj successfully recognized the new REMD format and is processing the data for only the single replica.

INPUT COORDINATE FILES
  File (remd.mdcrd.008) is an AMBER REMD (new format) trajectory, 0 files total (First index is 000),
  only frames from this file will be used (remdtraj not specified),  each file has 500 sets

OUTPUT COORDINATE FILE
  File (remd.reptraj.008) is an AMBER trajectory

ACTIONS
  1>  DISTANCE: between the atoms selections center of mass will be
      saved to array named d1

Processing AMBER REMD trajectory (new format) file remd.mdcrd.008

Set      1 .................................................
Set     50 .................................................
Set    100 .................................................
Set    150 .................................................
Set    200 .................................................
Set    250 .................................................
Set    300 .................................................
Set    350 .................................................
Set    400 .................................................
Set    450 .................................................
Set    500 

PTRAJ: Successfully read in 500 sets and processed 500 sets.

Extracting temperature trajectories

A typical use of REMD is to obtain thermodynamic ensembles at temperatures of interest (e.g. 300 K). Thus the replica-based trajectory files need to be converted to sets of snapshots for each temperature. This is done in ptraj by specifying the first replica trajectory in the trajin command, along with a command that a specific temperature should be analyzed. No matter which temperature is desired, the first trajectory in the sequence should be used for the trajin command. Ptraj will then open all of the replica trajectories and extract only those frames corresponding to the temperature that the user requested. For this to function properly, ptraj currently requires that the files are numbered in sequence with a 3 digit extension (e.g. file.000 to file.007). An example is shown below. Note that �remdtraj� is used to signal ptraj that we want a specific temperature analyzed, and �remdtrajtemp� followed by the actual temperature tells ptraj which temperature to use. Note also that the requested temperature must match one of those found in the REMD lines in the modified format (which were set in the temperatures.dat file described above). In this case we will extract data for 300 K, write out all frames at 300 K to a new trajectory, and also perform a distance analysis. As with the replica trajectories, the trajout command is optional and only needed if one wishes to perform analysis outside of ptraj.

extract_300k_traj.x

#!/bin/csh

ptraj ala10.prmtop << EOF
trajin remd.mdcrd.001 remdtraj remdtrajtemp 300.0
trajout remd.300K.mdcrd nobox

distance d1 out dist.300K.dat :2@CA :8@CA

go
EOF

This time the ptraj output should show that it is reading all of the REMD files and is processing the data for a single replica:

INPUT COORDINATE FILES
  File (remd.mdcrd.001) is an AMBER REMD (new format) trajectory, 8 files total (First index is 001),
  frames at 300.000000 K will be used,  each file has 500 sets

OUTPUT COORDINATE FILE
  File (remd.300K.mdcrd) is an AMBER trajectory

ACTIONS
  1>  DISTANCE: between the atoms selections center of mass will be
      saved to array named d1

Processing AMBER REMD trajectory (new format) files remd.mdcrd.001 -> remd.mdcrd.008
  Opening other REMD files.
    Opening remd.mdcrd.002
    Opening remd.mdcrd.003
    Opening remd.mdcrd.004
    Opening remd.mdcrd.005
    Opening remd.mdcrd.006
    Opening remd.mdcrd.007
    Opening remd.mdcrd.008
  Done.

Set      1 .................................................
Set     50 .................................................
Set    100 .................................................
Set    150 .................................................
Set    200 .................................................
Set    250 .................................................
Set    300 .................................................
Set    350 .................................................
Set    400 .................................................
Set    450 .................................................
Set    500 

PTRAJ: Successfully read in 500 sets and processed 500 sets.

This should produce the following output files: dist.300K.dat, remd.300K.mdcrd.

Feel free to visualize the trajectory file in a program of your choice.

It is also possible to automate this kind of analysis across all temperature. The following scrip will loop over all of the temperatures in the temperature.dat file and extract the temperature trajectories and run analysis:

analyze_all_temps.x

#!/bin/bash

TEMPLIST=`cat temperatures.dat`

for T in $TEMPLIST ; do

ptraj ala10.prmtop <<EOF
trajin remd.mdcrd.001 remdtraj remdtrajtemp $T
trajout remd.Ttraj.$T
go
EOF

ptraj ala10.prmtop <<EOF
trajin remd.Ttraj.$T
distance d1 out dist.$T.dat :2@CA :8@CA
go
EOF

done

This will produce the following files:

remd.Ttraj.269.5	remd.Ttraj.300.0	remd.Ttraj.334.0	remd.Ttraj.371.8
remd.Ttraj.413.9	remd.Ttraj.460.7	remd.Ttraj.512.9	remd.Ttraj.570.9
dist.269.5.dat	dist.300.0.dat	dist.334.0.dat	dist.371.8.dat
dist.413.9.dat	dist.460.7.dat	dist.512.9.dat	dist.570.9.dat

8) Consideration for production calculations

Especially when using supercomputing centers where wallclock time for individual jobs are limited (and charged for), the jobs must be frequently restarted for long simulations. It is important to check the restart files before restarting the REMD run, especially if the job was killed before it was completed. You should check the restart file headers and make sure that a complete set exists- meaning that there is a restart file for each temperature. It can happen that the restart files will not all be from the same time index.

Hopefully this tutorial has provided you with the basics of how to setup, run and postprocess REMD simulations. You are recommended to check the literature for example applications of replica exchange MD.