Setting up a DNA-Ligand System - Section 1

(Note: These tutorials are meant to provide illustrative examples of how to use the AMBER software suite to carry out simulations that can be run on a simple workstation in a reasonable period of time. They do not necessarily provide the optimal choice of parameters or methods for the particular application area.)
Copyright Ross Walker 2008

Setting up a DNA-Ligand System

By: Bryan Leland^a, David Paul^a, Brent Krueger^a & Ross Walker^b
^aDept. of Chemistry, Hope College^* ^bSan Diego Supercomputer Center, University of California, San Diego

SECTION 1

1) Charge Derivation for Non-standard residues

The AMBER force fields are built around the concept of accurate pair-wise charges. The underlying unit of charge is in the form of a residue such as an Amino acid. In AMBER each residue is constrained to have a unit charge (with the exception of certain DNA and RNA terminal residues). This makes the force field charge model transferable since individual residues can be used as building blocks to build larger protein and nucleic acid systems without the need to refit the charges for each model. Key to this approach is that the charge derivation scheme is an integral part of the force field parameterization. In AMBER the method used for charge derivation is called the Restrained Electrostatic Potential (RESP) method. For details and background of the RESP procedure see J. Phys. Chem, 1993, 97, 10269-10280.

The basis of this procedure is a multistage approach that ensures that rotationally degenerate atoms, such as methyl and methylene hydrogens have equivalent charges. An essential component of this procedure is that these pairwise charges are generated from gas phase QM calculations but are designed to reproduce solution phase charge interactions. For the FF94, 96, and 99 force field variants this is achieved by using an HF/6-31G* QM calculation to generate the electrostatic potential. Conveniently the error in the HF/6-31G* calculation is close to the difference between the charge distribution in gas phase and that in solution. The FF03 force fields uses a different but functionally equivalent approach. The main point here is that one should not arbitrarily change the level of QM theory when deriving charges.

The charge derivation is also designed around the concept of a residue in which a single residue is expected to have an integer charge. Ideally residues should be reasonably small so as to reduce the chances of artifacts occurring because the QM calculations or the RESP fits got stuck in high energy minima. Hence in our case we will split the dye and the linker into two separate residues. There are practical reasons for this as well. For example we may want to use a number of different dyes but all with the same linker.

Dye	Linker
Figure 1.1

There are several options for charge fitting in AMBER. This includes the original RESP implementation, Antechamber and R.E.D. For a standalone ligand one would typically use Antechamber (see http://www.ambermd.org/tutorials/basic/tutorial4b/index.php for an example of using Antechamber) however, it is not designed to handle bonded residues and while it can be adapted to do this it still has issues with regards to rotationally degenerate atoms. For most complex systems these days the best option is probably to use R.E.D. since this provides more reproducibility and is designed to automate multiple orientation fits etc and also makes it easier for others to reproduce the charge calculations.

Since our intention here is to reproduce what was used for this specific research project and highlight the actual procedure involved in a RESP fit we will go to the effort of carefully fitting the charges manually using RESP. One should note though that the procedure used here and the choice of options is specific to this research project. These procedures are not a step by step guide to 'standard' charge derivation in the AMBER force fields. This is merely designed to illustrate the typical steps involved. For more information on how the original force fields were fit and procedures being used for future force fields one is encouraged to consult the literature.

1.1) Charge Derivation for the Dye

The charge fit consists of 3 stages. First we need to optimize the geometry, then we need to generate an electrostatic potential and then finally we need to run the RESP procedure. However, before we can do this we need to deal with the bond that would normally link the dye to the linker. This needs to be capped before beginning the geometry optimization. In theory one could use any form of cap, however, it is important to remember that the cap will not be present in the actual simulation and yet the charges on the non-cap atoms must sum to the correct integer charge. For proteins, the cap procedure that was developed as part of the force field design, is to use ACE or NME residues:

ACE	NME
Figure 1.2 Adapted from Cornell et al. JACS 117, 1195, 5179

The above figure shows the two capping groups with the charges that they are assigned in the AMBER FF9X force fields. You should note that these sum to zero charge in each case. We will cap the broken covalent bond of the dye with NME and when we later carry out the RESP procedure we will constrain the NME atoms to have the same charges as they would in the FF9X force field.

i) Optimize Geometry

The first stage is to optimize our capped dye. This needs to be done with a QM method at a reasonably high level of theory. The original AMBER FF94 paper used MP2/6-31G* to optimize the geometry. Here in the interests of speed we will use B3LYP/6-31G* which should be similar to the MP2 result. You can do this optimization in a QM package of your choice. In this case we used Gaussian. The creation of Gaussian input files is beyond the scope of this tutorial but they are provided below for reference. Investigations showed that the dye has two minima that are in different conformations but equivalent in energy. We call these two conformations floF and floB and both are shown below:

floB.pdb

floF.pdb

Figure 1.3
Note the orientation of the Amide group.

To deal with this situation we will do a multiconformational resp fit. The first step is to optimize the two structures.

Here are the Gaussian input files: floB_opt.gin, floF_opt.gin

And here are the output files: floB_opt.gout, floF_opt.gout

We will also generate a pdb of the optimized floB structure for use later in the leap unit building. You can generate pdb's from Gaussian runs using a number of approaches. Newzmat which ships with Gaussian can convert a checkpoint file to a pdb file but it is extremely buggy and often crashes. Another alternative is Gaussview which is made by Gaussian but costs money. A free alternative is WebMO (http://www.webmo.net/) which can also be used to setup and run Gaussian calculations, you can also use Babel. For the purposes of this tutorial the pdb is provided here: floB_opt.pdb.

ii) Calculate Electrostatic Potential

The next stage is to calculate an ESP for each of the two optimized geometries that can ultimately be read by the RESP program. Key here is that we use HF/6-31G* as the level of theory due to its fortuitous cancellation of error as mentioned earlier. The Gaussian route line, for this project, needs to read:

#P HF/6-31G* SCF=Tight Pop=MK IOp(6/33=2,6/41=10,6/42=17)

IOp 6/33=2 makes Gaussian write out the potential points and the potentials - Do not change.
IOp 6/41=10 specifies that 10 concentric layers of points are used for each atom.
IOp 6/42=17 specifies the density points in each layer - 17 gives 2500 points per atom. For larger molecules it may be necessary to reduce this slightly. The maximum number of ESP points that RESP can utilize is 99,999. The value of 17 can be reduced to any integer value.

The input files are generated from the final structure produced during the optimization step. Note that the use of 6/41=10 and 6/42=17 is specific to this project in order to increase the number of grid points that Gaussian produces. For most 'standard' RESP fits these should not be included and just the IOp(6/33=2) is needed.

Note I took the checkpoint file from the optimization and copied them to floB_hf.chk & floF_hf.chk so that the Gaussian input file can just read geom=check without overwriting the original checkpoint file.

Here are the input files: floB_hf.gin, floF_hf.gin

And here are the output files: floB_hf.gout, floF_hf.gout

iii) Convert the Gaussian ESP data into the RESP format

RESP only reads in ESP data in a specific file format. To convert from the Gaussian output format to the RESP input format, we will use the following script:

esp.sh

#!/bin/csh
xlf /usr/local/apps/amber9/src/resp/readit.f
grep "Atomic Center " $1 > a
grep "ESP Fit" $1 > b
grep "Fit    " $1 > c
./a.out 
rm -f a b c a.out readit.o

Note: You will need to modify this for your machine. To use the correct compiler (e.g. g77) and the correct path to readit.f. If you are running this in Cygwin the file name may be called a.exe or a.out.exe so make sure you update the above script appropriately. Also note that readit.f was replaced in AMBER 10 with espgen but this tutorial has not yet been updated for this so if you are using amber10 you will need to download the readit.f file using this link.

In order to run this script we first need to determine how many atoms there are and how many ESP centers there are. This is the Gaussian output file just before the ESP data is produced. E.g.:

Atomic Center    1 is at   7.127640   .120744  -.032564
Atomic Center    2 is at   7.345002  -.099693 -1.079630
...
Atomic Center   42 is at  -4.823702  2.342669  -.413287
ESP Fit Center   43 is at   7.127640   .120744  2.067436
...
ESP Fit Center **** is at  -6.585767  1.848962 -2.848473
ESP Fit Center **** is at  -6.502128  1.613625 -2.848473

So the atom count is easy. This is 42. The ESP Fit center is a little more complicated since there are more than 9999 and Gaussian just prints *'s here. So you can just take the difference between the line numbers (e.g. using a utility such as nl 'nl floB_hf.gout > floB_hf.gout_line_numbers'). In this case for floB we have 94,681 ESP Fit centers and for floF we have 94,998.

We can now run the esp.sh:

>./esp.sh floB_hf.gout
enter natom,nesp:

So we enter 42, 94681.

This should give an esp.dat file which we will rename floB_esp.dat.

Then repeat this for floF and you should end up with two dat files: floB_esp.dat, floF_esp.dat

Finally we need to combine these two dat files since RESP needs all the data for a multi-conformational fit in a single file:

>cat floB_esp.dat floF_esp.dat > floBF_esp.dat

iv) Generate the input files for RESP

To do the RESP fit we will follow the same approach as was used in the original RESP paper by Cornell et al.

Cornell, W.D., Cieplak, P., Bayly, C.I., Kollman, P.A., JACS, 1993, 115, 9620-9631. (pdf)

On normally carries out a RESP fit in two stages. In the first stage all atoms are allowed to vary and then in a second stage all degenerate hydrogen atoms are constrained to have equal charge and refit. However in the case of the dye there are no methyl or methylene groups (other than the NME methyl group, which is already fixed) hence their is no need to carry out the second stage.

We also need to remember here that we want to do a multiconformational RESP fit since we have the two similar states.

Here is the input file for the first stage: floBF-resp.in

You should take a careful look at this file to make sure you understand it. The first stage consists of:

floBF-resp run #1
 &cntrl
 nmol=2,
 ihfree=1,
 qwt=0.0005,
 iqopt=2,
 /

nmol=2 tells RESP that this is a multiconformational fit over two 'molecules' (two conformations of the dye).
ihfree=1 tells RESP we only want the weak restraint on the heavy atoms.
qwt=0.0005 Strength of the restraint in A.U.
iqopt=2 tells RESP to read in initial charges from a qin file in the form of Lagrange constraints. This is so we can fix the NME cap charges.

The next section contains the two molecules:

...

The first value (1.0) is a weighting factor for the multi-molecule fit. In this case we set it to 1.0 for both since we want them weighted equally. The next line (floB) is just an identifier comment. Then we have a line specifying the total charge and the number of atoms (-2 42). Each following line specifies an individual atom listed in the same order as in the QM calculation. The first column is the atomic number and the second column is N.

N can either be -1, 0 or a positive integer. If N=-1 then it tells RESP that this atom's charge should be read from the qin file and constrained to this value. A 0 means that it is free to change during the fit and a positive integer allows you to specify that two atoms have the same charge. This is used in the second stage of the fit to constrain degenerate atoms to have the same charge. E.g. if we wanted atom 11 to have the same charge as atom 10 but for them to vary as a pair then atom 10 would have a 0 and atom 11 would have a 10.

The last section is specific to doing a multiconformational fit.

    2
    1    1    2    1
    2
    1    2    2    2
    2
    1    3    2    3
    2
    1    4    2    4
    2
    1    5    2    5
    2
    1    6    2    6
    2
    1    7    2    7
    2
...

Here you have sets of 2 lines for each atom. The first line just specifies the number of molecules, so in this case it is always 2. The second line has the format:

MOLECULE ATOM MOLECULE ATOM

and is used to specify the equivalences between the molecules. So in this case the first line reads: 1 1 2 1. Which implies that atom 1 of molecule 1 is equivalent to atom 1 of molecule 2. ('yeah we know this seems dumb in this example but it can make sense in some situations.')

Next we need a qin file to provide the charges for the cap.

floBF-resp.qin

 -0.149000  0.097600  0.097600  0.097600 -0.415700  0.271900  0.000000  0.000000
  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
  0.000000  0.000000 -0.149000  0.097600  0.097600  0.097600 -0.415700  0.271900
  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
  0.000000  0.000000  0.000000  0.000000

This has 8 entries per line specifying the charge of each atom in each molecule in the same order as the QM calculation. In this case we set all atoms, with the exception of the NME atoms to have zero initial charge. The NME atoms we set to have the charges from the FF99SB force field (which are the same as the FF99/94 force fields). Note for those entries that are not zero there must be an equivalent -1 in the RESP input file.

v) Run the multicomformational RESP fit for the dye.

>$AMBERHOME/exe/resp -O -i floBF-resp.in -o floBF-resp.out -p floBF-resp.pch
-t floBF-resp.chg -q floBF-resp.qin -e floBF_esp.dat

Input files: floBF-resp.in, floBF-resp.qin, floBF_esp.dat
Output files: floBF-resp.out, floBF-resp.pch, floBF-resp.chg

Inspection of the output file will show the charges for each atom which are as follows:

Figure 1.4

As mentioned above we don't need to do the second stage of the fit as there are no methyl or methylene groups here, other than those on the caps which we have kept fixed throughout. We also do not need to worry about degeneracy in the COO- group since a literature search here reveals that the barrier to rotation here is around 6 kCal/mol implying that they are not necessarily degenerate.

You should make a note of these charges as we will need them for later.

1.2) Charge Derivation for the Linker

We now repeat the process for the linker which is shown in figure 1.1. However, things are slightly trickier here as it is not obvious how to cap the DNA end of the linker. The dye end is easy, we just use ACE but the DNA is difficult. There is also the issue that the linker after the caps have been removed must have a charge of -0.3079. The reason for this is that normally the DNA would be terminated by 5' and 3' residues. In the AMBER FF99 force field standard 'internal' dna (and rna) residues have an integer charge of -1. However, terminal bases are parameterized up as a pair that when added together give an integer charge of -1. A 5' residue has a charge of -0.3079 while a 3' has a charge of -0.6921. In binding the linker to the DNA we will be replacing the original 5' residue with an 'internal' residue. Hence the linker is now effectively the 5' residue and so is responsible for making up the remaining fractional charge. Thus the DNA end cap on the linker needs to be constrained to have a charge of -0.6921. There is no existing residue in the AMBER force field that has this property so we need to 'invent' our own capping group. There are many different options here. Ideally, one would create several different caps and try them all to see how the resulting charge fit varies. Here we present just one possible, but by no means perfect, solution. Figure 1.5 (left) shows our linker capped by ACE on one end and connected to a phosphate and sugar on the other as it will be in our final DNA system. On the right is an approximation to this that we will use in our RESP fits.

Since the linker bonds to a phosphate on the DNA residue, it would seem appropriate to use a phosphate as a cap. This idea is discussed, along with the methyl cap for the phosphate in the manuscript describing the charge derivation for DNA in the AMBER force field:

Cieplak, P., Cornell, W.D., Bayly, C., Kollman, P.A., J.Comp.Chem., 1995, 16(11), 1357-1377 (pdf)


Figure 1.5

We need to constrain the atoms of the cap to sum to -0.6921 but do this in a way that as best as possible retains the charge field that would represent the DNA residue. We begin by performing a RESP fit as we did for the dye-NME construct with the ACE charges fixed to those of the FF99SB force field (see figure 1.2). Charges for the phosphorus, and three oxygen atoms of the capping group (highlighted in blue in figure 1.6 below), are similarly fixed to those of the FF99SB DNA charges (see figure 1.7). The capping group (PO₂-O-CH₃) was then constrained to give a total charge of -0.6921. The methyl group was allowed to be freely fit (within the group constraint) since there is no equivalent in the FF99SB force field set that we could fix them to. Since the ACE group is neutral and the total charge of the entire construct

is -1, our linker will end up with a charge -0.3079 as desired. For comparison RESP fits were performed with no constraints and the resulting charges were similar.


Figure 1.6	Figure 1.7 Adapted from Cornell et al. JACS 117, 1195, 5179

As before we first optimize the geometry and then calculate the electrostatic potential before proceeding with the RESP fit. The files are provided below.

i) Optimize Geometry

Gaussian Input File: ln5_opt.gin
Output Files: ln5_opt.gout

PDB File: ln5_opt.pdb

Copy the ln5_opt.chk produced in the optimization run to ln5_hf.chk before running the HF ESP calculation so that you can use geom=check.

ii) Calculate Electrostatic Potential

Gaussian Input File: ln5_hf.gin
Output Files: ln5_hf.gout

iii) Convert the Gaussian ESP fit into the RESP format

esp.sh

#!/bin/csh
xlf /usr/local/apps/amber9/src/resp/readit.f
grep "Atomic Center " $1 > a
grep "ESP Fit" $1 > b
grep "Fit    " $1 > c
./a.out 
rm -f a b c a.out readit.o

Again, remember to edit this for your machine as necessary.

Determine how many atoms there are and how many ESP fit centers there are: atoms = 35, ESP Fit centers = 79216.

Run the esp.sh:

>./esp.sh ln5_hf.gout
enter natom,nesp:

Enter 35, 79216.

This should give an esp.dat file which we will rename ln5_esp.dat.

Since we are only doing a single conformation fit here we don't need to cat any files together this time.

iv) Generate the input files for RESP

Next we generate the input files required for the RESP fit. These are similar to the those we used for the dye above but contain some subtle changes relating to doing only a single conformational fit and use of a group charge constraint.

Here is the input file for the first stage: linker_resp.in

ln5 RESP run #1
 &cntrl 
 ihfree=1, 
 qwt=0.0005, 
 iqopt=2  
 /

This time we don't specify nmol since we leave it at the default of 1 since we are no longer doing a multiconformational fit.

The next section contains just the one molecule where, as described above, we set the ACE and selected phosphate cap atoms to be fixed (-1) and read from qin file:

We don't need the final section from the previous file since we are not doing a multiconformational fit. However, we do need an additional section which defines the group constraint that was discussed above.

    8  -0.69210
    1   28    1   29    1   30    1   31    1   32    1   33    1   34    1   35

Here, 8 is the number of atoms in the constraint. -0.69210 is the sum of the charges that these atoms must sum to. The second line specifies the molecule number and atom number of each atom involved in the group. Note RESP expects no more than 16 values per line so if you had more than 8 atoms this group description would span multiple lines.

Next we need a qin file to provide the charges for the caps.

linker-resp1.qin

 -0.366200  0.112300  0.112300  0.112300  0.597200 -0.567900  0.000000  0.000000
  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
  0.000000  0.000000  0.000000  1.165900 -0.776100 -0.776100 -0.495400  0.000000
  0.000000  0.000000  0.000000

v) Run Stage 1 of the RESP fit.

>$AMBERHOME/exe/resp -O -i linker_resp.in -o linker_resp.out -p linker_resp.pch
-t linker_resp.chg -q linker-resp1.qin -e ln5_esp.dat

Input files: linker_resp.in, linker-resp1.qin, ln5_esp.dat
Output files: linker_resp.out, linker_resp.pch, linker_resp.chg

First we can take a look at the output file (linker_resp.out) to check that the stage 1 optimized charges both look reasonable, that the constrained charges remained fixed and the group constraint was obeyed.

vi) Generate the input files for RESP

We now need to prepare for the second stage of the RESP fit (linker_resp2.in). Here we will keep all atom charges fixed except methyl and methylene groups. In the case of the linker this is almost the entire molecule, everything except the NH and the O. As well as specifying the additional atoms to be fixed we also make some minor changes to the input file. The first section now contains:

ln5 RESP run #2
 &cntrl 
 ihfree=1, 
 qwt=0.001, 
 iqopt=2  
 /

This is the same as the first run except that we change the weights (qwt) to be 0.001.

The second section of the file is slightly more complicated that the first stage. Here we set everything that is not a methyl or methylene hydrogen / carbon to be fixed (-1) and read from the qin file. The methyl/methylene carbons, and the first hydrogen of each group we set to be zero so that they can freely change. Addition hydrogen's in each group we set to have their charge constrained to that of the first hydrogen. This is accomplished by specifying the atom number of the first hydrogen in the second column for each additional hydrogen. To remain consistent we will also refit the methyl group of our DNA cap but keeping the group charge constraint of -0.69210.

    1.0 
link
   -1   35
    6   -1 
    1   -1
    1   -1 
    1   -1 
    6   -1
    8   -1 
    7   -1 
    1   -1 
    6    0
    1    0
    1   10
    6    0
    1    0
    1   13
    6    0
    1    0
    1   16
    6    0
    1    0
    1   19
    6    0
    1    0
    1   22
    6    0
    1    0
    1   25
    8   -1 
   15   -1 
    8   -1 
    8   -1 
    8   -1
    6    0
    1    0
    1   33
    1   33
    8  -0.69210
    1   28    1   29    1   30    1   31    1   32    1   33    1   34    1   35

vii) Run Stage 2 of the RESP fit.

We can now run the second stage of the RESP fit. Note that we use the linker-resp.chg file generated as output in stage 1 as the qin (-q) file for stage 2.

>$AMBERHOME/exe/resp -O -i linker_resp2.in -o linker_resp2.out -p linker_resp2.pch
-t linker_resp2.chg -q linker_resp.chg -e ln5_esp.dat

Input files: linker_resp2.in, linker_resp.chg, ln5_esp.dat
Output files: linker_resp2.out, linker_resp2.pch, linker_resp2.chg

We can now examine the output file generated from this second stage of RESP fit and hence label all the charges for all the atoms. This is summarized in the figure below although ultimately all we will require are the charges for the linker atoms:

Figure 1.8

We are now ready to proceed to step 2 where we will create the pdb file of the linker and dye combined.

*Funding and computational support for the creation of this tutorial was provided by NSF-CIEG (BDI0726924), NSF-REU, NSF-MRI, HHMI and ACS-PRF.