Amber masthead
Filler image AmberTools24 Amber24 Manuals Tutorials Force Fields Contacts History
Filler image

Useful links:

Amber Home
Download Amber
Installation
Amber Citations
GPU Support
Updates
Mailing Lists
For Educators
File Formats
Contributors
Workshops
Developing Nonstandard Parameters → Deriving Implicitly Polarized Charges in mdgx

Stage 1: The Molecule in Many Guises

The first step to deriving IPolQ charges is to have a set of poses of the molecule immersed in equilibrated baths of the solvent or environment of interest. In the present case, and for most applications in biochemical simulations, the solvent is water, although it could just as easily be methanol, a protein binding site, or more copies of the solute itself. (See the final notes for what happens in the special case of one molecule solvated in its own neat liquid.) The poses of the molecule should be representative of the conformations it will occupy in actual simulations: these may be difficult to know a priori, but intuition, a reasonable guess as to the molecule's parameters, or some combination of the two should be sufficient to create them. There are any number of ways to accomplish this step; in the papers high-temperature MD simlations were used to sample up to twenty conformations of each molecule, reliant on a good initial guess. One advantage of this method was that each solvated pose of the molecule could be described by the same topology file. In the method presented here, we will make more use of intuition and the mdgx &configs module to create new conformations in vacuo, then solvate them one by one. This will likely result in slightly different numbers of solvent molecules in each box, and while there are ways to prevent that from happening, it is of little consequence to our goal.

For glycerol, the most significant differences between conformations will likely be determined by rotation around the O1-C1-C2-O2 dihedral, or equivalently the O2-C2-C3-O3 dihedral. There are two reasonable states, one which puts the oxygens in trans, the other gauche. We can pull the molecule into each of these states for the O1-C1-C2-O2 dihedral, and let the other respond to keep the overall system energy as low as possible. However, the orientations of the polar hydrogens are also important for glycerol's electrostatic field, and in a simulation they can reasonably be assumed to flip around wildly depending on the orientations of nearby water molecules. However, orientations in which the hydroxyl groups of the glycerol point towards one another are not as desirable: if only one hydroxyl group is manipulated, the dipoles should naturally repel to point the other hydroxyls in different directions, but the O-H bonds can likewise line up head-to-tail if there is no solvent to compete for their attention. To keep the hydroxyls from making daisy chains, we can penalize the proximity during the conformation generation.

A starting representation of glycerol was created with standard antechamber commands, starting with the file Glycerol.pdb and the tleap input GenTopology.tleap. For review, the commands are:

antechamber -i Glycerol.pdb -fi pdb -o Glycerol.prepi -fo prepi -c bcc
tleap -f GenTopology.tleap

The details of the prepi file that was produced by antechamber may depend on the particular version of the Generalized Amber Force Field (GAFF) that was referenced. The library file result using Amber16 and GAFF 1.8 is here, and the topology is here. The following mdgx input will sample the conformations we want:

&files
  -p Glycerol.top
  -c Glycerol.crd
  -o GenConformers.out
&end

&configs
  GridSample :1@O1 :1@C1 :1@C2 :1@O2  { -180.0 180.0 }   Krst 32.0 fbhw 10.0
  GridSample :1@C1 :1@C2 :1@O2 :1@HO2 { -180.0 180.0 }   Krst 32.0 fbhw 30.0
  Set        :1@HO1 :1@O1 :1@O2       value 120.0   Krst 32.0 fbhw 60.0
  Set        :1@HO3 :1@O3 :1@O2       value 120.0   Krst 32.0 fbhw 60.0
  combine 1 2
  count 9
  verbose 1

  write   'pdb', 'inpcrd'
  outbase 'Conf', 'Conf'
  outsuff 'pdb', 'crd'
&end

More uses of the powerful &configs module can be found in another tutorial on generating data to fit bonded parameters. In the present context, we have simply used it to create nine conformations of the glycerol, sampling trans and gauche arrangements of the (1,2) vicinal hydroxyls while also sampling the orientations of the central hydroxyl. This was accomplished by the first two "GridSample" lines in the job file (the combine keyword ensures that both degress of freedom are sampled independently--without it the sampling would progress along both ranges in a line). The following two "Set" commands ensure that the hydroxyls on C1 and C3cannot point at the oxygen of the center hydroxyl by adding an energy penalty that begins to ramp up if the H–O • • • O angle ever gets below 60 degrees (it is unpenalized between 60 and 180, thanks to a restraint target at 120 with a flat bottom half width, fbhw, of 60 degrees). In addition to the report file (this would have been written to mdout by default), there are two sets of coordinate output, the same conformations in both .pdb and Amber .inpcrd format. The .pdb files are subsequent inputs for tleap, but they can be useful even to familiar users because their REMARK lines contain the NMR restraints that could be used to generate each configuration by structural energy minimization with sander. Looking at these restraints can be a valuable check to ensure that the various manipulations given to mdgx (GridSample, RandomSample, Set) did what was intended.

Now that we've got these nine conformations, we can throw out some of them for brevity (even in practice, six conformations is a pretty solid number for a molecule like glycerol). By inspecting the restraints in each PDB file, we see that including conformations 1, 2, 4, 5, 7, and 8 will keep two conformations at each of the O1-C1-C2-O2 dihedral positions and provide a good spread of the center hydroxyl orientations. It is important to note that, at least according to the GAFF force field, which should be a pretty good guess, these conformations have already been through conjugate gradient energy minimization in vacuo, so further energy minimization with sander subject to the same restraints won't do anything for us. The next step is, then, to keep these conformations rigid as we solvate them to prepare for the next stage of the IPolQ protocol.

We will use each of the six selected PDB files as inputs to create separate simulation boxes and immerse the conformations in water. Once the systems are set up, we will use tight positional restraints on the coordinates of the glycerol atoms to prevent them from moving as we relax the water and settle out the box volume under NPT. The following shell commands will do the trick:

#!/bin/bash

for CONF in 1 2 4 5 7 8 ; do

  # Make subdirectory then go to it
  mkdir Conf${CONF}/
  mv Conf${CONF}.pdb Conf${CONF}/
  cd Conf${CONF}/

  # Write the tleap input spcific for this case, immerse the glycerol
  echo "source leaprc.gaff" > immerse.tleap
  echo "source leaprc.protein.ff15ipq" >> immerse.tleap
  echo "source leaprc.water.tip3p" >> immerse.tleap
  echo "loadAmberPrep ../Glycerol.prepi" >> immerse.tleap
  echo "x = loadPdb \"Conf${CONF}.pdb\"" >> immerse.tleap
  echo "solvateOct x TIP3PBOX 12.0" >> immerse.tleap
  echo "saveAmberParm x Conf${CONF}.top Conf${CONF}.crd" >> immerse.tleap
  echo "quit" >> immerse.tleap
  tleap -f immerse.tleap > immerse.out

  # Run pmemd to minimize the glycerol + water system
  pmemd -O \
    -i ../min.in \
    -p Conf${CONF}.top \
    -c Conf${CONF}.crd \
    -ref Conf${CONF}.crd \
    -r Conf${CONF}.min \
    -o Conf${CONF}.min.out

  # Run pmemd to equilibrate the water and box volume
  pmemd -O \
    -i ../equil.in \
    -p Conf${CONF}.top \
    -c Conf${CONF}.min \
    -ref Conf${CONF}.crd \
    -r Conf${CONF}.rst \
    -o Conf${CONF}.eq.out

  # Back to the main tutorial directory
  cd ../
done

In the above shell commands, the minimization and equilibration files are nothing special; the key aspects are that ntr = 1, that restraintmask = ':1', and that restraint_wt be set to some significant value (in this case 100.0 kcal/mol-Å2). In practice, the equilibration time may need to go a bit longer for larger molecules that could have more nooks and crannies for water to settle into, but it's nowhere near the equilibration time needed for a flexible protein. The process can be accelerated with parallel pmemd, GPU pmemd, or by simply backgrounding the equilibration step so that the loop will iterate to the next conformation once the minimization is done (just don't have more conformations than CPUs). With the equilibrated systems ready, we can proceed to the next stage of IPolQ.

To summarize Stage 1: we created a handful of intuitively relevant conformations of our molecule, glycerol, using a good guess of a force field to get ourselves started, then immersed these conformations in water and relaxed the solvent structure around them. The route we took is relatively fast and puts full control in the hands of the investigator by using the mdgx &configs module to deliberately sample the important degrees of freedom and prevent uninteresting or unrealistic conformations from gaining representation in the future data set.

In practice: This is where the data set gets mapped out and is the most complex stage of IPolQ. I generally advise that a minimum of three conformations go into electrostatic potential fitting, even if the molecule is rigid (different flexions of a double or triple ring system can happen, and the slight changes between each conformation can help avoid fitting to noise, in any charge fitting calculation). For highly flexible molecules like Arginine dipeptide, I've used up to twenty conformations. The important thing is that each of them be distinct and that they be have some relevance at room temperature.

When improvising: This was but one route to accomplishing this objective and seasoned Amber users are at liberty to improvise as they need. A different approach might be to take a single trajectory of the solvated glycerol and select snapshots as seeds for the next stage of IPolQ. The latter approach might be advantageous if the solvent needed counterions (which can lengthen the equilibration time), and would have the cosmetic advantage of always drawing from a common topology file for the solvated system. However, the user would still have to take care that the solutes in each snapshot be energy-minimized even in the context of the surrounding water, to ensure that excessive strain that may exist instantaneously in a molecular mechanics simulation does not adversely affect the electron density that comes out of the subsequent quantum calculations. Whether the equilibrium bond and angle settings of the MM force field will perturb the electron density in a QM calculation is another important question. For this reason and others, the focus of my efforts in force field development has been to bring consonance between the detailed molecular mechanics and quantum mechanics energy surfaces.


Proceed to Stage 2, or go back.


"How's that for maxed out?"

Last modified: Jul 25, 2023