Stage 1: The Molecule in Many Guises
The first step to deriving IPolQ charges is to have a set of poses of the
molecule immersed in equilibrated baths of the solvent or environment of
interest. In the present case, and for most applications in biochemical
simulations, the solvent is water, although it could just as easily be
methanol, a protein binding site, or more copies of the solute itself. (See
the final notes for what happens in the special case of one molecule solvated
in its own neat liquid.) The poses of the molecule should be representative of
the conformations it will occupy in actual simulations: these may be difficult
to know a priori, but intuition, a reasonable guess as to the
molecule's parameters, or some combination of the two should be sufficient to
create them. There are any number of ways to accomplish this step; in the
papers high-temperature MD simlations were used to sample up to twenty
conformations of each molecule, reliant on a good initial guess. One advantage
of this method was that each solvated pose of the molecule could be described
by the same topology file. In the method presented here, we will make more use
of intuition and the mdgx &configs module to create new
conformations in vacuo, then solvate them one by one. This will likely result
in slightly different numbers of solvent molecules in each box, and while there
are ways to prevent that from happening, it is of little consequence to our
goal.
For glycerol, the most significant differences between conformations will
likely be determined by rotation around the O1-C1-C2-O2 dihedral, or
equivalently the O2-C2-C3-O3 dihedral. There are two reasonable states, one
which puts the oxygens in trans, the other gauche. We can pull
the molecule into each of these states for the O1-C1-C2-O2 dihedral, and let
the other respond to keep the overall system energy as low as possible.
However, the orientations of the polar hydrogens are also important for
glycerol's electrostatic field, and in a simulation they can reasonably be
assumed to flip around wildly depending on the orientations of nearby water
molecules. However, orientations in which the hydroxyl groups of the glycerol
point towards one another are not as desirable: if only one hydroxyl group is
manipulated, the dipoles should naturally repel to point the other hydroxyls in
different directions, but the O-H bonds can likewise line up head-to-tail if
there is no solvent to compete for their attention. To keep the hydroxyls from
making daisy chains, we can penalize the proximity during the conformation
generation.
A starting representation of glycerol was created with standard antechamber
commands, starting with the file
Glycerol.pdb and the tleap input
GenTopology.tleap. For review, the
commands are:
antechamber -i Glycerol.pdb -fi pdb -o Glycerol.prepi -fo prepi -c bcc
tleap -f GenTopology.tleap
The details of the prepi file that was produced by antechamber
may depend on the particular version of the Generalized Amber Force Field
(GAFF) that was referenced. The library file result using Amber16 and GAFF 1.8
is here, and the topology is
here. The following mdgx
input will sample the conformations we want:
&files
-p Glycerol.top
-c Glycerol.crd
-o GenConformers.out
&end
&configs
GridSample :1@O1 :1@C1 :1@C2 :1@O2 { -180.0 180.0 }
Krst 32.0 fbhw 10.0
GridSample :1@C1 :1@C2 :1@O2 :1@HO2 { -180.0 180.0 }
Krst 32.0 fbhw 30.0
Set :1@HO1 :1@O1 :1@O2
value 120.0
Krst 32.0 fbhw 60.0
Set :1@HO3 :1@O3 :1@O2
value 120.0
Krst 32.0 fbhw 60.0
combine 1 2
count 9
verbose 1
write 'pdb', 'inpcrd'
outbase 'Conf', 'Conf'
outsuff 'pdb', 'crd'
&end
More uses of the powerful &configs module can be found in another
tutorial on generating data to fit bonded parameters. In the present context,
we have simply used it to create nine conformations of the glycerol, sampling
trans and gauche arrangements of the (1,2) vicinal hydroxyls
while also sampling the orientations of the central hydroxyl. This was
accomplished by the first two "GridSample" lines in the job file (the
combine keyword ensures that both degress of freedom are sampled
independently--without it the sampling would progress along both ranges in a
line). The following two "Set" commands ensure that the hydroxyls on C1 and C3cannot point at the oxygen of the center hydroxyl by adding an energy penalty
that begins to ramp up if the
H–O • • • O angle ever gets below
60 degrees (it is unpenalized between 60 and 180, thanks to a restraint target
at 120 with a flat bottom half width, fbhw, of 60 degrees). In addition to the
report file (this would have been written to
mdout by default), there are two sets of coordinate output, the same
conformations in both .pdb and Amber
.inpcrd format. The .pdb files are subsequent
inputs for tleap, but they can be useful even to familiar users because their
REMARK lines contain the NMR restraints that could be used
to generate each configuration by structural energy minimization with
sander . Looking at these restraints can be a valuable check to
ensure that the various manipulations given to mdgx
(GridSample , RandomSample , Set ) did what
was intended.
Now that we've got these nine conformations, we can throw out some of them
for brevity (even in practice, six conformations is a pretty solid number for a
molecule like glycerol). By inspecting the restraints in each PDB file, we see
that including conformations 1, 2, 4, 5, 7, and 8 will keep two conformations
at each of the O1-C1-C2-O2 dihedral positions and provide a good spread of the
center hydroxyl orientations. It is important to note that, at least according
to the GAFF force field, which should be a pretty good guess, these
conformations have already been through conjugate gradient energy minimization
in vacuo, so further energy minimization with sander
subject to the same restraints won't do anything for us. The next step is,
then, to keep these conformations rigid as we solvate them to prepare for the
next stage of the IPolQ protocol.
We will use each of the six selected PDB files as inputs to create separate
simulation boxes and immerse the conformations in water. Once the systems are
set up, we will use tight positional restraints on the coordinates of the
glycerol atoms to prevent them from moving as we relax the water and settle
out the box volume under NPT. The following shell commands will do the trick:
#!/bin/bash
for CONF in 1 2 4 5 7 8 ; do
# Make subdirectory then go to it
mkdir Conf${CONF}/
mv Conf${CONF}.pdb Conf${CONF}/
cd Conf${CONF}/
# Write the tleap input spcific for this case, immerse the glycerol
echo "source leaprc.gaff" > immerse.tleap
echo "source leaprc.protein.ff15ipq" >> immerse.tleap
echo "source leaprc.water.tip3p" >> immerse.tleap
echo "loadAmberPrep ../Glycerol.prepi" >> immerse.tleap
echo "x = loadPdb \"Conf${CONF}.pdb\"" >> immerse.tleap
echo "solvateOct x TIP3PBOX 12.0" >> immerse.tleap
echo "saveAmberParm x Conf${CONF}.top Conf${CONF}.crd" >> immerse.tleap
echo "quit" >> immerse.tleap
tleap -f immerse.tleap > immerse.out
# Run pmemd to minimize the glycerol + water system
pmemd -O \
-i ../min.in \
-p Conf${CONF}.top \
-c Conf${CONF}.crd \
-ref Conf${CONF}.crd \
-r Conf${CONF}.min \
-o Conf${CONF}.min.out
# Run pmemd to equilibrate the water and box volume
pmemd -O \
-i ../equil.in \
-p Conf${CONF}.top \
-c Conf${CONF}.min \
-ref Conf${CONF}.crd \
-r Conf${CONF}.rst \
-o Conf${CONF}.eq.out
# Back to the main tutorial directory
cd ../
done
In the above shell commands, the minimization and
equilibration files are nothing special; the key aspects
are that ntr = 1 , that restraintmask = ':1' , and that
restraint_wt be set to some significant value (in this case 100.0
kcal/mol-Å2). In practice, the equilibration time may need to
go a bit longer for larger molecules that could have more nooks and crannies
for water to settle into, but it's nowhere near the equilibration time needed
for a flexible protein. The process can be accelerated with parallel pmemd,
GPU pmemd, or by simply backgrounding the equilibration step so that the loop
will iterate to the next conformation once the minimization is done (just don't
have more conformations than CPUs). With the equilibrated systems ready, we
can proceed to the next stage of IPolQ.
To summarize Stage 1: we created a handful of intuitively relevant
conformations of our molecule, glycerol, using a good guess of a force field to
get ourselves started, then immersed these conformations in water and relaxed
the solvent structure around them. The route we took is relatively fast and
puts full control in the hands of the investigator by using the
mdgx &configs module to deliberately sample the important
degrees of freedom and prevent uninteresting or unrealistic conformations from
gaining representation in the future data set.
In practice: This is where the data set gets mapped out and is the
most complex stage of IPolQ. I generally advise that a minimum of three
conformations go into electrostatic potential fitting, even if the molecule is
rigid (different flexions of a double or triple ring system can happen, and the
slight changes between each conformation can help avoid fitting to noise, in
any charge fitting calculation). For highly flexible molecules like Arginine
dipeptide, I've used up to twenty conformations. The important thing is that
each of them be distinct and that they be have some relevance at room
temperature.
When improvising: This was but one route to accomplishing this
objective and seasoned Amber users are at liberty to improvise as they need. A
different approach might be to take a single trajectory of the solvated
glycerol and select snapshots as seeds for the next stage of IPolQ. The latter
approach might be advantageous if the solvent needed counterions (which can
lengthen the equilibration time), and would have the cosmetic advantage of
always drawing from a common topology file for the solvated system. However,
the user would still have to take care that the solutes in each snapshot be
energy-minimized even in the context of the surrounding water, to ensure that
excessive strain that may exist instantaneously in a molecular mechanics
simulation does not adversely affect the electron density that comes out of the
subsequent quantum calculations. Whether the equilibrium bond and angle
settings of the MM force field will perturb the electron density in a QM
calculation is another important question. For this reason and others, the
focus of my efforts in force field development has been to bring consonance
between the detailed molecular mechanics and quantum mechanics energy surfaces.
Proceed to Stage 2, or
go back.
|