Stage 3: Iterate.
The iteration is not quite a repeat of Stages 1 and 2, but it's close. The
parameters were trained on data that thoroughly covers their space, but we
don't know quite how well the parameters will do when manipulating the molecule
on their own. Even though they are individually well sampled, they could
conspire to create artificial minima. We need to let them guide a new round of
conformational search, in the absence of restraints, and then see whether those
minima they find remain in agreement with our benchmark MP2 / cc-pvDZ
calculations. We first need to generate new topologies incorporating our new
.frcmod files. The following tleap input will work for MPD:
source /home/cerutti/amber/dat/leap/cmd/leaprc.gaff
loadAmberParams ../carbofluid.dat
loadAmberPrep MPD_Vac.prepi
x = loadPdb "MPD.pdb"
setBox x vdw 10.0
saveAmberParm x MPD_vac_iter1.top MPD.crd
quit
We can repeat that for glycerol, and then apply the following
mdgx input to sample conformations with the new force field:
&files
-p MPD_vac_iter1.top
-c coords.cdf
-o RegenConformers.out
&end
&configs
verbose 1
% Cull results at a tight energy convergence criterion
simEtol 0.01,
% Controls on the quantum mechanical operations
qmlev 'MP2',
basis 'cc-pvDZ',
% Output controls
outbase 'NConf', 'NConf'
write 'pdb', 'orca'
outsuff 'pdb', 'orca'
&end
A few important things are different here. First, the starting structure is
not a single structure, but a trajectory. As a result, we do not specify a
value for count . (Even if we did, it would be ignored as there are
more than one unique starting structures.) The input for -c is actually like
the input for data in the &speval module: it can
accept regular expressions and directories as long as those paths lead to files
containing valid coordinate sets according to the input topology. Another
major difference is that the minimizations are all free: beginning from each
of the divergent starting structures, we will optimize wherever the force field
wants to go. In the case of these simple molecules, this leads to a lot of
convergent results, but there are a number of minima on these energy surfaces
and we find more of them the more initial structures we have. To reduce the
chance of including redundant structures in the new data set, I have specified
the simEtol keyword. The settings above will cull structures if
their molecular mechanics energies (according to the input force field, here
GAFF, plus our fitted charges and first pass bonded parameters) differ by less
than 0.01 kcal/mol. For more complex structures, a position rmsd criterion
may also be added, but because of the symmetry in glycerol and the equivalent
hydrogens in MPD we are sticking to energy alone.
The results for glycerol, including conformations and QM outputs,
and the results for MPD
are can be added to our
input file for data fitting just as easily:
&files
-parm /home/cerutti/AmberReload/dat/leap/parm/gaff.dat
-d carbofluid_gen2.dat
-o fit.out
&end
¶m
System Glycerol/Glycerol_vac.top Glycerol/coords.cdf
Glycerol/energies.dat
System Glycerol/Glycerol_vac.top Glycerol/crd_gen2.cdf
Glycerol/nrg_gen2.dat
System MPD/MPD_vac.top
MPD/coords.cdf MPD/energies.dat
System MPD/MPD_vac.top
MPD/crd_gen2.cdf MPD/nrg_gen2.dat
ParmOutput frcmod
eunits hartree,
accrep report.m
verbose 1,
% Angle fitting input
fita c3 oh ho
fita c3 c3 oh
fita c3 c3 c3
FitAnglEq 1,
arst 0.0002,
arstcpl 1.0,
% Torsion fitting input
fith c3 c3 c3 c3
fith hc c3 c3 oh
fith hc c3 c3 c3
fith oh c3 c3 oh
fith h1 c3 c3 oh
fith X c3 oh X
fith c3 c3 oh ho
hrst 0.0002,
&end
The results may even look a little better. Whereas the first refinement of
the force field got us results of 1.86 and 2.18 kcal/mol RMS error for
glycerol and MPD, respectively, the combined data set shows 1.83 and 2.15
kcal/mol RMS errors for the two compounds--mostly because the handful of new
conformations we've added are much less strained and the parameters are
retreading what was already familiar territory. The .frcmod file
resulting from this is here. It's
generally safe at this point to stop refinements, unless you fixed something
catastrophic in the previous refinement (which will almost always stem from
poor sampling in certain parameters). I have a plan for bringing the
mdgx conformational search code to bear right after each fitting
run, before the parameters even get written out: the program will check for
undersampled regions of parameter space and then try to drag the system into
those regions to test whether they contain minima. (Undersampling will
exist in most systems--for instance, dihedrals that span planar groups will
not be well sampled for out-of-plane configurations, but it is safe because
other factors, including improper dihedrals and van-der Waals effects,
destabilize the nonplanar conformations.)
Go on to Part 4
|