Amber masthead
Filler image AmberTools23 Amber22 Manuals Tutorials Force Fields Contacts History
Filler image

Useful links:

Amber Home
Download Amber
Installation
Amber Citations
GPU Support
Updates
Mailing Lists
For Educators
File Formats
Contributors
Developing Nonstandard Parameters → Parameter Fitting with mdgx

Stage 3: Iterate.

The iteration is not quite a repeat of Stages 1 and 2, but it's close. The parameters were trained on data that thoroughly covers their space, but we don't know quite how well the parameters will do when manipulating the molecule on their own. Even though they are individually well sampled, they could conspire to create artificial minima. We need to let them guide a new round of conformational search, in the absence of restraints, and then see whether those minima they find remain in agreement with our benchmark MP2 / cc-pvDZ calculations. We first need to generate new topologies incorporating our new .frcmod files. The following tleap input will work for MPD:

source /home/cerutti/amber/dat/leap/cmd/leaprc.gaff
loadAmberParams ../carbofluid.dat
loadAmberPrep MPD_Vac.prepi
x = loadPdb "MPD.pdb"
setBox x vdw 10.0
saveAmberParm x MPD_vac_iter1.top MPD.crd
quit

We can repeat that for glycerol, and then apply the following mdgx input to sample conformations with the new force field:

&files
  -p    MPD_vac_iter1.top
  -c    coords.cdf
  -o    RegenConformers.out
&end

&configs
  verbose 1

  % Cull results at a tight energy convergence criterion
  simEtol   0.01,

  % Controls on the quantum mechanical operations
  qmlev    'MP2',
  basis    'cc-pvDZ',

  % Output controls
  outbase  'NConf', 'NConf'
  write    'pdb', 'orca'
  outsuff  'pdb', 'orca'
&end

A few important things are different here. First, the starting structure is not a single structure, but a trajectory. As a result, we do not specify a value for count. (Even if we did, it would be ignored as there are more than one unique starting structures.) The input for -c is actually like the input for data in the &speval module: it can accept regular expressions and directories as long as those paths lead to files containing valid coordinate sets according to the input topology. Another major difference is that the minimizations are all free: beginning from each of the divergent starting structures, we will optimize wherever the force field wants to go. In the case of these simple molecules, this leads to a lot of convergent results, but there are a number of minima on these energy surfaces and we find more of them the more initial structures we have. To reduce the chance of including redundant structures in the new data set, I have specified the simEtol keyword. The settings above will cull structures if their molecular mechanics energies (according to the input force field, here GAFF, plus our fitted charges and first pass bonded parameters) differ by less than 0.01 kcal/mol. For more complex structures, a position rmsd criterion may also be added, but because of the symmetry in glycerol and the equivalent hydrogens in MPD we are sticking to energy alone.

The results for glycerol, including conformations and QM outputs, and the results for MPD are can be added to our input file for data fitting just as easily:

&files
  -parm    /home/cerutti/AmberReload/dat/leap/parm/gaff.dat
  -d       carbofluid_gen2.dat
  -o       fit.out
&end

&param
  System  Glycerol/Glycerol_vac.top Glycerol/coords.cdf   Glycerol/energies.dat
  System  Glycerol/Glycerol_vac.top Glycerol/crd_gen2.cdf Glycerol/nrg_gen2.dat
  System  MPD/MPD_vac.top           MPD/coords.cdf        MPD/energies.dat
  System  MPD/MPD_vac.top           MPD/crd_gen2.cdf      MPD/nrg_gen2.dat
  ParmOutput frcmod
  eunits     hartree,
  accrep     report.m
  verbose    1,

  % Angle fitting input
  fita       c3 oh ho
  fita       c3 c3 oh
  fita       c3 c3 c3
  FitAnglEq  1,
  arst       0.0002,
  arstcpl    1.0,

  % Torsion fitting input
  fith       c3 c3 c3 c3
  fith       hc c3 c3 oh
  fith       hc c3 c3 c3
  fith       oh c3 c3 oh
  fith       h1 c3 c3 oh
  fith       X  c3 oh X
  fith       c3 c3 oh ho
  hrst       0.0002,
&end

The results may even look a little better. Whereas the first refinement of the force field got us results of 1.86 and 2.18 kcal/mol RMS error for glycerol and MPD, respectively, the combined data set shows 1.83 and 2.15 kcal/mol RMS errors for the two compounds--mostly because the handful of new conformations we've added are much less strained and the parameters are retreading what was already familiar territory. The .frcmod file resulting from this is here. It's generally safe at this point to stop refinements, unless you fixed something catastrophic in the previous refinement (which will almost always stem from poor sampling in certain parameters). I have a plan for bringing the mdgx conformational search code to bear right after each fitting run, before the parameters even get written out: the program will check for undersampled regions of parameter space and then try to drag the system into those regions to test whether they contain minima. (Undersampling will exist in most systems--for instance, dihedrals that span planar groups will not be well sampled for out-of-plane configurations, but it is safe because other factors, including improper dihedrals and van-der Waals effects, destabilize the nonplanar conformations.)

Go on to Part 4

"How's that for maxed out?"

Last modified: Jul 25, 2023