Stage 3: Fitting charges

The Restrained Electrostatic Potential (REsP) fitting approach (see here for a reference of particular relevance) is an established means for placing partial charges on atoms at the right intensities to mimic the electrostatic field outside of the molecular surface. One minor difference between the fitting method in mdgx and REsP is the use of harmonic, rather than hyperbolic, restraints to hold charges to small values in the absence of strong data to the contrary. (This is not because I don't like hyperbolic restraints--on the contrary, I plan to implement them soon.) More importantly, unless explicitly told to use restraints, mdgx is really just doing ElectroStatic Potential fitting, or ESP--in theory, with a very large data set, no restraints are needed. Both mdgx and IPolQ are designed to function with bigger data sets that new computers make possible.

Another significant thing that the IPolQ procedure changes is the target: rather than the common HF/6-31G* calculation, IPolQ takes higher-level QM calculations with inclusion of a time-averaged solvent density, and fits two sets of charges simultaneously with weak coupling between the two sets. REsP isn't linked to HF/6-31G*, but the extension for simultaneously fitting two charge sets is novel. (The motivation for this approach is described in the publications: briefly, these methods are prone to under-determined solutions, particularly when some atoms shield others deep within the molecular surface, and we want to select two sets of charges that make similar choices for under-determined parts of the system.)

Recall that in the previous stage we computed grids IPolQgrd<conformation>.vacu and IPolQgrd<conformation>.solv for six conformations. We will now fit charges for the molecule. The following mdgx &fitq input will fit charges, and like the &ipolq namelist in Stage 2 there are lots of parameter options to address details of a procedure that is conceptually simple. The basic question is "what set of atomic partial charges, if placed at the coordinates of atoms in the molecule, will best mimic the electrostatic potential that the molecule projects beyond its surface?" A simple least squares fitting procedure will give us charges, but for buried atoms the partial charges could come out quite large. We therefore need some mechanism of restraining the charges, which I touched on in the previous paragraph. Aside from that, there's an issue of what region of space we consider important: far away from the molecule, there'll be no trouble getting the potential right as long as the overall dipole is correct, but very close to the nucleus of any solute atoms a point monopole will do a very bad job of getting the electrostatics of the distributed quantum electron density. Our solution is to select points where solvent molecules could reasonably be expected to explore in a simulation, as judged by the van-der Waals clashes that an oxygen probe atom, with a tiny hydrogen attached, could approach until incurring some modest energy penalty. Even if the oxygen "probe" can only get so far, the attached hydrogen might be able to go about an Ångstrom further in, so points that could be reached by the hydrogen "probe arm" should be counted as well. Also important: don't take a big clump of points from just one area, spread them out. That's what the following input says:

&files -p Glycerol.top -o fit.out &end &fitq ipolq Conf1/IPolQgrd1.vacu Conf1/IPolQgrd1.solv Glycerol.top 1.0 ipolq Conf2/IPolQgrd2.vacu Conf2/IPolQgrd2.solv Glycerol.top 1.0 ipolq Conf4/IPolQgrd4.vacu Conf4/IPolQgrd4.solv Glycerol.top 1.0 ipolq Conf5/IPolQgrd5.vacu Conf5/IPolQgrd5.solv Glycerol.top 1.0 ipolq Conf7/IPolQgrd7.vacu Conf7/IPolQgrd7.solv Glycerol.top 1.0 ipolq Conf8/IPolQgrd8.vacu Conf8/IPolQgrd8.solv Glycerol.top 1.0 % General conditions for the REsP pnrg 0.0 flim 0.39 nfpt 3750 MaxMemory 4GB % Equivalencies on atoms go here equalq '@C1,C3' equalq '@O1,O3' equalq '@H11,H12,H31,H32' equalq '@HO1,HO3' minq '@C2' &end

The biggest block of input here is the section of ipolq lines. These provide the names of each grid for a particular conformation of the Glycerol.top system (note that the topology of the solute alone is given). The number at the end of each line is the weight assigned to each conformation. Use higher values to make one conformation more important in the fitting. Note that the coordinates are read from the grid files, and must match the order of atoms in the topology file.

In the input above, the keyword pnrg ("probe energy") tells mdgx to include points if they are accessible within 0.9572Å of points at which a TIP3P oxygen could be placed without incurring any van-der Waals penalty (the TIP3P-related values are defaults, but can be changed if the user wants--see the keywords parm / ProbeArm, psig / ProbeSig, and peps / ProbeEps). The keyword flim ("fitting limit") tells mdgx to search through each electrostatic potential grid for viable points but exclude new points from fitting if they are closer than 0.39Å to any of those already selected (this will not be an issue with the coarse grids we are using in this tutorial, but would demand that points be at least two spacings apart in a calculation with 0.2Å grids). The MaxMemory keyword in this namelist should not be confused with the eponymous keyword in the &ipolq namelist: here, it tells mdgx itself how much memory it can allocate for its matrix equations when solving the IPolQ least-squares problems.

Another important aspect of the input is the equalq keyword. This keyword may be given as many times as is necessary, each instance specifying a group of partial charges that should have the same value. mdgx will treat all partial charges named in each ambmask string (see the Amber manual for a description of the format) as a single variable in its fitting problem. The minq keyword is what applies harmonic restraints to atoms (again, in the absence of minq, mdgx is merely doing ElectroStatic Potential fitting).

The collection of grid files for this exercise can be found here. The output generated by mdgx is here. In section 2.) of that output the resulting charges are listed. The charges in the rightmost column, IPolQ(pert), are the ones we want (IPolQ charges expressed as a perturbation of the vacuum phase charges). Currently, these should be copied manually into the original library file Glycerol.prepi, a result which can be found here, although there is a plan to have mdgx print its own updated library files to save this tedium. In addition to the new file with the IPolQ charges, we may want to fit bonded parameters to vacuum phase data, in which case we would want to make a second library file containing the vacuum charges found in the second column of the mdgx output. In the final stage, we'll discuss iterations of the IPolQ procedure to get converged charge sets, but this set of vacuum phase charges will not change.

In practice: Let's consider the case of fitting charges for two molecules that should share the same charges on particular atoms. In this case we would include more ipolq lines with each system (separate topology files) and have more elaborate ambmask strings to specify atoms in equalq masks spanning both systems. This can work for any number of molecules, as long as your computer has enough memory to handle the big matrices it might generate. By default, mdgx will check each system's topology to know what total charge it needs to place on each system, but there are options to override that, too. Users can run $AMBERHOME/bin/mdgx -FITQ for a full list of the options in charge fitting.

Proceed to Stage 4, go back to Stage 2, or return to the introduction.

"How's that for maxed out?"

Last modified: Jul 25, 2023