Stage 2: QM with and without the time-averaged solvent density

With conformations of the molecule sampled and situated in equilibrated solvent configurations, the next step is to compute a time-averaged solvent charge density, or more accurately the time-averaged electrostatic potential due to the solvent. The latter definition is important when dealing with systems in which the solvent has counterions--in our present case it is pure water, and it would be OK to simply include a few hundred snapshots of all waters whose centers of mass come within perhaps 9Å of any atom of the glycerol. But, if there were counterions, some of those snapshot regions would have them and some would not, and there would be strong dipoles in the solvent charge distribution. What we really need is the electrostatic field due to the solvent, to understand how the solvent charge distribution polarizes the solute wave function. We could include a bigger region around the solute, but it would hardly solve the problem, and the more charges we need to include the more cumbersome it becomes for the QM program. If we could reach into the MD simulation, get into the Particle Mesh Ewald, subtract off the contributions of the solute charges, compute real space interactions and read off the mesh values for points inside the solute molecular volume, we would have the field we need, complete with the essential contributions of periodic images. Not so easy to hack into sander and get that, although the first implementation of IPolQ did make use of scripts and topology edits to achieve this in a very tedious manner.

Today, the mdgx program has an extension of its dynamics facility that collects the electrostatic potential due to the solvent (everywhere) at selected points throughout the solute volume. It takes the actual coordinates of solvent molecules in the first solvation shell, then figures out what more would be needed to replicate the entire solvent potential and creates layers of charges strewn above the molecular surface to replicate the field due to infinite periodic images in the context of the quantum calculation. The layers of charges are fitted by a linear least squares approach very similar to the one that will be taken in Stage 3 of this tutorial, but they only replicate the solvent electrostatic potential inside the solute volume. Outside the solute volume, who knows--but the solute wave function will never display any density there. In development of IPolQ, there was some concern that the wave function might mistake the small charges representing the time-averaged solvent density as "nuclei" onto which it could relocate, but this was not found to be an issue in practice. If the user is not reassured, mdgx can omit solvent coordinates completely and make only shells to recreate the solvent field.

All that must be done in this stage is to prepare an input file for each conformation of the molecule. The vast majority of the file will be the same for all conformations, but because we want to run each of them separately and probably concurrently, the quantum package will need scratch space on the disk and these working files should be kept in distinct places so that the calculations will not interfere with one another. An input file like this will work for a single conformation:

&files -p Conf<conformation>.top -c Conf<conformation>.crd -o ipolq.out &end &cntrl DoSETTLE = 1, ntpr = 100, nstlim = 100, nfistep = 0, dt = 0.003, ntt = 3, tempi = 300.0, temp0 = 300.0, gamma_ln = 3.0, &end &ipolq ntqs 50, nqframe 50, nsteqlim 1000, qshell1 3.0, qshell2 4.0, qshell3 5.0, nqphpt 200, minqwt 0.1, solute ':1', qmlev 'MP2', basis 'cc-pvDZ', modq ':WAT & @O' -1.0106, modq ':WAT & @H1,H2' 0.5053, verbose 0, checkex 0, rqminp 1, rqmout 1, rqmchk 1, scrdir "/scratch/<username>/ipqGOL<conformation>/", prepqm "<shell command needed to set up environment for ORCA>" prepqm "<another command>" prepqm "<another command>" qmpath "<ORCA bin directory>/orca", maxcore = 4096, uvpath "<ORCA bin directory>/orca_vpot", unx 21, uny 21, unz 21, uhx 1.2, uhy 1.2, uhz 1.2, GridFile 'IPolQgrd<conformation>', &end

Users will have to make a shell script for themselves that creates this input file relevant to each of their conformations; there are simply too many platform-specific details to provide a working example. The platform-specific details are in fact what the prepqm keyword deals with. This keyword may be given any number of times, and each input is, in effect, a shell command. Open a shell to run the ORCA executable on some example input; if you have to do anything like extend $LD_LIBRARY_PATH or source someone else's environment before running ORCA, those commands should go in prepqm inputs. When including these commands, make sure to use quotes (single or double is fine). Quotations will tell mdgx that what comes inside is a single phrase, not multiple keyword values. mdgx will essentially create its own shell environment in order to run the quantum program, and communicate with that program via files. It's clunky, but it gets the job done. For any shell commands that should be executed after the quantum calculations are done, use the postqm keyword.

The &cntrl namelist is included simply to supply certain values for output diagnostics (the stuff that sander writes into mdout), and to set basic conditions for the thermostat and timestep. The nfistep keyword is an oddity of mdgx which can be used to direct the output into multiple sequential files, something we are not going to do. The various keywords in the &ipolq namelist can be referenced either in the Amber manual or by running ${AMBERHOME}/bin/mdgx -IPOLQ. Within the context of an MD run whose length is determined not by nstlim (the value of 100 is simply put in &cntrl so that mdgx will allow ntpr = 100, otherwise the default value of nstlim is 1, which limits the value of ntpr. The keywords nsteqlim, ntqs, and nqframe control the equilibration time that will be taken to make sure the water is relaxed, the number of steps between snapshots, and the number of snapshots to be taken for measuring the time-averaged solvent density. The values given in this tutorial are all ten times smaller than settings I typically run with--the typical IPolQ run is about 750ps of MD for a pure water solvent with 500 snapshots.

The three qshell commands specify the minimum distances from any solute atom (that is, the surfaces) at which to locate shell charges which will carry the spirit of the solvent electrostatics in the simulation. In earlier IPolQ calculations we used the actual locations of solvent atoms; the closest approach of water oxygen atoms could be less than 2Å from solute protons, and the water protons themselves could likewise penetrate pretty close to solute heavy atoms. These points were not a problem, so far as we could tell, so the surfaces in this input file will be safe. The nqphpt value specifies the number of points per solute atom to include in a SASA-like calculation to put points on each surface (you get nqphpt points per atom, but only a certain number survive depending on how exposed the atom is). The setting in this tutorial is pretty reasonable. See the manual for more information and options to generate this surface, but bear in mind that the point is just to convey the infinite system's solvent electrostatics in the context of a quantum calculation: many of the options exist out of an abundance of caution, but they were not necessary in the end.

Notice the modq settings: these touch on an important aspect of the IPolQ procedure. One must view the solvent model itself as a reflection of the IPolQ approximation about the ideal non-polarizable charge set, with a dipole moment halfway between the gas-phase value and some "hyper-polarized" state that it could reach in theory (the reasoning is beyond the scope of this tutorial, but more information can be found in the papers). One must calculate the dipole moment of the solvent model itself, then subtract the known dipole moment of the solvent in the gas phase, and find the dipole of some state that is twice as polarized as the actual solvent model, relative to the gas phase value. For TIP3P water, the dipole moment is 2.347 Debye, compared to a the water gas phase dipole of 1.85 Debye, which implies a hyper-polarized dipole of 2.844 Debye that places a charge of -1.0106 proton units on the oxygen and 0.5053 on each hydrogen. The simulation is still running with the standard water molecule, but the electrostatic field needs to be calculated as if the water were in some hyper-polarized state. This principle would also apply to other solvents such as methanol, but not to monatomic ions which do not have dipole moments.

In addition to the ORCA program itself, there is also a secondary program needed to evaluate the electrostatic potential of the converged wave function (orca_vpot). For Gaussian, this auxiliary program is cubegen; in the case of ORCA, mdgx is written to launch orca_vpot and then translate the results into Gaussian cubegen format. Electrostatic potential grids of this type are the currency for the final stage of IPolQ fitting. Note that the number of grid points in this exercise is fairly small, so that it will run faster: it is not cheap to compute the electrostatic potentials due to the solved wave function, but more could be done on this aspect of the mdgx code to calculate potentials for only the points that will be relevant to fitting later. For real problems, a grid spacing of 0.2Å and a grid that encloses the entire molecule with 4-5Å buffer space on all sides is recommended.

These calculations will each take about ten minutes on a single CPU; the majority of the time will be spent on the QM calculations, but with production settings the balance would shift to have MD be the major time commitment for this small molecule. Using a big molecule (200+ electrons) would shift the balance again, making the QM component by far the most significant cost. The mdgx output from one of these runs can be found here. The two grid files for the first conformation are here (vacuum) and here (solvated).

To summarize Stage 2:The essential output of this stage is, for each conformation, a pair of grid files IPolQgrd<conformation>.vacu and IPolQgrd<conformation>.solv, containing the electrostatic potential due to the wave function in vacuo and in solution, respectively. Assuming that all such calculations run successfully, conformations 1, 2, 4, 5, 7, and 8 will get done because we decided to omit 3, 6, and 9 in Stage 1. That's all that we need from this stage: a collection of electrostatic potentials for a range of conformations to take into the final stage. When trying to do this with big molecules or fickle methods like MP2 with the vital ORCA KeepDens option (compute the MP2-correlation correction to the density), the calculations can sometimes fail. This is nearly always an issue with the quantum calculation. It can often be mitigated by providing more memory (see the mdgx maxcore keyword, which will cascade down to either of the supported QM packages). If, ultimately, most calculations succeed and a few fail, it is still possible to continue to the next stage with what data could be collected.

In practice:In this example, we specified quantum calculations at the MP2 level. This is really expensive for larger basis sets (which are appropriate for MP2) due mostly to the need to calculate the electron density, not just the energy, with the MP2 correlation correction. The cc-pvDZ basis set is fairly small for the purposes of this tutorial. Another thing to note is that, for larger molecules, some versions of ORCA can throw errors when running MP2 on a single processor. We're not sure why this happens, but running on multiple CPUs seems to fix the problem. If the parallel version mdgx.MPI is run on multiple processors, it will launch ORCA and Gaussian with an equal number of processors, so use mdgx.MPI for MP2-based IPolQ calculations on systems larger than ~110 electrons (about 200 Daltons). For larger systems, users should opt for DFT methods (anything that the QM program supports will run), but we have not done as much testing with these methods.

Proceed to Stage 3, go back to Stage 1, or return to the introduction.

"How's that for maxed out?"

Last modified: Jul 25, 2023