Stage 2: QM with and without the time-averaged solvent density
With conformations of the molecule sampled and situated in equilibrated
solvent configurations, the next step is to compute a time-averaged solvent
charge density, or more accurately the time-averaged electrostatic potential
due to the solvent. The latter definition is important when dealing
with systems in which the solvent has counterions--in our present case it is
pure water, and it would be OK to simply include a few hundred snapshots of all
waters whose centers of mass come within perhaps 9Å of any atom of the
glycerol. But, if there were counterions, some of those snapshot regions would
have them and some would not, and there would be strong dipoles in the solvent
charge distribution. What we really need is the electrostatic field due to the
solvent, to understand how the solvent charge distribution polarizes the solute
wave function. We could include a bigger region around the solute, but it
would hardly solve the problem, and the more charges we need to include the
more cumbersome it becomes for the QM program. If we could reach into the MD
simulation, get into the Particle Mesh Ewald, subtract off the contributions of
the solute charges, compute real space interactions and read off the mesh
values for points inside the solute molecular volume, we would have the field
we need, complete with the essential contributions of periodic images. Not so
easy to hack into sander and get that, although the first implementation of
IPolQ did make use of scripts and topology edits to achieve this in a very
tedious manner.
Today, the mdgx program has an extension of its dynamics
facility that collects the electrostatic potential due to the solvent
(everywhere) at selected points throughout the solute volume. It takes the
actual coordinates of solvent molecules in the first solvation shell, then
figures out what more would be needed to replicate the entire solvent potential
and creates layers of charges strewn above the molecular surface to replicate
the field due to infinite periodic images in the context of the quantum
calculation. The layers of charges are fitted by a linear least squares
approach very similar to the one that will be taken in Stage 3 of this
tutorial, but they only replicate the solvent electrostatic potential
inside the solute volume. Outside the solute volume, who knows--but the
solute wave function will never display any density there. In development of
IPolQ, there was some concern that the wave function might mistake the small
charges representing the time-averaged solvent density as "nuclei" onto which
it could relocate, but this was not found to be an issue in practice. If the
user is not reassured, mdgx can omit solvent coordinates
completely and make only shells to recreate the solvent field.
All that must be done in this stage is to prepare an input file for each
conformation of the molecule. The vast majority of the file will be the same
for all conformations, but because we want to run each of them separately and
probably concurrently, the quantum package will need scratch space on the disk
and these working files should be kept in distinct places so that the
calculations will not interfere with one another. An input file like this will
work for a single conformation:
&files
-p Conf<conformation>.top
-c Conf<conformation>.crd
-o ipolq.out
&end
&cntrl
DoSETTLE = 1,
ntpr = 100, nstlim = 100, nfistep = 0,
dt = 0.003,
ntt = 3, tempi = 300.0, temp0 = 300.0, gamma_ln = 3.0,
&end
&ipolq
ntqs 50, nqframe
50, nsteqlim 1000,
qshell1 3.0, qshell2 4.0,
qshell3 5.0,
nqphpt 200,
minqwt 0.1,
solute ':1',
qmlev 'MP2',
basis 'cc-pvDZ',
modq ':WAT & @O' -1.0106,
modq ':WAT & @H1,H2' 0.5053,
verbose 0, checkex 0,
rqminp 1, rqmout 1,
rqmchk 1,
scrdir "/scratch/<username>/ipqGOL<conformation>/",
prepqm "<shell command needed to set up environment for ORCA>"
prepqm "<another command>"
prepqm "<another command>"
qmpath "<ORCA bin directory>/orca",
maxcore = 4096,
uvpath "<ORCA bin directory>/orca_vpot",
unx 21, uny 21, unz 21,
uhx 1.2, uhy 1.2, uhz 1.2,
GridFile 'IPolQgrd<conformation>',
&end
Users will have to make a shell script for themselves that creates this
input file relevant to each of their conformations; there are simply too many
platform-specific details to provide a working example. The platform-specific
details are in fact what the prepqm keyword deals with. This
keyword may be given any number of times, and each input is, in effect, a shell
command. Open a shell to run the ORCA executable on some example
input; if you have to do anything like extend $LD_LIBRARY_PATH or
source someone else's environment before running
ORCA , those commands should go in prepqm inputs.
When including these commands, make sure to use quotes (single or double is
fine). Quotations will tell mdgx that what comes inside is a
single phrase, not multiple keyword values. mdgx will essentially
create its own shell environment in order to run the quantum program, and
communicate with that program via files. It's clunky, but it gets the job
done. For any shell commands that should be executed after the quantum
calculations are done, use the postqm keyword.
The &cntrl namelist is included simply to supply certain
values for output diagnostics (the stuff that sander writes into
mdout ), and to set basic conditions for the thermostat and
timestep. The nfistep keyword is an oddity of mdgx
which can be used to direct the output into multiple sequential files,
something we are not going to do. The various keywords in the
&ipolq namelist can be referenced either in the Amber manual
or by running ${AMBERHOME}/bin/mdgx -IPOLQ . Within the context of
an MD run whose length is determined not by nstlim (the value of
100 is simply put in &cntrl so that mdgx will
allow ntpr = 100 , otherwise the default value of
nstlim is 1, which limits the value of ntpr . The
keywords nsteqlim , ntqs , and nqframe
control the equilibration time that will be taken to make sure the water is
relaxed, the number of steps between snapshots, and the number of snapshots to
be taken for measuring the time-averaged solvent density. The values given in
this tutorial are all ten times smaller than settings I typically run with--the
typical IPolQ run is about 750ps of MD for a pure water solvent with 500
snapshots.
The three qshell commands specify the minimum distances from
any solute atom (that is, the surfaces) at which to locate shell charges which
will carry the spirit of the solvent electrostatics in the simulation. In
earlier IPolQ calculations we used the actual locations of solvent atoms; the
closest approach of water oxygen atoms could be less than 2Å from solute
protons, and the water protons themselves could likewise penetrate pretty close
to solute heavy atoms. These points were not a problem, so far as we could
tell, so the surfaces in this input file will be safe. The nqphpt
value specifies the number of points per solute atom to include in a SASA-like
calculation to put points on each surface (you get nqphpt points
per atom, but only a certain number survive depending on how exposed the atom
is). The setting in this tutorial is pretty reasonable. See the manual for
more information and options to generate this surface, but bear in mind that
the point is just to convey the infinite system's solvent electrostatics in the
context of a quantum calculation: many of the options exist out of an abundance
of caution, but they were not necessary in the end.
Notice the modq settings: these touch on an important aspect of
the IPolQ procedure. One must view the solvent model itself as a reflection of
the IPolQ approximation about the ideal non-polarizable charge set, with a
dipole moment halfway between the gas-phase value and some "hyper-polarized"
state that it could reach in theory (the reasoning is beyond the scope of this
tutorial, but more information can be found in the papers). One must calculate
the dipole moment of the solvent model itself, then subtract the known dipole
moment of the solvent in the gas phase, and find the dipole of some state that
is twice as polarized as the actual solvent model, relative to the gas phase
value. For TIP3P water, the dipole moment is 2.347 Debye, compared to a the
water gas phase dipole of 1.85 Debye, which implies a hyper-polarized dipole of
2.844 Debye that places a charge of -1.0106 proton units on the oxygen and
0.5053 on each hydrogen. The simulation is still running with the standard
water molecule, but the electrostatic field needs to be calculated as if the
water were in some hyper-polarized state. This principle would also apply to
other solvents such as methanol, but not to monatomic ions which do not have
dipole moments.
In addition to the ORCA program itself, there is also a
secondary program needed to evaluate the electrostatic potential of the
converged wave function (orca_vpot ). For Gaussian ,
this auxiliary program is cubegen ; in the case of
ORCA , mdgx is written to launch
orca_vpot and then translate the results into
Gaussian cubegen format. Electrostatic potential grids of
this type are the currency for the final stage of IPolQ fitting. Note that
the number of grid points in this exercise is fairly small, so that it will
run faster: it is not cheap to compute the electrostatic potentials due to
the solved wave function, but more could be done on this aspect of the
mdgx code to calculate potentials for only the points that will
be relevant to fitting later. For real problems, a grid spacing of 0.2Å
and a grid that encloses the entire molecule with 4-5Å buffer space on
all sides is recommended.
These calculations will each take about ten minutes on a single CPU; the
majority of the time will be spent on the QM calculations, but with production
settings the balance would shift to have MD be the major time commitment for
this small molecule. Using a big molecule (200+ electrons) would shift the
balance again, making the QM component by far the most significant cost. The
mdgx output from one of these runs can be found
here. The two grid files for the first
conformation are here (vacuum) and
here (solvated).
To summarize Stage 2:The essential output of this stage is, for each
conformation, a pair of grid files
IPolQgrd<conformation>.vacu and
IPolQgrd<conformation>.solv , containing the electrostatic
potential due to the wave function in vacuo and in solution,
respectively. Assuming that all such calculations run successfully,
conformations 1, 2, 4, 5, 7, and 8 will get done because we decided to omit 3,
6, and 9 in Stage 1. That's all that we need from this stage: a collection of
electrostatic potentials for a range of conformations to take into the final
stage. When trying to do this with big molecules or fickle methods like MP2
with the vital ORCA KeepDens option (compute the MP2-correlation
correction to the density), the calculations can sometimes fail. This is
nearly always an issue with the quantum calculation. It can often be
mitigated by providing more memory (see the mdgx maxcore keyword,
which will cascade down to either of the supported QM packages). If,
ultimately, most calculations succeed and a few fail, it is still possible to
continue to the next stage with what data could be collected.
In practice:In this example, we specified quantum calculations at the
MP2 level. This is really expensive for larger basis sets (which are
appropriate for MP2) due mostly to the need to calculate the electron density,
not just the energy, with the MP2 correlation correction. The cc-pvDZ basis
set is fairly small for the purposes of this tutorial. Another thing to note
is that, for larger molecules, some versions of ORCA can throw
errors when running MP2 on a single processor. We're not sure why this
happens, but running on multiple CPUs seems to fix the problem. If the
parallel version mdgx.MPI is run on multiple processors, it will
launch ORCA and Gaussian with an equal number of processors, so
use mdgx.MPI for MP2-based IPolQ calculations on systems larger
than ~110 electrons (about 200 Daltons). For larger systems, users should opt
for DFT methods (anything that the QM program supports will run), but we have
not done as much testing with these methods.
Proceed to Stage 3,
go back to Stage 1, or return to the
introduction.
|