Amber masthead
Filler image AmberTools25 Amber24 Manuals Tutorials Force Fields Contacts History
Filler image

Useful links:

Amber Home
Download Amber
Installation
Amber Citations
GPU Support
Features
Get Started
Benchmarks
Hardware
Logistics
Updates
Mailing Lists
For Educators
File Formats
Contributors
Workshops

GPU Accelerated Features of PMEMD in Amber

The GPU accerlated version of pmemd is implemented in Amber but not in AmberTools. The code supports both explicit solvent PME or IPS simulations in all three canonical ensembles (NVE, NVT and NPT) and implicit solvent Generalized Born simulations. It has been designed to support as many of the standard PMEMD features as possible. For a full list of features, see the pmemd section of the Amber manual.

Some Features

  • Thermodynamic Integration, FEP and MBAR support
  • Metropolis Monte Carlo constant pH
  • All-atom PME continuous constant pH MD
  • Constant redox MD
  • REMD (T, H, pH, redox, coupled redox-pH, multi-dimensional, reservoir)
  • Constant pressure REMD
  • Expanded umbrella sampling support
  • 12-6-4 LJ nonbonded potentials for metal ions
  • Gaussian accelerated molecular dynamics
  • Self-guided Langevin dynamics (SGLD)
  • Middle thermostat scheme
  • Gas phase simulations (through igb=6)
  • External electric fields
  • Support for the Charmm VDW Force switch
  • Semi-Isotropic pressure scaling
  • Enhanced NMR restraints and Rˆ6 averaging support (except NOESY volume restraints)

Alchemical Free Energy Calculations

The free energy methods implemented in Amber GPU code builds on the efficient Amber GPU MD code base (pmemd.cuda). These methods include both thermodynamic integration (TI), free energy perturbation (FEP) and multi-state Bennett’s ratio (MBAR) classes. See the Free Energy Tutorials for specific examples.

  • Input flags to run a TI calculation on a GPU are the same as for the CPU version. Users need to:
    • Set icfe=1 : to enable the free energy calculations
    • Define perturbated regions in timask1, timask2
    • Set ifsc=1 : to utilize the soft-core potentials
    • Define softcore regions in scmask1, scmask2
    • Define the current alchemical progress variable lambda by setting clambda.
    • There is a CPU-version tutorial available and users can run it with the GPU version without any modification in the input.
  • FEP/MBAR: To generate additional output info for subsequent FEP/MBAR analysis:
    • Users first need to define TI input flags as above
    • Enable the FEP/MBAR output: ifmbar=1
    • Define the number of MBAR states in mbar_states, e.g., mbar_states=11
    • Specify the lambda value of each MBAR stat, e.g. mbar_lambda = 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0
    • Define the MBAR output interval, bar_intervall, e.g. bar_intervall=10 means Amber will output MBAR results every 10 MD steps

Replica-Exchange Molecular Dynamics

Amber is capable of performing temperature, Hamiltonian, redox, pH, coupled redox-pH, and reservoir replica exchange simulations on the GPU. Multi-dimensional replica exchange simulations, where two or more conditions are simulated at the same time, are supported as well. The details of input and control variables can be found in Amber manuals. The newly implemented free energy methods in Amber can be performed in conjunction with Hamiltonian replica exchange so that different windows can exchange their conformations. To enable such calculations, users need to:

  • Create input files for all lambda values.
  • Define Hamiltonian replica exchange input flags in each input file
    • numexchg: the number of exchange attempts that will be performed between replica pairs
    • nstlim: the number of MD steps that will be performed between exchange attempts
  • Define the Hamiltonian replica exchange group file. Note that:
    • In the group file, the entries must be sorted according to the lambda values
    • Currently, the number of entries in the group files must be the same as the number of lambda windows.
    • Currently, the number of lambda windows must be a multiple of the available GPUs, e.g., if there are 12 lambda windows, users need to allocate 1, 2, 3, 4, 6, or 12 GPUs, since individual lambda windows cannot span multiple GPUs but one GPU can run multiple windows provided sufficient GPU memory is available.

Constant pH Molecular Dynamics

Constant pH molecular dynamics simulations can run with the Generalized Born implicit solvent model and with explicit solvent as described in the manual and online tutorial.

Features not Supported on GPUs

The following options are NOT supported (after Amber18)

ibelly != 0 Simulations using belly style constraints are not supported.
(igb != 0 & cut < systemsize) GPU accelerated implicit solvent GB simulations do not support a cutoff.
nmropt > 1 Support is not currently available for nmropt > 1. In addition, for nmropt = 1, only features that do not change the underlying force field parameters are supported. For example umbrella sampling restraints are supported as well as simulated annealing functions such as variation of Temp0 with simulation step. However, varying the VDW parameters with step is NOT supported.
nrespa != 1 No multiple time stepping is supported.
vlimit != -1 For performance reasons the vlimit function is not implemented on GPUs.
es_cutoff != vdw_cutoff Independent cutoffs for electrostatics and van der Waals are not supported on GPUs. (Although it may be coming.)
order != 4 A PME interpolation order of 4 is the only option supported. Currently we do not see an advantage in making a tradeoff between mapping work and FFT reduction, nor an advantage in trading direct space electrostatics for reciprocal space work.
imin = 1 (in parallel) Minimization is only supported in the serial GPU code, and it is wise to use the double-precision form of the code at that. Highly strained systems may need to be minimized using the CPU code.
emil_do_calc != 0 Emil is not supported on GPUs.
iemap > 0 EMAP restraints are not supported on GPUs.
icfe > 0 & imin > 0 Minimization is not supported for TI/MBAR on GPUs.

pmemd vs. sander

For the supported functionality, the input required and output produced in PMEMD are intended to replicate sander. The agreement goes as far as the limits of machine roundoff differences for the CPU code, which performs essentially all of its arithmetic in 64-bit precision. Likewise, the GPU code offers a double-precision variant for quality assurance during code testing and after installation, but perfect agreement with CPU results is not guaranteed in cases where the GPU and CPU must generate their own random number sequences with different routines. The production GPU code, which performs most of its arithmetic in 32-bit precision, will necessarily diverge from the CPU code, but maintains a high degree of numerical reproducibility thanks to fixed-precision accumulation of forces and energies. pmemd simply runs more rapidly, scales better in parallel using MPI, can make use of NVIDIA GPUs and Intel Xeon Phis for acceleration, and uses less resident memory than the more general sander engine. Dynamic memory allocation is used so memory configuration is not required. Benchmark data is available on the Amber website, ambermd.org. Given the improvements in performance in both serial and parallel as well as the incredible performance offered by GPU acceleration, it is advisable to always use pmemd in place of sander if the simulation requirements are within the functionality envelope provided by pmemd.

Minor changes in the output after Amber16

There are some minor differences in the output format in Amber versions after Amber16. For example, the Ewald error estimate is NOT calculated when running on a GPU. We have updated the Amber outputs and test cases to reflect this fact--Amber16 and earlier versions printed the CPU-based Ewald error estimate, but this was a meaningless report. The error estimate coming out of the CPU pertains to the error in the spline approximation of the Ewald direct space force and energy, a spline-based approximation to terms based on erfc(). In Amber the GPU also uses a spline-based approximation to obtain the Ewald direct space force between particles but the splines are in fact more accurate than analytic computation in 32-bit floating point arithmetic due to the way we tweak the coefficients when fashioning them on the CPU for use by the kernels. We do not calculate the error due to this process, but rest assured the direct sum tolerance and aliasing effects on the grid are much worse for the numerics than the spline will be.

"How's that for maxed out?"

Last modified: Jun 25, 2025