The pmemd.cuda GPU Implementation

This page provides background on running MD simulations in Amber (pmemd) with GPU acceleration. GPU-accelerated pmemd is distributed in Amberversions, generally released even years. Other GPU-accelerated code related to Amber (e.g. cpptraj or pbsa programs) can be found in the chapters on those program in the AmberTools Reference Manual, generally released annually.

The following pages give additional information about the GPU code. Links will persist on the navigation bar to the left when visiting the GPU section of the Amber site.

Update on AMD/HIP support in Amber24 and AmberTools25

We are pleased to announce the availability of support for AMD GPU hardware. Users should understand that this involves a lot of new code, and that we are looking for feedback on any problems that arise. Please start with short test runs, and check the outputs carefully before undertaking long simultions.

This new functionality is included, and no additional patches are needed. But it is not installed by default. See Section 2.2.3 of the Amber 2024 Reference Manual for detailed instructions.

You should probably work in a fresh copy of the amber24_src folder, just to be sure nothing bad happens to your current Amber24 installation. Please report successes and failures to the Amber mailing list, so that your experience helps others.

History

The fastest academic GPU MD simulation engine, pmemd.cuda, is written and maintained by researchers in the Amber community; see literature references below. Principal current and past developers include:

David Cerutti, overseeing major code renovations, performance enhancements, and maintenance of the general MD engine
Taisung Lee, co-author of the thermodynamic integration and free energy feature extensions
Daniel Mermelstein, co-author of the thermodynamic integration and free free energy feature extensions
Charles Lin, co-author of the GPU NMR restraint code, thermodynamic integration and free energy extensions
Perri Needham, co-author of the GPU NMR restraint code
Delaram Goreishi, author of Nudged Elastic Band methods in CUDA and Fortran
Ross Walker, project and QA lead, author of the first CUDA extensions for the original pmemd Fortran program and developer of the mixed precision models

The state of the code is also buoyed by the generous support of Ke Li, Peng Wang, Duncan Poole and Mark Berger (technology engineers and alliance managers) at NVIDIA Corporation, and Andrew Nelson, Nick Chen and Mike Chen at Exxact Corporation.

Since the advent of GPU accelerated simulations in Amber11, the engine has taken on new features, quality control mechanisms, and algorithms. While the inherently parallel GPU architecture does not permit the verbose error checking and reporting that the CPU code contains, we actively monitor user feedback and engage a set of built-in debugging functions to help us understand any issues that arise. Hundreds of labs and companies all over the world use the latest Amber22 GPU simulation engine.

The code supports serial as well as parallel GPU simulations, but from Pascal (2016) onward, the benefit of running a simulation, with the exception of REMD based simulations, on two or more GPUs is marginal. On the latest Voltai and Turing architectures our algorithms cannot scale to multiple GPUs. We therefore recommend executing independent simulations on separate GPUs in most cases. A key design feature of the GPU code is that the entirety of the molecular dynamics calculation is performed on the GPU. This means that only one CPU core is needed to drive a simulation and a server full of four or eight GPUs can run one independent simulation per card without loss of performance provided that there are at least the same number of free CPU cores available as GPUs in use. (Most commodity CPU chips have at least four cores.) The fact that GPU performance is unaffected by CPU performance means that any CPU compiler (the open source GNU C and Fortran compilers are adequate) will deliver comparable results with Amber's premier engine, and sets Amber apart from other molecular dynamics codes. Another key feature of this design choice is that it means low cost CPUs can be used which coupled with custom designed precision models and bitwise reproducibility use to validate consumer cards gives AMBER unrivaled performance per dollar.

References

The initial Amber implementation papers, covering implicit and explicit solvents:

Andreas W. Goetz; Mark J. Williamson; Dong Xu; Duncan Poole; Scott Le Grand; & Ross C. Walker* "Routine microsecond molecular dynamics simulations with AMBER - Part I: Generalized Born", J. Chem. Theory Comput., 2012, 8 (5), pp 1542-1555, DOI: 10.1021/ct200909j

Romelia Salomon-Ferrer; Andreas W. Goetz; Duncan Poole; Scott Le Grand; & Ross C. Walker* "Routine microsecond molecular dynamics simulations with AMBER - Part II: Particle Mesh Ewald", J. Chem. Theory Comput., 2013, 9 (9), pp 3878-3888. DOI: 10.1021/ct400314y

Scott Le Grand; Andreas W. Goetz; & Ross C. Walker* "SPFP: Speed without compromise - a mixed precision model for GPU accelerated molecular dynamics simulations.", Comp. Phys. Comm, 2013, 184, pp374-380, DOI: 10.1016/j.cpc.2012.09.022

Historical thermodynamic integration capabilities are described here:

Tai-Sung Lee, Dan Mermelstein, Charles Lin, Scott LeGrand, Timothy J. Giese, Adrian Roitberg, David A. Case, Ross C. Walker* & Darrin M. York*, "GPU-accelerated molecular dynamics and free energy methods in Amber18: performance enhancements and new features", J. Chem. Inf. Mod., 58:2043-2050 2018. 10.1021/acs.jcim.8b00462

Tai-Sung Lee, Yuan Hu, Brad Sherborne, Zhuyan Guo, & Darrin M. York*, "Toward Fast and Accurate Binding Affinity Prediction with pmemdGTI: An Efficient Implementation of GPU-Accelerated Thermodynamic Integration", J. Chem. Theory Comput., 2017, 13, pp 3077–3084, DOI: 10.1021/acs.jctc.7b00102

Daniel J. Mermelstein, Charles Lin, Gard Nelson, Rachael Kretsch, J. Andrew McCammon, & Ross C. Walker*, "Fast, Flexible and Efficient GPU Accelerated Binding Free Energy Calculations within the AMBER Molecular Dynamics Package", J. Comp. Chem., 2018, DOI: 10.1002/jcc.25187

Amber20 SYCL version for Intel GPU Max Series

(Limited feature release patch created on February 5, 2024)

We are pleased to announce the release of a SYCL version of Amber20 pmemd for Intel Data Center GPU Max Series. This is a limited feature release that enables PME simulations with AMBER and CHARMM force fields, thermostats, barostats, and most NMR type restraints. Additional features are under development.

The code has been tested using Intel oneAPI 2024.0 on Intel Max Series Data Center GPUs (GPU Max 1100 and GPU Max 1550). The SYCL version involves a lot of new code, and we are looking for feedback on any issues that arise. Please start with short test runs and check the outputs carefully before undertaking long simulations.

The SYCL version can be used if you already have Amber20 and AmberTools21. In order to use the SYCL version follow these steps:

Download https://drive.google.com/file/d/1O7AJ3hBkqoMh6zIMYiHTIro6ijZ54UBM/view?usp=share_link
Confirm the SHA256 sum of the tarball: 7dc1d29fdd071a91696fa8a6fc42fcf0ab1fa523f13f3a610b5981bc1205bd8a
Untar a fresh copy of AmberTools21 and Amber20.
Navigate to your amber20_src folder.
Untar the SYCL patch file you downloaded
Follow the instructions in the file README_sycl.md to build and test the Amber20 SYCL version.

Feature completeness with respect to the CUDA implementation of pmemd and upgrade to the current Amber version are under development. Future SYCL code releases will also focus on support for Intel Arc series GPUs and portability across Intel, AMD, and Nvidia hardware.

Please send feedback to Andy Goetz (agoetz -at- sdsc.edu) and Guoquan Chen (guoquan.chen -at- intel.com) and the Amber mailing list. Thank you!

GPU Accelerated Code Support

Update on AMD/HIP support in Amber24 and AmberTools25

History

References

Amber20 SYCL version for Intel GPU Max Series

(Limited feature release patch created on February 5, 2024)

Archived Pages