PMEMD runs on many models of NVIDIA GPUs
GPU accelerated PMEMD has been implemented using CUDA and thus will only run on NVIDIA
GPUs at present although we are working on supporting AMD GPUs. Due to accuracy concerns with
pure single precision the code uses a custom designed hybrid single / double / fixed
precision model termed SPFP. This places the requirement that the GPU hardware supports both
double precision and integer atomics meaning only GPUs with hardware revision 3.0 and later
can be used. Support for hardware revisions 1.3 and 2.0 was present in previous versions of
the code but for code complexity and maintenance reasons has been deprecated as of AMBER 18.
For price and performance reasons, at this time we generally recommend the GeForce cards over
the more expensive Tesla or Quadro variants.
In addition to the general information presented below, Ross Walker has prepared a page with hardware details, about either building
your own GPU machine for Amber use, or for obtaining certified machines
from Exxact Corporation.
At the time of writing the following cards are supported by AMBER 18:
- Hardware Version 7.0 (Volta V100)
- Quadro GV100
- Hardware Version 6.1 (Pascal GP102/104)
- Titan-XP [aka Pascal Titan-X]
- GTX-1080TI / 1080 / 1070 / 1060
- Quadro P6000 / P5000
- P4 / P40
- Hardware Version 6.0 (Pascal P100/DGX-1)
- Quadro GP100 (with optional NVLink)
- P100 12GB / P100 16GB / DGX-1
- Hardware Version 5.0 / 5.5 (Maxwell)
- M4, M40, M60
- GTX970 / 980 / 980Ti
- Quadro cards supporting SM5.0 or 5.5
- Hardware Version 3.0 / 3.5 (Kepler I / Kepler II)
- Tesla K20 / K20X / K40 / K80
- Tesla K10 / K8
- GTX-Titan / GTX-Titan-Black / GTX-Titan-Z
- GTX770 / 780
- GTX670 / 680 / 690
- Quadro cards supporting SM3.0 or 3.5
While we recommend CUDA 9.1 or 9.2 for the best speed of the resulting executables, the
following compilers revisions are the minimum requirements for different tiers of
- Volta (V100 - SM_70) based cards require CUDA 9.0 or later.
- Pascal (GP102/104 - SM_61) based cards (GTX-1080TI / 1080 / 1070 / 1060 and
Titan-XP) require CUDA 8.0 or later.
- GTX-1080 cards require NVIDIA Driver version >= 367.27 for reliable
- GTX-Titan and GTX-780 cards require NVIDIA Driver version >= 319.60 for
correct numerical results.
- GTX-Titan-Black Edition cards require NVIDIA Driver version >= 337.09 or
331.79 or later for correct numerical results.
Other cards not listed here may also be supported as long as they implement Hardware
Revision 3.0, 3.5, 5.0, 5.5, 6.0, 6.1, or 7.0 specifications.
Note that you should ensure that all GPUs on which you plan to run PMEMD are connected to
PCI-E 2.0 x16 lane slots or better. If this is not the case then you will likely see degraded
performance, although this effect is lessened in serial if you write to the mdout or mdcrd
files infrequently (e.g. every 2000 steps or so). Scaling over multiple GPUs within a single
node is not really feasible anymore for PME calculations given the interconnect performance
has not kept pace with improvements in the individual GPU performance. However, it is still
possible to get good multi-GPU scaling for implicit solvent calculations larger than 2500
atoms if all GPUs are in x16 or better slots and can communicate via peer to peer (i.e.
connected to the same physical processor socket).
For a detailed writeup on PCI-E layouts in modern hardware and the variations in peer to
peer support see the following write-up: [Exploring the complexities
of PCI-E connectivity]. It is also possible to run over multiple nodes, although you are
unlikely to see any performance benefit and thus it is not recommended except for loosely
coupled runs such as REMD. The main advantage of AMBER's approach to GPU implementation over
other implementations such as NAMD and Gromacs is that it is possible to run multiple single
GPU runs on a single node with little or no slow down. For example, a node with 4 Titan-XP
[Pascal Titan-X] cards could run 4 individual AMBER DHFR 4fs NVE calculations all at the same
time without slowdown providing an aggregate throughput in excess of 2500ns/day.