Amber masthead
Filler image AmberTools18 Amber18 Manuals Tutorials Force Fields Contacts History
Filler image

Useful links:

Amber Home
Download Amber
Installation
News
Amber Citations
GPU Support
Features
Get Started
Benchmarks
Hardware
Logistics
Patches
Intel Support
Updates
Mailing Lists
For Educators
File Formats

PMEMD runs on many models of NVIDIA GPUs

GPU accelerated PMEMD has been implemented using CUDA and thus will only run on NVIDIA GPUs at present although we are working on supporting AMD GPUs. Due to accuracy concerns with pure single precision the code uses a custom designed hybrid single / double / fixed precision model termed SPFP. This places the requirement that the GPU hardware supports both double precision and integer atomics meaning only GPUs with hardware revision 3.0 and later can be used. Support for hardware revisions 1.3 and 2.0 was present in previous versions of the code but for code complexity and maintenance reasons has been deprecated as of AMBER 18. For price and performance reasons, at this time we generally recommend the GeForce cards over the more expensive Tesla or Quadro variants.

In addition to the general information presented below, Ross Walker has prepared a page with hardware details, about either building your own GPU machine for Amber use, or for obtaining certified machines from Exxact Corporation.

At the time of writing the following cards are supported by AMBER 18:

  • Hardware Version 7.0 (Volta V100)
    • Titan-V
    • V100
    • Quadro GV100
  • Hardware Version 6.1 (Pascal GP102/104)
    • Titan-XP [aka Pascal Titan-X]
    • GTX-1080TI / 1080 / 1070 / 1060
    • Quadro P6000 / P5000
    • P4 / P40
  • Hardware Version 6.0 (Pascal P100/DGX-1)
    • Quadro GP100 (with optional NVLink)
    • P100 12GB / P100 16GB / DGX-1
  • Hardware Version 5.0 / 5.5 (Maxwell)
    • M4, M40, M60
    • GTX-Titan-X
    • GTX970 / 980 / 980Ti
    • Quadro cards supporting SM5.0 or 5.5
  • Hardware Version 3.0 / 3.5 (Kepler I / Kepler II)
    • Tesla K20 / K20X / K40 / K80
    • Tesla K10 / K8
    • GTX-Titan / GTX-Titan-Black / GTX-Titan-Z
    • GTX770 / 780
    • GTX670 / 680 / 690
    • Quadro cards supporting SM3.0 or 3.5

While we recommend CUDA 9.1 or 9.2 for the best speed of the resulting executables, the following compilers revisions are the minimum requirements for different tiers of hardware:

  • Volta (V100 - SM_70) based cards require CUDA 9.0 or later.
  • Pascal (GP102/104 - SM_61) based cards (GTX-1080TI / 1080 / 1070 / 1060 and Titan-XP) require CUDA 8.0 or later.
  • GTX-1080 cards require NVIDIA Driver version >= 367.27 for reliable numerical results.
  • GTX-Titan and GTX-780 cards require NVIDIA Driver version >= 319.60 for correct numerical results.
  • GTX-Titan-Black Edition cards require NVIDIA Driver version >= 337.09 or 331.79 or later for correct numerical results.

Other cards not listed here may also be supported as long as they implement Hardware Revision 3.0, 3.5, 5.0, 5.5, 6.0, 6.1, or 7.0 specifications.

Note that you should ensure that all GPUs on which you plan to run PMEMD are connected to PCI-E 2.0 x16 lane slots or better. If this is not the case then you will likely see degraded performance, although this effect is lessened in serial if you write to the mdout or mdcrd files infrequently (e.g. every 2000 steps or so). Scaling over multiple GPUs within a single node is not really feasible anymore for PME calculations given the interconnect performance has not kept pace with improvements in the individual GPU performance. However, it is still possible to get good multi-GPU scaling for implicit solvent calculations larger than 2500 atoms if all GPUs are in x16 or better slots and can communicate via peer to peer (i.e. connected to the same physical processor socket).

For a detailed writeup on PCI-E layouts in modern hardware and the variations in peer to peer support see the following write-up: [Exploring the complexities of PCI-E connectivity]. It is also possible to run over multiple nodes, although you are unlikely to see any performance benefit and thus it is not recommended except for loosely coupled runs such as REMD. The main advantage of AMBER's approach to GPU implementation over other implementations such as NAMD and Gromacs is that it is possible to run multiple single GPU runs on a single node with little or no slow down. For example, a node with 4 Titan-XP [Pascal Titan-X] cards could run 4 individual AMBER DHFR 4fs NVE calculations all at the same time without slowdown providing an aggregate throughput in excess of 2500ns/day.

"Insert clever motto here."

Last modified: Aug 17, 2018