AMBER 12 NVIDIA GPU
ACCELERATION SUPPORT

| Background | Authorship & Support | Supported Features | Supported GPUs |
| Accuracy Considerations | Installation and Testing | Running GPU Accelerated Simulations |
| Considerations for Maximizing GPU Performance | Benchmarks |
| Recommended Hardware & Test Drives |
| Return to Main Amber Page |

Benchmarks

Benchmarks timings by Mike Wu and Ross Walker.

Download AMBER 12 Benchmark Suite

Please Note: The current benchmark timings here are for AMBER 12 up to and including Bugfix.14 (GPU support revision 12.2, Jan 10th 2013). These new benchmarks highlight the new Kepler I (K10/GTX680) and Kepler II (K20) performance patch.

A note on comparing these benchmarks with other codes: We have deliberately avoided comparisons with others codes here since benchmarking tends to be more of a religion than a science. First and foremost the benchmarks we provide here are based on typical research scenarios. We have NOT made any unreasonable (or at the very least questionable) simulation parameter choices purely to produce a better headline performance number. Examples of such practices are running with hydrogen mass repartitioning and 4fs time-steps (or angle constraints and 4fs time steps), multiple time-stepping with PME (such as reciprocal space every 4 or 8 fs), use of single precision throughout the code, or very loose shake tolerances without testing the effect this has on simulation accuracy or the use of time-steps that are right on the bleeding edge of stability (such as 2.5fs with shake). If you want to compare AMBER performance with other codes feel free to. However, you should avoid apples and oranges comparisons. For a true performance comparison you should attempt to run the real world production examples here in the other MD codes using, as best as is supported, the settings used here. This will ensure a fair and, more importantly, scientifically relevant comparison of performance.

Machine Specs

Machine 1
CPU = Dual x 8 Core Intel E5-2660
MPICH2 v1.5 - GNU v4.4.6
GPU = GTX580 (1.5GB) / GTX680 (4.0GB) / K10
nvcc v4.2
NVIDIA Driver Linux 64 - 304.54

Machine 2
CPU = Dual x 8 Core Intel E5-2687W @ 3.10 GHz
Motherboard = SuperMicro X9DR3-F Motherboard
GPU = K10 (2x4GB) / K20 (5GB) / K20X (6GB)
ECC = OFF
nvcc v4.2
NVIDIA Driver Linux 64 - 304.51

Machine 3 (SDSC Gordon)
CPU = Dual x 8 Core Intel E5-2670 @ 2.60GHz
MVAPICH2 v1.8a1p1
Intel Compilers v12.1.0
QDR IB Interconnect

K10 Note: The K10 naming is a little confusing. In these plots we have chosen to refer to K10's as the number of GPUs exposed to the operating system. Thus 2 x K10 is actually a single K10 card and 8 x K10 means 4 K10 cards.

Code Base = AMBER 12 Release + Bugfixes 1 to 14 - GPU code v12.2 (Jan 2013)

Precision Model = SPFP (GPU), Double Precision (CPU)

Benchmarks run with ECC turned OFF on GTX/M2090/K10/K20 cards. If you see approximately 10% less performance than the numbers here then run the following (for each GPU) as root:

nvidia-smi -g 0 --ecc-config=0    (repeat with -g x for each GPU ID)

Segfaults in Parallel: If you find that runs across multiple nodes (i.e. using the infiniband adapter) segfault almost immediately then this is most likely an issue with GPU Direct v2 (CUDA v4.2) not being properly supported by your hardware and driver installations. In most cases setting the following environment variable on all nodes (put it in your .bashrc) will fix the problem:

export CUDA_NIC_INTEROP=1

List of Benchmarks

Implicit Solvent (GB)

  1. TRPCage = 304 atoms
  2. Myoglobin = 2,492 atoms
  3. Nucleosome = 25,095 atoms

Explicit Solvent (PME)

  1. DHFR NVE = 23,558 atoms
  2. DHFR NPT = 23,558 atoms
  3. FactorIX NVE = 90,906 atoms
  4. FactorIX NPT = 90,906 atoms
  5. Cellulose NVE = 408,609 atoms
  6. Cellulose NPT = 408,609 atoms

You can download a tar file containing the input files for all these benchmarks here (50.3 MB).

^

Cuda Zone


Implicit Solvent GB Benchmarks

1) TRPCage = 304 atoms

&cntrl
  imin=0,irest=1,ntx=5,
  nstlim=100000,dt=0.002,ntb=0,
  ntf=2,ntc=2,tol=0.000001,
  ntpr=1000, ntwx=1000, ntwr=50000,
  cut=9999.0, rgbmax=15.0,
  igb=1,ntt=0,nscm=0,
/

 

 

2) Myoglobin = 2492 atoms

&cntrl
  imin=0,irest=1,ntx=5,
  nstlim=10000,dt=0.002,ntb=0,
  ntf=2,ntc=2,tol=0.000001,
  ntpr=1000, ntwx=1000, ntwr=50000,
  cut=9999.0, rgbmax=15.0,
  igb=1,ntt=0,nscm=0,
/

 

 

3) Nucleosome = 25095 atoms

&cntrl
  imin=0,irest=1,ntx=5,
  nstlim=1000,dt=0.002,ntb=0,
  ntf=2,ntc=2,tol=0.000001,
  ntpr=100, ntwx=100, ntwr=50000,
  cut=9999.0, rgbmax=15.0,
  igb=1,ntt=0,nscm=0,
/

 

^


Explicit Solvent PME Benchmarks

1) DHFR NVE = 23,558 atoms

 Typical Production MD NVE with
 GOOD energy conservation.
 &cntrl
   ntx=5, irest=1,
   ntc=2, ntf=2, tol=0.000001,
   nstlim=10000, 
   ntpr=1000, ntwx=1000,
   ntwr=10000, 
   dt=0.002, cut=8.,
   ntt=0, ntb=1, ntp=0,
   ioutfm=1,
 /
 &ewald
  dsum_tol=0.000001,
 /
 

 

2) DHFR NPT = 23,558 atoms

Typical Production MD NPT
 &cntrl
   ntx=5, irest=1,
   ntc=2, ntf=2, 
   nstlim=10000, 
   ntpr=1000, ntwx=1000,
   ntwr=10000, 
   dt=0.002, cut=8.,
   ntt=1, tautp=10.0,
   temp0=300.0,
   ntb=2, ntp=1, taup=10.0,
   ioutfm=1,
 /
 

 

3) FactorIX NVE = 90,906 atoms

 Typical Production MD NVE with
 GOOD energy conservation.
 &cntrl
   ntx=5, irest=1,
   ntc=2, ntf=2, tol=0.000001,
   nstlim=10000, 
   ntpr=1000, ntwx=1000,
   ntwr=10000, 
   dt=0.002, cut=8.,
   ntt=0, ntb=1, ntp=0,
   ioutfm=1,
 /
 &ewald
  dsum_tol=0.000001,nfft1=128,nfft2=64,nfft3=64,
 /
 

 

4) FactorIX NPT = 90,906 atoms

Typical Production MD NVT
&cntrl
 ntx=5, irest=1,
 ntc=2, ntf=2, 
 nstlim=10000, 
 ntpr=1000, ntwx=1000,
 ntwr=10000, 
 dt=0.002, cut=8.,
 ntt=1, tautp=10.0,
 temp0=300.0,
 ntb=2, ntp=1, taup=10.0,
 ioutfm=1,
/
 

 

5) Cellulose NVE = 408,609 atoms

Typical Production MD NVE with
GOOD energy conservation.
 &cntrl
   ntx=5, irest=1,
   ntc=2, ntf=2, tol=0.000001,
   nstlim=10000, 
   ntpr=1000, ntwx=1000,
   ntwr=10000, 
   dt=0.002, cut=8.,
   ntt=0, ntb=1, ntp=0,
   ioutfm=1,
 /
 &ewald
  dsum_tol=0.000001,
 /
 

 

6) Cellulose NPT = 408,609 atoms

Typical Production MD NPT
 &cntrl
  ntx=5, irest=1,
  ntc=2, ntf=2, 
  nstlim=10000, 
  ntpr=1000, ntwx=1000,
  ntwr=10000, 
  dt=0.002, cut=8.,
  ntt=1, tautp=10.0,
  temp0=300.0,
  ntb=2, ntp=1, taup=10.0,
  ioutfm=1,
 /
 

 

^