AMBER 11 NVIDIA GPU
ACCELERATION SUPPORT

| Background | Authorship & Support | Supported Features | Supported GPUs |
| Accuracy Considerations | Installation and Testing | Running GPU Accelerated Simulations |
| Considerations for Maximizing GPU Performance | Benchmarks |
| Return to Main Amber Page |

NEWS
Major Update Released 18th 2011 (v2.2 [Amber 11 Bugfix.17])
Doubles Performance for PME Calculations on GPUs.

Benchmarks

Benchmarks timings by Mike Wu and Ross Walker.

Machine Specs

Machine 1
CPU = Dual x Quad Core Intel E5462 2.80 GHz
Motherboard = SuperMicro X7DWA-N Motherboard
MPICH2.0 - 1.2.1p1
MKL 10.1.1.019
Ifort 11.1.069
GPU = Tesla C1060 /Tesla C2050 / GTX295
nvcc v3.2
NVIDIA Driver Linux 64 - 260.19.14

Machine 2
CPU = Dual x Hex Core Intel X5670 2.93 GHz
Motherboard = SuperMicro X8DTG-DF
MVAPICH2.0 - 1.5 (With GPU Direct v1.0)
MKL 10.1.1.019
Ifort 11.1.069
GPU = GTX295 (768MB) / GTX580 (1.5GB) / Tesla C(M)2070 (6.0 GB) / Tesla M2090 (6.0 GB)
nvcc v3.2
NVIDIA Driver Linux 64 - 256.47
(Configuration has 1 GPU per node + 1 Mellanox QDR IB in x16 slot or 2 GPU per node + 1 Mellanox QDR IB in x4 slot)

Code Base = AMBER 11 Release + Bugfixes 1 to 17 - GPU code v2.2 (Aug 18th 2011)

Precision Model = SPDP (GPU), Double Precision (CPU)

Benchmarks run with ECC turned OFF on C2050/C2070/M2090 cards. If you see approximately 10% less performance than the numbers here then run the following (for each GPU) as root:

nvidia-smi -g 0 --ecc-config=0    (repeat with -g x for each GPU ID)

Segfaults in Parallel: If you find that runs across multiple nodes (i.e. using the infiniband adapter) segfault almost immediately then this is most likely an issue with GPU Direct v2 (CUDA v4.0) not being properly supported by your hardware and driver installations. In most cases setting the following environment variable on all nodes (put it in your .bashrc) will fix the problem:

export CUDA_NIC_INTEROP=1

List of Benchmarks

Implicit Solvent (GB)

  1. TRPCage = 304 atoms
  2. Myoglobin = 2,492 atoms
  3. Nucleosome = 25,095 atoms

Explicit Solvent (PME)

  1. DHFR NVE = 23,558 atoms
  2. DHFR NPT = 23,558 atoms
  3. FactorIX NVE = 90,906 atoms
  4. FactorIX NPT = 90,906 atoms
  5. Cellulose NVE = 408,609 atoms
  6. Cellulose NPT = 408,609 atoms

You can download a tar file containing the input files for all these benchmarks here (96.0 MB).

^

Cuda Zone


Implicit Solvent GB Benchmarks

1) TRPCage = 304 atoms

&cntrl
  imin=0,irest=1,ntx=5,
  nstlim=100000,dt=0.002,ntb=0,
  ntf=2,ntc=2,tol=0.000001,
  ntpr=1000, ntwx=1000, ntwr=50000,
  cut=9999.0, rgbmax=15.0,
  igb=1,ntt=0,nscm=0,
/

 

 

2) Myoglobin = 2492 atoms

&cntrl
  imin=0,irest=1,ntx=5,
  nstlim=10000,dt=0.002,ntb=0,
  ntf=2,ntc=2,tol=0.000001,
  ntpr=1000, ntwx=1000, ntwr=50000,
  cut=9999.0, rgbmax=15.0,
  igb=1,ntt=0,nscm=0,
/

 

 

3) Nucleosome = 25095 atoms

&cntrl
  imin=0,irest=1,ntx=5,
  nstlim=1000,dt=0.002,ntb=0,
  ntf=2,ntc=2,tol=0.000001,
  ntpr=100, ntwx=100, ntwr=50000,
  cut=9999.0, rgbmax=15.0,
  igb=1,ntt=0,nscm=0,
/

 

^


Explicit Solvent PME Benchmarks

1) DHFR NVE = 23,558 atoms

 Typical Production MD NVE with
 GOOD energy conservation.
 &cntrl
   ntx=5, irest=1,
   ntc=2, ntf=2, tol=0.000001,
   nstlim=10000, 
   ntpr=1000, ntwx=1000,
   ntwr=10000, 
   dt=0.002, cut=8.,
   ntt=0, ntb=1, ntp=0,
   ioutfm=1,
 /
 &ewald
  dsum_tol=0.000001,
 /
 

 

2) DHFR NPT = 23,558 atoms

Typical Production MD NPT
 &cntrl
   ntx=5, irest=1,
   ntc=2, ntf=2, 
   nstlim=10000, 
   ntpr=1000, ntwx=1000,
   ntwr=10000, 
   dt=0.002, cut=8.,
   ntt=1, tautp=10.0,
   temp0=300.0,
   ntb=2, ntp=1, taup=10.0,
   ioutfm=1,
 /
 

 

3) FactorIX NVE = 90,906 atoms

 Typical Production MD NVE with
 GOOD energy conservation.
 &cntrl
   ntx=5, irest=1,
   ntc=2, ntf=2, tol=0.000001,
   nstlim=10000, 
   ntpr=1000, ntwx=1000,
   ntwr=10000, 
   dt=0.002, cut=8.,
   ntt=0, ntb=1, ntp=0,
   ioutfm=1,
 /
 &ewald
  dsum_tol=0.000001,nfft1=128,nfft2=64,nfft3=64,
 /
 

 

4) FactorIX NPT = 90,906 atoms

Typical Production MD NVT
&cntrl
 ntx=5, irest=1,
 ntc=2, ntf=2, 
 nstlim=10000, 
 ntpr=1000, ntwx=1000,
 ntwr=10000, 
 dt=0.002, cut=8.,
 ntt=1, tautp=10.0,
 temp0=300.0,
 ntb=2, ntp=1, taup=10.0,
 ioutfm=1,
/
 

 

5) Cellulose NVE = 408,609 atoms

Typical Production MD NVE with
GOOD energy conservation.
 &cntrl
   ntx=5, irest=1,
   ntc=2, ntf=2, tol=0.000001,
   nstlim=10000, 
   ntpr=1000, ntwx=1000,
   ntwr=10000, 
   dt=0.002, cut=8.,
   ntt=0, ntb=1, ntp=0,
   ioutfm=1,
 /
 &ewald
  dsum_tol=0.000001,
 /
 

 

6) Cellulose NPT = 408,609 atoms

Typical Production MD NPT
 &cntrl
  ntx=5, irest=1,
  ntc=2, ntf=2, 
  nstlim=10000, 
  ntpr=1000, ntwx=1000,
  ntwr=10000, 
  dt=0.002, cut=8.,
  ntt=1, tautp=10.0,
  temp0=300.0,
  ntb=2, ntp=1, taup=10.0,
  ioutfm=1,
 /
 

 

^