AMBER 12 NVIDIA
GPU ACCELERATION SUPPORT
Benchmarks
Benchmarks timings by Mike Wu and Ross
Walker.
Download AMBER 12
Benchmark Suite
Please Note: The current benchmark
timings here are for AMBER 12 up to and including Bugfix.14 (GPU support revision 12.2,
Jan 10th 2013). These
new benchmarks highlight the new Kepler I (K10/GTX680) and Kepler II
(K20) performance
patch.
A note on comparing these benchmarks with
other codes: We have deliberately avoided comparisons with
others codes here since benchmarking tends to be more of a religion
than a science. First and foremost the benchmarks we provide here
are based on typical research scenarios. We have NOT made any
unreasonable (or at the very least questionable) simulation
parameter choices purely to produce a better headline performance
number. Examples of such practices are running with hydrogen mass
repartitioning and 4fs time-steps (or angle constraints and 4fs time
steps), multiple time-stepping with PME (such as reciprocal space
every 4 or 8 fs), use of single precision throughout the code, or
very loose shake tolerances without testing the effect this has on
simulation accuracy or the use of time-steps that are right on the
bleeding edge of stability (such as 2.5fs with shake). If you want
to compare AMBER performance with other codes feel free to. However,
you should avoid apples and oranges comparisons. For a true
performance comparison you should attempt to run the real world
production examples here in the other MD codes using, as best as is
supported, the settings used here. This will ensure a fair and, more
importantly, scientifically relevant comparison of performance.
Machine Specs
Machine 1
CPU = Dual x 8 Core Intel E5-2660
MPICH2 v1.5 - GNU v4.4.6
GPU = GTX580 (1.5GB) / GTX680 (4.0GB) / K10
nvcc v4.2
NVIDIA Driver Linux 64 - 304.54
Machine 2
CPU = Dual x 8 Core Intel E5-2687W @ 3.10 GHz
Motherboard = SuperMicro X9DR3-F Motherboard
GPU = K10 (2x4GB) / K20 (5GB) / K20X (6GB)
ECC = OFF
nvcc v4.2
NVIDIA Driver Linux 64 - 304.51
Machine
3 (SDSC Gordon)
CPU = Dual x 8 Core Intel E5-2670 @ 2.60GHz
MVAPICH2 v1.8a1p1
Intel Compilers v12.1.0
QDR IB Interconnect
K10 Note: The K10 naming is a little
confusing. In these plots we have chosen to refer to K10's as the
number of GPUs exposed to the operating system. Thus 2 x K10 is
actually a single K10 card and 8 x K10 means 4 K10 cards.
Code Base = AMBER 12 Release + Bugfixes 1
to 14
- GPU code v12.2 (Jan 2013)
Precision Model = SPFP (GPU), Double Precision
(CPU)
Benchmarks run with ECC turned OFF on
GTX/M2090/K10/K20 cards. If you see approximately 10% less
performance than the numbers here then run the following (for each GPU) as root:
nvidia-smi -g 0
--ecc-config=0 (repeat
with -g x for each GPU ID)
Segfaults in Parallel: If you find that
runs across multiple nodes (i.e. using the infiniband adapter)
segfault almost immediately then this is most likely an issue with
GPU Direct v2 (CUDA v4.2) not being properly supported by your
hardware and driver installations. In most cases setting the
following environment variable on all nodes (put it in your .bashrc)
will fix the problem:
export
CUDA_NIC_INTEROP=1
List of Benchmarks
Implicit Solvent (GB)
- TRPCage = 304 atoms
- Myoglobin = 2,492 atoms
- Nucleosome = 25,095 atoms
Explicit Solvent (PME)
- DHFR NVE = 23,558 atoms
- DHFR NPT = 23,558 atoms
- FactorIX NVE = 90,906 atoms
- FactorIX NPT = 90,906 atoms
- Cellulose NVE = 408,609 atoms
- Cellulose NPT = 408,609 atoms
You can download a tar file containing the input
files for all these benchmarks
here
(50.3 MB).
^ |



 |
|
Implicit
Solvent GB Benchmarks
|
1) TRPCage = 304 atoms
&cntrl
imin=0,irest=1,ntx=5,
nstlim=100000,dt=0.002,ntb=0,
ntf=2,ntc=2,tol=0.000001, ntpr=1000, ntwx=1000,
ntwr=50000, cut=9999.0, rgbmax=15.0,
igb=1,ntt=0,nscm=0, / |
|
|
 |

|
|
2) Myoglobin = 2492 atoms
&cntrl
imin=0,irest=1,ntx=5,
nstlim=10000,dt=0.002,ntb=0,
ntf=2,ntc=2,tol=0.000001, ntpr=1000, ntwx=1000,
ntwr=50000, cut=9999.0, rgbmax=15.0,
igb=1,ntt=0,nscm=0, / |
|
|
 |

|
|
3) Nucleosome = 25095 atoms
&cntrl
imin=0,irest=1,ntx=5,
nstlim=1000,dt=0.002,ntb=0,
ntf=2,ntc=2,tol=0.000001, ntpr=100, ntwx=100,
ntwr=50000, cut=9999.0, rgbmax=15.0,
igb=1,ntt=0,nscm=0, / |
|
|

|

| |
|
^
|
Explicit
Solvent PME Benchmarks
|
1) DHFR NVE = 23,558 atoms
Typical Production MD NVE with
GOOD energy conservation.
&cntrl
ntx=5, irest=1,
ntc=2, ntf=2, tol=0.000001,
nstlim=10000,
ntpr=1000, ntwx=1000,
ntwr=10000,
dt=0.002, cut=8.,
ntt=0, ntb=1, ntp=0,
ioutfm=1,
/
&ewald
dsum_tol=0.000001,
/
|
|
|
 |

|
|
2) DHFR NPT = 23,558 atoms
Typical Production MD NPT
&cntrl
ntx=5, irest=1,
ntc=2, ntf=2,
nstlim=10000,
ntpr=1000, ntwx=1000,
ntwr=10000,
dt=0.002, cut=8.,
ntt=1, tautp=10.0,
temp0=300.0,
ntb=2, ntp=1, taup=10.0,
ioutfm=1,
/
|
|
|
 |

|
|
3) FactorIX NVE = 90,906 atoms
Typical Production MD NVE with
GOOD energy conservation.
&cntrl
ntx=5, irest=1,
ntc=2, ntf=2, tol=0.000001,
nstlim=10000,
ntpr=1000, ntwx=1000,
ntwr=10000,
dt=0.002, cut=8.,
ntt=0, ntb=1, ntp=0,
ioutfm=1,
/
&ewald
dsum_tol=0.000001,nfft1=128,nfft2=64,nfft3=64,
/
|
|
|
 |

|
|
4) FactorIX NPT = 90,906 atoms
Typical Production MD NVT
&cntrl
ntx=5, irest=1,
ntc=2, ntf=2,
nstlim=10000,
ntpr=1000, ntwx=1000,
ntwr=10000,
dt=0.002, cut=8.,
ntt=1, tautp=10.0,
temp0=300.0,
ntb=2, ntp=1, taup=10.0,
ioutfm=1,
/
|
|
|
 |

|
|
5) Cellulose NVE = 408,609 atoms
Typical Production MD NVE with
GOOD energy conservation.
&cntrl
ntx=5, irest=1,
ntc=2, ntf=2, tol=0.000001,
nstlim=10000,
ntpr=1000, ntwx=1000,
ntwr=10000,
dt=0.002, cut=8.,
ntt=0, ntb=1, ntp=0,
ioutfm=1,
/
&ewald
dsum_tol=0.000001,
/
|
|
|
 |

|
|
6) Cellulose NPT = 408,609 atoms
Typical Production MD NPT
&cntrl
ntx=5, irest=1,
ntc=2, ntf=2,
nstlim=10000,
ntpr=1000, ntwx=1000,
ntwr=10000,
dt=0.002, cut=8.,
ntt=1, tautp=10.0,
temp0=300.0,
ntb=2, ntp=1, taup=10.0,
ioutfm=1,
/
|
|
|
 |

|
^ | |