AMBER PME Midpoint (Amber 18) Beta
Release Notes

| Background | Authorship & Support | Performance and Accuracy |
| Supported Features | Compiling and Running |
| Return to Main Amber Page |

News and Updates

Background

About the new Midpoint method implementation in AMBER 18 beta

The Midpoint method for Molecular Dynamics was developed by researchers at IBM as part of their BlueMatter project with BlueGene and subsequently expanded upon and described publically by researchers at D. E. Shaw Research, LLC (see this 2006 paper published in The Journal of Chemical Physics).  The Midpoint method is based on domain decomposition and provides anefficient approach to significantly reduce “data distribution” time with increased node count. The San Diego Computing Center (SDSC) at UC San Diego in collaboration with Intel© Corporation have developed a prototype implementation of the Midpoint method within the Amber PMEMD software.

This implementation has been thoroughly tested with three key AMBER benchmarks, and is being releasing in beta form to the broader Amber community for more extensive testing and feedback.  The midpoint implementation will continue to be refined and the performance improved over the coming months with a planned production release to be part of the upcoming Amber 18.

Motivation for re-architecting AMBER 16 PMEMD CPU code

The aim to re-architect AMBER 16’s PMEMD code is to continue to improvethe CPU performance and cluster scalability. For example, the following figures show thatthe current atom decomposition approach in AMBER 16 does not scale well for theCellulose andSTMV benchmarks (roughly 400K and 1 Million atoms, respectively) beyond 8 nodes each containing a single Intel© Xeon Phi™processor 7250 (Knights Landing). It was also observed that the data distribution time grew exponentially with increased node count.

Figure 1: Scaling of existing AMBER 16 PMEMD code on Intel Xeon Phi 7250 equipped nodes connected with Intel Omni Path.

 

Authorship & Support

PMEMD Midpoint Implementations:

Ashraf Bhuiyan (Intel)
Charles Lin (SDSC)
Tareq Malas (Intel)
Ross C. Walker (SDSC)*

*Corresponding author.

Citing the Midpoint Code

If you make use of any of the AMBER midpoint code in your work please include the following citations (in addition to the standard AMBER citation):

  • Charles Lin, Tareq Malas, Ashraf Bhuiyan,and Ross C. Walker* "Scalable Amber Molecular Dynamics Implementations for Intel Architecture", 2018, in prep.

^

Performance and Accuracy

AMBER 18 midpoint beta Mixed-precision accuracy

In addition to the traditional full double precision implementation used in the PMEMD CPU code the AMBER 18 midpoint beta also introduces a mixed prevision model, first pioneered with the AMBER GPU implementation, termed SPDP. This precision model uses single precision for each particle-particle interaction but sums the resulting forces into double precision accumulators. This precision model was established as sufficient for accurate MD simulations and shown to conserve energy in the original AMBER GPU publications and has been tested to the same degree of tolerance as part of this new CPU optimization. The benefit of using mixed-precision is that it provides improved performance, both serial and parallel, since all modern Intel CPUs can perform two single precision floating point calculations for the cost of a single double precision floating point calculation. The figures and table below provide the validation data for AMBER 18 DPDP and SPDP compared with AMBER 16 DPDP as reference.

Figure  SEQ Figure \* ARABIC 1: Cellulose

Figure  SEQ Figure \* ARABIC 2: DHFR

Figure  SEQ Figure \* ARABIC 3: FactorIX

 

 

DHFR

Factor IX

Cellulose

STMV

Max deviation

A18 SPDP

8.6E-04

1.9E-03

3.8E-03

4.0E-03

A18 DPDP

5.0E-08

4.7E-07

4.9E-07

5.0E-07

 

RMS deviation

A18 SPDP

5.4E-05

1.1E-04

1.4E-04

1.9E-04

A18 DPDP

1.9E-08

2.0E-08

2.0E-08

2.0E-08

 

AMBER 18 midpoint beta performance results

Benchmark Downloads: STMV | Cellulose | Poliovirus

The new midpoint based parallel version of PMEMD in AMBER 18 betais now faster and scalesbetter thanPMEMD from the AMBER 16 released version. The following figures show that the AMBER 18 beta is consistently faster than AMBER 16 in the primary benchmarks (Cellulose and STMV) on current Intel processors[Intel© Xeon© Gold 6148 (Skylake) and Intel© Xeon Phi™7250(Knights Landing) processors]. At larger number of nodes, the performance gap between AMBER18 beta and AMBER 16increases, as the former has better scaling efficiency. For example, the figure below shows that AMBER 18 beta is 2.2x faster than the AMBER 16 released code when running the STMV benchmark on a single node Intel© Xeon Phi™ processor 7250.Moreover, the AMBER 18 beta provides 2.5x speedup over AMBER 16 on 8 nodes of Intel© Xeon Phi™ processor 7250 and 3.1x speedup on 32 nodes of Intel© Xeon© Gold 6148 Processorfor the STMV benchmark.In addition to the traditional Cellulose and STMV benchmarks we have also introduced a larger, 4 million atom benchmark based on the polio virus. This represents a large simulation and is thus a good stress test of the midpoint decomposition approach.

Hardware configuration: Intel© Xeon Phi™ 7250 runs at 1.4GHz with 96GB of DRAM and 16GB MCDRAM in QuadrantCache mode. Intel© Xeon© Gold 6148 Processor runs at 2.4GHz frequency with 192GB of DRAM memory capacity. The multinode results are using Omnipath (OPA) fabric on the cluster. The compilers used are Intel compiler version 2017 update2 and Intel MPI version 5.1.3.

^

Currently Supported Features

The AMBER 18 midpoint beta release does not yet support the full PMEMD functionality. We plan to add critical missing functionality in the coming months prior to the production release in AMBER 18. 
The AMBER 18 midpoint beta version currently supports: NVE, NVT (Langevin Thermostat), Shake The following features are expected to be supported in the midpoint AMBER 18 release version: NPT, NMR Restraints, TI* (likely to be released after release via update)

^

Configure, Compile, and Build Instructions

In this Beta release, we advise the user to configure, compile and run on the same Computer. If you build on Intel© Xeon© Processor E7-4850 v2 processor (Products formerly Ivy Bridge), you will get AVX. If you build on Intel© Xeon© Processor E7-4850 v4 (Products formerly Broadwell) you will get AVX2 instruction. If you build on Intel© Xeon Phi™processor 7250, you will get AVX512 instruction. Binaries compiled on Intel© Xeon Phi™processor 7250, will not run on Haswell or previous generation processors. Currently Intel compilers version 2017 update2 and GCC compiler version 6.2.0 are supported and extensively tested for this AMBER18 midpoint beta release.

Add download, update and patch instructions


  • Optimized code with mixed-precision (SPDP):

      ./configure -intelmpi -openmp -mic2_spdp intel
      cd $AMBERHOME/src/pmemd.midpoint/src
      make mic2
      Run in a bash shell:

          export KMP_BLOCKTIME=0 # threads sleep right away when idle (e.g., on barriers)
          export KMP_STACKSIZE=200 M # Allocate 200 MB for each OpenMP private stack
          export I_MPI_PIN_DOMAIN=core # Restricts the threads of each MPI rank in one physical CPU
          export OMP_NUM_THREADS=4 or 2 # 4 for Xeon Phi 2 for Xeon
         
          mpirun -np {cores} $AMBERHOME/bin/pmemd.midpoint \
          -O -i mdin -o mdout -p prmtop -c inpcrd 

          #note cores is number of physical cores not hyperthreads


  • Optimized code with full double precision (DPDP):

        ./configure -intelmpi -openmp -mic2 intel
        cd $AMBERHOME/src/pmemd.midpoint/src
        make mic2
            Run in a bash shell:

            export KMP_BLOCKTIME=0 # threads sleep right away when idle (e.g., on barriers)
            export KMP_STACKSIZE=200 M # Allocate 200 MB for each OpenMP private stack
            export I_MPI_PIN_DOMAIN=core # Restricts the threads of each MPI rank in one physical CPU
            export OMP_NUM_THREADS=4 or 2 # 4 for Xeon Phi 2 for Xeon
       
            mpirun -np {cores} $AMBERHOME/bin/pmemd.midpoint \
            -O -i mdin -o mdout -p prmtop -c inpcrd

            #note cores is number of physical cores not hyperthreads


  • Without hardware specific or OpenMP optimizations:

          ./configure -intelmpi intel
          cd $AMBERHOME/src/pmemd.midpoint/src
          make parallel
              Run in a bash shell:

              mpirun -np {cores} $AMBERHOME/bin/pmemd.midpoint \
              -O -i mdin -o mdout -p prmtop -c inpcrd

              #note cores is number of physical cores not hyperthreads

    ^

  • Additional Resources

    The following provides some additional resources that you may find useful.

    ^