Amber masthead
Filler image AmberTools24 Amber24 Manuals Tutorials Force Fields Contacts History
Filler image

Useful links:

Amber Home
Download Amber
Installation
Amber Citations
GPU Support
Updates
Mailing Lists
For Educators
File Formats
Contributors
Workshops
Free Energies
 

Advanced Thermodynamic Integration (TI) Methods to Mutate a Ligand Bound to a Protein

by Shi Zhanga
aDarrin York group, Rutgers University, Newark, NJ

Learning Outcomes

  • Understand recent Amber choices available for TI calculations.
  • Install the most up-to-date version of DDBoost.
  • Setup all initial files for a ligand mutation.
  •        -Prepare protein•ligand files
           -Use Antechamber to derive force field parameters for ligands
  • Set up an alchemical enhanced sampling (ACES) TI calculation using softcore potentials.
  • Run an ACES TI calculation with a single edge.
  • Use DDBoost to analyze the data from the TI calculation.

  • Implementation Notes:
    • DDBoost is a set of free energy tools developed by the York group, consisting of
      • FE_Workflow
      • FE_MDEngine
      • FE_Toolkit
    • FE_MDEngine is in Amber22.
    • FE_Workflow and FE_Toolkit not a part of Amber22, but files are compatible with Amber22, and available on the York gitlab page.

    Introduction

    This tutorial will guide you through calculating the relative binding free energies (ΔΔGbinding) between two ligands bound to the same protein. The system will be the human Thr160-phospho CDK2/cyclin A protein bound to two different inhibitors (NU6094 - PDBID: 1H1Q and NU6086 - PDBID: 1H1R).[1] This is a relatively conservative chemical change from H to Cl with only one “edge”. An edge is one alchemical transformation between two molecules. If you have more ligands or multiple function group changes between two ligands, you may need to perform multiple alchemical changes, in which case you should use MCS to define the softcore atoms for each edge.[2]

    Figure 1. Ligands NU6094 (1H1Q) and NU6086 (1H1R) bound to the the human Thr160-phospho CDK2/cyclin A protein.

    This tutorial assumes you know general Amber usage such as protein system setup, choices when relaxing a system and running production run molecular dynamics simulations. As well, it also assumes you know the fundamental issues surrounding free energy calculations. If you need more guidance, please see these tutorials:

    Advanced TI Calculations

    Free energies will be computed with thermodynamic integration (TI) with softcore potentials, a method used to ensure smooth free energy curves, as implemented in pmemd.cuda which runs on a GPU.[3] This implementation is a little more sophisticated than the standard TI. It will use FE_Workflow to create a number of inputs that will run GPU-driven Hamiltonian replica exchange molecular dynamics (HREMD) TI calculation method known as alchemical enhanced sampling (ACES)[4] with recent flags available in Amber, including smoothstep softcore potentials and enhanced sampling methods.

    In this tutorial, we’re going to use FE_Workflow, part of DDBoost, to set up the TI calculations, which will employ:

    • a second-order smoothstep softcore potential.
    • enhanced sampling git_add_sc functionalities.
    • Hamiltonian REMD with lambda TI as implemented in the ACES code.

    Noteworthy TI Flags

    Smoothstep Softcore Potentials

    Many issues with linear interpolation between end states result in spurious energies and poor statistics (reviewed in ref [5] introduction). Concerted softcore potentials “soften” the Lennard-Jones and electrostatic interactions with increasing values of α and β parameters, respectively. The default values6 in Amber are scalpha=0.5 and scbeta=12.

    A second-order smoothstep softcore potential, SSC(2), was developed to alleviate some of the remaining energy issues, replacing the original softcore potentials and setting variables scalpha=0.2 and scbeta=50. The SSC(2) function with these values enable the free energy derivatives to vanish at the transformation endpoints and smooth adjustment of the λ-dependent terms in the potential.5

    Enhanced Sampling for Softcore Ligand Energies

    In theory, sampling will not affect the converged value of the ΔΔGbinding. However, in practice, for example when rotating a ligand from cis to trans, there may be steric hindrances in the protein binding pocket during calculation of the alchemical change. Enhanced sampling methods for softcore energy calculations have been developed to improve convergence.4

    Figure 2. Softcore enhanced sampling flags in Amber. For more explanation, see text in ref 4.

    If you think there is an issue with convergence, for instance, in your ΔΔGbinding calculation, you can scale interactions between the common core (e.g. pharmacophore) and your softcore region (the atoms that changed). Figure 2 lists gti_add_sc options, where S means that the interactions between the common core and softcore region will be scaled in lambda windows, but P means energies will stay constant for all lambda windows. The default flag in Amber TI calculations is git_add_sc=1. Theoretically, you should obtain the same ΔΔGbinding with any of the 1-6 gti_add_sc values because G is a state function. However, in practice, the York group currently recommends gti_add_sc=5.

    ACES Method on the Ligand

    The alchemical enhanced sampling (ACES)[4] method is a Hamiltonian replica exchange molecular dynamics (HREMD) thermodynamic integration method that exchanges between lambda windows. This increases the sampling between neighboring lambda windows and should increase convergence of the calculated ΔΔGbinding between trials. This will require running on multiple GPUs on a single node (see below for an estimation of how many).

    Reviews

    More information is also available through these articles. [7, 8]

    System Setup

    Prepare the Protein•Ligand PDB Files – 1H1Q and 1H1R

    1. Download the 1h1q structure
    Download the 1h1q structure from PDB Database, https://www.rcsb.org/structure/1H1Q.

    2. Edit the pdb file to remove extraneous info.
    Open the 1h1q.pdb with any text editor. Remove everything but the protein-complex structure, i.e., remove the structural information at the beginning, remove all the solvent molecules and ions. Just use one monomer and one ligand. Take out CONNECT info.

    3. Add hydrogens.
    Use tleap to add H's for the protein.

    Repeat Steps 1-3 for the 1H1R protein•ligand complex.

    Derive Force Field Parameters for the Ligands with Antechamber

    Force fields are specific for the molecule type. Ligands, usually organic molecules, are modelled in Amber with GAFF or some modification of this. For this tutorial, GAFF2 parameters will be derived with Antechamber. Each new ligand will require derivation of parameters with Antechamber unless they exist elsewhere in the literature.

    4. Add hydrogens to the ligand using a tool such a VMD or Chimera X.
    Using VMD to examine the ligand structure and add H atoms back to the ligand if missing. (The Extension -> Modeling -> Molefacture function can be used to add missing atoms). Save the structure.
      File
      • New Molecule
        • Resid 1298 = selection
        • Select all
        • +Hs
      Enlarge in VMD and look at Hs that are added.
      • Delete extra Hs
      • Make H’s planar if need be by selecting dihedral and then clicking “Force Planar”
      • Make sure ligand picture looks chemically reasonable. Pay attention to the H’s that are added to the imidazole ring, to confirm whether they are added to the correct N atom.

      Save changes
      • File Apply Changes to Parent, Yes
      • Should see 1h1r_edited.pdb. Save it by File, Save Coordinates, selected atoms: “all”, file type - pdb.
      • Click Save and name new file. ligand Figure 3. Ligand from 1H1Q (NU6094) after hydrogen atoms added with Molefacture.
    5. Name the residue LIG

    Using a text editor, rename the residue name (column 18 to 20) of the ligand from “2A6” to “LIG”

    The residue name for all ligands should be “LIG” in order for DDBoost to recognize the ligands.

    6. Make a new file for just the ligand.

    Using a text editor, copy the ligand to a new file named 1h1q_0.pdb. Simply copy-paste. Do not cut-paste because you don't want to remove it from the original protein•ligand complex coordinates.

    7. Run antechamber to get a mol2 file.

    This run will use GAFF2 parameters and set net charge = 0 because LIG is a neutral ligand. ndiis_attempts=700 ⇒ Forces antechamber to run enough times to generate a force field. Here, {PDB} can be changed to "1h1q".

    antechamber -i ${PDB}_0.pdb -fi pdb -o ${PDB}_0.mol2 -fo mol2 \
    -at "gaff2" -pf y -an y -c 2 -nc 0 -rn LIG -ek "ndiis_attempts=700"
    

    8. Run parmchk2 to get an frcmod file.

    Here, {PDB} can be changed to "1h1q".

    parmchk2 -i ${PDB}_0.mol2 -o ${PDB}_0.frcmod -f mol2 -s 1
    

    9. Run tleap to generate an lib file.

    Here, {PDB} can be changed to "1h1q". tleap commands

    loadamberparams ${PDB}_0.frcmod
    LIG = loadmol2 ${PDB}_0.mol2
    saveoff LIG ${PDB}_0.lib
    quit
    

    10. Repeat step 4-9 for other ligands.

    For example, in this case, you want to repeat steps 4-9 for 1H1R where H is instead Cl.

      Sample Files

      All files required for intial setup are provided here. This includes:

      • 1h1q.pdb
      • 1h1q_0.frcmod
      • 1h1q_0.lib
      • 1h1q_0.mol2
      • 1h1q_0.pdb
      • 1h1r.pdb
      • 1h1r_0.frcmod
      • 1h1r_0.lib
      • 1h1r_0.mol2
      • 1h1r_0.pdb

      ***Note that you will need to make sure that the residue name for each ligand should be “LIG” in BOTH .pdb files and BOTH.lib files in order for DDBoost to recognize the ligands.

    Simulation Setup

    In this tutorial, we’re going to use FE_Workflow, part of DDBoost, to set up the TI calculations, which will employ:

    • a second-order smoothstep softcore potential.
    • enhanced sampling git_add_sc functionalities.
    • Hamiltonian REMD with lambda TI as implemented in the ACES code.

    Setup directories and input file for FE_Workflow

    FE_Workflow uses the input file to specify parameters. It assumes the following:
    • A single place to put all the pdb files and frcmod files.
    • Use the input file to control different steps of the TI calculation (ex: setup or analysis; in this first case setup).
    11. Make folders for the simulation.
    • Make a folder called ti.
    • In the ti folder, make two folders, one called rbfe and another called initial.
      • In the initial folder, make a subfolder for the protein. For example, call it CDK2.
      • Place all the frcmod files and pdb files from the System Setup into this folder (initial/CDK2).

    FE_Workflow will read in files found in initial/CDK2 and put results in the rbfe folder. In the rbfe folder, FE_Workflow will also make folders for each trial and lambda window, and then place TI input files, scripts, results in the correct folders. Figure 4 contains a list of files and/or folders the user needs to make (blue) and the ones the FE_Workflow will make (green).

    Figure 4. File setup for FE_Workflow.

    12. Edit the “input” file for FE_Workflow
    • Either download the input file or copy it from xxxx.
    • Place the input file in your ti folder.
    • For this tutorial, we are going to use the ff14SB protein force field and gaff2 for ligand force field. The water model is set to tip4pew and the buffer size is 16 Å for complex system and 20 Å for solution system. We'll use tleap to add a water box and ions.

      FE_Workflow will build an extensive set of input and scripting files necessary to run a robust equilbiration, including solvent relaxation, heating and production run for 2 ns at each lambda step. This is all contained in the run_alltrials.slurm file. Default values are used in FE_Workflow. If you want to modify the relaxation, heating or production runs, you can manually modify the md.in files for each step after FE_Workflow has made all the files.

      In this tutorial, we are running 11 lambda windows in parallel for 20 steps. After 20 steps, an exhange is attempted with a neighboring lambda window. At the exchange step, will try to exchange if successful, will run for another 20 steps and then another exchange. The flag numexchg controls how many exchanges will be attempted and nstlim specifies how many steps between exchange attempts.

    • Edit the input file. The user will need to specify the following flags:
    • Flag Example Setting What happens
      Path_to_input initial sets the path of the setup (this has to exist already)
      system initial/CDK2 sets the path of the protein (this has to exist already)
      stage setup sets if this is setup or analysis
      translist 1h1q~1h1r sets the name of the folder for the transformation. FE_Workflow will make this folder

    There is a detailed explanation for the flags inside the sample input file and Amber Reference Manual. For instance, under:

    #######################################################################
    # 1. NAME OF SYSTEM, INPUT STRUCTURES, and TYPE OF CALCULATION
    #######################################################################
    
    and
    # List of desired transformations or edges
    
    There are directions for specifying the names of the tranformations.
    # For example, RBFE or RSFE calculations should have a list
    # in which each entry consists of two molnames separated by the
    # character "~". Initial structure/parameter files of these
    # molnames should be provided in /
    # For RBFE calculations, the PDB file of protein-ligand complex
    # and mol2,lib,frcmod files of ligand are expected.
    # For RSFE calculations, mol2,lib,frcmod files of ligand are expected.
    # example,
    # translist=(1h1q~1h1r 1h1q~1h1s)
    #
    # For ASFE calculations, "translist" should contain a list of
    # molnames.
    # mol2,lib,frcmod files of these molnames are expected in
    # /
    # example,
    # translist=(mobley_1527293 mobley_3034976)
    #translist=(1h1q~1h1r 1h1r~1h1s 1h1s~1oiu 1oiu~1h1q 1h1r~1oiu 1h1s~1h1q)
    

    Run FE_Workflow

    13. Go to the folder with input, type the command
    setup_fe
    

    Workflow should begin to generate the files for you.

    setup

    In your transformation folder, FE_Workflow will make a set of folders, one for the ligand solvated in water (aq) and for the protein•ligand complex (com). It will copy over needed .parm7 and .rst7 files, as well as make .sh and slurm files.

    You will find simulation topology, coordinate, md input and slurm files under CDK2/1h1q~1h1r/aq and com folders respectively (see Figure 4). Files for the set of the transformation 1h1q to 1h1r are provided for your reference.

    In the beginning, the trial folders (t1, t2 and t3) only contain .rst7 files for end states (0 and 1). Later it will later write .mdout, .nc files as well as .rst7 files for propagated lambda windows from equilibration.

    Running the TI Calculation

    14. The user should edit the slurm script to customize the number of GPUs requested.

    GPU Usage

    This TI implementation uses ACES, a replica exchange method, that requires multiple GPUs to run simultaneoulsy.

    You might ask how many GPUs you need. You do not need the same number of GPUs as replicas. The number of replicas per GPU is determined by the GPU memory size. If your GPU has a lot of memory (ex: 32 GB), you can put 20 replicas on 1 GPU. A typical system with 100,000 atoms, will take 1 GB GPU memory. Modern GPUs typically have 8 GB memory. Therefore, about 8 replicas can go on 1 GPU simultaneously. The code will of course go faster if you have more GPUs. The slurm script will automatically distribute the replicas to the specified number of GPUs available. For instance, if you tell slurm to grab 2 GPUs, will put 5 in 1 GPU and 6 replicas in the other. If you have 4 GPUs, it will distribute the 11 replicas evenly. NOTE: All the GPUs should be on the same node. If you are using multiple nodes, it will slow down the exchange attempt due to I/O communication issues.

    15. Submit the slurm script.

    Now transfer the files to any HPC where FE-MDEngine is installed. Confirm the “#SBATCH --” lines are set appropriately. Submit the slurm script and wait for the simulations to finish. The simulation will go through minimization, equilibration and production stages. Each step will generate the output file and coordinate. Here is an example production run input.

    &cntrl
    imin            = 0
    nstlim          = 20
    numexchg        = 100000
    dt              = 0.001
    irest           = 0
    ntx             = 1
    ntxo            = 1
    ntc             = 1
    ntf             = 1
    ntwx            = 10000
    ntwr            = 5000
    ntpr            = 1000
    cut             = 10
    iwrap           = 1
    
    ntb             = 2
    temp0           = 298.
    ntt             = 3
    gamma_ln        = 2.
    tautp           = 1
    ntp             = 1
    barostat        = 2
    pres0           = 1.01325
    taup            = 5.0
    
    ifsc            = 1
    icfe            = 1
    
    ifmbar          = 1
    bar_intervall   = 1
    mbar_states     = 11
    mbar_lambda(1)     = 0.00000000
    mbar_lambda(2)     = 0.10000000
    mbar_lambda(3)     = 0.20000000
    mbar_lambda(4)     = 0.30000000
    mbar_lambda(5)     = 0.40000000
    mbar_lambda(6)     = 0.50000000
    mbar_lambda(7)     = 0.60000000
    mbar_lambda(8)     = 0.70000000
    mbar_lambda(9)     = 0.80000000
    mbar_lambda(10)     = 0.90000000
    mbar_lambda(11)     = 1.00000000
    
    noshakemask     = '@1-90'
    
    clambda         = 1.00000000
    timask1         = '@1-45'
    timask2         = '@46-90'
    crgmask         = ""
    scmask1         = '@41'
    scmask2         = '@90'
    scalpha         = 0.5
    scbeta          = 1.0
    
    gti_cut         = 1
    gti_output      = 1
    gti_add_sc      = 5
    gti_scale_beta  = 1
    gti_cut_sc_on   = 8
    gti_cut_sc_off  = 10
    gti_lam_sch     = 1
    gti_ele_sc      = 1
    gti_vdw_sc      = 1
    gti_cut_sc      = 2
    gti_ele_exp     = 2
    gti_vdw_exp     = 2
    
    gremd_acyc      = 1
    /
    &ewald
    /
    

    Analysis of the Simulation

    This tutorial uses FE_Toolkit to analyze the data automatically.

    If simulations finish without errors, you will find a series of *ti.mdout files in the t1/ folder. For the free energy results, we will extract energy values from these *ti.mdout files.

    Look at the trajectories of the endstates in VMD

    If you want to take a close look at the conformation or the structure of the ligand, load in VMD the unisc.parm7 and 0.00000000_ti.nc for real state ligand 1h1q (resid :1), 1.00000000_ti.nc for real state ligand 1h1r (resid :2).

    Perform Analysis

  • Transfer *ti.mdout files in both com/ and aq/ folders back to your local computer.
  • Modify the “stage=analysis” and “path_to_data” to the folder containing the output files.
  • Make sure you’ve loaded the FE-Toolkit module.
  • Again, type the command
    setup_fe
  • Essentially, the analysis uses the graphmbar program inside the FE-Toolkit, and there are more advanced options for different features you may want to explore. An example of graphmbar is provided in the fe-toolkit/src/graphmbar-0.9/examples/stx. A general usage is consisted of three steps:

    python3 ./gmbar.py -f ExptData.txt -t 1 2 3 -c > graphmbar.inp
    
    OMP_NUM_THREADS=1 mpirun -n 6 graphmbar_mpi --nboot=200 --fc=10.\
                   --guess=2 graphmbar.inp > graphmbar.out
    
    python3 ./gmbar.py -f ExptData.txt -t 1 2 3 \
             -a graphmbar.out > graphmbar.txt
    
    The FE-Workflow serves as an interface to connect graphmbar and pass some variables for analysis.

    ligand

    The experimental value to the binding free energy is not passed to the analysis mode in this example, but you can always apply the experimental restraints by providing a file consisting of two column, first is the names of the complex, second the experimental value in unit of kcal/mol. Let’s look at the graphmbar.txt file. Since we only have two nodes, no cycle is detected.

    ligand

    On the last line, it generates the relative binding free energy from 1h1q to 1h1r based on our one trial simulation. The ddG value is -0.549 kcal/mol.

    References

    1. Davies, T. G., Bentley, J., Arris, C. E., Boyle, F. T., Curtin, N. J., Endicott, J. A., Gibson, A. E., Golding, B. T., Griffin, R. J., Hardcastle, I. R., Jewsbury, P., Johnson, L. N., Mesguiche, V., Newell, D. R., Noble, M. E. M., Tucker, J. A., Wang, L.,and Whitfield, H. J. Nature Structural and Molecular Biology, 2002, 9, 745-749.
    2. Cao, Y., Jiang, T., and Girke, T. A maximum common substructure-based algorithm for searching and predicting drug-like compounds. Bioinformatics, 2008, 24, i366–i374.
    3. Lee, T. S., Cerutti, D. S., Mermelstein, D., Lin, C., LeGrand, S., Giese, T.J., Roitberg, A., Case, D. A., Walker, R. C., and York, D. M. GPU-Accelerated Molecular Dynamics and Free Energy Methods in Amber18: Performance Enhancements and New Features. Journal of Chemical Information and Modeling, 2018, 58, 2043-2050.
    4. Lee, T. S., Tsai, H. C., Ganguly, A., and York, D. M. ACES: Optimized Alchemically Enhanced Sampling. Journal of Chemical Theory and Computation, 2023, 19, 472-487.
    5. Steinbrecher, T.; Joung, I.; Case, D. A. Soft-core potentials in thermodynamic integration: Comparing one- and two-step transformations J. Comput. Chem., 2011, 32, 3253– 3263.
    6. Lee, T.S., Lin, Z., Allen, B.K., Lin, C., Radak, B.K., Tao, Y., Tsai, H.-C., Sherman, W. and York, D.M. Improved Alchemical Free Energy Calculations with Optimized Smoothstep Softcore Potentials J. Chem. Theory Comput., 2020, 16, 5512–5525.
    7. Tsai, H. C., Lee, T. S., Ganguly, A., Giese, T. J., Ebert, M. C., Labute, P., Merz Jr., K. M., and York, D. M. AMBER Free Energy Tools: A New Framework for the Design of Optimized Alchemical Transformation Pathways. Journal of Chemical Theory and Computation, 2023, 19, 640-658.
    8. Ganguly, A., Tsai, H. C., Pendás-Fernández, M., Lee, T. S., Giese, T. J., and York, D. M. AMBER Drug Discovery Boost Tools: Automated Workflow for Production Free-Energy Simulation Setup and Analysis (ProFESSA). Journal of Chemical Information and Modeling, 2022, 62, 6069-6083.
    Adapted by Maria Nagan

    "How's that for maxed out?"

    Last modified: