Free energy calculations with the Free Energy Workflow Tool (FEW)

SECTION 4: Thermodynamic integration calculations

FEW facilitates the determination of relative binding free energies by the thermodynamic integration (TI) approach for ligands binding to the same receptor. The program provides the functionality for an automated setup of the required thermodynamic integration (TI) transformation simulations, and using the output of these simulations it can compute relative ligand binding free energies with minimal user intervention.
The basic principle of the TI approach for the determination of relative binding free energies is explained in Tutorial A9. Here exclusively the setup of TI calculations with FEW is shown. The setup with FEW differs form the one presented in Tutorial A9 mainly in that only one simulation is run (instead of three) for each transformation, i.e. one transformation simulation in water and one transformation simulation in the solvated complex. The simulations are prepared by FEW according to the 1-step, soft-core approach (see [11] and page 355 of the Amber14 Manual), which allows to change charges and vdW interactions in a single simulation.

The TI functionality of FEW is demonstrated here by a calculation of the relative binding free energy of the Factor Xa ligands L51c and L51d (Figure 6). As input structures the equilibrated ligand and complex structures of L51c from the MD simulations run in section 1 are used.

Note: Although the TI setup is performed here with pre-equilibrated structures, it is in principle possible to start TI calculations for any complex and ligand structures for which parameters have been prepared according to step 1A and for which solvated systems were generated, e.g. by step 1B. Please consult the FEW manual, if you plan to start TI simulations from non-equilibrated structures.

Figure 6: Transformation investigated by the TI approach.

For preparation of input structures and parameters and for setup of TI MD equilibration and production calculations one command file, called here TI_setup_L51c_d.in, is used. The different tasks (steps 1A - 2B below) are requested by setting flags in the respective parts of the command file to 1.

TI_setup_L51c_d.in

@TIW
################################################################################
# Command file for TI simulation setup
################################################################################
# Location of input and output directories / file(s)
#
# lig_struct_path: Folder containing the ligand input file(s)
# output_path: Basis directory in which all setup and analysis folders will
#              be generated. The directory must be identical with the
#              'output_path' directory used for setup of the MD simulations.
lig_struct_path              /home/user/tutorial/structs
output_path                  /home/user/tutorial

################################################################################
# Parameters required for TI simulation setup:
# The following parameters have to be specified and need to be identical
# in all subsequent runs for one system / TI-setup
#
# ti_simulation_setup: Request setup of TI simulation
# charge_method: Charge method that shall be used, either "resp" or "am1"
# lig_name_v0_struct: Name of start-ligand - Must be identical to the name of
#                     the file in the "structs" folder used for generation of
#                     parameter and library files with the common MD setup
#                     functionality of FEW.
# lig_name_v1_struct: Name of end-ligand - Must be identical to the name of
#                     the file in the "structs" folder used for generation of
#                     parameter and library files with the common MD setup
#                     functionality of FEW.
# lig_alias_v0: Alias that shall be used for the identification of the
#               start-ligand. The alias must consist of 3 characters.
# lig_alias_v1: Alias that shall be used for the identification of the
#               end-ligand. The alias must consist of 3 characters.
# softcore_mask_v0: Soft core atom mask for start-structure, specifying the
#                   atoms of the start-structure (state V0) that shall be
#                   regarded as soft core using the format
#                   <lig_alias_v0>@<atom name list separated by comma>
# softcore_mask_v1: Soft core atom mask for end-structure, specifying the
#                   atoms of the end-structure (state V1) that shall be
#                   regarded as soft core using the format
#                   <lig_alias_v1>@<atom name list separated by comma>
ti_simulation_setup         1
charge_method               am1
lig_name_v0_struct          L51c
lig_name_v1_struct          L51d
lig_alias_v0                LFc
lig_alias_v1                LFd
softcore_mask_v0            LFc@C14,H1
softcore_mask_v1            LFd@N3
#
################################################################################
# 1) Parameters for preparation of coordinate and topology files of solvated
#    systems of start- and end-structures for TI simulations
#
# A) Generation of atom association list based on ligand mol2 files of
#    start and end structures
#
# prepare_match_list: Request creation of matching list
prepare_match_list          1
#
# B) Setup of coordinate and topology files
#
# It is required that RESTRT (coordinate) and topology files for the ligand and
# complex of the start structure exist. These can be generated with the common
# MD setup functionality of FEW.
#
# prepare_inpcrd_prmtop: Request setup of coordinate and topology files
# lig_inpcrd_v0: Coordinate file (restart file) of ligand - start structure
# com_inpcrd_v0: Coordinate file (restart file) of complex - start structure
# lig_prmtop_v0: Topology of ligand - start structure
# com_prmtop_v0: Topology of complex - start structure
# match_list_file: Optional: File containing the atom association information
#                  for the common part of start- and end-structures. Must only
#                  be specified if step 1A was not successful and the list was
#                  created manually.
# chain_termini: Comma separated numbers of terminal residues of chains in
#                receptor structure.
# create_sybyl_mol2: Request generation of mol2-files with sybyl atom types
#                    for easy comparison of atom names of start- and end-
#                    structures. Can facilitate checking and manual adjustment
#                    of atom names in the end-structure, if automatic matching
#                    is not successful.
# additional_library: If an additional library file is required, e.g. for
#                     non-standard residues present in the receptor structure,
#                     this file must be specified here.
# additional_frcmod: If additional parameters are needed, e.g. for describing
#                    non-standard residues present in the receptor structure,
#                    a parameter file should be provided here.
# SSbond_file: File with disulfide bridge definitions
#
prepare_inpcrd_prmtop      1
lig_inpcrd_v0              /home/user/tutorial/MD_am1/L51c/lig/equi/md_nvt_red_06.restrt
com_inpcrd_v0              /home/user/tutorial/MD_am1/L51c/com/equi/md_nvt_red_06.restrt
lig_prmtop_v0              /home/user/tutorial/MD_am1/L51c/cryst/L51c_solv_lig.top
com_prmtop_v0              /home/user/tutorial/MD_am1/L51c/cryst/L51c_solv_com.top
match_list_file
chain_termini              235
create_sybyl_mol2          1
additional_library         /home/user/tutorial/input_info/CA.lib
additional_frcmod
SSbond_file                /home/user/tutorial/input_info/disulfide_bridges.txt

# 2) Setup scripts for TI MD
#
# General parameters
#
# no_shake: Set to "1", if no SHAKE shall be performed
# ti_batch_path: Root path to be used in setup of batch files
# ti_prod_template: Template script for TI production simulations
no_shake                   1
ti_batch_path              /home/user/tutorial
ti_prod_template           /home/user/tutorial/input_info/MD_prod_noShake_TI.in
#
# A) Setup of scripts for equilibration
#
# ti_equil: Request generation of scripts for TI equilibration input
# ti_equil_template: Template file for equilibration part of equilibration
#                    phase of TI simulations. This equilibration part is
#                    followed per default by a 1 ns free TI MD simulation
#                    for complete equilibration of the system.
# ti_equil_batch_template: Batch template file for equilibration phase of
#                          TI simulations.
# ti_equil_lambda: Values of lambda that shall be used in the calculation
#                  in ascending order. Please specify only the decimal digits,
#                  e.g. 1 for lambda 0.1, 05 for lambda 0.05.
ti_equil                   0
ti_equil_template          /home/user/tutorial/input_info/equi_noShake_TI.in
ti_equil_batch_template    /home/user/tutorial/input_info/equi_TI.sge
ti_equil_lambda            1,2,3,4,5,6,7,8,9
#
# B) Setup scripts for production
#
#    ATTENTION: This setup step can only be conducted if the equilibration
#               calculations have been completed.
#
# ti_production: Request generation of scripts for TI production input.
#                This setup step requires that the equilibration output is
#                present in the corresponding 'equi' folder.
# ti_prod_lambda: Lambda steps for which the production shall be run;
#                 separated by comma and in ascending order. Please specify
#                 only the decimal digits, e.g. 1 for lambda 0.1.
# total_ti_prod_time: Total production time requested (in ns)
# ti_prod_batch_template: Batch template for TI production simulations
# converge_check_script: Location of perl program to be used for convergence
#                        checking after each production step. If the location
#                        is not specified, it will be assumed that the program
#                        can be found under the default location
#                        at .../FEW/miscellaneous/convergenceCheck.pl
# converge_check_method: Method that shall be used for convergence checking.
#                        1: Difference in standard error of dV/dL
#                        2: Precision of dV/dL according to student's
#                           distribution
# converge_error_limit: Error limit that shall be used as termination criterion
#                       for the TI production simulations.
#                       Defaults: 0.01 kcal/mol for method 1
#                                 0.2 kcal/mol for method 2
ti_production              0
ti_prod_lambda             1,2,3,4,5,6,7,8,9
total_ti_prod_time         1,1,1,1,1,1,1,1,1
ti_prod_batch_template     /home/user/tutorial/input_info/prod_TI.sge
converge_check_script
converge_check_method      2
converge_error_limit       0.2

Definition of global parameters
The key terms of the very first part of the command file, where Input/Output directories/file(s) are defined, are identical to the ones of the Input/Output specification section of the command files for MD simulation setup discussed in section1.
In the TI simulation setup part it is specified that TI simulations shall be prepared using AM1-BCC charges for the ligands. Furthermore the names of the ligands, i.e. the names of the MOL2-structure files, that shall be considered for the transformation, lig_name_v0_struct and lig_name_v1_struct are defined. The corresponding ligand aliases lig_alias_v0 and lig_alias_v1 are used as ID's and will appear e.g. in ligand residue names and in file names. The softcore_mask's define the soft-core atoms, i.e. the atoms that shall be changed, "appear" or "disappear", i.e. are decoupled, upon transfer of ligand L51c into ligand L51d.

Step 1: Preparation of input coordinate and topology files

To request setup of coordinate and topology files for the TI simulations the flags ti_simulation_setup and prepare_inpcrd_prmtop need to be set to 1.

A: Before the actual input files are prepared, an atom association list is created in which the atoms that are structurally equivalent in both ligands are listed as pairs. The identification of equivalent atoms is necessary because all common atoms of the ligands need to be listed in the same order in the coordinate files later. Therefore a correct atom association list is essential for a successful setup of the simulations. Usually this step is conducted without any user intervention.

B: When the atom association list has been created, the actual coordinate and topology files are generated. Files for the end state of the transformation, that is L51d in the present case, are prepared based on coordinate and topology files of the start state. Thus, coordinate files of the solvated ligand L51c (lig_inpcrd_v0) and of the solvated complex (com_inpcrd_v0) need to be provided. The coordinate files are obtained here from the final equilibration step of the MD simulations prepared (and run) in section 1 of this tutorial. The corresponding topology files used for the simulations are present in the cryst folders of the simulation directories and are specified in the command file under lig_prmtop_v0 and com_prmtop_v0.
As the protein Factor Xa is not a monomer, but consists of two chains, the end of the first chain is defined under chain_termini. Other specific receptor features can be handled via setting of additional_library, additional_frcmod, and SSbond_file, which have been described before (see section 1).

Running FEW:
After the basic input/output path, i.e. the path /home/user/tutorial here, has been changed to the path where the tutorial files are stored on your local system, FEW can be invoked by:

perl $FEW/FEW.pl TI /home/user/tutorial/cfiles/TI_setup_L51c_d.in

When you run FEW using the command-file given above with ti_simulation_setup and prepare_inpcrd_prmtop set to 1, a new folder called TI_am1 is created in the /home/user/tutorial directory. This folder contains a sub-folder named after the ligand names specified as lig_name_v0_struct and lig_name_v1_struct in the command file. In case of the names given in the command file above the folder will be named L51c_L51d. In this sub-folder all data associated with this specific transformation will be deposited (Figure 7). Currently only the setup folder exists, which contains the files generated during the setup of the coordinate and topology files.

Sub-structure of the transformation specific folder created in the setup of the TI calculations

Figure 7: Schematic depiction of the sub-structure of the transformation specific folder created upon setup of TI calculations with FEW.

Step 2: Setup of TI MD simulations

A: Equilibration
The preparation of files for the equilibration phase of the TI MD simulations can be requested by setting ti_equil to 1. Simulations are prepared without using the Shake algorithm for constraining bonds involving hydrogen atoms. (no_shake=1). The path defined as output_path is also used as Input/Output path for the prepared batch scripts (ti_batch_path). Input files for the equilibration phase of the simulations are created for the nine λ values 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 0.9, using the templates ti_equil_template and ti_prod_templ. Currently the pure equilibration step conducted according to the settings in ti_equi_templ is followed by a fixed 1 ns simulation employing the ti_prod_templ to ensure thorough equilibration. The batch template ti_equi_batch_template needs to be adapted for the local computing architecture before usage by changing the top section of the script before the Fix variables statement as well as the section entitled re-queuing.

When FEW is run by repeating the command from step 1 with the ti_equil flag switched on, a folder called equi is created in the transformation specific folder L51c_L51d (Figure 7). This folder contains two sub-folders, lig for the simulation of the free ligand in solution and com for the simulation of the solvated complex. Each of these folders contains beside the input files for the TI simulations a batch script called run.pbs for job submission via a queuing system. The TI equilibration jobs can be started by:

qsub /home/user/tutorial/TI_am1/L51c_L51d/equi/com/run.pbs

qsub /home/user/tutorial/TI_am1/L51c_L51d/equi/lig/run.pbs

The equilibration steps for the different λ values will then be sequentially processed according to the scheme depicted in Figure 8.

Processing scheme for equilibration and production phase TI simulations

Figure 8: Schematic depiction of the workflow during equilibration and production phase of the TI simulations.

ATTENTION:
The setup of the production TI MD simulations described below can only be started, when the equilibration simulations have been completed for all λ values that shall be considered in the production phase. As running the equilibration simulations can be time consuming, output files with energies recorded during the simulations are provided in the FEW tutorial folder. If you want to use the provided MD output files copy the file link_TIequi_output.sh into your basic input/output folder

cp $FEW/examples/tutorial/link_TIequi_output.sh /home/user/tutorial

and replace the path /home/src/FEW given in the third line by the path under which FEW is installed under your local system. Then execute the script.

/home/user/tutorial/link_TIequi_output.sh

Then exit the shell and check that softlinks have been created in the com and lig folders under /home/user/tutorial/TI_am1/L51c_L51d/equi.

B: Production
The preparation of input files for the production part of the transformation simulations is invoked by setting ti_production to 1. Setup of simulations is requested here for all nine λ steps for which equilibration calculations have been performed. It would also be possible to request production setup only for a part of those λ's for which equilibrations have been run. However, it is essential, that the size of Δλ, i.e. the λ step size, is equivalent for all Δλ's. Consequently, to run simulations only for the λ values 0.2, 0.4, 0,6, and 0.8 would also have been possible.
TI production simulations are started based on the final coordinate files from the equilibration. A precision better than 0.2 kcal/mol (according to student's distribution at a confidence level of 95%) will be used as termination criterion for the simulations. If this criterion is not reached the simulations at all λ values will be terminated after total_ti_prod_tim=1 ns. The batch template file ti_prod_batch_template needs to be adapted in the same way as the ti_equil_batch_template file above and serves then as template file for the setup of batch files for the submission of TI production jobs.

Running FEW:
When the computing environment specific changes have been made, setup of the TI production simulations by FEW can be invoked using a command file in which ti_production is switched on.

perl $FEW/FEW.pl TI /home/user/tutorial/cfiles/TI_setup_L51c_d.in

The setup starts with a check of the equilibration according to the reverse cumulative averaging procedure [12, 13], to ensure that the system is equilibrated and can be used for TI production.
After the checking routine has been completed, you are notified whether the equilibration check was successful. In the present case the equilibration check is passed for all λ's of both ligand and complex system. It can happen that the overall equilibration check is passed, but that the equilibration check failed for individual λ's. In this case the respective equilibrations should be further analyzed to completely rule out the presence of any problems during equilibration that could lower the accuracy of the binding free energy prediction later. A plot of the dV/dλ values versus the simulation time can be valuable for the identification of such problems. Figure 9 shows such a plot of dV/dλ values extracted form the output files and their cummulative average (red line) plotted versus the simulation time. It is obvious that although the dV/dλ values fluctuate considerably, the cummulative average is relatively constant. This indicates that the system is on average already in an equilibrated state, and that only longer sampling will be needed to reach the same precision as at the other λ values. As a control of the precision level in the production runs is ensured via the implemented convergence check, the final structures of the equilibration can be used as input structures for the production in such a case.

Plot of dV/dLambda versus simulation time

Figure 9: Plot of dV/dλ (black) and its cummulative average (red) versus the simulation time. The point in the simulation where the pure equilibration part ends and the 1 ns production part of the equilibration phase starts is indicated by a blue dashed line.

The files for the TI production simulations are deposited in a folder called prod. This folder contains, as the equi folder described above, two sub-folders, lig and com, for the simulations of the ligand in solution and of the solvated complex, respectively. The production simulations can be started by submission of the batch scripts, e.g.:

/home/user/tutorial/TI_am1/L51c_L51d/qsub_prod.sh

Note: Since carrying out TI simulations for two systems at nine λ values is computationally expensive, the output files of the simulations containing the energy values are provided with FEW. If you wish to use these files, please copy the script copy_TIprod_output.sh into the tutorial folder

cp $FEW/examples/tutorial/copy_TIprod_output.sh /home/user/tutorial

and change the path /home/src/FEW to the directory where FEW is installed on your local system. Then execute the script.

/home/user/tutorial/copy_TIprod_output.sh

After completion of the script, exit the shell and check that the output files from TI production were copied into the lig and com folders under /home/user/tutorial/TI_am1/L51c_L51d/prod.

Step 3: Calculation of the relative binding free energy

The difference in binding free energy between L51c and L51d ΔΔG_binding can be computed with FEW in an automated fashion based on the output from TI production simulations. The command file TI_setup3_L51c_d.in contains the parameters required for this task.

TI_setup3_L51c_d.in

@TIW
################################################################################
# Command file for calculating the difference in binding free energy
# between start and end ligand based on TI simulations.
################################################################################
# Location of input and output directories / file(s)
#
# lig_struct_path: Folder containing the ligand input file(s)
# output_path: Basis directory in which all setup and analysis folders will
#              be generated. The directory must be identical with the
#              'output_path' directory used for setup of the MD simulations.
lig_struct_path              /home/user/tutorial/structs
output_path                  /home/user/tutorial

################################################################################
# Parameters required for TI analysis:
#
# ti_ddG: Request calculation of difference in binding free energy.
# charge_method: Charge method that shall be used, either "resp" or "am1"
# lig_name_v0_struct: Name of start-ligand - Must be identical to the name of
#                     the file in the "structs" folder used for generation of
#                     parameter and library files with the common MD setup
#                     functionality of FEW.
# lig_name_v1_struct: Name of end-ligand - Must be identical to the name of
#                     the file in the "structs" folder used for generation of
#                     parameter and library files with the common MD setup
#                     functionality of FEW.
# lig_alias_v0: Alias that shall be used for the identification of the
#               start-ligand. The alias must consist of 3 characters.
# lig_alias_v1: Alias that shall be used for the identification of the
#               end-ligand. The alias must consist of 3 characters.
# dVdL_calc_source: Specification of files considered in computation of ddG:
#                   0: All files available for ligand and complex are considered
#                   X-Y: Where X is the number of the first and Y is the number
#                   of the last production file that shall be taken into account.
#                   If Y=0 all files from X on will be considered.
# ddG_calc_method: Method to be used for the calculation of dG:
#                  1: Conventional numerical integration with interpolation
#                     to lambda=0 and lambda=1.
#                  2: Numerical integration without interpolation
#
ti_ddG                      1
charge_method               am1
lig_name_v0_struct          L51c
lig_name_v1_struct          L51d
lig_alias_v0                LFc
lig_alias_v1                LFd
dVdL_calc_source            0
ddG_calc_method             1

The Input/Output specification section of this command-file is already familiar. Also the ligand specific definitions in the TI analysis part of the file were explained above. The only important new parameters are dVdL_calc_source and ddG_calc_method. The first of these parameters determines, which files from TI production will be regarded for the calculation of ΔΔG_binding. Here we use all files created during TI production. It would, however, also be possible to regard e.g. only dV/dλ values from files recorded after a certain simulation time.
As calculation method the conventional calculation according to the trapezoidal rule with linear extrapolation to the end states λ=0 and λ=1 is used here. For the alternative option please consult the FEW manual and [13].

After FEW has be executed with the TI_setup3_L51c_d.in command file, a new folder called TI_results can be found in the transformation specific directory (Figure 7). This folder contains text-files listing all dV/dλ values regarded in the calculation of ΔΔG_binding, a log-file, and a file called TI_dG.out in which the final results are provided.

The computed difference in free energy of binding between L51c and L51d is 2.00 kcal/mol. Please consider this as a very rough estimate of the binding free energy, since within the short simulation time of 1 ns the dV/dλ values did not reach the desired error limit for all λ's. Especially the errors for the λ's 0.8 and 0.9 in the complex state are still high. One should also keep in mind that the transformation applied here was very small, so that the production simulations could converge relatively fast, for other transformations much longer simulations might be required to obtain good predictions.

SECTION 1: Setup of MD simulations

SECTION 2: MM-PB(GB)SA calculations

SECTION 3: LIE analysis