Fundamentals of LEaP
By Pengfei Li and David Cerutti
Background Information
The LEaP
program is a portal between many chemical structure
file types (.pdb
and .mol2
, primarily), and the Amber
model parameter file types such as .lib
, .prepi
,
parm.dat
, and .frcmod
). Each of the parameter files
contains pieces of information needed for constructing a simulation, whether
for energy minimization or molecular dynamics. Given a complete set of
parameters (the molecular model, which we call a "force field"),
LEaP
will generate an AMBER topology file (generally, the file
extension is ".prmtop
", although ".parm7
" and
".top
" are synonymous for the same file type) and coordinate file
(appropriate file extensions for the products of LEaP
include
".inpcrd
" and ".crd
"). LEaP
functions
within a larger workflow described in Section 1.1 of the
Amber Manual.
The figure below illustrates the information contained in different file
types. A black checkmark in the figure indicates that such files are
relevant to LEaP
. A red checkmark in
the figure indicates the file contains additional information that may not be
relevant to LEaP
. More detailed information about the contents of
some files and their formats can be found on the
AMBER file formats webpage, and
in Section 14.1 of the
AMBER 2016 Manual.
Here we also have a figure for modeling files in CHARMM and GROMACS.
How Does LEaP Work?
The LeAP
program collates parameters in order to make a
complete description of a molecule. To see this in action, just read through
one of the leaprc
files in $AMBERHOME/dat/leap/cmd/
.
Each of them is read as a script: there are declarations of atom types and
residues, as well as calls to load files containing parameters and residue
definitions. For instance, our recommended protein force field is encoded in
leaprc.protein.ff14SB
, a file you'll be calling often. In that
leaprc
, you will find a call to load parm10.dat
(a
"full" force field parameter file) as well as frcmod.ff14SB
(an
auxiliary parameter file which has FORce field MODifications).
In this case, frcmod.ff14SB
supplements parm10.dat
,
but since it is specified later any conflicts will favor the
frcmod
. An alternative, equally modern protein force field can be
found in leaprc.protein.ff15ipq
, and this file calls only the main
parameter file because all of its parameters were derived at the same time.
In all leaprc
files, there must also be libraries
(lib
files). It's one thing to have parameters, but it won't be
of much help until the units of our molecule are clearly defined in terms of
those parameters. As is indicated in the precedning chart, the residue
libraries contain some of their own parameters: the charges. It may seem like
an odd division of the material, but it owes to the way the Amber force fields
work. There are alternative paradigms for defining a force field, which are
well beyond the scope of this tutorial, but in short applying them would
require a different program.
Once LEaP
knows all of the parameters, it tries to completely
define all aspects of a structure it has been given. There is some degree of
fault tolerance: LEaP
can be presented with incomplete residues,
or requests to simply build residues, and it will comply so long as it knows
what the residues are made of. However, if LEaP
does not know
what a residue is, or more often it encounters atom names inside a residue that
do not quite match its libraries, the program will throw an error message and
you will not get a system ready for simulations until that's fixed. The
coordinates that come out of LEaP
reflect the structure it was
given, after filling out any incomplete residues or constructing what it was
told to make. The topology that comes out of LEaP
is a complete
description of how that system will behave. The simulation is needed because
the system is almost certainly so complex that analytic solutions are not to
be found.
Structure File | Library Files | Parameter Files | Topology | Coordinates | |
Molecules Atom Names Connections Coordinates |
Units (Residues) Atom Names Atom Types Charges Connections Coordinates Masses Bonded Params Nonbond Params |
Atom Types Masses Bonded Params Nonbond Params |
→ → → |
Units (Residues) Atom Names Atom Types Charges Connections Masses Bonded Params Nonbond Params |
Coordinates |
"Standard" residues: AMBER has library files for standard amino
acids, nucleic acids, lipids, and sugars. These libraries are designed to
work with established parameter sets, as defined in their respective
leaprc
files. In Amber16 and later versions, we have defined
leaprc
files for different descriptions of proteins, nucleic
acids, and solvents (including a number of water models), so that you can mix
and match as you need. You include these leaprc
files in your
LEaP
input by simply stating
"source leaprc.protein.ff14SB
" and so forth; can't find the files?
That's because they're stashed away in $AMBERHOME/dat/leap/cmd/
,
but LEaP
knows where to look in order to find the standard
includables much like the C compiler gets run with a default directory for its
standard libraries. Bear in mind that not all combinations of the models we
provide are recommended, but also that we went to the trouble of making these
files because we think that individually they are good options.
Non-standard residues: Library files may exist for nonstandard
residues (for instance, you may find them on the web), but you should always
check their contents and the means by which they were derived. In most cases
users will simply need to created a corresponding .lib
file for
themselves. This process takes the antechamber
program as
described in a
separate tutorial.
The antechamber
program will generate its own .frcmod
files for custom-built residues, typically using the
General Amber Force
Field, but the exact source of parameters is subject to some user control.
The .prepi
files that antechamber
produces easily can
be read by LEaP
to create library files proper, or just treated as
library files themselves. In most cases antechamber
will also
create a .frcmod
file containing custom parameters, which is also
then needed by LEaP
when building the system.
The pdb4amber
program: While we make our best effort to
provide force fields that plug into the structures the PDB and other databases
hold, there are multiple atom naming conventions and other aspects of
structures that prevent us from being universally compatible. To mitigate this
problem, we offer the pdb4amber
program which modifies atom and
residue names into conventions that the rest of our programs will understand.
It's mostly for proteins, but it can be applied to other biomolecules. There
are some limitations to the program, such as a limit of 100,000 atoms owing to
the PDB CONECT
format. The program will provide a list of options
if run with no arguments; a typical application is shown in the example below.
Adding bonds: One of the most common mistakes that can be made in
structure preparation is the failure to bond cysteine disulfide bridges, but
there are other connections to be scrupulous about. If a .pdb
or
.mol2
structure file contains connectivity information that forms
a bridge between two cysteines, things will be all right. However, if only the
coordinates are taken from the file (perhaps you grep for "ATOM"
to eliminate crystallographic waters), the connections would need to be added
manually. If the disulfide bridge is lost, the CYS residues will not be
correctly converted to CYX and instead reduced (hydrogen added to the thiol) in
the simulation. Any bonds can be added to a structure in LEaP
input with the command:
bond <unit>.<residue #>.<atom name>
<unit>.<residue #>.<atom name>
The AMBER Manual
gives a more general description of how to add bonds, but this is the format
you will encounter most often. Make sure to relabel any cysteines that need
participate in disulfide bridges as CYX, not CYS as they appear in the Protein
Data Bank (PDB). The specifications of each atom are not
ambmask
strings, and it is important to note that the residue
numbers you must provide are the positions of residues that appear in
the structure file, not the residue numbers that might be given in that
file. This makes a difference if you have a .pdb
file that starts
its residue numbering at something relevant to biochemists, like CYS 14,
THR 15, GLU 16, CYS 17, … In that case, a disulfide bond between the
(relabeled) CYX 14 and CYX 17 of this structure (read from its parent
.pdb
file as MyUnit) would be specified by: bond MyUnit.1.SG
MyUnit.4.SG
.
The leap.log file: The most important output from leap will be piped
to the terminal, or whatever file you redirect it to (i.e. leap -f
InputFile.txt > OutputFile.txt
), but there's another source of
output that will be constantly appended with verbose output from every
LEaP
session run in the directory: leap.log
. Clear
this file before running LEaP
if you want to focus on the details
of what you are doing. Reading through leap.log
, you will find
every command that you placed in your input file, and much more. It will
really show you how the leaprc
files you reference are taken as
scripts: in fact, virtually every part of their contents could be specified
directly in the LEaP
input file if it weren't alreayd written down
for you to include.
Error Messages: Even advanced AMBER users will make mistakes, but
fortunately LEaP
reports most issues that it encountered. The key
is for all users to read the LEaP
output carefully as the
errors are often easily overlooked. It is far better to spend a minute
checking the output only to find that nothing is wrong than to start a
simulation, run it for two weeks, and then review the outcome only to find that
the system had a subtle problem at the very start.
An Example with LEaP
This simple exercise in creating a solvated protein in water will apply the
principles we have just learned. We will use the snake venom toxin fasciculin,
1FSC.pdb, the ff14SB force
field to describe the protein, and the SPC/E water model. Because fasciculin
contains four disulfide bridges, we must rename the CYS residues participating
in such interactions (all of them, it turns out) to CYX. The CONECT records in
1FSC.pdb will take care of the bonds and CYS renaming when we apply
pdb4amber
. The output of that operation is
here.
There are two atoms, marked "UNL" in the structure file, which we will
simply have to do away with--they are not part of the protein or any ligand,
and are probably just a problem for the crystallographic refinement. The final
.pdb
file, before we apply tleap
, is
here. The protein and all other
crystallographic waters can be seen in the figure below.
In order to solvate the system, we will make use of the
solvateOct
command to put this thing in a regular, truncated
octahedral box. This is the closest thing to a sphere that Amber has, and the
most efficient way in terms of space (and thus particle count) to surround a
biomolecule with water. In the early days of MD we often did "shoe boxes" with
dimensions dictated by the shape and orientation of the biomolecule. The
trouble with this approach comes when simulating something that can tumble, and
the timescales our simulations routinely reach nowadays will pretty much
guarantee that will happen. In our simulation there will be periodic images of
the protein arrayed in an egg-crate layout (that's how the truncated
octahedron stacks), and we do not want a situation where the thing could touch
an image of itself. The settings below would be pretty good for running a
simulation with a 9 or 10Å cutoff; a buffer of 14Å is used, but the
water in that buffer region is going to settle somewhat around the protein so
the box volume will shrink a bit in the first 100-200ps of dynamics after a
barostat comes into play.
Another consideration: the fasciculin is charged, and the best practice is
to have a system that is overall neutral. To compensate for the +4 charge, we
will add four chloride ions with the addIons
command. The
complete LEaP
input is here:
source leaprc.protein.ff14SB
source leaprc.water.spce
fasciculin = loadPdb "1fsc_amb.pdb"
solvateOct fasciculin SPCBOX 14.0
addIons fasciculin Cl- 4
saveAmberParm fasciculin solvated_1fsc.top solvated_1fsc.crd
quit
The output from this exercise can be found
here, and the leap.log
file is
here. The resulting
topology and
coordinates can now be fed into the
initial minimization for a typical MD simulation. Here is the system we
made, showing the octahedral shape of the box (this is looking head-on at one
of the square faces, even though that may not be obvious from the outline of
the water):
There is a tutorial about using the AMBER force field in CHARMM. In this way users can take advantage of the functions provided by the CHARMM software package. Interested users can check this link.