1.4 Building Protein Systems in Explicit Solvent
By Abigail Held1 and Maria Nagan2
1from Bill Miller III's lab, Truman State University, 2Stony Brook University
Table of Contents
IntroductionLO1: Use VMD Visualize Software to Examine a Protein Structure
LO2: Evaluate and Analyze the Protein Structure to be Imported into Amber
#1) Non-Standard Residues
#2) Metals
#3) Experimental methods noted in the paper associated with the PDB
#4) Solvent molecules or crystallization buffer
#5) Missing electron density (amino acids)
#6) Disulfide Bonds
#7) Protonation States
LO3: Use LEaP to Build a Protein System in Explicit Solvent
References
Learning Outcomes
Introduction
This tutorial will go over how to build a protein system using AmberTools v20. It is targeted toward beginners; however it may also be useful to more experienced users who are building a protein system for the first time. This tutorial requires AmberTools v20 and VMD be properly installed on a working Linux/Unix computer.Learning Outcome 1: Use VMD Visualization Software to Examine the Protein Structure
AmberTools is used through Unix command line, which is accessed via a terminal window. Open a terminal now on your computer. For help learning the command line in Unix, see:1. Make a directory for your tutorial.
[Username@computer ~]$ mkdir tutorial
[Username@computer ~]$ cd tutorial/To see the path to your working directory, you can use pwd (print working directory).
[Username@computer tutorial] pwd ~/Username/tutorial/
2. Download a pdb file from the PDB Databank.
The Protein Data Bank (PDB), found at http://rcsb.org/pdb, is a central repository of experimentally determined structures. A PDB is a file format that contains 3D coordinates for a biomolecule that were determined using an experimental method such as X-ray crystallography, cryo-EM, or NMR spectroscopy. Each PDB has a unique code, called a PDBID, that allows it to be found and referenced quickly. When selecting a PDB for your protein of interest, do not select the first PDB that you see. Consider the resolution, the sequence, and the presence (or absence) of inhibitors, cofactors, or substrates. Several studies solve mutated proteins, or solve it in complex with small molecules. Consider what it is that you are trying to model and choose a PDB accordingly.
In this tutorial, we will be using the x-ray crystal structure of the human RAMP1 extracellular domain (PDBID: 2YX8).1 Go to the
protein data bank and download the PDB with the PDBID 2YX8.
- Enter 2YX8 into the search bar. You will be brought to a page with information such as the reference, a picture
of the structure, the sequence, and other information.
- You will be brought to a page with information such as the reference, a picture
of the structure, the sequence, and other information. Download the PDB file by clicking on the blue "Download Files" icon in the upper right hand corner and choosing "PDB Format".
3. Move 2yx8.pdb into your tutorial directory.
Your web browser probably has a default location where it places downloaded files, such as "Downloads".
[Username@computer]$ mv ~/username/Downloads/2yx8.pdb ~/username/tutorial/
4.Open VMD and load 2yx8.pdb into VMD.
The first thing you should always do when beginning to prepare a PDB is visualize it using a visualization softwere. Here, we use VMD, the tutorials for which can be found under VMD Visualization Software if you are unfamiliar. You can also use Chimera-X, the tutorials for which can be found under Chimera-X.
- Open VMD.
- Choose Select File, New Molecule, choose your 2yx8.pdb file, and select the file type: PDB.
- You can also open your PDB directly on Windows and Linux by typing:
vmd 2yx8.pdbThere will be three windows that open. The first is VMD main, which is basically your control panel. The second is the VMD terminal. The third is the VMD display, which presently should look like this.
A few VMD functions
First, in VMD Main, hover the cursor over the "Display" tab. This will open a menu where you can alter these display settings. For this tutorial, set the projection to orthographic, turn off depth cueing, and turn off axes.
Learning Outcome 2: Evaluate and Analyze the Protein Structure to be Imported into Amber
You may have noticed that the PDB does not contain any hydrogens. This is because X-ray methods cannot resolve hydrogens, so they are not included in X-ray structures. 2 Do not worry about this, as hydrogens will be added later.PDBs are never ready for use in MD simulations as they come! They require several modifications before they can be used. Below is a checklist of what to look for in your protein PDB before using it for MD simulation. The procedure for dealing with items marked with a * will be covered in this tutorial. Anything not marked is beyond the scope of a beginning tutorial, though they should still be considered.
1. Non-Standard Residues
2. Metals
3. Experimental methods noted in the paper associated with the PDB
4. Solvent molecules or crystallization buffer
5. Missing electron density (amino acids)
6. Disulfide Bonds
7. Protonation States
#1) Non-standard Residues
Non-standard residues refer to any residue in your PDB that is not a standard amino acid. These include cofactors (NADH, heme, etc.), non-standard amino acids (hydroxyproline,etc.), and bound inhibitors or substrates. If you have them, you should consider how you want to model these residues. In the case of small organic molecules that do not contain metals or metalloids, this could be as simple as using antechamber (see Tutorial 2.1 Simulating a pharmaceutical compound using antechamber and the Generalized Amber Force Field). In the case of more complicated non-standard residues, you may have to use more advanced methods.#2) Metals
Dealing with metals can be very complex, and you should carefully consider how to model them if you have one. There are parameters for monovalent and divalent ions listed in the Amber 2020 Manual in Section 3.1.1. as well as in $AMBERHOME/dat/leap/lib and $AMBERHOME/dat/leap/parm.#3) Experimental methods noted in the paper associated with the PDB*
You should always read the paper associated with your PDB. Read the 2YX8 paper by Yokoyama and coworkers and note any important structural features (i.e. disulfide bonds). Also note the experimental procedure(s) used to solve the structure. Here, multi-wavelength anomalous dispersion (MAD) was used, which involves replacing methionine residues with selenomethionine. We will need to edit the PDB file to fix this.
Remember that you are trying to simulate reality, which is not necessarily emulated in the PDB!a. Make a copy of 2YX8 to perform the changes on with cp:
cp 2yx8.pdb 2yx8_fixedMET.pdb
Each amino acid residue in the PDB file has a three letter residue name (resname) which corresponds to the three-letter code. Residues like selenomethionine will have different residue names.
b. Open your PDB file using vi or another text editor.
If you are unfamiliar with the use of vi, see this tutorial.
c. Look for the MODRES
In the PDB heater information right under the sequencing data, 2yx8.pdb contains MODRES records that list selenomethionine (MSE) residues that need to be modified to methionine (Residue 48 and 76). Look through the PDB for the first residue with the MSE resname listed in the MODRES records. It should look like this.
d. Clean up the file and replace MSE with MET
- The HETATM tag in the first column means that these residues will not be connected to the surrounding amino acids via a bond by default. Find/replace HETATM to "ATOM" so that the sequence will be continuous. Be sure to keep the two spaces after the ATOM so the rest of the columns will be lined up.
- The second column is the atom number. Do not edit this!
- The third column is the name of the atom. For example, CA is the alpha carbon. Because methionine does not contain selenium, we need to change this atom to sulfur. Edit the atom designation from SE to SD (the atom name of the sulfur atom in methionine). Change the SE in the last column to an S as well.
- The fourth column is the resname that was mentioned above. Change all of the MSE entries to MET.
After making these changes, you should have this.
- Make the same changes to the other Met residue listed in the MODRES records and save your changes.
#4) Solvent molecules or crystallization buffer*
On several occasions, solvent or crystallization buffer may have been crystallized along with the protein and are included in the PDB. In 2yx8, for example there are several crystallographic waters. Because we are preparing an explicit solvent system, we can keep these. Sometimes, other solvents or phosphates will be present. These do not affect the function of the protein, so they can be removed.#5) Missing Electron Density (amino acids)
Sometimes, the researchers who solved a PDB could not get a clear enough picture of certain amino acids. This means that the PDB is missing electron density, and you have to model in the missing residues. To check if you have this problem:- You can use VMD to show the protein backbone (using new cartoon; see Learning Outcome 1) and look for any gaps.
- Alternatively, you can look in the PDB file itself for missing residues. If you are missing residues 15-18, for example, you will see information on resid 14 and then resid 19.
Check 2yx8_fixedMET.pdb for missing electron density using either (or both) of these methods.
#6) Check for disulfide bonds*
We read the paper and saw the protein in VMD (Learning Outcome 1), so we know that this protien contains disulfide bonds. Cysteines involved in disulfide bridges have a special Amber resname (CYX) however these will not be in the PDB you got from the Protein Data Bank.
a. Find the cysteines in the disulfide bond.
You can find the cysteines involved in disulfide bridges listed in the SSBOND records near the top of the PDB.
Each of these cysteine residues must have their CYS resnames changed to CYX in your PDB. This is just like how we edited MSE to MET above, except it's less complicated because you only need to change the resname.
b. Make a new copy of your PDB with a _fixedCYS tag.
cp 2yx8_fixedMET.pdb 2yx8_fixedMET_fixedCYS.pdb
c. Use a text editor (like vi) to change residues named "CYS" involved in a disulfide bond to "CYX".
We are not entirely done dealing with disulfide bonds yet, however this is all we can do until we use LEaP to build the system (Learning Outcome 3).#7) Check protonation states*
Remember that our end goal is to simulate reality, and that may not necessarily be reflected in the PDB. One of the big examples of this is protonation states. Several proteins contain amino acids with non-standard protonation states. For example, aspartate proteases have a protonated Asp residue in their active site. You should know your protein and how it functions before you try to simulate it, and you should know if it requires any non-standard protonation states. If so, you will have to edit your PDB.
Protein PDBs that were solved with an X-ray method do not contain hydrogens because the method is not capable of resolving them. LEaP automatically adds hydrogens to these PDBs based off of optimal hydrogen bonding while following standard protonation states. Therefore, the amino acid with a non-standard protonation state will get protonated incorrectly if you do not make the necessary resname change(s). For example, in a PDB for an Asp protease, the protonated Asp will come with the resname ASP, just like all the other aspartates that are carboxylates. This will cause LEaP to protonate it as if it were a regular aspartate. To prevent this, you would have to change your non-standard Asp's resname to ASH (the resname for protonated Asp). With the correct resname, LEaP will correctly protonate your amino acids. Table 1 shows the resnames for some common protonation states.
Table 1. AMBER Resnames of Common Non-standard Protonation StatesNon-standard Protonation Form |
AMBER Resname |
Protonated/uncharged Asp |
ASH |
Protonated/uncharged Glu |
GLH |
Deprotonated/uncharged Lys |
LYN |
His protonated at epsilon position |
HIE |
His protonated at delta position |
HID |
Charged His (protonated at both positions) |
HIP |
Deprotonated Cys or Cys bound to a metal |
CYM |
Cys involved in disulfide bridge |
CYX |