Generate proper residule topology while using the coordinates from a crystal structure ...

To carry out MD simulations for a protein/ligand or dna/ligand complex, one needs to generate residue topology (prepi, prepc, mol2) for the inhibitor while using the coordinates of the crystal structue to generate the start conformation for MD simulations. The problem is that after HF/6-31G* optimization, the coordinates of the ligand are translated and the atom name information is lost. The following is the tips to solve the problem.

  1. Extract the ligand from the pdb file, add hydrogen atoms with a software package such as sybyl. One may need to modify the sybyl atom and bond types. Suppose it is saved as a mol2 file - crystal.mol2
  2. Run antechamber to produce a gcrt file (antechamber -fi mol2 -fo gcrt -i crystal.mol2 -o input.gau). If the coordinates of cyrstal.mol2 has many effective digitals (-XXX.XXX), one may generate a gzmat file or first translate the coordinate center of crystal.mol2 to (0,0,0) with the translate program, since too many digitals may cause no space between two floats in the Gaussian ESP output and the espgen program may not run properly under such circumstance. Suppose the name of the Gaussian output file is input.gout
  3. Run antechamber to produce an ac file having RESP charges (antechamber -fi gout -fo ac -i input.gout -o -c resp)
  4. Run antechamber to get a RESP charge file (antechamber -fi ac -i -c wc -cf input.crg)
  5. Run antechamber to read in RESP charges to crystal.mol2 (antechamber -fi mol2 -fo mol2 -i crystal.mol2 -o crystal_resp.mol2 -c rc -cf input.crg -j 4 -at gaff)
There are alternative ways to solve the problem, for example, one can first read the atom names in crystal.mol2 into (antechamber -fi ac -fo ac -i -o -ao name -a crystal.mol2 -fa mol2) and then produce a prepi from the (prepgen -i -o input.prepi -f int), then load input.prepi and the whol complex in pdb to leap to produce topology file. It is pointed out that only the atom names, not the order of atom sequence matter when both the residue topology and initial coordinates (in pdb) are read into leap.

Are you sure you don't have these problems ....

  • Do atom names of the molecule unique? You should make atom names unique if you want to generate a prep (prepi and prepc) file.

  • Does your molecule have open valence? The programs, especially atomtype and bondtype may not work properly for open-valence systems.

  • Are you sure the atomic connectivity has no problem? You may first generate an ac file and check if some bonds are missing or unwanted. If it is the case, you may manually revised them.

  • Is your system made up of several isolated fragments? You may not generate prep file for such kind of systems. Please generate residue topology file for each of them.

  • Is your system very big (for example, a protein)? You may not generate some file format (esp. prep) for big system.

  • Have you installed mopac program? To generate am1-bcc charges, you need install mopac508mn or mopac6 or mopac7 or mopac8.

  • If your molecules have some conformational problem (too short and too long distances), try to avoid using formats that atom connectivities are not complete. For example, in parmchk, ac and mol2 formats are better choices than prepi and prepc.

  • Is the input file in pdb format? pdb format sometimes is tricky, please make sure that there is no space in the residue name field and the residue name field should has less than five characters. Please also make sure that the "." in the coordinate fields be in the same places as the Protein Data Bank pdb files do.

Tips on running bondtype :

  • If you think the bond types in mol2 files are correct (according to sybyl mol2 definition), you may choose "-j part". With this option, all aromatic bonds are set to "10" and aromatic single and double bonds are not discriminable. If is OK if one just wants to assign am1-bcc charges since all the parameters for aromatic single (7) and double (8) bond types are identical.

  • If you are not sure the bond types in mol2 files are correct or the input file is an ac file, it is recommended to use the defult option "-j full"

  • Bondtype works for most of organic molecules that are made of C, N, O, S, P, H, F, Cl, Br, I. It may not work well for high-charged molecules and molecules with too many unusual valence states. However, the improved version can handle many difficult molecules and it even works for a molecule like this (the labeled atoms have non-zero formal charges and the net charge is -3.0).

  • If program error happens, you may
    1. double check the structure (the connectivity) and/or
    2. adjust atom valence penalty parameters in APS.DAT, and/or
    3. increase MAXVASTATE and PSCUTOFF (Be caution, apply a large value will make the calculation very slow!) in define.h and recompile bondtype.C

Tips on running espgen :

    Sometimes the coordinates are directly extracted from a pdb file of a biological system and they have many digitals (such as XXX.XXX).The electrostatic potentials produced by Gaussian may not have space to separated two numbers. If this happens, espgen fails to run. To solve this problem, one may centralize the molecule by using the "translate" command and save the translational vectors. After the resp charges being generated, one may restore the original coordinate system with the "translate" command.

If you still have some trouble ....

  • You may manualy revise some troublesome intermediate files and run separated programs one by one. For example, instead of run a single antechamber command ( antechamber -fi pdb -fo prepi -i input.pdb -o output.prepi -c bcc -at gaff -j 4 ) to convert an pdb to prepi file with am1-bcc charge, gaff atom type,

    You may run several separated commands to achieve your purpose:

    antechamber -fi pdb -fo ac -i input.pdb -o -c mul
    am1bcc -i -f ac -o -j 4
    atomtype -i -o -p gff
    prepgen -i -f prepi -output.prepi

  • If you cannot solve the problem yourself, please report program failure to Junmei Wang at