suppose [-mat] [-rot] [-mean] [-sd] [-cmp <cmppdb>] [-fit <fitexpr>] [-calc <calcexpr>] [-pr <atomlist>] <pdb-files>
This is a program written in the NAB molecular manipula- tion lanuguage (http://www.scripps.edu/case/casegroup- sh-2.2.html#sh-2.2) to supercede Bob Diamond's multiple superposition program, "superpose". It overcomes a number of superpose's limitations and undocumented bugs while retaining its functionality as well as adding features and improving some aspects of it, particularly the way atom selections are made for defining regions of the molecule you wish to superimpose and get statistics for. suppose takes structures as pdb files and fits each struc- ture onto every other, computing the fitting error for each pair and optionally, for each residue of each pair. It then uses this complete set of pairwise rms deviations to compute the overall rms pairwise deviation between structures. suppose also computes the mean structure by superimposing all structures onto the first one and then averaging each coordinate of each atom over all the molecules. It then determines the deviation of each structure from this mean coordinate set and reports the overall rms deviation from the mean. The deviation of each structure and overall rms deviation from a user-specified coordinate set can also be obtained by using the -cmp option (i.e. to compare an NMR family to an Xray structure). The program can then write the set of fitted, rotated coordinates and/or the mean structure back as new pdb files. All input files must have the same atoms in the same order.
-mat will write to STDOUT a matrix of all pairwise devia- tions as well as a table containing the deviation of each structure from the mean and optionally, from an arbitrary input structure (see the -cmp options for details). -rot will write out fitted, rotated coordinates to files with ".rot.pdb" in place of the last extension of the input file names. For example, if you speci- fied *.pdb as the input files, the rotated files would be called *.rot.pdb. -mean will cause the mean structure (after superposition) RMS values. I'm not sure error bars on an RMSD are very meaningful, but people have asked for it... -cmp <cmppdb> indicates that each of the input structures should be compared to <cmppdb> similar to the way each one gets compared to the mean structure. Overall RMS from <cmppdb> and optionally (via -mat), the RMSD from each structure is reported. This could be useful for comparing an NMR family to an Xray or some other Brookhaven structure, for example. The -rot option will cause <cmppdb> to be fitted and written out along with the input files. Note that <cmppdb> must be the same molecule as the input structures with the atoms in the same order! -fit <"fitexpr"> indicates that the atoms to be used in the various fits will be specified by the NAB atom expression <fitexpr>. See below. Default is all atoms. It is recommended that <fitexpr> be contained in quotes. -calc <"calcexpr"> indicates the atoms to be used in calculating rms differences will be specified by the NAB atom expression <calcexpr> after the fit. See below. Default is all atoms. It is recommended that <cal- cexpr> be contained in quotes. -pr <"atomlist"> causes suppose to compute per-residue RMSDs for the fit specified by the -fit option (or the default fitting on all atoms of the molecule if the -fit option was omitted). <atomlist> should be just the "atom part" of an NAB atom expression (see below). It will be used as the atom selection upon which the RMSD of each residue will be calculated. Nor- mally, this will be the quoted wildcard character "*", indicating that all atoms of each residue will be used to compute the per-residue RMSDs.
An atom expression is a pattern that matches one or more atom names in a molecule or residue. NAB atom expressions are based on the idea that atoms can be easily specified by three things: the strand they belong to, the residue they belong to, and their name. Therefore, an NAB atom expression consists of three parts: first, a strand part, second, a residue part and third, an atom part. The parts must occur in order and be separated by colons (:). For example, the expression 1:10:CA specifies the alpha carbon of residue 10 on strand 1. Not all three parts of an atom expression are required. If one part is missing, it is assumed that you mean to select everything for that part. Extending the above example, the expression 1::CA specifies the alpha carbons in all residues on the first strand. Notice that there is nothing for the residue part, so all residues were selected. Finally, within each part, the * and ? wildcards and/or the , and - delimiters and the construct can be used to increase the flexibility of atom expressions. For exam- ple, 2:1-10:CA selects the alpha carbons for the first 10 residues of the second strand, while :1-10,ALA:C* selects all carbon atoms in all ALA residues in all strands as well as all carbon atoms in the first 10 residues of all strands. Some more examples of atom expressions: ::C,CA,N Select all atoms with the names C, CA or N in all residues in all strands - typically the peptide backbone. 1:1-10,13,URA:C1' Select atoms named C1' (the glycosyl carbons) in residues 1 to 10 and 13 and any residues named URA in the first strand. :C*[^'] Select all non-sugar carbons in a nucleic acid. The [^'] is an example of a negated character class. It matches any character in the last position except '. ::P,O?P,C[3-5]?,O[35]? The nucleic acid backbone. The P selects phosphorous atoms. The O?P matches phosphate oxy- gens that have various second letters: O1P, O2P or OAP, OBP. The C[3-5]? matches the backbone carbons C3', C4', C5', or C3*, C4*, C5* depending on the character you use to denote "prime" in your structures. The O[35]? matches the backbone oxygens O3', O5' or O3*, O5*.
This program reads pdb files but doesn't necessarrily keep the names and numbers for the residues and/or strands that appeared in the pdb file. Atom names in your pdb file will be preserved, but by convention, NAB and therefore suppose start residue numbering over at the beginning of each strand. Also, strands are labelled as integers beginning at 1, not alphabetically beginning at A. Be careful about this since correct atom expressions rely on the understanding that residues on the second, etc. strands in particular may be renumbered if they weren't already in the pdb file. Example 1: suppose *.pdb would take all files with the .pdb extension in the cur- rent directory, calculate the rms deviation for every pos- sible pair, calculate the mean structure and report the overall rms pairwise deviation and the rms deviation from the mean structure. Example 2: suppose -mat -rot *.pdb Same as example 1, but in addition, the complete matrix of pairwise deviations as well as each structure's deviation from the would be printed to STDOUT. The superimposed structures would be written out as *.rot.pdb Example 3: suppose -mat -rot -mean -fit ":1-10:C,CA,N" *.pdb Similar to example 2, but in addition, the mean structure would be written to the file "mean.pdb" and the superposi- tion would be done considering only the backbone atoms of the first 10 residues of all strands. The deviations would be calcluated using all atoms. Example 4: suppose -mat -rot -mean -fit "::C,CA,N" -calc "::C,CA,N" *.pdb Similar to example 3, but both the fitting and the rmsd calculations would be performed on all of the backbone atoms in the molecule. Example 5: suppose -fit "::C,CA,N" -pr "*" *.pdb This example shows how to use the per-residue RMSD fea- ture. The fit expression indicates that the molecules will be fitted on the backbone atoms, and subsequently, per-residue RMSDs will be computed using all atoms in each residue. Example 6: suppose -fit "::C,CA,N" -pr "C*,N*,O*,S*" *.pdb Similar to example 5, except that the per-residue RMSDs will be computed on the atoms named like "C*,N*,O*,S*". Namely, all the heavy atoms in each residue.
Statistics are printed to STDOUT; rotated coordinates (if requested) are placed in new files that have ".rot.pdb" in place of the input file extension (usually ".pdb").
As soon as the first difference is detected, the program stops and prints an error message. If either atom expression fitexpr or calcexpr match no atoms in the structure the program stops and prints an error message. If less than 2 input files are detected the program stops and prints an error message.
Checks are not made that the atoms in are really in the same order in all structures. This is particularly trou- blesome for the file specified by the -cmp option since presumably the comparison pdb file has been obtained from a different source (i.e. an Xray or Brookhaven structure). Portions of an atom expression can be invalid and without warning the program goes ahead and computes the fit and/or calculation on the valid part of the atom expression. This can be misleading. If certain atom expressions are not contained in quotes, they can be interpreted by the shell and replaced by file- names, etc. If you are unlucky, undesireable behavior such as seg faulting and dumping core will result. Put your atom expressions in quotes.