suppose - superposition of multiple structures

USAGE

       suppose  [-mat] [-rot] [-mean] [-sd] [-cmp <cmppdb>] [-fit
       <fitexpr>] [-calc <calcexpr>] [-pr <atomlist>] <pdb-files>



DESCRIPTION

       This  is  a program written in the NAB molecular manipula-
       tion   lanuguage   (http://www.scripps.edu/case/casegroup-
       sh-2.2.html#sh-2.2)  to  supercede  Bob Diamond's multiple
       superposition program, "superpose".  It overcomes a number
       of  superpose's  limitations  and  undocumented bugs while
       retaining its  functionality as well  as  adding  features
       and  improving  some  aspects  of it, particularly the way
       atom selections are  made  for  defining  regions  of  the
       molecule  you  wish to superimpose and get statistics for.
       suppose takes structures as pdb files and fits each struc-
       ture  onto  every  other,  computing the fitting error for
       each pair and optionally, for  each  residue  of  each
       pair.   It  then  uses  this  complete set of pairwise rms
       deviations to compute the overall rms  pairwise  deviation
       between structures.

       suppose  also computes the mean structure by superimposing
       all structures onto the first one and then averaging  each
       coordinate  of  each atom over all the molecules.  It then
       determines the deviation of each structure from this  mean
       coordinate  set and reports the overall rms deviation from
       the mean.   The deviation of each  structure  and  overall
       rms  deviation  from  a  user-specified coordinate set can
       also be obtained by using the -cmp option (i.e. to compare
       an NMR family to an Xray structure).  The program can then
       write the set of fitted, rotated  coordinates  and/or  the
       mean  structure  back  as  new pdb files.  All input files
       must have the same atoms in the same order.


OPTIONS

       -mat will write to STDOUT a matrix of all pairwise  devia-
              tions  as  well as a table containing the deviation
              of each structure from  the  mean  and  optionally,
              from  an  arbitrary  input  structure (see the -cmp
              options for details).

       -rot will write out fitted, rotated coordinates  to  files
              with  ".rot.pdb"  in place of the last extension of
              the input file names.  For example, if  you  speci-
              fied  *.pdb  as  the input files, the rotated files
              would be called *.rot.pdb.

       -mean will cause the mean structure (after  superposition)
              RMS values.  I'm not sure error bars on an RMSD are
              very meaningful, but people have asked for it...

       -cmp <cmppdb>
              indicates  that each of the input structures should
              be compared to <cmppdb> similar to the way each one
              gets  compared  to the mean structure.  Overall RMS
              from <cmppdb> and optionally (via -mat),  the  RMSD
              from  each  structure  is  reported.  This could be
              useful for comparing an NMR family to  an  Xray  or
              some  other Brookhaven structure, for example.  The
              -rot option will cause <cmppdb> to  be  fitted  and
              written  out along with the input files.  Note that
              <cmppdb> must be the same  molecule  as  the  input
              structures with the atoms in the same order!

       -fit <"fitexpr">
              indicates  that the atoms to be used in the various
              fits will be specified by the NAB  atom  expression
              <fitexpr>. See below.  Default is all atoms.  It is
              recommended that <fitexpr> be contained in  quotes.

       -calc <"calcexpr">
              indicates  the  atoms to be used in calculating rms
              differences will  be  specified  by  the  NAB  atom
              expression  <calcexpr>  after  the fit.  See below.
              Default is all atoms.  It is recommended that <cal-
              cexpr> be contained in quotes.

       -pr <"atomlist">
              causes suppose to compute per-residue RMSDs for the
              fit specified by the -fit option  (or  the  default
              fitting on  all  atoms  of the molecule if the -fit
              option was omitted).  <atomlist> should be just the
              "atom  part" of an NAB atom expression (see below).
              It will be used as the atom  selection  upon  which
              the  RMSD of each residue will be calculated.  Nor-
              mally, this will be the quoted  wildcard  character
              "*", indicating that all atoms of each residue will
              be used to compute the per-residue RMSDs.



ATOM EXPRESSIONS

       An atom expression is a pattern that matches one  or  more
       atom names in a molecule or residue.  NAB atom expressions
       are based on the idea that atoms can be  easily  specified
       by  three  things:  the strand they belong to, the residue
       they belong to, and their name.  Therefore,  an  NAB  atom
       expression  consists of three parts: first, a strand part,
       second, a residue part and third, an atom part.  The parts
       must  occur  in order and be separated by colons (:).  For
       example, the expression 1:10:CA specifies the alpha carbon
       of residue 10 on strand 1.

       Not all three parts of an atom expression are required.
       If one part is missing, it is assumed  that  you  mean  to
       select  everything  for  that  part.   Extending the above
       example, the expression 1::CA specifies the alpha  carbons
       in all residues on the first strand.  Notice that there is
       nothing  for  the  residue  part,  so  all  residues  were
       selected.

       Finally,  within  each part, the * and ?  wildcards and/or
       the , and - delimiters and the construct can  be  used  to
       increase  the  flexibility of atom expressions.  For exam-
       ple, 2:1-10:CA selects the alpha carbons for the first  10
       residues  of the second strand, while :1-10,ALA:C* selects
       all carbon atoms in all ALA residues  in  all  strands  as
       well  as  all carbon atoms in the first 10 residues of all
       strands.

       Some more examples of atom expressions:

       ::C,CA,N Select all atoms with the names C, CA or N in all
       residues  in all strands - typically the peptide backbone.

       1:1-10,13,URA:C1' Select atoms  named  C1'  (the  glycosyl
       carbons) in residues 1 to 10 and 13 and any residues named
       URA in the first strand.

       :C*[^'] Select all non-sugar carbons in  a  nucleic  acid.
       The  [^']  is an example of a negated character class.  It
       matches any character in the last position except '.

       ::P,O?P,C[3-5]?,O[35]?  The nucleic acid backbone.  The  P
       selects phosphorous atoms.  The O?P matches phosphate oxy-
       gens that have various second letters: O1P,  O2P  or  OAP,
       OBP.   The  C[3-5]? matches the backbone carbons C3', C4',
       C5', or C3*, C4*, C5* depending on the character  you  use
       to  denote "prime" in your structures.  The O[35]? matches
       the backbone oxygens O3', O5' or O3*, O5*.



NAMING AND NUMBERING CONVENTIONS

       This program reads pdb files but doesn't necessarrily keep
       the names and numbers for the residues and/or strands that
       appeared in the pdb file.  Atom names  in  your  pdb  file
       will  be  preserved,  but by convention, NAB and therefore
       suppose start residue numbering over at the  beginning  of
       each  strand.   Also,  strands  are  labelled  as integers
       beginning at 1, not alphabetically  beginning  at  A.   Be
       careful  about this since correct atom expressions rely on
       the  understanding  that  residues  on  the  second,  etc.
       strands  in  particular  may be renumbered if they weren't
       already in the pdb file.


       Example 1:
       suppose *.pdb
       would take all files with the .pdb extension in  the  cur-
       rent directory, calculate the rms deviation for every pos-
       sible pair, calculate the mean structure  and  report  the
       overall  rms pairwise deviation and the rms deviation from
       the mean structure.

       Example 2:
       suppose -mat -rot *.pdb
       Same as example 1, but in addition, the complete matrix of
       pairwise  deviations as well as each structure's deviation
       from the would be printed  to  STDOUT.   The  superimposed
       structures would be written out as *.rot.pdb

       Example 3:
       suppose -mat -rot -mean -fit ":1-10:C,CA,N" *.pdb
       Similar  to example 2, but in addition, the mean structure
       would be written to the file "mean.pdb" and the superposi-
       tion  would be done considering only the backbone atoms of
       the first 10 residues  of  all  strands.   The  deviations
       would be calcluated using all atoms.

       Example 4:
       suppose  -mat  -rot -mean -fit "::C,CA,N" -calc "::C,CA,N"
       *.pdb
       Similar to example 3, but both the fitting  and  the  rmsd
       calculations  would  be  performed  on all of the backbone
       atoms in the molecule.

       Example 5:
       suppose -fit "::C,CA,N" -pr "*" *.pdb
       This example shows how to use the  per-residue  RMSD  fea-
       ture.   The  fit  expression  indicates that the molecules
       will be fitted on the backbone  atoms,  and  subsequently,
       per-residue RMSDs will be computed using all atoms in each
       residue.

       Example 6:
       suppose -fit "::C,CA,N" -pr "C*,N*,O*,S*" *.pdb
       Similar to example 5, except that  the  per-residue  RMSDs
       will  be  computed  on the atoms named like "C*,N*,O*,S*".
       Namely, all the heavy atoms in each residue.



FILES

       Statistics are printed to STDOUT; rotated coordinates  (if
       requested) are placed in new files that have ".rot.pdb" in
       place of the input file extension (usually ".pdb").



DIAGNOSTICS

       As soon as the first difference is detected,  the  program
       stops and prints an error message.

       If  either  atom  expression  fitexpr or calcexpr match no
       atoms in the structure the program  stops  and  prints  an
       error message.

       If  less than 2 input files are detected the program stops
       and prints an error message.



BUGS

       Checks are not made that the atoms in are  really  in  the
       same  order in all structures.  This is particularly trou-
       blesome for the file specified by the  -cmp  option  since
       presumably  the comparison pdb file has been obtained from
       a different source (i.e. an Xray or Brookhaven structure).

       Portions  of an atom expression can be invalid and without
       warning the program goes ahead and computes the fit and/or
       calculation  on  the  valid  part  of the atom expression.
       This can be misleading.

       If certain atom expressions are not contained  in  quotes,
       they can be interpreted by the shell and replaced by file-
       names, etc.  If you  are  unlucky,  undesireable  behavior
       such  as  seg  faulting and dumping core will result.  Put
       your atom expressions in quotes.