SANE - Structure Assisted NOE Evaluation

SANE is a perl program which generates restraints from crosspeaks in NOESY spectra. It works with crosspeak lists from both Felix and NMRView and is able to analyse 2D, 3D and 4D NOESY spectra. To reduce the chemical shift ambiguity it uses existing assignments, the average distance in an ensemble of structures, the secondary structure and relative NOE contributions. Any combination of these filters can be used.

The program has been described in: "SANE (Structure Assisted NOE Evaluation): an automated model-based approach for NOE assignment."  BM Duggan, GB Legge, HJ Dyson and PE Wright, J Biomol NMR, 2001 19(4) 321-9. Abstract

Running SANE

To run SANE make sure your PATH variable includes the directory containing the SANE perl script and type sane sane.par, where sane.par is a parameter file containing a variety of information such as chemical shift tolerances, cut-offs for the filtering routines and the names and locations of files. Some example parameter files are provided with the code (NMRView 3D parameter file, Felix 2D, 3D, 4D parameter files).

In this document the parameters used by SANE, and defined in the parameter file, are coloured this dull red. Files required by SANE are;

The nature of the data is specified by data_type being either "Felix" or "NMRView". If using Felix data then SANE also requires a volume file, defined by vol_file. If using NMRView data then SANE requires a sequence file specified by seq_file. To account for folded peaks SANE requires information about the spectrum. For NMRView data the user must specify the upper and lower chemical shifts of each dimension which can be obtained from the Attributes window. For Felix data the user must specify the frequency, spectral_width, reference_point and reference_ppm for each dimension. This information is also used for the conversion of points to ppm necessary with Felix data.

Output

SANE always creates three output files; an OUT file, a UPL file and an ambig MAP file. It can optionally create a new XPK file and an XPK file containing only the crosspeaks for which restraints were not written.

Procedure

SANE follows much the same procedure one would use manually. The following filters can be used and at the moment they will be applied in the order in which they are described below. The entire list of possibilities is written out after the chemical shift filtering. After each filtering step  a message and the reduced list of possibilities is printed if one or more possibilities have been eliminated. If a filtering step does not eliminate any possibilities then you won't see any output from that step. After performing all the filtering if there is more than one possibility then an ambiguous restraint is written. If there is only one possibility then a unique restraint is written, and if there are no possibilities then a message to that effect is printed.

SANE assumes that the spectrum is aliased rather than folded. It will account for aliased chemical shifts using the referencing parameters, in the case of Felix data, or the chemical shifts of the edges of the spectrum, in the case of NMRView data. At the moment there is no way to cope with data that has been folded rather than aliased.

Defining your spectrum

For SANE to properly analyse your spectrum it needs to know what type of experiment it is, the nuclei involved and how the processed data is arranged. expt_flag defines the nature of the experiment as follows; Shared time CN NOESY experiments can be specified by values of expt_flag between 10 and 29. expt_flag is printed in the comment section of each restraint in combination with the crosspeak number. This allows each restraint to be traced back to its own spectrum and crosspeak. The range of values allowed for each type of experiment enables restraints from several different experiments of the same type, e.g. different mixing times, to be used without confusion.

SANE needs to know which dimensions in the transformed matrix correspond to the protons and which to the heteronuclei. It also needs to know what the heteronucleus is. This information is specified with the parameters protonA_dim, heteroA_dim, protonB_dim, heteroB_dim, heteroA and heteroB. For example, an 15N NOESY transformed in the usual manner could be specified by setting

 
heteroA = "N"
protonA_dim = 1
heteroA_dim = 3
protonB_dim = 2
For Shared time CN NOESY spectra the heteronuclear dimension must be specified with "CN".

SANE also requires that you specify which dimension of the matrix contains the directly detected dimension. This is done with the parameter detect_dim. SANE will not fold assignments in the directly detected dimension. This leaves fewer possibilities for the program to consider and allows it to run quicker, but if you folded the directly detected dimension using a spectrometer without digital filters then those folded resonances will not be able to be assigned correctly.

Bells and Whistles

volume to distance conversion
SANE uses bins to convert volumes to distances. It offers two different methods. The first method (bin_volumes) requires the user to define a list of boundaries (Bound1, Bound2, ...) and the distance bins (Bin1, Bin2, ...) they correspond to. The volumes are then converted directly to bins. In the second method (bin_distances) the volumes are converted to distances and then a user defined list of boundaries (Bound1, Bound2, ...) is used to sort the volumes into distance bins (Bin1, Bin2, ...). The second method requires the gradient and intercept from a calibration. Both methods divide the initial volume by a parameter, scaling_constant, and can use up to ten different bins.

water
Peaks at the same chemical shift as the water can be ignored by specifying the water_shift and the water_tolerance. Peaks that fall within this range will be ignored.

ignore and adjust lists
SANE allows the user to define lists of peaks which can be ignored (ignore_list), or their distance bins adjusted to the next lower bin (adjust_list). To ignore or adjust a peak the peak number is included in the appropriate list in the parameter file. Including a peak number in the adjust list more than once will cause it to be adjusted more than once. If an attempt to adjust a peak to a bin longer than the longest bin is made, then the message, "Attempted to adjust to lower than smallest bin", is printed and the restraint is not written.

negative peaks
Setting the flag using_negative_volumes to "yes" causes sane to treat negative volumes as positive volumes. If using_negative_volumes is not set then the usual analysis will be done for negative peaks but a restraint will not be written and the message, "Crosspeak has negative volume", will be printed.

number of ambiguities
The parameter accepted_possibilities controls how many possibilities will be accepted when writing an ambiguous restraint. Setting accepted_possibilities to 0 will include all possibilities in the ambiguous restraint. A value of 1 will write only unique restraints, a value of 2 will write unique restraints and ambiguous restraints involving up to two possibilities, a value of 3 will write unique restraints and ambiguous restraints involving up to three possibilities, and so on.

unassigned resonances
Unassigned resonances may be included as possible assignments. This is done by setting the flag using_unassigned to true and including the names of the unassigned resonances in the assignment list with their chemical shifts set to a value less than -999.  If this is done then SANE will use the BioMagRes Bank shifts, specified in standard_shifts, to determine the chemical shift range for the unassigned resonance. The mean BMRB shift is used with the standard deviation as the tolerance. The number of standard deviations can be specified with number_BMRB_stdevs, which has a default value of 2. Any peak falling within the specified number of standard deviations from the mean chemical shift of the unassigned resonance will include that unassigned resonance as a possible assignment. Leaving the unassigned resonances out of the assignment list will prevent SANE from including unassigned resonances as possibilities.

shortest average distance eliminated by contribution filter
When using both distance and contribution filtering it is possible for the possibility with the shortest average distance to be eliminated by the contribution filter. In such cases a warning message, "Possibility with lowest mean distance eliminated by filters", is printed.
Elimination of the possibility with the shortest average distance can occur if there is a large variation, throughout the ensemble, of the distances associated with possibilities. Long range restraints tend to have larger variations in their distances than short range restraints, so the contribution filter tends to favour them. This is due to the use of the minimum distance, rather than the average distance, when calculating the contribution filter. In the future, NOE contributions may be calculated using the average distance, rather than the minimum, or the option of choosing a method may be added.
 

Future developments

Suggestions for improvements and new features, as well as bug reports, are always welcome. At the moment I hope to (eventually) add the following;
  1. The ability to cope with select filter data.
  2. Closer integration with AMBER to allow the iteration of restraint generation and molecular dynamics to become more automated.
  3. It is still unclear whether contribution filtering works best using the shortest distance in the ensemble or the average distance in the ensemble. It may be that at different times in the structure determination process one method is better than the other. The introduction of a flag allowing the user to choose one method or the other may be useful.

Some other useful scripts

There are a few scripts for converting files and examining structures which may be useful. These are a mixture of perl, awk and sed.
Last updated 2001 August 16 by Brendan Duggan