The STARANISO Server

Anisotropy of the Diffraction Limit
and
Bayesian Estimation of Structure Amplitudes



IMPLEMENTATION DETAILS of the DEBYE and STARANISO software


This server uses the DEBYE and STARANISO programs to perform some or all (depending on the selections made by the user) of the following 15 program steps:


    If unmerged data were supplied as input, perform the complete 'unmerged data' protocol, namely steps #1 to #10:

  1. Read an XDS or MTZ reflection file of unmerged intensities.  If XDS format is supplied convert to MTZ multi-record format

  2. Run the autoPROC image-scaling module.  The final step of this protocol is to take the original unmerged data and to repeat the last scaling & merging step, this time keeping the previously-determined image scales and error model fixed, but with no isotropic diffraction cut-off applied.

  3. Using the merged data from the previous step, determine the anisotropic diffraction cut-off as a mask.  Also perform the 'merged data' protocol (steps #11 to #19 below), for later comparison of the results with those from the full 'unmerged data' protocol.

  4. Apply the anisotropic cut-off mask from the previous step to the original unmerged data.

  5. Re-determine the image scales and error model using the masked unmerged data from the previous step.

  6. Apply the new image scales and error model from the previous step to the original unmerged data and output new rescaled unmerged and merged data.

  7. Perform the 'merged data' protocol again (steps #11 to #19 below) using the merged data from the previous step as input, and output the final amplitudes and mask.

  8. Apply the mask from the previous step to the rescaled unmerged data from step #6.

  9. Using the rescaled unmerged data from step #2 determine the merging statistics for the measured data.

  10. Using the masked rescaled unmerged data from step #8 determine the merging statistics for the 'observed' data.


    If merged data were supplied, only perform the 'merged data' protocol (steps #11 to #19) using the original data as input:

  11. Read an MTZ reflection file of merged intensities.

  12. Perform an anisotropic diffraction cut-off of the merged intensities.

  13. Determine the anisotropy of the observed intensity distribution.

  14. Renormalize the intensity profile.

  15. Use the anisotropy from step #13 to compute an anisotropic prior of the expected intensity.

  16. Perform Bayesian estimation of structure amplitudes.

  17. Deal with anomalous data.

  18. Correct the amplitudes for anisotropy.

  19. Create a new MTZ file containing F and σ(F) columns.


The above steps are described in greater detail below:


    If unmerged data were supplied as input, perform the 'unmerged data' protocol, namely steps #1 to #10 below:

  1. Read an XDS ASCII or MTZ multi-record reflection file of unmerged intensities; convert the former to MTZ multi-record format.

  2. Run the autoPROC image-scaling module (aP_scale) to determine initial image scales and error model.  The rescaled unmerged and merged data and everything downstream that will have had an inappropriate isotropic diffraction cut-off applied, are discarded.  This also re-runs AIMLESS on the original unmerged data fixing these image scales and error model, this time with no isotropic diffraction cut-off (since otherwise the scales would be biased by the data now included beyond the initial isotropic cut-off determined by aP_scale).  Finally, aP_scale outputs the merged data with no isotropic diffraction cut-off applied.

  3. Run STARANISO on the merged data from the previous step to determine the anisotropic diffraction cut-off: this is written out as a byte mask in CCP4 map format.  Also perform the 'merged data' protocol (steps #11 to #19 below): the purpose of this is to allow later comparison of the results of the 'merged data' and 'unmerged data' protocols.

  4. Re-run STARANISO on the original unmerged data using the mask from the previous step, and output rescaled unmerged data with an anisotropic diffraction cut-off applied.

  5. Run AIMLESS again on the masked unmerged data from the previous step, and re-determine the image scales and error model using only the data inside the anisotropic cut-off surface.

  6. Run AIMLESS again on the original unmerged data and apply the new image scales and error model from the previous step and output new rescaled unmerged and merged data.

  7. Perform the 'merged data' protocol again (steps #11 to #19 below) using the merged data from the previous step as input, and output the final amplitudes and mask.

  8. Re-run STARANISO to apply the mask from the previous step to the rescaled unmerged data from step #6.

  9. Run MRFANA on the rescaled unmerged data from step #2 to determine the merging statistics for the measured data.

  10. Run MRFANA again, this time on the masked rescaled unmerged data from step #8 to determine the merging statistics for the 'observed' data.


    If merged data were supplied, only perform the 'merged data' protocol using the original data as input:

  11. Read an MTZ reflection file of merged intensities with standard uncertainties, with or without anomalous data, for example as output by the CCP4 AIMLESS program.  Ideally this file should not already have had a diffraction cut-off of any kind applied by AIMLESS (or other program), since the appropriate anisotropic cut-off will be determined by the STARANISO program.

  12. Perform an anisotropic diffraction cut-off of the merged intensities, instead of the traditional isotropic cut-off, using a locally-averaged mean I/σ(I) as the cut-off threshold.  The local average is calculated within a sphere of reciprocal space centered on each reflection (default radius r* = 0.15Å-1), and the contributions to the average are weighted by the exponential function w = exp(-4(s*/r*)2) of the reciprocal-space distance s* from the center.

  13. Determine the anisotropy of the observed intensity distribution, corrected where necessary by the systematic absence factor (Wilson [1987]), using either the error-free likelihood function proposed by Popov & Bourenkov [2003], or the Bayesian likelihood function (default) which uses the French-Wilson formalism and which takes experimental errors into account.  In either case, by default a precalculated expected intensity profile is used (thanks to Alexander Popov for furnishing this); this assumes an average solvent content so in cases where the actual solvent content is much higher than normal it may not provide an accurate estimate of the contribution of bulk water.

    Maximum likelihood optimization of the overall scale and the elements of the overall anisotropic displacement tensor that are not constrained by the point-group symmetry is performed.  Note that the default P & B profile was obtained by averaging the observed profiles of a number of protein-only structures so is not strictly applicable to structures containing a substantial proportion of nucleic acid.

    Alternatively the user may supply one or more PDB file(s) containing an ensemble of models from which the DEBYE program will calculate the rotationally-averaged intensities used in the profile (Morris et al. [2003]).  Ideally, none of the models should have a much lower resolution compared with dmin for the diffraction data (say dmin of a model should be numerically less than roughly dmin + 0.5Å of the data).  There are two reasons for this: a low resolution model will have fewer observed waters and therefore the estimated bulk water contribution at low values of d* will be incorrect, and extrapolation of the calculated profile to high resolution will also introduce inaccuracies.

    A suitable ensemble can be generated by the same procedure used for obtaining model ensembles for Molecular Replacement, e.g. using the MrBUMP and/or BALBES packages.  An estimated contribution from bulk water scattering will be added to the calculated averaged intensity profile.

  14. Optionally, renormalize the intensity profile by applying a d*-dependent scale factor determined such that the mean normalized intensity (Z) is 1 in all d* bins.  This may help to average out the effect of differences from the averaged profile.

  15. Use the anisotropy from step #13 to compute an anisotropic prior of the expected intensity, i.e. divide the expected intensity obtained from the profile by the scale/anisotropy correction.

  16. Perform Bayesian estimation of structure amplitudes by the method of French & Wilson [1978], but using the anisotropic prior in place of the traditional isotropic prior originally suggested by F & W.  STARANISO incorporates subroutines from the Netlib repository, in place of the approximate look-up tables used in TRUNCATE, to compute high-accuracy parabolic cylinder functions (scaled to avoid numerical under/overflow issues: Gil et al. [2006]) and thereby obtain all the required moments.

  17. Input anomalous data are treated differently from non-anomalous data in the Bayesian estimation.  If anomalous data are present on input it is naturally assumed that the anomalous differences are statistically significant (otherwise what is the point of keeping the Bijvoet pairs separate?).  If this is not the case then the correct course of action is to re-run the merging step, this time also merging the Bijvoet pairs, since this will deal with outliers correctly.  Otherwise the Bayesian estimation is performed twice per unique reflection on the separately merged means of I[+] and I[-] (where these are observed), not on the overall merged mean including all I[+] and I[-].

    This is because the Bayesian estimation assumes a centric or acentric Wilson distribution as appropriate, but the average of two random variates each with an acentric distribution with different expected values does not necessarily itself have an acentric distribution.  Hence it is not correct to perform the Bayesian estimation as currently implemented on the average of two Wilson intensity variates with different expected values.  Rather I[+] and I[-] should be separately converted to Fs, and then averaged.

    There are further issues concerning the optimal procedure for averaging F[+] and F[-] when they have different standard uncertainties.

  18. Optionally correct the amplitudes for anisotropy.

  19. Finally, create a new MTZ file containing F and σ(F) columns (and also anomalous F and σ(F) columns if anomalous I columns were read in).  Note that it is formally invalid to take Fs from the Bayesian estimation and square them in a misguided attempt to recover the Is! (needed for example by some twinning tests).  Rather, the posterior Is should be estimated by the same procedure as for the posterior Fs.  For this reason there is an option to output the posterior intensities (MTZ column labels Ipost, SIGIpost etc.).

FILE-NAMING SCHEME used in the 'UNMERGED DATA' protocol

Step IDProgram Input reflection fileInput & output type(s)Operation
xdsmtz-mrfPOINTLESSXDS_ASCII.HKL XDSMRFConvert XDS_ASCII.HKL to MTZ multi-record format.
debyeDEBYE xdsmtz-mrfMTZ + PDBprofileCompute Debye isotropic scattering profile.
iso-mergedaP_scalexdsmtz-mrfMRF MRF + SRFInitial scaling with an isotropic diffraction cut-off & merging of 'xdsmtz-mrf' unmerged data, with no data cut-off.
stats-measMRFANA aimless_alldata_unmergedMRF---'Measured' merging statistics for uncut merged data.
merged-anisoSTARANISO aimless_alldataSRFSRF + maskAnisotropic diffraction cut-off & anisotropy correction of 'iso-merged' data; output of initial anisotropic mask.
xpcorr-xdsSTARANISO XDS_ASCII.HKLXDS---Compute XDS profile correlation plot.
masked-aniso-mrfSTARANISO xdsmtz-mrfMRF + maskMRFApplication of initial 'merged-aniso' anisotropic mask to 'xdsmtz-mrf' unmerged data.
masked-scaledAIMLESS masked-aniso-mrfMRFscalesScaling of 'masked-aniso-mrf' data.
masked-mergedAIMLESSxdsmtz-mrf MRF + scalesMRF + SRFApplication of scales from 'masked-scaled' step to 'xdsmtz-mrf' unmerged data (no cut of the data), and final merging; also Ihalfx columns from MRFANA appended to the merged output file.
aniso-mergedSTARANISO masked-mergedSRFSRF + maskFinal anisotropic diffraction cut-off and anisotropy correction of merged data from 'masked-merged' step; output of updated anisotropic mask.
aniso-masked-mrfSTARANISO masked-merged-mrfMRF + maskMRFApplication of updated 'aniso-merged' mask to unmerged data from 'masked-merged' step.
stats-obsMRFANA aniso-masked-mrfMRF---'Observed' merging statistics for masked & merged data.


NOTES for above table:


REFERENCES

French, S. & Wilson, K.S. (1978) "On the treatment of negative intensity observations." Acta Cryst. A34, 517-525.  See also: "Bayesian treatment of negative intensity measurements in crystallography" .

Gil, A., Segura, J. & Temme, N.M. (2006) "Algorithm 850: Real parabolic cylinder functions U(a,x), V(a,x)." ACM Transactions on Mathematical Software (TOMS). 32, 102-12.  See also: "Computing the real parabolic cylinder functions U(a,x), V(a,x)".

Morris, R.J., Blanc, E. & Bricogne, G. (2003) "On the interpretation and use of <|E|2>(d*) profiles." Acta Cryst. D60, 227-40.

Popov, A.N. & Bourenkov, G.P. (2003) "Choice of data-collection parameters based on statistical modelling." Acta Cryst. D59, 1145-53.

Wilson, A.J.C. (1987) "Treatment of enhanced zones and rows in normalizing intensities." Acta Cryst. A43, 250-2.