STARANISO anisotropy & Bayesian estimation server.

The STARANISO Server

Anisotropy of the Diffraction Limit
and
Bayesian Estimation of Structure Amplitudes

IMPLEMENTATION DETAILS of the DEBYE and STARANISO software

This server uses the DEBYE and STARANISO programs to perform some or all (depending on the selections made by the user) of the following 15 program steps:

If unmerged data were supplied as input, perform the complete 'unmerged data' protocol, namely steps #1 to #10:

Read an XDS or MTZ reflection file of unmerged intensities. If XDS format is supplied convert to MTZ multi-record format
Run the autoPROC image-scaling module. The final step of this protocol is to take the original unmerged data and to repeat the last scaling & merging step, this time keeping the previously-determined image scales and error model fixed, but with no isotropic diffraction cut-off applied.
Using the merged data from the previous step, determine the anisotropic diffraction cut-off as a mask. Also perform the 'merged data' protocol (steps #11 to #19 below), for later comparison of the results with those from the full 'unmerged data' protocol.
Apply the anisotropic cut-off mask from the previous step to the original unmerged data.
Re-determine the image scales and error model using the masked unmerged data from the previous step.
Apply the new image scales and error model from the previous step to the original unmerged data and output new rescaled unmerged and merged data.
Perform the 'merged data' protocol again (steps #11 to #21 below) using the merged data from the previous step as input, and output the final amplitudes and mask.
Apply the mask from the previous step to the rescaled unmerged data from step #6.
Using the rescaled unmerged data from step #2 determine the merging statistics for the measured data.
Using the masked rescaled unmerged data from step #8 determine the merging statistics for the 'observed' data.

If merged data were supplied, only perform the 'merged data' protocol (steps #11 to #21) using the original data as input:
Read an MTZ reflection file of merged intensities.
Perform an anisotropic diffraction cut-off of the merged intensities.
Determine the anisotropy of the observed intensity distribution.
Renormalize the intensity profile.
Use the anisotropy from step #13 to compute an anisotropic prior of the expected intensity.
Perform Bayesian estimation of structure amplitudes.
Deal with anomalous data.
Correct the amplitudes for anisotropy.
Create a new merged MTZ file containing F and σ(F) columns.
Generate asymmetric unit indices out to the isotropic diffraction limit of the data, add test-set (R_free) flags and merge columns with the new MTZ file.
Copy the resulting MTZ file to the original filename from step 19, omitting 'unobserved' and 'unobservable' reflections where the SA_flag column has a 'missing' (undefined) value. Finally, remove the entire SA_flag column.

The above steps are described in greater detail below:

If unmerged data were supplied as input, perform the 'unmerged data' protocol, namely steps #1 to #10 below:

Read an XDS ASCII or MTZ multi-record reflection file of unmerged intensities; convert the former to MTZ multi-record format.

Run the autoPROC image-scaling module (aP_scale) to determine initial image scales and error model. The rescaled unmerged and merged data and everything downstream that will have had an inappropriate isotropic diffraction cut-off applied, are discarded. This also re-runs AIMLESS on the original unmerged data fixing these image scales and error model, this time with no isotropic diffraction cut-off (since otherwise the scales would be biased by the data now included beyond the initial isotropic cut-off determined by aP_scale). Finally, aP_scale outputs the merged data with no isotropic diffraction cut-off applied.

Run STARANISO on the merged data from the previous step to determine the anisotropic diffraction cut-off: this is written out as a byte mask in CCP4 map format. Also perform the 'merged data' protocol (steps #11 to #19 below): the purpose of this is to allow later comparison of the results of the 'merged data' and 'unmerged data' protocols.
Re-run STARANISO on the original unmerged data using the mask from the previous step, and output rescaled unmerged data with an anisotropic diffraction cut-off applied.
Run AIMLESS again on the masked unmerged data from the previous step, and re-determine the image scales and error model using only the data inside the anisotropic cut-off surface.
Run AIMLESS again on the original unmerged data and apply the new image scales and error model from the previous step and output new rescaled unmerged and merged data.
Perform the 'merged data' protocol again (steps #11 to #19 below) using the merged data from the previous step as input, and output the final amplitudes and mask.
Re-run STARANISO to apply the mask from the previous step to the rescaled unmerged data from step #6.
Run MRFANA on the rescaled unmerged data from step #2 to determine the merging statistics for the measured data.
Run MRFANA again, this time on the masked rescaled unmerged data from step #8 to determine the merging statistics for the 'observed' data.

If merged data were supplied, only perform the 'merged data' protocol using the original data as input:
Read an MTZ reflection file of merged intensities with standard uncertainties, with or without anomalous data, for example as output by the CCP4 AIMLESS program. Ideally this file should not already have had a diffraction cut-off of any kind applied by AIMLESS (or other program), since the appropriate anisotropic cut-off will be determined by the STARANISO program.
Perform an anisotropic diffraction cut-off of the merged intensities, instead of the traditional isotropic cut-off, using a locally-averaged mean I/σ(I) as the cut-off threshold. The local average is calculated within a sphere of reciprocal space centered on each reflection (default radius r* = 0.15Å^-1), and the contributions to the average are weighted by the exponential function w = exp(-4(s*/r*)²) of the reciprocal-space distance s* from the center.
Determine the anisotropy of the observed intensity distribution, corrected where necessary by the systematic absence factor (Wilson [1987]), using either the error-free likelihood function proposed by Popov & Bourenkov [2003], or the Bayesian likelihood function (default) which uses the French-Wilson formalism and which takes experimental errors into account. In either case, by default a precalculated expected intensity profile is used (thanks to Alexander Popov for furnishing this); this assumes an average solvent content so in cases where the actual solvent content is much higher than normal it may not provide an accurate estimate of the contribution of bulk water.
Maximum likelihood optimization of the overall scale and the elements of the overall anisotropic displacement tensor that are not constrained by the point-group symmetry is performed. Note that the default P & B profile was obtained by averaging the observed profiles of a number of protein-only structures so is not strictly applicable to structures containing a substantial proportion of nucleic acid.
Alternatively the user may supply one or more PDB file(s) containing an ensemble of models from which the DEBYE program will calculate the rotationally-averaged intensities used in the profile (Morris et al. [2003]). Ideally, none of the models should have a much lower resolution compared with d_min for the diffraction data (say d_min of a model should be numerically less than roughly d_min + 0.5Å of the data). There are two reasons for this: a low resolution model will have fewer observed waters and therefore the estimated bulk water contribution at low values of d* will be incorrect, and extrapolation of the calculated profile to high resolution will also introduce inaccuracies.
A suitable ensemble can be generated by the same procedure used for obtaining model ensembles for Molecular Replacement, e.g. using the MrBUMP and/or BALBES packages. An estimated contribution from bulk water scattering will be added to the calculated averaged intensity profile.
Optionally, renormalize the intensity profile by applying a d*-dependent scale factor determined such that the mean normalized intensity (Z) is 1 in all d* bins. This may help to average out the effect of differences from the averaged profile.
Use the anisotropy from step #13 to compute an anisotropic prior of the expected intensity, i.e. divide the expected intensity obtained from the profile by the scale/anisotropy correction.
Perform Bayesian estimation of structure amplitudes by the method of French & Wilson [1978], but using the anisotropic prior in place of the traditional isotropic prior originally suggested by F & W. STARANISO incorporates subroutines from the Netlib repository, in place of the approximate look-up tables used in TRUNCATE, to compute high-accuracy parabolic cylinder functions (scaled to avoid numerical under/overflow issues: Gil et al. [2006]) and thereby obtain all the required moments.
Input anomalous data are treated differently from non-anomalous data in the Bayesian estimation. If anomalous data are present on input it is naturally assumed that the anomalous differences are statistically significant (otherwise what is the point of keeping the Bijvoet pairs separate?). If this is not the case then the correct course of action is to re-run the merging step, this time also merging the Bijvoet pairs, since this will deal with outliers correctly. Otherwise the Bayesian estimation is performed twice per unique reflection on the separately merged means of I[+] and I[-] (where these are observed), not on the overall merged mean including all I[+] and I[-].
This is because the Bayesian estimation assumes a centric or acentric Wilson distribution as appropriate, but the average of two random variates each with an acentric distribution with different expected values does not necessarily itself have an acentric distribution. Hence it is not correct to perform the Bayesian estimation as currently implemented on the average of two Wilson intensity variates with different expected values. Rather I[+] and I[-] should be separately converted to Fs, and then averaged.
There are further issues concerning the optimal procedure for averaging F[+] and F[-] when they have different standard uncertainties.
Optionally correct the amplitudes for anisotropy.
Create a new merged MTZ file containing F and σ(F) columns (and also anomalous F and σ(F) columns if anomalous I columns were read in). Note that it is formally invalid to take Fs from the Bayesian estimation and square them in a misguided attempt to recover the Is! (needed for example by some twinning tests). Rather, the posterior Is should be estimated by the same procedure as for the posterior Fs. For this reason there is an option to output the posterior intensities (MTZ column labels Ipost, SIGIpost etc.).
The following two steps are run only if the test-set (R_free) flags option is left checked:
Run the UNIQUEIFY script on the new merged MTZ file. Asymmetric-unit indices out to the isotropic diffraction limit are generated, test-set flags are added and column-merged with the new merged MTZ file.
Copy the resulting MTZ file to the original filename from step 19 (using SFTOOLS), omitting both 'unobserved' and 'unobservable' reflections (measured and unmeasured respectively but outside the anisotropic diffraction cut-off surface), where in both cases the SA_flag column has a 'missing' (undefined) value. Finally, remove the entire SA_flag column (to avoid the possibility of confusing some refinement programs).

FILE-NAMING SCHEME used in the 'UNMERGED DATA' protocol

Step ID	Program	Input reflection file	Input & output type(s)		Operation
xdsmtz-mrf	POINTLESS	XDS_ASCII.HKL	XDS	MRF	Convert XDS_ASCII.HKL to MTZ multi-record format.
debye	DEBYE	xdsmtz-mrf	MTZ + PDB	profile	Compute Debye isotropic scattering profile.
iso-merged	aP_scale	xdsmtz-mrf	MRF	MRF + SRF	Initial scaling with an isotropic diffraction cut-off & merging of 'xdsmtz-mrf' unmerged data, with no data cut-off.
stats-meas	MRFANA	aimless_alldata_unmerged	MRF	---	'Measured' merging statistics for uncut merged data.
merged-aniso	STARANISO	aimless_alldata	SRF	SRF + mask	Anisotropic diffraction cut-off & anisotropy correction of 'iso-merged' data; output of initial anisotropic mask.
xpcorr-xds	STARANISO	XDS_ASCII.HKL	XDS	---	Compute XDS profile correlation plot.
masked-aniso-mrf	STARANISO	xdsmtz-mrf	MRF + mask	MRF	Application of initial 'merged-aniso' anisotropic mask to 'xdsmtz-mrf' unmerged data.
masked-scaled	AIMLESS	masked-aniso-mrf	MRF	scales	Scaling of 'masked-aniso-mrf' data.
masked-merged	AIMLESS	xdsmtz-mrf	MRF + scales	MRF + SRF	Application of scales from 'masked-scaled' step to 'xdsmtz-mrf' unmerged data (no cut of the data), and final merging; also Ihalfx columns from MRFANA appended to the merged output file.
aniso-merged	STARANISO	masked-merged	SRF	SRF + mask	Final anisotropic diffraction cut-off and anisotropy correction of merged data from 'masked-merged' step; output of updated anisotropic mask.
aniso-masked-mrf	STARANISO	masked-merged-mrf	MRF + mask	MRF	Application of updated 'aniso-merged' mask to unmerged data from 'masked-merged' step.
stats-obs	MRFANA	aniso-masked-mrf	MRF	---	'Observed' merging statistics for masked & merged data.

NOTES for above table:

At each step the filenames of all files output by AIMLESS or STARANISO in that step share the common prefix '<job ID>‑SWS‑<step ID>', or alternatively '<user prefix>‑<step ID>' if the user has set a prefix.
MRF = multi-record (unmerged) MTZ reflection format, SRF = single-record (merged) MTZ format.
The 'xdsmtz-mrf', 'debye' and 'xpcorr-xds' steps are run only if the required input file has been uploaded.
All STARANISO steps read the scattering profile created by the 'debye' step if that step was run.
The 'merged data' protocol, comprising the 'debye' and 'merged-aniso' steps of the 'unmerged data' protocol above (except no mask is output), applies an isotropic diffraction cut-off in the scaling and an anisotropic cut-off of the data after merging (hence the mnemonic for this step = 'merged-aniso').
The critical distinguishing feature of the 'unmerged data' protocol is the use of an anisotropic diffraction cut-off (mask) in the scaling step before merging (hence the mnemonic for the step that outputs the final merged reflection file in the 'unmerged data' protocol = 'aniso-merged'). There is also an anisotropic cut-off of the data after merging.
Both protocols finish with anisotropy (Debye-Waller) correction and Bayesian estimation of Fs using an anisotropic prior.
The above scheme applies to newly-submitted jobs; there are small differences in the case of jobs run prior to the filename change-over.

REFERENCES

French, S. & Wilson, K.S. (1978) "On the treatment of negative intensity observations." Acta Cryst. A34, 517-525. See also: "Bayesian treatment of negative intensity measurements in crystallography" .

Gil, A., Segura, J. & Temme, N.M. (2006) "Algorithm 850: Real parabolic cylinder functions U(a,x), V(a,x)." ACM Transactions on Mathematical Software (TOMS). 32, 102-12. See also: "Computing the real parabolic cylinder functions U(a,x), V(a,x)".

Morris, R.J., Blanc, E. & Bricogne, G. (2003) "On the interpretation and use of <|E|²>(d*) profiles." Acta Cryst. D60, 227-40.

Popov, A.N. & Bourenkov, G.P. (2003) "Choice of data-collection parameters based on statistical modelling." Acta Cryst. D59, 1145-53.

Wilson, A.J.C. (1987) "Treatment of enhanced zones and rows in normalizing intensities." Acta Cryst. A43, 250-2.

The STARANISO Server

Anisotropy of the Diffraction Limitand Bayesian Estimation of Structure Amplitudes

IMPLEMENTATION DETAILS of the DEBYE and STARANISO software

FILE-NAMING SCHEME used in the 'UNMERGED DATA' protocol

REFERENCES

Anisotropy of the Diffraction Limit
and
Bayesian Estimation of Structure Amplitudes