 The STARANISO Server
Anisotropy of the Diffraction Limit and
Bayesian Estimation of Structure Amplitudes


About the DEBYE and STARANISO software
This server uses the DEBYE and STARANISO programs to perform some
or all (depending on the selections made by the user) of the following
15 program steps:
If unmerged data were supplied as input, perform the complete
'unmerged data protocol', namely steps #1 to #10:
 Read an XDS or MTZ reflection file of unmerged
intensities. If XDS format is supplied convert to MTZ multirecord
format
 Run the autoPROC imagescaling module.
The final step of this protocol is to take the original unmerged data
and to repeat the last scaling & merging step, this time keeping the
previouslydetermined image scales and error model fixed, but with no
isotropic diffraction cutoff applied.
 Using the merged data from the previous step,
determine the anisotropic diffraction cutoff as a mask. Also
perform the 'merged data protocol' (steps #11 to #19 below), for later
comparison of the results with those from the full 'unmerged
protocol'.
 Apply the anisotropic cutoff mask from the
previous step to the original unmerged data.
 Redetermine the image scales and error model
using the masked unmerged data from the previous step.
 Apply the new image scales and error model from
the previous step to the original unmerged data and output new rescaled
unmerged and merged data.
 Perform the 'merged data protocol' again (steps
#11 to #19 below) using the merged data from the previous step as
input, and output the final amplitudes and mask.
 Apply the mask from the previous step to the
rescaled unmerged data from step #6.
 Using the rescaled unmerged data from step #2
determine the merging statistics for the measured data.
 Using the masked rescaled unmerged data from
step #8 determine the merging statistics for the 'observed'
data.
If merged data were supplied, only perform the 'merged data
protocol' (steps #11 to #19) using the original data as input:
 Read an MTZ reflection file of merged
intensities.
 Perform an anisotropic cutoff of the merged
intensities, unless performed previously at step #5.
 Determine the anisotropy of the observed
intensity distribution.
 Renormalize the intensity profile.
 Use the anisotropy from step #13 to compute an
anisotropic prior of the expected intensity.
 Perform Bayesian estimation of structure
amplitudes.
 Deal with anomalous data.
 Correct the amplitudes for anisotropy.
 Create a new MTZ file containing F and
σ(F) columns.
The above steps are described in greater detail below:
If unmerged data were supplied as input, perform the 'unmerged
data protocol', namely steps #1 to #10 below:
 Read an XDS ASCII or MTZ multirecord reflection file of unmerged
intensities; convert the former to MTZ multirecord format.
 Run the autoPROC imagescaling module (aP_scale) to determine
initial image scales and error model. The rescaled unmerged and
merged data and everything downstream that will have had an
inappropriate isotropic diffraction cutoff applied, are discarded.
This also reruns AIMLESS on the original unmerged data fixing these
image scales and error model, this time with no isotropic diffraction
cutoff (since otherwise the scales would be biased by the data now
included beyond the initial isotropic cutoff determined by
aP_scale). Finally, aP_scale outputs the merged data with no
isotropic diffraction cutoff applied.
 Run STARANISO on the merged data from the previous step to determine
the anisotropic diffraction
cutoff: this is written out as a byte mask in CCP4 map format.
Also perform the 'merged data protocol' (steps #11 to #19 below): the
purpose of this is to allow later comparison of the results of the
'merged data' and 'unmerged data' protocols.
 Rerun STARANISO on the original unmerged data using the mask from
the previous step, and output rescaled unmerged data with an anisotropic
diffraction cutoff applied.
 Run AIMLESS again on the masked unmerged data from the previous
step, and redetermine the image scales and error model using only the
data inside the anisotropic cutoff surface.
 Run AIMLESS again on the original unmerged data and apply the new
image scales and error model from the previous step and output new
rescaled unmerged and merged data.
 Perform the 'merged data protocol' again (steps #11 to #19 below)
using the merged data from the previous step as input, and output the
final amplitudes and mask.
 Rerun STARANISO to apply the mask from the previous step to the
rescaled unmerged data from step #6.
 Run MRFANA on the rescaled unmerged data from step #2 to determine
the merging statistics for the measured data.
 Run MRFANA again, this time on the masked rescaled unmerged data
from step #8 to determine the merging statistics for the 'observed'
data.
If merged data were supplied, only perform the 'merged data
protocol' using the original data as input:
 Read an MTZ reflection file of merged intensities with
standard uncertainties, with or without anomalous data, for example as
output by the CCP4 AIMLESS
program. Ideally this file should not already have had a
diffraction cutoff of any kind applied by AIMLESS (or other program),
since the appropriate anisotropic cutoff will be determined by the
STARANISO program.
 Perform an anisotropic
diffraction cutoff of the merged intensities, instead of the
traditional isotropic cutoff, using a locallyaveraged mean
I/σ(I) as the cutoff threshold. The
local average is calculated within a sphere of reciprocal space centered
on each reflection (default radius r* = 0.15Å^{1}),
and the contributions to the average are weighted by the exponential
function w = exp(4(s*/r*)^{2}) of the
reciprocalspace distance s* from the center.
 Determine the anisotropy of the observed intensity
distribution, corrected where necessary by the systematic absence factor
(Wilson [1987]), using either the errorfree likelihood function
proposed by Popov & Bourenkov [2003], or the Bayesian likelihood
function (default) which uses the FrenchWilson formalism and which
takes experimental errors into account. In either case, by default
a precalculated expected intensity profile is used (thanks to Alexander
Popov for furnishing this); this assumes an average solvent content so
in cases where the actual solvent content is much higher than normal it
may not provide an accurate estimate of the contribution of bulk water.
Maximum likelihood optimization of the overall scale and the elements
of the overall anisotropic displacement tensor that are not constrained
by the pointgroup symmetry is performed. Note that the default P &
B profile was obtained by averaging the observed profiles of a number of
proteinonly structures so is not strictly applicable to structures
containing a substantial proportion of nucleic acid.
Alternatively the user may supply one or more PDB file(s) containing
an ensemble of models from which the DEBYE program will calculate the
rotationallyaveraged intensities used in the profile (Morris
et al. [2003]). Ideally, none of the models should have a
much lower resolution compared with d_{min} for the
diffraction data (say d_{min} of a model should be
numerically less than roughly d_{min} + 0.5Å of the
data). There are two reasons for this: a low resolution model will
have fewer observed waters and therefore the estimated bulk water
contribution at low values of d* will be incorrect, and
extrapolation of the calculated profile to high resolution will also
introduce inaccuracies.
A suitable ensemble can be generated by the same procedure used for
obtaining model ensembles for Molecular Replacement, e.g. using
the MrBUMP and/or
BALBES
packages. An estimated contribution from bulk water scattering will
be added to the calculated averaged intensity profile.
 Optionally, renormalize the intensity profile by applying a
d*dependent scale factor determined such that the mean
normalized intensity (Z) is 1 in all d* bins. This
may help to average out the effect of differences from the averaged
profile.
 Use the anisotropy
from step #13 to compute an anisotropic prior of the expected intensity,
i.e. divide the expected intensity obtained from the profile by
the scale/anisotropy correction.
 Perform Bayesian estimation of structure amplitudes by the method of
French & Wilson [1978], but using the anisotropic prior in place of the
traditional isotropic prior originally suggested by F & W.
STARANISO incorporates subroutines from the Netlib repository, in place
of the approximate lookup tables used in TRUNCATE, to compute
highaccuracy
parabolic cylinder functions (scaled to avoid numerical
under/overflow issues: Gil et al. [2006]) and thereby obtain all
the required moments.
 Input anomalous data are treated differently from nonanomalous data
in the Bayesian estimation. If anomalous data are present on input
it is naturally assumed that the anomalous differences are statistically
significant (otherwise what is the point of keeping the Bijvoet pairs
separate?). If this is not the case then the correct course of
action is to rerun the merging step, this time also merging the Bijvoet
pairs, since this will deal with outliers correctly. Otherwise the
Bayesian estimation is performed twice per unique reflection on the
separately merged means of I[+] and I[] (where these are
observed), not on the overall merged mean including all
I[+] and I[].
This is because the Bayesian estimation assumes a centric or acentric
Wilson distribution as appropriate, but the average of two random
variates each with an acentric distribution with different expected
values does not necessarily itself have an acentric distribution.
Hence it is not correct to perform the Bayesian estimation as currently
implemented on the average of two Wilson intensity variates with
different expected values. Rather I[+] and I[]
should be separately converted to Fs, and then averaged.
There are further
issues concerning the optimal procedure for averaging F[+]
and F[] when they have different standard uncertainties.
 Optionally correct the amplitudes for anisotropy.
 Finally, create a new MTZ file containing F and
σ(F) columns (and also anomalous F and
σ(F) columns if anomalous I columns were read
in). Note that it is formally invalid to take Fs from
the Bayesian estimation and square them in a misguided attempt to
recover the Is! (needed for example by some twinning tests).
Rather, the posterior Is should be estimated by the same
procedure as for the posterior Fs. For this reason there is
an option to output the posterior intensities (MTZ column labels
Ipost, SIGIpost etc.).
REFERENCES
French, S. & Wilson, K.S. (1978) "On the treatment of negative
intensity observations." Acta Cryst. A34, 517525.
See also: "Bayesian
treatment of negative intensity measurements in crystallography" .
Gil, A., Segura, J. & Temme, N.M. (2006) "Algorithm 850: Real
parabolic cylinder functions U(a,x),
V(a,x)." ACM Transactions on Mathematical Software
(TOMS). 32, 10212. See also: "Computing
the real parabolic cylinder functions U(a,x),
V(a,x)".
Morris, R.J., Blanc, E. & Bricogne, G. (2003) "On the interpretation
and use of <E^{2}>(d*) profiles."
Acta Cryst. D60, 22740.
Popov, A.N. & Bourenkov, G.P. (2003) "Choice of datacollection
parameters based on statistical modelling." Acta Cryst. D59,
114553.
Wilson, A.J.C. (1987) "Treatment of enhanced zones and rows in
normalizing intensities." Acta Cryst. A43, 2502.