| The STARANISO Server
Anisotropy of the Diffraction Limit and
Bayesian Estimation of Structure Amplitudes
|
|
IMPLEMENTATION DETAILS of the DEBYE and STARANISO software
This server uses the DEBYE and STARANISO programs to perform some
or all (depending on the selections made by the user) of the following
15 program steps:
If unmerged data were supplied as input, perform the complete
'unmerged data' protocol, namely steps #1 to #10:
- Read an XDS or MTZ reflection file of unmerged
intensities. If XDS format is supplied convert to MTZ multi-record
format
- Run the autoPROC image-scaling module.
The final step of this protocol is to take the original unmerged data
and to repeat the last scaling & merging step, this time keeping the
previously-determined image scales and error model fixed, but with no
isotropic diffraction cut-off applied.
- Using the merged data from the previous step,
determine the anisotropic diffraction cut-off as a mask. Also
perform the 'merged data' protocol (steps #11 to #19 below), for later
comparison of the results with those from the full 'unmerged data'
protocol.
- Apply the anisotropic cut-off mask from the
previous step to the original unmerged data.
- Re-determine the image scales and error model
using the masked unmerged data from the previous step.
- Apply the new image scales and error model from
the previous step to the original unmerged data and output new rescaled
unmerged and merged data.
- Perform the 'merged data' protocol again (steps
#11 to #19 below) using the merged data from the previous step as
input, and output the final amplitudes and mask.
- Apply the mask from the previous step to the
rescaled unmerged data from step #6.
- Using the rescaled unmerged data from step #2
determine the merging statistics for the measured data.
- Using the masked rescaled unmerged data from
step #8 determine the merging statistics for the 'observed'
data.
If merged data were supplied, only perform the 'merged data'
protocol (steps #11 to #19) using the original data as input:
- Read an MTZ reflection file of merged
intensities.
- Perform an anisotropic diffraction cut-off of
the merged intensities.
- Determine the anisotropy of the observed
intensity distribution.
- Renormalize the intensity profile.
- Use the anisotropy from step #13 to compute an
anisotropic prior of the expected intensity.
- Perform Bayesian estimation of structure
amplitudes.
- Deal with anomalous data.
- Correct the amplitudes for anisotropy.
- Create a new MTZ file containing F and
σ(F) columns.
The above steps are described in greater detail below:
If unmerged data were supplied as input, perform the 'unmerged
data' protocol, namely steps #1 to #10 below:
- Read an XDS ASCII or MTZ multi-record reflection file of unmerged
intensities; convert the former to MTZ multi-record format.
- Run the autoPROC image-scaling module (aP_scale) to determine
initial image scales and error model. The rescaled unmerged and
merged data and everything downstream that will have had an
inappropriate isotropic diffraction cut-off applied, are discarded.
This also re-runs AIMLESS on the original unmerged data fixing these
image scales and error model, this time with no isotropic diffraction
cut-off (since otherwise the scales would be biased by the data now
included beyond the initial isotropic cut-off determined by
aP_scale). Finally, aP_scale outputs the merged data with no
isotropic diffraction cut-off applied.
- Run STARANISO on the merged data from the previous step to determine
the anisotropic diffraction
cut-off: this is written out as a byte mask in CCP4 map format.
Also perform the 'merged data' protocol (steps #11 to #19 below): the
purpose of this is to allow later comparison of the results of the
'merged data' and 'unmerged data' protocols.
- Re-run STARANISO on the original unmerged data using the mask from
the previous step, and output rescaled unmerged data with an anisotropic
diffraction cut-off applied.
- Run AIMLESS again on the masked unmerged data from the previous
step, and re-determine the image scales and error model using only the
data inside the anisotropic cut-off surface.
- Run AIMLESS again on the original unmerged data and apply the new
image scales and error model from the previous step and output new
rescaled unmerged and merged data.
- Perform the 'merged data' protocol again (steps #11 to #19 below)
using the merged data from the previous step as input, and output the
final amplitudes and mask.
- Re-run STARANISO to apply the mask from the previous step to the
rescaled unmerged data from step #6.
- Run MRFANA on the rescaled unmerged data from step #2 to determine
the merging statistics for the measured data.
- Run MRFANA again, this time on the masked rescaled unmerged data
from step #8 to determine the merging statistics for the 'observed'
data.
If merged data were supplied, only perform the 'merged data'
protocol using the original data as input:
- Read an MTZ reflection file of merged intensities with
standard uncertainties, with or without anomalous data, for example as
output by the CCP4 AIMLESS
program. Ideally this file should not already have had a
diffraction cut-off of any kind applied by AIMLESS (or other program),
since the appropriate anisotropic cut-off will be determined by the
STARANISO program.
- Perform an anisotropic
diffraction cut-off of the merged intensities, instead of the
traditional isotropic cut-off, using a locally-averaged mean
I/σ(I) as the cut-off threshold. The
local average is calculated within a sphere of reciprocal space centered
on each reflection (default radius r* = 0.15Å-1),
and the contributions to the average are weighted by the exponential
function w = exp(-4(s*/r*)2) of the
reciprocal-space distance s* from the center.
- Determine the anisotropy of the observed intensity
distribution, corrected where necessary by the systematic absence factor
(Wilson [1987]), using either the error-free likelihood function
proposed by Popov & Bourenkov [2003], or the Bayesian likelihood
function (default) which uses the French-Wilson formalism and which
takes experimental errors into account. In either case, by default
a precalculated expected intensity profile is used (thanks to Alexander
Popov for furnishing this); this assumes an average solvent content so
in cases where the actual solvent content is much higher than normal it
may not provide an accurate estimate of the contribution of bulk water.
Maximum likelihood optimization of the overall scale and the elements
of the overall anisotropic displacement tensor that are not constrained
by the point-group symmetry is performed. Note that the default P &
B profile was obtained by averaging the observed profiles of a number of
protein-only structures so is not strictly applicable to structures
containing a substantial proportion of nucleic acid.
Alternatively the user may supply one or more PDB file(s) containing
an ensemble of models from which the DEBYE program will calculate the
rotationally-averaged intensities used in the profile (Morris
et al. [2003]). Ideally, none of the models should have a
much lower resolution compared with dmin for the
diffraction data (say dmin of a model should be
numerically less than roughly dmin + 0.5Å of the
data). There are two reasons for this: a low resolution model will
have fewer observed waters and therefore the estimated bulk water
contribution at low values of d* will be incorrect, and
extrapolation of the calculated profile to high resolution will also
introduce inaccuracies.
A suitable ensemble can be generated by the same procedure used for
obtaining model ensembles for Molecular Replacement, e.g. using
the MrBUMP and/or
BALBES
packages. An estimated contribution from bulk water scattering will
be added to the calculated averaged intensity profile.
- Optionally, renormalize the intensity profile by applying a
d*-dependent scale factor determined such that the mean
normalized intensity (Z) is 1 in all d* bins. This
may help to average out the effect of differences from the averaged
profile.
- Use the anisotropy
from step #13 to compute an anisotropic prior of the expected intensity,
i.e. divide the expected intensity obtained from the profile by
the scale/anisotropy correction.
- Perform Bayesian estimation of structure amplitudes by the method of
French & Wilson [1978], but using the anisotropic prior in place of the
traditional isotropic prior originally suggested by F & W.
STARANISO incorporates subroutines from the Netlib repository, in place
of the approximate look-up tables used in TRUNCATE, to compute
high-accuracy
parabolic cylinder functions (scaled to avoid numerical
under/overflow issues: Gil et al. [2006]) and thereby obtain all
the required moments.
- Input anomalous data are treated differently from non-anomalous data
in the Bayesian estimation. If anomalous data are present on input
it is naturally assumed that the anomalous differences are statistically
significant (otherwise what is the point of keeping the Bijvoet pairs
separate?). If this is not the case then the correct course of
action is to re-run the merging step, this time also merging the Bijvoet
pairs, since this will deal with outliers correctly. Otherwise the
Bayesian estimation is performed twice per unique reflection on the
separately merged means of I[+] and I[-] (where these are
observed), not on the overall merged mean including all
I[+] and I[-].
This is because the Bayesian estimation assumes a centric or acentric
Wilson distribution as appropriate, but the average of two random
variates each with an acentric distribution with different expected
values does not necessarily itself have an acentric distribution.
Hence it is not correct to perform the Bayesian estimation as currently
implemented on the average of two Wilson intensity variates with
different expected values. Rather I[+] and I[-]
should be separately converted to Fs, and then averaged.
There are further
issues concerning the optimal procedure for averaging F[+]
and F[-] when they have different standard uncertainties.
- Optionally correct the amplitudes for anisotropy.
- Finally, create a new MTZ file containing F and
σ(F) columns (and also anomalous F and
σ(F) columns if anomalous I columns were read
in). Note that it is formally invalid to take Fs from
the Bayesian estimation and square them in a misguided attempt to
recover the Is! (needed for example by some twinning tests).
Rather, the posterior Is should be estimated by the same
procedure as for the posterior Fs. For this reason there is
an option to output the posterior intensities (MTZ column labels
Ipost, SIGIpost etc.).
FILE-NAMING SCHEME used in the 'UNMERGED DATA'
protocol
Step ID | Program
| Input reflection file | Input & output
type(s) | Operation
|
xdsmtz-mrf | POINTLESS | XDS_ASCII.HKL
| XDS | MRF | Convert XDS_ASCII.HKL to MTZ multi-record format.
|
debye | DEBYE
| xdsmtz-mrf | MTZ + PDB | profile | Compute Debye isotropic
scattering profile.
|
iso-merged | aP_scale | xdsmtz-mrf | MRF
| MRF + SRF | Initial scaling with an isotropic diffraction
cut-off & merging of 'xdsmtz-mrf' unmerged data, with no data cut-off.
|
stats-meas | MRFANA
| aimless_alldata_unmerged | MRF | --- | 'Measured' merging
statistics for uncut merged data.
|
merged-aniso | STARANISO
| aimless_alldata | SRF | SRF + mask | Anisotropic
diffraction cut-off & anisotropy correction of 'iso-merged' data; output
of initial anisotropic mask.
|
xpcorr-xds | STARANISO
| XDS_ASCII.HKL | XDS | --- | Compute XDS profile
correlation plot.
|
masked-aniso-mrf | STARANISO
| xdsmtz-mrf | MRF + mask | MRF | Application of initial
'merged-aniso' anisotropic mask to 'xdsmtz-mrf' unmerged data.
|
masked-scaled | AIMLESS
| masked-aniso-mrf | MRF | scales | Scaling of 'masked-aniso-mrf'
data.
|
masked-merged | AIMLESS | xdsmtz-mrf
| MRF + scales | MRF + SRF | Application of scales
from 'masked-scaled' step to 'xdsmtz-mrf' unmerged data (no cut of the
data), and final merging; also Ihalfx columns from MRFANA appended to
the merged output file.
|
aniso-merged | STARANISO
| masked-merged | SRF | SRF + mask | Final anisotropic diffraction
cut-off and anisotropy correction of merged data from 'masked-merged'
step; output of updated anisotropic mask.
|
aniso-masked-mrf | STARANISO
| masked-merged-mrf | MRF + mask | MRF | Application of updated
'aniso-merged' mask to unmerged data from 'masked-merged' step.
|
stats-obs | MRFANA
| aniso-masked-mrf | MRF | --- | 'Observed' merging
statistics for masked & merged data.
|
NOTES for above table:
- At each step the filenames of all files output by AIMLESS or
STARANISO in that step share the common prefix
'<job ID>‑SWS‑<step ID>', or
alternatively
'<user prefix>‑<step ID>' if the
user has set a prefix.
- MRF = multi-record (unmerged) MTZ reflection format, SRF =
single-record (merged) MTZ format.
- The 'xdsmtz-mrf', 'debye' and 'xpcorr-xds' steps are run only if
the required input file has been uploaded.
- All STARANISO steps read the scattering profile created by the
'debye' step if that step was run.
- The 'merged data' protocol, comprising the 'debye' and
'merged-aniso' steps of the 'unmerged data' protocol above (except no
mask is output), applies an isotropic diffraction cut-off in
the scaling and an anisotropic cut-off of the data after
merging (hence the mnemonic for this step = 'merged-aniso').
- The critical distinguishing feature of the 'unmerged data'
protocol is the use of an anisotropic diffraction cut-off (mask)
in the scaling step before merging (hence the mnemonic for the
step that outputs the final merged reflection file in the 'unmerged
data' protocol = 'aniso-merged'). There is also an anisotropic
cut-off of the data after merging.
- Both protocols finish with anisotropy (Debye-Waller) correction
and Bayesian estimation of Fs using an anisotropic prior.
- The above scheme applies to newly-submitted jobs; there are small
differences in the case of jobs run prior to the filename change-over.
REFERENCES
French, S. & Wilson, K.S. (1978) "On the treatment of negative
intensity observations." Acta Cryst. A34, 517-525.
See also: "Bayesian
treatment of negative intensity measurements in crystallography" .
Gil, A., Segura, J. & Temme, N.M. (2006) "Algorithm 850: Real
parabolic cylinder functions U(a,x),
V(a,x)." ACM Transactions on Mathematical Software
(TOMS). 32, 102-12. See also: "Computing
the real parabolic cylinder functions U(a,x),
V(a,x)".
Morris, R.J., Blanc, E. & Bricogne, G. (2003) "On the interpretation
and use of <|E|2>(d*) profiles."
Acta Cryst. D60, 227-40.
Popov, A.N. & Bourenkov, G.P. (2003) "Choice of data-collection
parameters based on statistical modelling." Acta Cryst. D59,
1145-53.
Wilson, A.J.C. (1987) "Treatment of enhanced zones and rows in
normalizing intensities." Acta Cryst. A43, 250-2.