The STARANISO ServerAnisotropy of the Diffraction Limit
|
The problem here is that the PDB does not compute the correct value of the completeness for anisotropic data, for the simple reason that it does not currently capture any of the information required to determine the anisotropic cut-off.  This is something that we are currently discussing with them.
The correct anisotropic completeness values are in the table 'Merging statistics table for observed data extracted from the final MRFANA log file' under 'Compl. Ellip.' (i.e. the ellipsoidal completeness). STARANISO defines completeness in the conventional way, i.e. the fraction of reflections inside some data-dependent cut-off surface that were actually measured.  The cut-off surface is defined such that it encloses the set of reflections for which some metric of statistical significance (such as the mean I/σ(I) or CC(1/2)) exceeds a significance threshold set by the user.
The main difference in STARANISO lies in the method of determining the cut-off surface: for data assumed to be isotropic it is done in the standard way by computing the metric in spherical bins so that the cut-off surface is a sphere.  For data assumed to be anisotropic (i.e. the default) it is done by computing the 'moving-average' of the metric within a sphere of predefined radius in reciprocal space centred on each reflection in turn.  For anisotropic data CC(1/2) is not suitable as a metric so only the mean I/σ(I) is used.  Then an ellipsoid is fitted by least squares to the resulting cut-off surface points and the ellipsoid becomes the cut-off surface only for the purpose of estimating the completeness.  Note that there's no reason for the true cut-off surface to be an ellipsoid: an ellipsoid represents only one example of anisotropy (the only constraint on the shape of the cut-off surface is that it is locally smooth and it has at least the point symmetry of the Laue class).
In the absence of the definition of the cut-off ellipsoid (i.e. its semi-axis lengths and directions), the PDB simply assume that anisotropic data are isotropic and use a spherical surface of radius equal to the diffraction limit to compute the completeness.  This means that reflections with statistically non-significant values of the mean I/σ(I) that lie outside the ellipsoid but still inside the limiting sphere are included in the count of statistically-significant reflections, which will underestimate the completeness.
Whether or not the additional anisotropic data that would be rejected by a spherical cut-off will affect map interpretation will obviously depend on whether those data enhance the resolution of the map in a useful direction (e.g. it may help to resolve atoms whose relative positions lie approximately in that direction, but it is unlikely to help with resolving atom pairs in the low-resolution directions).  The magnitude of this effect will depend on the ellipsoidal completeness, not the spherical completeness value currently computed by the PDB.  In any case including that data surely cannot do any harm!
You should quote the fitted ellipsoid dimensions as given under 'Diffraction limits & eigenvectors of ellipsoid fitted to diffraction cut-off surface'.  It's important to understand that the ellipsoid is not the same as the cut-off surface: it's only an approximation to it.
The 'lowest limit' refers to the reflection with the lowest d* that was cut by STARANISO, i.e. how deeply STARANISO's cut-off surface cut into the input data.
The 'worst' and 'best' limits refer to the reflections which lie on the cut-off surface and which have the lowest and highest d* after STARANISO's cut-off, though not necessarily cut by STARANISO since the cut-off may already have been applied to the input data.
The 'worst' and 'best' directions of the cut-off surface are not mutually perpendicular because there's no reason why they should be!  The data are not cut off by the ellipsoid; rather the ellipsoid is the best fit to the empirically-determined cut-off surface, which therefore may in places go inside or outside the ellipsoid. So the actual cut-offs are not necessarly exactly at the axes of the ellipsoid: they may vary in direction depending on how good is the fit (an exception is that for consistency we apply a final spherical cut-off at the longest axis of the ellipsoid).
The anisotropy of the intensity values is well-described by an ellipsoid (i.e. the 'quadratic form' of the anisotropic B-factor expression); however the anisotropy of <I/σ(I)> is not so simply described because σ(I) depends strongly on the redundancies and these may be completely arbitrary since they are determined by the user-selected collection strategy.  For example the cut-off surface could in principle (though admittedly very unlikely in practice!) be shoe-box shaped, in which case the 'worst' direction would be along the shortest edge of the shoe box and the 'best' direction would be along the body diagonal (i.e. from the centre to a corner).  These directions are clearly not perpendicular.
The only constraints on the shape of the cut-off surface are that it is locally smooth and it has at least the symmetry of the Laue group, i.e. it could be a shoe box with sides of different lengths only in triclinic, monoclinic or orthorhombic space groups.  For example in cubic space groups it would have to have at least the symmetry of a cube (but again note that the 'worst' and 'best' directions in a cube are not perpendicular).
The problem is that it's impossible to describe an arbitrary surface with only a few numbers! - though of course it's easy to visualise it as a 3-D plot using the WebGL tool.  So we use the approximately-fitted ellipsoid to provide a rough definition of the shape of the cut-off surface using the minimum number of numbers (i.e. not more than 6).
I would say that the justification is that the distribution of statistical significance, however it is measured, is clearly systematic, i.e. there are obviously regions of reciprocal space where the significance of the data is systematically higher than in other regions.  The problem is to delineate the significant region while making minimal assumptions about its shape (one need assume only that the surface is locally smooth and that its symmetry is the same as that of the Laue class, which are very mild assumptions and rather obvious).  Removal of the data in the non-significant region is justified because they would only contribute to the noise in the refinement and map calculation, and not contribute any useful signal.
There is a very important related issue here: that of 'fill-in' of the unobserved amplitudes with the structure-factor amplitudes D|Fcalc| calculated from the model.  This is likely to make the maps look much more appealing but could result in highly undesirable strong model bias if not handled correctly.  This issue is discussed in more detail here and here.  Note that we recommend the first method (making use of the 'SA_flag' column) over the second (using Refmac's 'mapcalculate free exclude' option), because the former fills in Fs inside the ellipsoid while excluding those outside, which is the desirable procedure, whereas the latter does no fill-in at all.  Note that as the sharpened data in each direction other than the best one can be abruptly cut off by the boundary surface, some series-termination effects must be expected to be visible in those directions if only the data inside the ellipsoid are used, and if there is high degree of anisotropy, this effect could be very significant.
While I would agree that while it would be hard to demonstrate any benefit of an anisotropic cut-off in cases of mild anisotropy, it would be even harder to demonstrate that any harm comes of it! (and where would one draw the line between mild and strong anisotropy ?).
Deciding what constitutes a significant anisotropy effect is harder than it might seem because the effects do not depend only on the anisotropy ratio or the variation in B factor with direction. They depend also strongly on the resolution. Thus quite small anisotropic ratios and B factor variations can have a big effect at high resolution through the exponent term in the Debye-Waller factor.
Note that the cut-off surface in the cubic case may also be anisotropic, because we are not assuming that the surface is an ellipsoid, only that it has cubic symmetry.  If we assumed that the surface is really an ellipsoid it would indeed be constrained by cubic symmetry to be a sphere.  A cube has different properties when measured along its edges compared with its diagonals, and indeed along any arbitrary direction, so is anisotropic by definition.  The sphere is the only shape which is isotropic but a crystal cannot be built from spheres, so all crystals, including cubic ones, can be anisotropic.
STARANISO does indeed perform three different corrections: the cut-off described above, the anisotropically-corrected prior intensity in the French-Wilson treatment, and the anisotropic correction of the resulting Fs.
We thought long and hard about this and decided that it doesn't make sense to average data where the contributions to the mean intensity from the structure-dependent terms (i.e. the contributions from the different levels of structure) differ.  Therefore for anisotropic data we continue to average the statistics in spherical shells, but adjust the widths of the shells to contain equal numbers of reflections (otherwise the statistics for the outer shells containing small numbers of reflections become meaningless).
Tickle, I.J., Flensburg, C., Keller, P., Paciorek, W., Sharff, A., Vonrhein, C., Bricogne, G. (2016). STARANISO (http://staraniso.globalphasing.org/cgi-bin/staraniso.cgi). Cambridge, United Kingdom: Global Phasing Ltd.
It's important to understand that when unmerged data are submitted to the server two separate protocols are run (sequentially):
Initially we weren't sure which protocol would be optimal so we gave the user the choice of outputs.  However experience suggests that in many cases, although the 'unmerged data' protocol is theoretically superior, in practice it does not make a significant difference to the results.  However we are not aware either that it ever does any harm, so unless there's an obvious reason not to, the 'aniso-merged' output would seem the better choice.  In addition, we find that users like the 'unmerged data' protocol because it provides pretty graphs and 'Table 1' statistics!  Of course you can also make these yourself but here they are all in one place.
The number remaining after the anisotropic cut-off would obviously depend on the number of significant reflections present before the cut-off, which will in turn depend on the strength of the diffraction, the average redundancy (for unmerged data) and the crystal-detector distance.  It's therefore not possible to say what is 'normal', since it depends on the characteristics of the crystal and the data-collection strategy.
The WebGL plots are intended to show graphically which reflections were cut so you can judge for yourself if it makes sense.
The following layout is suggested for reporting, in addition to Table 1, the ellipsoidal diffraction limits with the corresponding principal axes, and the eigenvalues and eigenvectors of the overall anisotropy (B) tensor, both as direction cosines in the standard PDB orthogonal basis (that is x parallel to a, y parallel to c* x a and z parallel to c*, with the origin of the orthogonal frame at the origin of reciprocal space), and in terms of the reciprocal unit-cell vectors.
Diffraction limits (Å) and corresponding principal axes of the ellipsoid fitted to the diffraction cut-off surface as direction cosines in the orthogonal basis (standard PDB convention), and in terms of reciprocal unit-cell vectors: Diffraction limit #1: 2.600 ( 0.8739, 0.0000, -0.4862) 0.933 a* - 0.359 c* Diffraction limit #2: 1.677 ( 0.0000, 1.0000, 0.0000) b* Diffraction limit #3: 2.036 ( 0.4862, 0.0000, 0.8739) 0.692 a* + 0.722 c* Eigenvalues of overall anisotropy tensor on |F|s (Å2) and corresponding eigenvectors of the overall anisotropy tensor as direction cosines in the orthogonal basis (standard PDB convention), and in terms of reciprocal unit-cell vectors: Eigenvalue #1: 125.56 ( 0.7738, 0.0000, -0.6334) 0.879 a* - 0.477 c* Eigenvalue #2: 25.17 ( 0.0000, 1.0000, 0.0000) b* Eigenvalue #3: 25.91 ( 0.6334, 0.0000, 0.7738) 0.822 a* + 0.570 c*
In the *-SWS-aniso-merged.res file I see this:
Eigenvalues (E) & eigenvectors of overall anisotropy (B) tensor on Fs: 125.56 0.7738 0.0000 -0.6334 0.879 _a_* - 0.477 _c_* 25.17 0.0000 1.0000 0.0000 _b_* 25.91 0.6334 0.0000 0.7738 0.822 _a_* + 0.570 _c_*Are those values (125.56, 25.17, 25.91) the ones to use ?
The simple answer is yes, but with a proviso: the overall anisotropy (B) tensor (as well as the principal axes of the diffraction limit ellipsoid) is conventionally reported with respect to an orthogonal frame of reference, using the standard PDB orthogonalization convention (that is x parallel to a, y parallel to c* x a and z parallel to c*, with the origin of the orthogonal frame at the origin of reciprocal space).  The principal axes or eigenvectors are given in the results file both as direction cosines in this orthogonal basis, and in terms of the reciprocal unit-cell vectors a*, b* and c*.  So in your example the first eigenvalue (125.56) of the B tensor is in the direction (0.7738, 0.0000, -0.6334) in the orthogonal basis, which in terms of the reciprocal unit-cell vectors happens to be 0.879 a* - 0.477 c*.  Note that the underscores in the results file signify that these are vectors to distinguish them from the reciprocal cell lengths a*, b* and c* (the underscore character is the typographic convention used in manuscripts to specify to the typesetter the bold font that is used to indicate a vector).
It would not make sense to report the limits along the reciprocal
unit-cell vectors a*, b* and
c*, firstly because in triclinic and monoclinic space
groups the choice of the axes is made according to an arbitrary
convention, so there is nothing special about these directions over any
others (except for the b* axis in monoclinic which is
chosen because it is a 2-fold symmetry axis).  One can indeed see
that the directions reported are not special in terms of
a* and c*: they can take any values in the
a*/c* plane.  Secondly, the eigenvectors
and eigenvalues uniquely define the tensor: this is not true of
arbitrarily-chosen directions.  Since the eigenvectors are by
definition mutually orthogonal it is not possible to uniquely define the
tensor in terms of the RL (reciprocal lattice) vectors in triclinic and
monoclinic space groups, because obviously the RL vectors are not
mutually orthogonal in these space groups (in the higher symmetry space
groups the eigenvectors and RL vectors coincide anyway due to the
symmetry constraints so the problem doesn't arise).  The components
of the tensor parallel to the RL vectors therefore constitute an
incomplete description of the tensor in triclinic and monoclinic space
groups.  It is therefore necessary to also specify the eigenvectors
of the tensor.