The STARANISO Server

Anisotropy of the Diffraction LimitandBayesian Estimation of Structure Amplitudes

1. Is it justified to go all the way to the limit of resolution as determined by STARANISO when the completeness as reported by the PDB validation is very low above 2Å resolution ?

The problem here is that the PDB does not compute the correct value of the completeness for anisotropic data, for the simple reason that it does not currently capture any of the information required to determine the anisotropic cut-off.  This is something that we are currently discussing with them.

The correct anisotropic completeness values are in the table 'Merging statistics table for observed data extracted from the final MRFANA log file' under 'Compl. Ellip.' (i.e. the ellipsoidal completeness). STARANISO defines completeness in the conventional way, i.e. the fraction of reflections inside some data-dependent cut-off surface that were actually measured.  The cut-off surface is defined such that it encloses the set of reflections for which some metric of statistical significance (such as the mean I/σ(I) or CC(1/2)) exceeds a significance threshold set by the user.

The main difference in STARANISO lies in the method of determining the cut-off surface: for data assumed to be isotropic it is done in the standard way by computing the metric in spherical bins so that the cut-off surface is a sphere.  For data assumed to be anisotropic (i.e. the default) it is done by computing the 'moving-average' of the metric within a sphere of predefined radius in reciprocal space centred on each reflection in turn.  For anisotropic data CC(1/2) is not suitable as a metric so only the mean I/σ(I) is used.  Then an ellipsoid is fitted by least squares to the resulting cut-off surface points and the ellipsoid becomes the cut-off surface only for the purpose of estimating the completeness.  Note that there's no reason for the true cut-off surface to be an ellipsoid: an ellipsoid represents only one example of anisotropy (the only constraint on the shape of the cut-off surface is that it is locally smooth and it has at least the point symmetry of the Laue class).

In the absence of the definition of the cut-off ellipsoid (i.e. its semi-axis lengths and directions), the PDB simply assume that anisotropic data are isotropic and use a spherical surface of radius equal to the diffraction limit to compute the completeness.  This means that reflections with statistically non-significant values of the mean I/σ(I) that lie outside the ellipsoid but still inside the limiting sphere are included in the count of statistically-significant reflections, which will underestimate the completeness.

Whether or not the additional anisotropic data that would be rejected by a spherical cut-off will affect map interpretation will obviously depend on whether those data enhance the resolution of the map in a useful direction (e.g. it may help to resolve atoms whose relative positions lie approximately in that direction, but it is unlikely to help with resolving atom pairs in the low-resolution directions).  The magnitude of this effect will depend on the ellipsoidal completeness, not the spherical completeness value currently computed by the PDB.  In any case including that data surely cannot do any harm!

2. Which are the anisotropic diffraction limits that I should quote ?

You should quote the fitted ellipsoid dimensions as given under 'Diffraction limits & eigenvectors of ellipsoid fitted to diffraction cut-off surface'.  It's important to understand that the ellipsoid is not the same as the cut-off surface: it's only an approximation to it.

3. What is the difference between the 'lowest' and the 'worst' limit ?

The 'lowest limit' refers to the reflection with the lowest d* that was cut by STARANISO, i.e. how deeply STARANISO's cut-off surface cut into the input data.

The 'worst' and 'best' limits refer to the reflections which lie on the cut-off surface and which have the lowest and highest d* after STARANISO's cut-off, though not necessarily cut by STARANISO since the cut-off may already have been applied to the input data.

4. Why are the 'worst' and 'best' directions not mutually perpendicular ?  I thought with the ellipsoid cut-off, the long axis should be perpendicular to the short axes.

The 'worst' and 'best' directions of the cut-off surface are not mutually perpendicular because there's no reason why they should be!  The data are not cut off by the ellipsoid; rather the ellipsoid is the best fit to the empirically-determined cut-off surface, which therefore may in places go inside or outside the ellipsoid.  So the actual cut-offs are not necessarly exactly at the axes of the ellipsoid: they may vary in direction depending on how good is the fit (an exception is that for consistency we apply a final spherical cut-off at the longest axis of the ellipsoid).

The anisotropy of the intensity values is well-described by an ellipsoid (i.e. the 'quadratic form' of the anisotropic B-factor expression); however the anisotropy of <I/σ(I)> is not so simply described because σ(I) depends strongly on the redundancies and these may be completely arbitrary since they are determined by the user-selected collection strategy.  For example the cut-off surface could in principle (though admittedly very unlikely in practice!) be shoe-box shaped, in which case the 'worst' direction would be along the shortest edge of the shoe box and the 'best' direction would be along the body diagonal (i.e. from the centre to a corner).  These directions are clearly not perpendicular.

The only constraints on the shape of the cut-off surface are that it is locally smooth and it has at least the symmetry of the Laue group, i.e. it could be a shoe box with sides of different lengths only in triclinic, monoclinic or orthorhombic space groups.  For example in cubic space groups it would have to have at least the symmetry of a cube (but again note that the 'worst' and 'best' directions in a cube are not perpendicular).

The problem is that it's impossible to describe an arbitrary surface with only a few numbers! - though of course it's easy to visualise it as a 3-D plot using the WebGL tool.  So we use the approximately-fitted ellipsoid to provide a rough definition of the shape of the cut-off surface using the minimum number of numbers (i.e. not more than 6).

5. What is the justification for removing the data between the ellipsoid and the high-resolution spherical cut-off ?

I would say that the justification is that the distribution of statistical significance, however it is measured, is clearly systematic, i.e. there are obviously regions of reciprocal space where the significance of the data is systematically higher than in other regions.  The problem is to delineate the significant region while making minimal assumptions about its shape (one need assume only that the surface is locally smooth and that its symmetry is the same as that of the Laue class, which are very mild assumptions and rather obvious).  Removal of the data in the non-significant region is justified because they would only contribute to the noise in the refinement and map calculation, and not contribute any useful signal.

There is a very important related issue here: that of 'fill-in' of the unobserved amplitudes with the structure-factor amplitudes D|Fcalc| calculated from the model.  This is likely to make the maps look much more appealing but could result in highly undesirable strong model bias if not handled correctly.  This issue is discussed in more detail here and here.  Note that we recommend the first method (making use of the 'SA_flag' column) over the second (using Refmac's 'mapcalculate free exclude' option), because the former fills in Fs inside the ellipsoid while excluding those outside, which is the desirable procedure, whereas the latter does no fill-in at all.  Note that as the sharpened data in each direction other than the best one can be abruptly cut off by the boundary surface, some series-termination effects must be expected to be visible in those directions if only the data inside the ellipsoid are used, and if there is high degree of anisotropy, this effect could be very significant.

6. I suppose one could make the valid argument that if the fall-off in diffraction intensity is not isotropic then applying a spherical cut-off is incorrect, and given that nearly all (non-cubic) macromolecular crystals show some diffraction anisotropy, anisotropic cut-offs should always be used.  What I struggle with when the anisotropy is not severe (and defining that is another question!) is how to demonstrate relative benefit of applying an anisotropic cut-off vs using an isotropic one and correcting for the anisotropy in refinement.

While I would agree that while it would be hard to demonstrate any benefit of an anisotropic cut-off in cases of mild anisotropy, it would be even harder to demonstrate that any harm comes of it! (and where would one draw the line between mild and strong anisotropy ?).

Deciding what constitutes a significant anisotropy effect is harder than it might seem because the effects do not depend only on the anisotropy ratio or the variation in B factor with direction. They depend also strongly on the resolution. Thus quite small anisotropic ratios and B factor variations can have a big effect at high resolution through the exponent term in the Debye-Waller factor.

Note that the cut-off surface in the cubic case may also be anisotropic, because we are not assuming that the surface is an ellipsoid, only that it has cubic symmetry.  If we assumed that the surface is really an ellipsoid it would indeed be constrained by cubic symmetry to be a sphere.  A cube has different properties when measured along its edges compared with its diagonals, and indeed along any arbitrary direction, so is anisotropic by definition.  The sphere is the only shape which is isotropic but a crystal cannot be built from spheres, so all crystals, including cubic ones, can be anisotropic.

7. I suppose this is further complicated by the fact that if refinement is performed with structure-factor amplitudes resulting from French-Wilson treatment of the intensities than using an isotropic prior with anisotropic data will result in systematically over-estimated amplitudes in the weak directions of diffraction.

STARANISO does indeed perform three different corrections: the cut-off described above, the anisotropically-corrected prior intensity in the French-Wilson treatment, and the anisotropic correction of the resulting Fs.

8. What prevents calculation and reporting of all statistics in shells with the shape of the cut-off surface ?

We thought long and hard about this and decided that it doesn't make sense to average data where the contributions to the mean intensity from the structure-dependent terms (i.e. the contributions from the different levels of structure) differ.  Therefore for anisotropic data we continue to average the statistics in spherical shells, but adjust the widths of the shells to contain equal numbers of reflections (otherwise the statistics for the outer shells containing small numbers of reflections become meaningless).

9. How should I cite usage of STARANISO ?

Tickle, I.J., Flensburg, C., Keller, P., Paciorek, W., Sharff, A., Vonrhein, C., Bricogne, G. (2016). STARANISO (http://staraniso.globalphasing.org/cgi-bin/staraniso.cgi). Cambridge, United Kingdom: Global Phasing Ltd.

10. I have anisotropy in my data and I got results output with much better resolution.  I am confused which MTZ file to use for refinements ?  There are two MTZ files 'aniso-merged' and 'merged-aniso'.

It's important to understand that when unmerged data are submitted to the server two separate protocols are run (sequentially):

1. The 'merged data' protocol where the input data are isotropically scaled and merged in the normal way (using aP_scale).  Then STARANISO applies an anisotropic diffraction cut-off and anisotropy / Bayesian correction to the merged MTZ file.  This gives the 'merged-aniso' outputs (i.e. first merging then anisotropy cut-off / correction).  This obviously produces exactly the same output as you would have got if you had submitted data that you had scaled and merged yourself using aP_scale.  On the server this step also creates an anisotropic cut-off mask for use in the 'unmerged data' protocol described next.

2. The 'unmerged data' protocol where the anisotropic mask created as above is first used to cut the unmerged data anisotropically.  The cut data are scaled, then the scales are applied to the complete dataset which is finally merged and anisotropy / Bayesian-corrected as above.  This gives the 'aniso-merged' outputs (i.e. first anisotropic cut-off then merging and anisotropy cut-off / correction).  I suppose logically it should be aniso-merged-aniso' but the second 'aniso' seemed superfluous!

Initially we weren't sure which protocol would be optimal so we gave the user the choice of outputs.  However experience suggests that in many cases, although the 'unmerged data' protocol is theoretically superior, in practice it does not make a significant difference to the results.  However we are not aware either that it ever does any harm, so unless there's an obvious reason not to, the 'aniso-merged' output would seem the better choice.  In addition, we find that users like the 'unmerged data' protocol because it provides pretty graphs and 'Table 1' statistics!  Of course you can also make these yourself but here they are all in one place.

11. The number of reflections in the output MTZ file is almost half of what the original data was.  Is it normal or should I be worried ?

The number remaining after the anisotropic cut-off would obviously depend on the number of significant reflections present before the cut-off, which will in turn depend on the strength of the diffraction, the average redundancy (for unmerged data) and the crystal-detector distance.  It's therefore not possible to say what is 'normal', since it depends on the characteristics of the crystal and the data-collection strategy.

The WebGL plots are intended to show graphically which reflections were cut so you can judge for yourself if it makes sense.

12. What results from the server should I report with Table 1 ?

The following layout is suggested for reporting, in addition to Table 1, the ellipsoidal diffraction limits with the corresponding principal axes, and the eigenvalues and eigenvectors of the overall anisotropy (B) tensor, both as direction cosines in the standard PDB orthogonal basis (that is x parallel to a, y parallel to c* x a and z parallel to c*, with the origin of the orthogonal frame at the origin of reciprocal space), and in terms of the reciprocal unit-cell vectors.

```Diffraction limits (Å) and corresponding principal axes of the ellipsoid fitted
to the diffraction cut-off surface as direction cosines in the orthogonal basis
(standard PDB convention), and in terms of reciprocal unit-cell vectors:

Diffraction limit #1:  2.600   ( 0.8739,  0.0000, -0.4862)   0.933 a* - 0.359 c*
Diffraction limit #2:  1.677   ( 0.0000,  1.0000,  0.0000)   b*
Diffraction limit #3:  2.036   ( 0.4862,  0.0000,  0.8739)   0.692 a* + 0.722 c*

Eigenvalues of overall anisotropy tensor on |F|s (Å2) and corresponding eigenvectors
of the overall anisotropy tensor as direction cosines in the orthogonal basis
(standard PDB convention), and in terms of reciprocal unit-cell vectors:

Eigenvalue #1:        125.56   ( 0.7738,  0.0000, -0.6334)   0.879 a* - 0.477 c*
Eigenvalue #2:         25.17   ( 0.0000,  1.0000,  0.0000)   b*
Eigenvalue #3:         25.91   ( 0.6334,  0.0000,  0.7738)   0.822 a* + 0.570 c*
```

13. I would like to report in my Table 1 the Wilson B factor along the a*, b* and c* axes.

In the *-SWS-aniso-merged.res file I see this:

```Eigenvalues (E) & eigenvectors of overall anisotropy (B) tensor on Fs:

125.56    0.7738  0.0000 -0.6334    0.879 _a_* - 0.477 _c_*
25.17    0.0000  1.0000  0.0000    _b_*
25.91    0.6334  0.0000  0.7738    0.822 _a_* + 0.570 _c_*
```
Are those values (125.56, 25.17, 25.91) the ones to use ?

The simple answer is yes, but with a proviso: the overall anisotropy (B) tensor (as well as the principal axes of the diffraction limit ellipsoid) is conventionally reported with respect to an orthogonal frame of reference, using the standard PDB orthogonalization convention (that is x parallel to a, y parallel to c* x a and z parallel to c*, with the origin of the orthogonal frame at the origin of reciprocal space).  The principal axes or eigenvectors are given in the results file both as direction cosines in this orthogonal basis, and in terms of the reciprocal unit-cell vectors a*, b* and c*.  So in your example the first eigenvalue (125.56) of the B tensor is in the direction (0.7738, 0.0000, -0.6334) in the orthogonal basis, which in terms of the reciprocal unit-cell vectors happens to be 0.879 a* - 0.477 c*.  Note that the underscores in the results file signify that these are vectors to distinguish them from the reciprocal cell lengths a*, b* and c* (the underscore character is the typographic convention used in manuscripts to specify to the typesetter the bold font that is used to indicate a vector).

It would not make sense to report the limits along the reciprocal unit-cell vectors a*, b* and c*, firstly because in triclinic and monoclinic space groups the choice of the axes is made according to an arbitrary convention, so there is nothing special about these directions over any others (except for the b* axis in monoclinic which is chosen because it is a 2-fold symmetry axis).  One can indeed see that the directions reported are not special in terms of a* and c*: they can take any values in the a*/c* plane.  Secondly, the eigenvectors and eigenvalues uniquely define the tensor: this is not true of arbitrarily-chosen directions.  Since the eigenvectors are by definition mutually orthogonal it is not possible to uniquely define the tensor in terms of the RL (reciprocal lattice) vectors in triclinic and monoclinic space groups, because obviously the RL vectors are not mutually orthogonal in these space groups (in the higher symmetry space groups the eigenvectors and RL vectors coincide anyway due to the symmetry constraints so the problem doesn't arise).  The components of the tensor parallel to the RL vectors therefore constitute an incomplete description of the tensor in triclinic and monoclinic space groups.  It is therefore necessary to also specify the eigenvectors of the tensor.