AIMLESS (CCP4: Supported Program)
NAME
aimless
- scale together multiple observations of
reflections
SYNOPSIS
aimless HKLIN foo_in.mtz HKLOUT foo_out.mtz
[Keyworded Input]
References
Input and Output files
Release Notes
DESCRIPTION
Running the program
Scaling options
Control of flow through the program
Partially recorded reflections
Scaling algorithm
Data from Denzo
Datasets
This program scales together multiple observations of
reflections,
and merges multiple observations into an average intensity: it is
a successor program to SCALA
Various scaling models can be used. The scale factor is a
function
of the primary beam direction, either as a smooth function of Phi
(the
rotation angle ROT), or expressed as BATCH (image) number
(strongly deprecated).
In addition,
the scale may be a function of the secondary beam direction,
acting
principally as an absorption correction expanded as spherical
harmonics. The secondary beam correction is related to the
absorption anisotropy
correction described by Blessing (Ref Blessing
(1995)
).
The merging algorithm analyses the data for outliers, and gives
detailed analyses. It generates a weighted mean of the
observations
of the same reflection, after rejecting the outliers.
The program does several passes through the data:
- initial estimate of the
scales
- first round scale refinement, using strong data using an
I/sigma(I) cutoff
- first round of outlier rejection
- if both summation and profile-fitted intensity estimates are
present (eg from Mosflm), then the cross-over point is
determined between
using profile-fitted for weak data and summation for strong
data.
- first analysis pass to refine the "corrections" to the
standard
deviation estimates
- final round scale refinement, using strong data within limits
on the normalised intensity |E|^2
- final analysis pass to refine the "corrections" to the
standard
deviation estimates
- final outlier rejections
- a
final pass to apply scales, analyse agreement & write the
output file, usually with merged intensities, but alternatively
as file
with scaled but unmerged observations, with partials summed and
outliers rejected, for each dataset
Anomalous scattering is ignored during the scale
determination (I+ & I- observations are treated together), but
the
merged file always contains I+ & I-, even if the ANOMALOUS OFF
command is used. Switching ANOMALOUS ON does affect the statistics
and
the outlier rejection (qv)
Running the program
Aimless will often be run from the CCP4 GUI, but may also be run
from a
script. In a script the input and output files may be assigned on
the
command line, or some of them (marked with an asterisk in the list
below) may be assigned as keyworded input commands. The option
switch
"--no-input" forces the program to run immediately with default
options, without waiting for input commands, using file assignments
from the command line.
Input files:
HKLIN*, HKLREF*, XYZIN*
Output files:
HKLOUT*, XMLOUT*, SCALES*, ROGUES*, TILEIMAGE
Plot files also represented in XMLOUT:
ROGUEPLOT, NORMPLOT, ANOMPLOT, CORRELPLOT
Explicit file assignments for optional
output reflection files, otherwise generated from HKLOUT:
HKLOUTUNMERGED, SCALEPACK, SCALEPACKUNMERGED
Scaling options
The optimum form of the scaling will depend a great deal on how
the
data were collected. It is not possible to lay down definitive
rules,
but some of the following hints may help. For most purposes, my
normal
recommendation is the default
scales rotation spacing 5 secondary bfactor on brotation spacing 20
Other hints:-
- Only use the
SCALE BATCH option if every image is different from every other
one, i.e.
off-line detectors (including film), or rapidly or
discontinuously changing incident beam flux. This is
rarely the
case for synchrotron data, but is appropriate for serial data
(eg XFEL). This mode may be VERY slow if there are many batches.
- If there is a discontinuity between one set of images and
another
(e.g. change of exposure time), then flag them as
different
RUNs. This will be done automatically if no runs are specified.
- The SECONDARY correction is recommended and is the default:
this provides a
correction for absorption. It
should always be restrained with a TIE SURFACE command (this is
the
default): under these conditions it is reasonably stable under
most
conditions. The ABSORPTION
(crystal frame)
correction is similar to SECONDARY (camera frame) in most cases,
but
may be preferable if data has been collected from multiple
alignments
of the same crystal.
- Use a
B-factor correction unless the data are only very
low-resolution.
Traditionally, the relative B-factor is a correction for
radiation
damage (hence it is a function of time), but it also includes
some
other corrections eg absorption.
- When trying out more complex scaling options, it is a
good idea to try a simple scaling first, to check that
the more elaborate model gives a real improvement.
- When scaling multiple MAD data sets they should
all be scaled together in one pass, outliers rejected across all
datasets, then each wavelength merged separately. This is the
default if multiple datasets are present in the input file.
Other options are described in greater detail under the KEYWORDS.
Control of flow through the
program
The
ONLYMERGE flag skips the scaling (often in conjuction with RESTORE
to
read in previously determined scales), calculates statistics
and
outputs the data.
Partially recorded
reflections
See appendix 1
The different options for the treatment of partials are set
by the PARTIALS command. Partials may
either be summed or scaled : in the latter case, each part is
treated
independently of the others.
Summed partials [default]:
All the parts are summed (after applying
scales) to give the total intensity, provided some checks are
passed.
The number of reflections failing the checks is printed. You
should make sure that you are not losing too many reflections in
these
checks.
Scaled partials:
In this option, each individual partial observation scaled up by
the
inverse
FRACTIONCALC, provided that the fraction is greater than
<minimum_fraction>
[default = 0.5]. This only works well if the calculated fractions
are
accurate, which is not usually the case.
Scaling algorithm
The normal scaling method improves the internal consistency of the
dataset by minimising
Sum( whl * ( Ihl - ghl * Ih )**2 )
See appendix 2 for more details
Scaling to reference
THIS OPTION HAS NOT BEEN EXTENSIVELY TESTED
An alternative method scales to an external previously-determined
reference dataset, minimising
Sum( whl * ( Ihl - ghl * Ihref )**2 )
where Ihref is the reference intensity and the weight whl =
1/(var(Ihl) + var(Ihref))
This option might be useful for example in scaling long-wavelength
data
with high absorption to a short-wavelength set from a similar
crystal. It is specified with the command REFINE
REFERENCE.
Reference intensities are taken from an MTZ file of merged
intensities
specified as HKLREF (command or command
line).
If intensities are not available, amplitudes F are accepted, and
will
be squared to intensities, but note that Fs which come from the
French
& Wilson "truncate" procedure are seriously biased for small
intensities, so Fs are deprecated. A coordinate reference XYZIN is
not
accepted, as that does not seem to work well (in some limited
tests).
By default, the first intensity column (or amplitude column) in the
file is used, or the column may be explicitly set using the LABREF command. Note that provided the columns
are contiguous, only the first of the set need be specified or
chosen automatically
eg LABREF I=I(+) will pick up I(+), SIGI(+), I(-), SIGI(-)
Data from Denzo
Data integrated with Denzo may be scaled and merged with Aimless as
an alternative to Scalepack, or unmerged output from scalepack may
be
used. Both have some limitations. See
appendix 3
for more details.
Datasets
TBD
KEYWORDED INPUT - DESCRIPTION
In the definitions below "[]" encloses optional items,
"|" delineates alternatives. All keywords are
case-insensitive, but are listed below in upper-case. Anything
after
"!" or "#" is treated as comment. The available
keywords are:
ANALYSIS, ANOMALOUS,
BINS,
DUMP,
EXCLUDE, HKLIN,
HKLOUT, HKLREF,
INITIAL, INTENSITIES, KEEP, LABREF, LINK,
NAME,
ONLYMERGE,
OUTPUT,
PARTIALS, REFINE,
REJECT,
RESOLUTION,
RESTORE, ROGUES,
RUN,
SCALES,
SDCORRECTION,
TIE,
TITLE, UNLINK, USESDPARAMETER, XMLOUT, XYZIN
RUN <Nrun> BATCH <b1> to
<b2>
Define a "run" : Nrun is the Run number, with an arbitrary
integer label (i.e.
not necessarily 1,2,3 etc). A "run"
defines
a set of reflections which share a set of scale factors. Typically
a
run
will be a continuous rotation around a single axis. The definition
of a
run may use several RUN
commands. If no RUN command is given
then run assignment will be done automatically, with run breaks at
discontinuities in dataset, batch number or Phi. If any RUN
definitions
are given, then all batches not explicitly specified will be
excluded.
SCALES [<subkeys>]
Define layout of scales, ie the scaling model. Note that a layout
may be defined for all runs (no RUN subkeyword), then overridden
for
particular runs by additional commands.
- Subkeys:
-
- RUN <run_number>
- Define run to which this command applies: the run must
have
been previously
defined. If no run is defined, it applies to all runs
- ROTATION <Nscales> |
SPACING <delta_rotation>
- Define layout of scale factors along rotation axis (i.e.
primary beam),
either as number of scales or (if SPACING keyword present)
as interval
on rotation [default SPACING 5]
- BATCH
- Set "Batch" mode, no interpolation along rotation
(primary) axis. This option is compulsory if a ROT column is
not
present in
the input file, but otherwise the ROTATION option is
preferred. WARNING: this option is not optimised and may
take a very long
time if you have many batches
- BFACTOR ON | OFF
- Switch Bfactors on or off. The default is ON.
- BROTATION <Ntime>
| SPACING
<delta_time>
- Define number of B-factors or (if SPACING keyword present)
the
interval on "time": usually no time is defined in the input
file, and
the rotation angle is used as its proxy [default SPACING
20].
- SECONDARY [<Lmax>]
- Secondary beam correction expanded in spherical harmonics
up
to
maximum order Lmax in the camera spindle frame. The number
of
parameters increases as (Lmax + 1)**2, so you should use the
minimum
order needed (eg 4 - 6, default 4). The deviation of the
surface from spherical should be
restrained eg with TIE SURFACE 0.001 [default]. Set Lmax = 0
to switch off
- ABSORPTION [<Lmax>]
- Secondary beam correction expanded in spherical harmonics
up
to
maximum order Lmax in the crystal frame based on POLE (qv).
The number
of parameters increases as (Lmax + 1)**2, so you should use
the
minimum order needed (eg 4 - 6, default 4). The deviation of
the surface from spherical should be
restrained eg with TIE SURFACE 0.001 [default]. This is not
substantially different from SECONDARY in most cases, but
may be
preferred if data are collected from multiple settings of
the same
crystal, and you want to use the same absorption surface.
This would
only be strictly valid if the beam is larger than the
crystal.
- POLE <h|k|l>
- Define the polar axis for ABSORPTION or SURFACE as h, k
or l
(eg
POLE L): the pole will default to either the closest axis to
the
spindle (if known), or l (k for monoclinic space-groups).
- CONSTANT
- One scale for each run (equivalent to ROTATION 1)
- TILE <NtileX>
<NtileY> [CCD]
- Define a detector scale for each tile. Currently this
implements a scale model for 3x3 tiled CCD detectors to
correct for the
underestimation of intensities in the corners of the tile,
see Appendix 2.
If the detector appears to be a 3x3 CCD (3072x3072 pixels)
then this
correction will be activated automatically unless the NOTILE
keyword is
given. The parameters are restrained using the TIE TILE
parameters (qv)
- NOTILE
- Switch off the automatic TILE 3 3 correction for CCD
detectors
SDCORRECTION [[NO]REFINE]
[INDIVIDUAL | SAME [FIXSDB]
[RUN
<RunNumber>] [FULL | PARTIAL] <SdFac> [<SdB>]
<SdAdd> [DAMP <dampfactor>]
[SIMILAR [<sd1> <sd2> <sd3>]] ||
[[NO]TIE SdFac | SdB | SdAdd <targetvalue>
<SDtarget>]
[SAMPLESD]
Input or set options for the "corrections" to the input standard
deviations: these are modified to
sd(I) corrected = SdFac * sqrt{sd(I)**2 + SdB*Ihl + (SdAdd*Ihl)**2}
where Ihl is the intensity and
(SdB may be omitted in the input).
The default is "SDCORRECTION REFINE INDIVIDUAL", If explicit
values are given, the default changes to NOREFINE.
The keyword REFINE controls refinement of the correction
parameters,
essentially trying to make the plot of the SD of the distribution
of
fraction deviations (Ihl - <I>)/sigma = 1.0 over
all
intensity ranges. The residual minimised is Sum( w * (1 - SD)^2) +
Restraint Residual
SAMPLESD is intended for very high multiplicity data such as XFEL
serial data. The final SDs are estimated from the weighted
population
variance, assuming that the input sigma(I)^2 values are proportional
to
the true errors. This probably gives a more realistic estimate of
the
error in <I>. In this case refinement of the corrections is
switched off unless explicitly requested.
Other subkeys
control what values are determined and used for each run (if more
than one). TIE and SIMILAR are mutually exclusive
- SAME
[default] same SD
parameters for all runs, different for fulls and partials
- INDIVIDUAL use different SD parameters for
each run, fulls and partials
- FIXSDB
fixes
the SdB parameter in the refinement (but it seems best to let it
refine, even though it has no obvious physical meaning)
- DAMP
set
dampfactor to damp shifts in the refinement [default 0.05]
- SIMILAR
restrain
parameters to be the same for all runs, with SDs optionally
given for SdFac (sd1), SdB (sd2), and SdAdd (sd3) [defaults 0.2,
3.0,
0.04)
- TIE
set
restraints for named parameter, "SdFac", "SdB", or "SdAdd". Each
restraint is to a specified target value, with a weight =
1/(SDtarget^2). The default is to restrain SdB only, target
value 0.0,
SD 20. NOTIE removes all restraints, TIE without values sets the
defaults.
RUN <run_number>
Define run for which values are given the run must have been
previously
defined. If no run is defined, it applies to all runs. Different
values
may be specified for fully recorded reflections (FULL) and for
partially
recorded reflections (PARTIAL), or the same values may be used for
both if one set is given, e.g.
sdcorrection full 1.4 0.11 part 1.4 0.05
USESDPARAMETER [NO | DIAGONAL
| COVARIANCE]
For the final estimation of intensity errors sd(I), incorporate the
estimated error in the refined scale model parameters, as estimated
from the inverse normal matrix in the scale refinement. The default
is
DIAGONAL if this keyword is omitted, or given with no sub-keyword.
"NO"
switches it off. The DIAGONAL option uses the separate parameter
variances, ie the diagonal of the variance/covariance matrix.
COVARIANCE uses the full matrix, which is slower but may be more
accurate.
The variance/covariance matrix [V] = Sum(wD^2)/(m-n) [H]^-1, where
[H]
is the normal (Hessian) matrix, Sum(wD^2) is the minimised residual,
m
the number of observations, and n the number of parameters.
The scaled intensity I'hl = Ihl/ghl where
ghl is its inverse scale factor
Var(I')/I'^2 = Var(I)/I^2 + Var(g)/g^2
ie Var(I') = (1/g^2) [ Var(I) + I'^2 Var(g) ]
Var(g) = [dg/dp]T [V]
[dg/dp] (COVARIANCE
option)
where dg/dp is the vector of partial derivatives with respect
to
parameters p
DIAGONAL approximation: Var(g) = Sum(i) {
[dg/dp(i)]^2 V(i,i) } ie summed over parameters i
PARTIALS [[NO]CHECK] [TEST
[<lower_limit>
<upper_limit>] [CORRECT <minimum_fraction>] [[NO]GAP
[<maxgap>]]
Set criteria for accepting complete or incomplete partials.
Default is CHECK TEST 0.95 1.05 CORRECT 0.95 NOGAP
After
all parts have been assembled, the total observation is accepted
if:-
-
the CHECK flag is set [default]
and
the MPART flags (if present) are all consistent (these flags
indicate that a set of
parts is eg 1 of 3, 2 of 3, 3 of 3)
-
if CHECK fails, then the total
fraction is checked to lie between lower_limit &
upper_limit
[default 0.95, 1.05]
-
if this fails, then the incomplete partial is scaled up by
the
total fraction if it is > minimum_fraction [default
0.95] (NB Pointless has different default for a different
purpose)
- a reflection has a gap in the middle may be accepted if GAP is
set, maxgap is maximum number of missing slots [not recommended:
default 1 if GAP is set]
INITIAL UNITY | MEAN |
MINIMUM_OVERLAP <minimum_overlap> | MAXIMUM_GAP
<maximum_gap>
Set initial scale factors either based on mean intensities (MEAN,
default) or all set to 1.0 (UNITY)
If the fractional overlap between rotation ranges is less than minimum_overlap
in too many rotation ranges, then scaling will be switched off (ie
ONLYMERGE
and so will SD correction refinement (SDCORRECTION NOREFINE).
Default
value 0.05. Set to a value <= 0.0 to ignore this check. maximum_gap specifies the
maximum number of contiguous rotation ranges which are allowed to
fall below the minimum_overlap criterion,
default 2.
Fractional
overlap is (Number of
observations with matching observations in a
different rotation range)/(Total number of observations)
INTENSITIES [SUMMATION | PROFILE |
COMBINE [<Imid>] [POWER
<Ipower>]
Set which intensity to use, of the integrated intensity (column
I)
or
profile-fitted (column IPR), if both are present. This applies to
all stages of the program, scaling & averaging. Mosflm
produces two
different estimates of the intensity, from summation integration
and
from profile fitting. Generally the profile-fitted estimate is
better,
but for the strongest reflections the summation value is often
better.
The default is to use a weighted mean, depending on the "raw"
intensity
ie before LP correction (COMBINE option), and to optimise
automatically
the switch-over point Imid, to give the best overallR meas.
- Subkeys:
-
- SUMMATION
- use summation integrated intensity Isum.
- PROFILE
- use profile-fitted intensity Ipr.
- COMBINE [<Imid>]
[POWER <Ipower>]
- Use weighted mean of profile-fitted & integrated
intensity,
profile-fitted for weak data, summation integration value
for strong.
- If
no value is given for Imid, it will be automatically
optimised
- I = w*Ipr + (1-w)*Isum
- w = 1/(1 + (Iraw/Imid)**Ipower)
- Ipower defaults to 3.
REJECT [SCALE | MERGE] [COMBINE]
[SEPARATE]
<Sdrej> [<Sdrej2>]
[ALL <Sdrej+-> [<Sdrej2+->]]
[KEEP | REJECT | LARGER | SMALLER]
[EMAX <Emax>]
[BATCH <batchrejectfactor>]
[NONE]
Define rejection criteria for outliers: different criteria may be
set for the scaling and for the merging passes. If neither
SCALE nor MERGE are specified, the same values are used for both
stages. The default values are REJECT 6 ALL -8, ie test within I+
or
I- sets on 6sigma, between I+ & I- with a threshold adjusted
upwards
from 8sigma according to the strength of the anomalous signal. The
adjustment of the ALL test is not necessarily reliable.
If there are multiple datasets, by default, deviation
calculations
include data from all datasets [COMBINE]. The SEPARATE flag means
that
outlier rejections are done only between observations from the
same
dataset. The usual case of multiple datasets is MAD data.
If ANOMALOUS ON is set, then the main outlier test is done in
the merging step only within the I+ & I- sets for that
reflection,
ie
Bijvoet-related reflections are treated as independent. The ALL
keyword here enables an additional test on all observations
including
I+ & I-
observations. Observations rejected on this second check
are
flagged "@" in the ROGUES file.
REJECT BATCH <batchrejectfactor> is intended for batch
scaling
of eg XFEL data. After the initial scales are calculated, very
weak
batches with scale factorsbatchrejectfactor x median scale are
rejected
REJECT NONE skips all outlier checking, REJECT EMAX 0.0 switches
off Emax testing
- Subkeys:
-
- SEPARATE
- rejection & deviation calculations only between
observations
from the same dataset
- COMBINE
- rejection & deviation calculations are done with all
datasets
[default]
- SCALE
- use these values for the scaling pass
- MERGE
- use these values for the merging (FINAL) pass
- sdrej
- sd multiplier for maximum deviation from weighted mean I
[default 6.0]
- [sdrej2]
- special value for reflections measured twice [default =
sdrej]
- ALL
- check outliers in merging step between as well as within
I+
& I- sets (not
relevant if ANOMALOUS OFF). A negative value [default -8]
means adjust
the value upwards according to the slope of the normal
probability
analysis of anomalous differences (AnomPlot)
- sdrej+-
- sd multiplier for maximum deviation from weighted mean I
including
all I+ & I- observations (not relevant if ANOMALOUS OFF)
- [sdrej2+-]
- special value for reflections measured twice [default =
sdrej+-]
- KEEP
- in merging, if two observations disagree, keep both of
them
[default]
- REJECT
- in merging, if two observations disagree, reject both of
them
- LARGER
- in merging, if two observations disagree, reject the
larger
- SMALLER
- in merging, if two observations disagree, reject the
smaller
- EMAX
- maximum
acceptable value for E = normalised |F|, <= 0.0 to switch
off test
[default = 10.0 for acentrics]. Observations are only
rejected if E
> EMAX and I/sd(I) > sdrej, to allow for inaccurate
normalisation
in very weak high resolution bins.
The test for outliers is described in Appendix
4
ANOMALOUS [OFF] [ON]
-
- OFF [default]
- no anomalous used, I+ & I- observations averaged
together
in merging
- ON
- separate anomalous observations in the final output pass,
for
statistics
& merging: this is also selected the keyword ANOMALOUS
on its own
RESOLUTION [RUN <RunNumber>]
[[LOW]
<Resmin>]
[[HIGH] <Resmax>]
Set resolution limits in Angstrom, either order, optionally for
individual
datasets. The keywords LOW or HIGH, followed by a number, may be
used
to set the
low or high resolution limits explicitly: an unset limit will be
set as
in the input HKLIN file. If a RUN is specified this limit applies
only
to that run: this may a previous general limit for all runs, and
may be
used with automatic run generation. [Default use all data]
TITLE <new title>
Set new title to replace the one taken from the input file. By
default,
the title is copied from hklin to hklout
ANALYSIS [CONE <angle>] [CCMINIMUM
<MinimumHalfdatasetCC>] [CCANOMMINIMUM
<MinimumHalfdatasetAnomCC>] [ISIGMINIMUM
<MinimumIoverSigma>] [BATCHISIGMINIMUM
<MinimumBatchIoverSigma>] [GROUPBATCH
<BatchGroupRange>]
Specify analysis parameters:
CONE specifies the half-angle (degrees) for cones around each
reciprocal axis, for anisotropy analysis [default 20°].
CCMINIMUM & ISIGMINIMUM specify thresholds for estimation of
suitable
maximum resolution limits, both overall and along each reciprocal
axis.
These estimates are printed in the final Results summary, and give
guide to possible cut-offs. BATCHISIGMINIMUM gives the threshold for
the
analysis of maximum resolution by batch, on <I/sd> before
averaging. CCANOMMINIMUM is the threshold for analysis of the
resolution limit of strong anomalous differences, from CC(1/2)anom.
Resolution estimates from CC(1/2) and CC(1/2)anom are done by
fitting a
function (1/2)(1 - tanh(z)) where z = (s - d0)/r, s = 1/d^2, and d0
is
the value of s for which the function = 0.5, and r controls the
steepness of falloff. For very negative CCs (usually from CCanom),
an
additional offset parameter dcc is added, {(1/2)(1 - tanh(z) * dcc -
dcc + 1}. The fitted function is plotted along with the values. This
curve-fitting was suggested by Ed Pozharski.
- MinimumHalfdatasetCC minimum half-dataset CC(1/2)
[default 0.3]
- MinimumIoverSigma minimum
<<I>/sd(<I>)> (=~ signal/noise) [default 1.5]
- MinimumBatchIoverSigma
minimum <I/sd(I)> (=~ signal/noise) [default 1.0, a
smaller value
as I/sd is before averaging]
- MinimumHalfdatasetAnomCC minimum half-dataset
CCanom [default 0.15]
BatchGroupRange: in the analyses against Batch, the batches
(images) are grouped to reduce the number of ranges, with a group
size
of BatchGroupRange degrees [default 1.0 degrees]
ONLYMERGE
Only do the merge step, no initial analysis, no scaling. If
RESTORE
is also given, the SDCORRECTION optimising will also be skipped.
DUMP [<Scale_file_name>]
Dump all scale factors to a file after the main scaling. These
can be used
to restart scaling using the RESTORE option, or for rerunning the
merge
step. If no filename is given, the scales will be written to
logical
file
SCALES, which may be assigned on the command line.
RESTORE [<Scale_file_name>]
Read scales and SDcorrection parameters from a SCALES file from a
previous run of Aimless
(see DUMP).
REFINE [CYCLES <Ncycle>] [BFGS
| FH | REFERENCE] [SELECT <IovSDmin> <E2min>
[<E2max>]]
[PARALLEL [AUTO] | <Nprocessors> |
<Fractionprocessors>]
Define number of refinement cycles
Ncycle and method for scale refinement.
BFGS use BFGS optimisation (usual method)
FH use Fox-Holmes
least-squares algorithm (not recommended)
REFERENCE scale
to an external reference
dataset, specified as a merged MTZ file with the HKLREF
command. This
should contain intensities (either IMEAN or I+/I-), or amplitudes F
which will be squared to intensities: intensities are strongly
preferred, as squared Fs which have been "truncated" are
significantly
biased. The LABREF command may be used to
specify the column label,
otherwise the first intensity (or F) will be used. If I+ and I- are
given, Imean for scale refinement is calculated as the unweighted
mean.
sigma(I) (if present) is assumed to be in the column following the
intensity.
SELECT define selection limits for the two rounds of scaling.
If unset, suitable values will be chosen automatically
- IovSDmin <I>/sd'(I) limit for selection of
reflections for 1st round scaling (< 0 for automatic
selection)
- E2min minimum E2 for
selection of reflections for main scaling [default 0.8]
- E2max maximum E2 for selection of
reflections for main scaling [default 5.0]
PARALLEL use multiple processors for the scale refinement
steps, if available. This produces some speed-up for very large
jobs.
For this option to be available, the
program must be compiled and linked with the "-fopenmp" option,
and the
environment variable OMP_NUM_THREADS must be set to the maximum
number
of threads allowed by the system
- <Nprocessors> number of processors to use (this
will be forced to be < OMP_NUM_THREADS)
- <Fprocessors> (< 1.0) fraction of OMP_NUM_THREADS
to use
- AUTO
[default if no argument to PARALLEL] determine the number of
processors to use from the number of observations in the file,
currently 1 processor / 200 000 observations, up to the
maximum allowed
(the optimum settings for this have yet to be determined)
EXCLUDE BATCH <batch
range>|<batch list>]
-
- BATCH | <b1> <b2>
<b3> ...
| <b1> TO <b2> |
- Define a list of batches, or a range of batches, to be
excluded
altogether.
TIE [SURFACE <Sd_srf>] [BFACTOR
<Sd_bfac>] [ZEROB <Sd_zerob>] [ROTATION <Sd_z>]
[TILE
<Sd1-5>] [TARGETTILE <r0> w0>]
Apply or remove restraints to parameters. These can be pairs of
neighbouring scale factors on rotation axis (ROTATION = primary
beam) to have the same
value, or neighbouring Bfactors, or surface spherical harmonic
parameters to zero (for SECONDARY or SURFACE corrections, to keep
the
correction approximately spherical), with a standard deviation as
given. This may be used if scales are varying too wildly,
particularly
in the detector plane. The default is no restraints on scales. A
tie
is recommended for SECONDARY or SURFACE corrections, eg TIE
SURFACE 0.001. A negative SD value indicates no tie.
- SURFACE: tie surface parameters to spherical surface
[default
is TIE
SURFACE 0.001]
- BFACTOR: tie Bfactors along rotation
- ZEROB: tie all B-factors to zero
- ROTATION: tie parameters along rotation axis (mainly useful
with
BATCH mode)
- TILE: tie the CCD tile parameters. 5 SDs for radius r, width
w, amplitude A, centre x0,y0, and Fourier coefficients
- TARGETTILE: target values for tile parameters r and w
OUTPUT [MTZ] [NO]MERGED [UNMERGED [SPLIT|TOGETHER]]
[SCALEPACK [MERGED |
UNMERGED]]
Control what goes in the output file. Two types of output
files
may be produced, either in MTZ format or in Scalepack format: (a)
MERGED (or AVERAGE), average intensity for each hkl (I+ &
I-) (b) UNMERGED, unaveraged observations, but with scales
applied, partials summed or scaled, and outliers rejected. Up to
four
types of files may be created at the same time: UNMERGED filenames
are
created from the HKLOUT filename (with dataset
appended if there are multiple datasets) with the string "_unmerged"
appended. If there are multiple datasets, by default MTZ
files,
merged or unmerged, are split into separate files (SPLIT).
Unmerged MTZ files may optionally include all datasets if the
keyword
TOGETHER qualifies UNMERGED.
The default is to create a merged MTZ file for each dataset.
- File format options:
-
- NONE
- no output file written
- MERGED or AVERAGE
- [default] output averaged intensities, <I+> &
<I->
for each hkl
- UNMERGED
- apply scales, sum or scale partials, reject outliers, but
do
not
average observations
- SCALEPACK or POLISH
- Write reflections to a formatted file in a format as
written by
"scalepack" (or my best approximation to it). If the
UNMERGED option is also selected, then the output
matches the scalepack "output nomerge original index",
otherwise it is
the "normal" scalepack output, with either I, sigI or I+
sigI+, I-,
sigI-, depending on the "anomalous" flag.
KEEP [OVERLOADS|BGRATIO
<bgratio_max>|PKRATIO <pkratio_max>|GRADIENT
<bg_gradient_max>|EDGE | MISFIT]
Set options to accept observations flagged as rejected by the
FLAG
column from Mosflm. By default, any
observation with FLAG .ne. 0 is rejected. Flagged reflections
which are accepted may be marked in the ROGUES file.
- Subkeys:
-
- OVERLOADS
- Accept profile-fitted overloads
- BGRATIO
- Observations are flagged in Mosflm if the ratio of rms
background
deviation relative to its expected value from counting
statistics is
too large. This option accepts observations if bgratio <
bgratio_max
[default in Mosflm 3.0]
- PKRATIO
- Accept observations with peak fitting rms/sd ratio pkratio
<
pkratio_max [default maximum in Mosflm 3.5]. Only set for
fully
recorded
observations
- GRADIENT
- Accept observations with background gradient <
bg_gradient_max [default in Mosflm 0.03].
- EDGE
- Accept profile-fitted observations on edge of active area
of
detector
- MISFIT
- Accept reflections flagged as MISFIT by XDS (in
XDS_ASCII.HKL file), ie flagged as outliers in the CORRECT
step
LINK [SURFACE] ALL | <run_2> TO
<run_1>
run_2 will use the same SURFACE (SECONDARY or ABSORPTION)
parameters
as run_1. This can be useful when different runs come from the
same
crystal, and may stabilize the parameters. The keyword ALL will be
assumed if omitted.
- For SECONDARY or ABSORPTION parameters, the default is to
link
runs
which come from the same crystal as long as they have similar
wavelengths. They should be UNLINKed if they are different.
UNLINK [SURFACE] ALL | <run_2> TO
<run_1>
Remove links set by LINK command (or by default). The keyword ALL
will
be assumed if omitted
BINS [RESOLUTION] <Nsbins> INTENSITY
<Nibins>
Define number of resolution and intensity bins for analysis
[default 10]
SMOOTHING <subkeyword>
<value> NOT YET
DONE
Set smoothing factors ("variances" of weights). A larger
"variance" leads to greater smoothing
- Subkeys:
-
- TIME <Vt>
- smoothing of B-factors [default 0.5]
- ROTATION <Vz>
- smoothing of scale along rotation [default 1.0]
- PROB_LIMIT
<DelMax_t> <DelMax_z>
<DelMax_xy>
- maximum values of normalized squared deviation (del**2/V)
to
include
a scale [default set automatically, typically 3]
NAME
PROJECT <project_name> CRYSTAL <crystal_name> DATASET
<dataset_name>
Assign or
reassign project/crystal/dataset names, for output file.
The names given here supersede those in the input file and redefines
the single output dataset.
Note that these names apply to all data: if multiple datasets are
required, these must be specified in Pointless. DATASET must be
present, and may optionally be given in the syntax
crystal_name/dataset_name
BASE [CRYSTAL <crystal_name>] DATASET
<base_dataset_name> NOT YET DONE
If there are multiple datasets in the input file, define the
"base"
dataset for analysis of dispersive (isomorphous) differences.
Differences between other datasets and the base dataset are
analysed
for correlation and ratios, ie for the i'th dataset (I(i) -
I(base)).
By default, the datasets with the shortest wavelength will be
chosen
as the base (or dataset 1 if wavelength is unknown). Typically,
the
CRYSTAL keyword may be omitted.
HKLIN <input file name>
Filename for the main input file, as an alternative to specifying it
on the command line.
HKLOUT <output file name>
Filename for the output file, as an alternative to specifying it on
the command line.
XMLOUT <output XML file name>
Filename for the XML output file, as an alternative to specifying it
on the command line.
HKLREF <reference file name>
Filename for a reference reflection MTZ file, as an alternative to
specifying it on the command line. This file is used to provide a
"best" estimate of intensity, possibly for the option to refine
against a reference set (see above).
This reference set is also used to compare to the scaled observed
data, analysing it for its
agreement as a function of batch, as R-factors and correlation
coefficients, so that particularly bad regions of data may be
detected. Column labels may
be specified with the LABREF
command.
For refinement against reference data, this file should be merged
measured intensities from a reference crystal, or possibly
amplitudes
(deprecated due to bias from the "truncate" procedure).
For analysis, this reference data could also for example be
calculated from the best
current model, eg the FC_ALL_LS column from Refmac. Amplitudes are
squared to intensities, and intensities are
scaled to the merged observations with a scale and a anisotropic
temperature factor. This is an alternative to giving a coordinate
file XYZIN from which structure factors will
be calculated.
LABREF [F | I =]<columnlabel>]
For
an HKLREF file, this defines the column label for intensity or
amplitude (which will be squared to an intensity). If this command
is
omitted, the first intensity column (or if no intensities, the first
amplitude) will be used. The next column is assumed to contain the
corresponding sigma. Note that provided the columns are contiguous,
only the first of the set need be specified or chosen automatically
eg LABREF I=I(+) will pick up I(+), SIGI(+), I(-), SIGI(-)
XYZIN <reference coordinate file name>
The
filename for a reference coordinate set, for analysis, but not for refinement. Structure factors will be
calculated to use as a reference, in the same way as HKLREF. This provides a current "best" estimate
of intensity, and the
observed data is analysed for its agreement as a function of batch,
as
R-factors and correlation coefficients, so that particularly bad
regions of data may be detected. The file
should contain a valid space group name (full name with spaces, eg
"P
21
21 21", "P 1 21 1" etc) and unit cell parameters (ie a CRYST1 line
in
PDB format).
ROGUES <rogues file name
File name for rogues file, otherwise ROGUES or assigned on the
command line
INPUT AND OUTPUT FILES
Input
- HKLIN
- The input file must be sorted on H K L M/ISYM BATCH
Compulsory columns:
H K L indices
M/ISYM partial flag, symmetry number
BATCH batch number
I intensity (integrated intensity)
SIGI sd(intensity) (integrated intensity)
Optional columns:
XDET YDET position on detector of this reflection: these
may be in any units (e.g. mm or pixels), but the
range of values must be specified in the
orientation data block for each batch.
ROT rotation angle of this reflection ("Phi"). If
this column is absent, only SCALES BATCH is valid.
IPR intensity (profile-fitted intensity)
SIGIPR sd(intensity) (profile-fitted intensity)
SCALE previously calculated scale factor (e.g. from
previous run of Scala). This will be applied
on input
SIGSCALE sd(SCALE)
TIME time for B-factor variation (if this is
missing, ROT is used instead)
MPART partial flag from Mosflm
FRACTIONCALC calculated fraction, required to SCALE PARTIALS
LP Lorentz/polarization correction (already applied)
FLAG error flag (packed bits) from Mosflm (v6.2.3
or later). By default, if this column is present,
observations with a non-zero FLAG will be
omitted. They may be conditionally accepted
using the KEEP command (qv)
Bit flags:
1 BGRATIO too large
2 PKRATIO too large
4 Negative > 5*sigma
8 BG Gradient too high
16 Profile fitted overload
32 Profile fitted "edge" reflection
BGPKRATIOS packed background & peak ratios, & background
gradient, from Mosflm, to go with FLAG
LATTNUM lattice number for multilattice data
Hn, Kn, Ln hkl indices for overlapped observations with multilattice data
HKLREF reference file for analysis of agreement by
batch.
This may contain intensities or amplitudes (which will be squared),
eg
the FC_ALL_LS column from Refmac. The label is specified on the
LABREF
command
XYZIN as an alternative to HKLREF, a coordinate file may be
given, from which amplitudes and intensities will be calculated
Output
Reflection files output
In all cases, separate files
are written for each dataset: files are named with the base
HKLOUT name with the dataset name appended, as "_dataset"
(a) HKLOUT: option OUTPUT [MTZ] MERGED
The output file contains columns
H K L IMEAN SIGIMEAN I(+) SIGI(+) I(-) SIGI(-)
Note that there are no M/ISYM or BATCH columns. I(+) & I(-)
are
the means of the Bijvoet positive and negative reflections
respectively
and are always present even for the option ANOMALOUS OFF.
- (b) HKLOUTUNMERGED: option OUTPUT [MTZ] UNMERGED
- Unmerged data with scales applied, with no partials (i.e.
partials
have been summed or scaled, unmatched partials removed), &
outliers
rejected. Only a single scaled intensity value is written,
chosen as
summation, profile-fitted or combined as specified by the
INTENSITIES
command. Columns defining the diffraction geometry (e.g.
FRACTIONCALC XDET YDET ROT TIME WIDTH LP) will be
preserved in the output
file. If HKLOUTUNMERGED is not specified, then the filename
for the unmerged file has "_unmerged" appended to HKLOUT
Output columns:
H,K,L REDUCED or ORIGINAL indices (see OUTPUT options)
M/ISYM Symmetry number (REDUCED), = 1 for ORIGINAL indices
BATCH batch number as for input
I, SIGI scaled intensity & sd(I)
SCALEUSED scale factor applied
SIGSCALEUSED sd(SCALE applied)
NPART number of parts, = 1 for fulls, negated for scaled
partials, i.e. = -1 for scaled single part partial
FRACTIONCALC total fraction (if present in input file)
TIME copied from input if present
XDET,YDET copied from input if present
ROT copied from input if present (averaged for
multi-part partials)
WIDTH copied from input if present
LP copied from input if present
(c) SCALEPACK: option OUTPUT SCALEPACK MERGED
If a SCALEPACK filename is not
specified then the filename will be taken from HKLOUT with
the extension ".sca"
(d) SCALEPACKUNMERGED: option OUTPUT SCALEPACK UNMERGED
If a SCALEPACKUNMERGED filename is
not
specified then the filename will be taken from SCALEPACK
with
"_unmerged" appended and the extension ".sca"
Other output files
- XMLOUT
- XML output for plotting etc. It includes the NORMPLOT,
ANOMPLOT, CORRELPLOT and ROGUEPLOT data, as well as the $TABLE
graph data
- SCALES
- scale factors from DUMP, used by RESTORE option
- ROGUES
- list of bad agreements
- TILEIMAGE
- a detector image representing the CCD TILE correction, if activated, in ADSC image format which
may be viewed with adxv
The following 4 files are also represented in
the XMLOUT file:
- NORMPLOT
- normal probability plot from merge stage
*** this is at present written is a format for plotting program
xmgr
(aka [xm]grace), but can also be read by loggraph ***
- ANOMPLOT
- normal probability plot of anomalous differences
(I+ - I-)/sqrt[sd(I+)**2 + sd(I-)**2]
*** this is at present written is a format for plotting program
xmgr (aka grace), but can also be read by loggraph ***
- CORRELPLOT
- scatter plot of pairs of anomalous differences (in multiples
of
RMS) from random half-datasets. One of these files is generated
for
each output dataset
*** this is at present written is a format for plotting program
xmgr (aka grace), but can also be read by loggraph ***
- ROGUEPLOT
- a plot of the position on the detector (on an ideal virtual
detector with the rotation axis horizontal) of rejected
outliers, with
the position of the principle ice rings shown
*** this is at present written is a format for plotting program
xmgr (aka grace), but can also be read by loggraph ***
REFERENCES
- P.R. Evans and ,G.N. Murshudov "How good are my data and
what is the resolution?" Acta Cryst. (2013). D69, 1204–1214
- P.R.Evans "An
introduction to data reduction: space-group determination,
scaling and
intensity statistics", Acta Cryst. D67, 282-292 (2011)
- P.R.Evans, "Scaling and assessment
of data quality", Acta
Cryst. D62, 72-82 (2006). Note that definitions of Rmeas
and Rpim in this paper are missing a square-root
on the
(1/n-1) factor
- W. Kabsch, J.Appl.Cryst. 21, 916-924
(1988)
- P.R.Evans, "Data reduction", Proceedings
of CCP4 Study Weekend,
1993, on Data Collection & Processing, pages 114-122
- P.R.Evans, "Scaling of MAD Data",
Proceedings of CCP4 Study
Weekend, 1997, on Recent Advances in Phasing,
Click here
- R.Read, "Outlier rejection", Proceedings
of CCP4 Study Weekend,
1999, on Data Collection & Processing
- Hamilton, Rollett & Sparks, Acta
Cryst. 18, 129-130 (1965)
- Blessing, R.H., Acta Cryst. A51, 33-38
(1995)
- Kay Diederichs & P. Andrew Karplus,
"Improved R-factors for
diffraction data analysis in macromolecular
crystallography", Nature
Structural Biology, 4, 269-275 (1997)
- Manfred Weiss & Rolf Hilgenfeld,
"On the use of the merging R factor as a quality indicator
for X-ray
data", J.Appl.Cryst. 30, 203-205 (1997)
- Manfred Weiss, "Global Indicators of X-ray data quality"
J.Appl.Cryst. 34, 130-135 (2001)
Appendix 1: Partially recorded reflections
In the input file, partials are flagged with
M=1 in the M/ISYM column, and have a calculated fraction in the
FRACTIONCALC
column. Data from Mosflm also has a column MPART which
enumerates each
part (e.g. for a reflection predicted to run over 3
images,
the 3 parts are
labelled 301, 302, 303), allowing a check that all parts have
been found:
MPART = 10 for partials already summed in MOSFLM.
Summed partials:
All the parts are summed (after applying scales) to give the
total
intensity,
provided some checks are passed. The parameters for the checks
are set by the
PARTIALS
command.
The number of reflections failing the checks is printed. You
should
make
sure that you are not losing too many reflections in these
checks.
- if the CHECK option is set (the default if an MPART column
is
present),
the MPART flags are examined. If they are consistent, the
summed
intensity
is accepted. If they are inconsistent (quite common), the
total
fraction
is checked (TEST).
NOCHECK switches off this check.
- if the TEST option is set (default), the
summed
reflection is accepted if the total fraction (the sum of the
FRACTIONCALC
values) lies between <lower_limit> ->
<upper_limit>
[default
limits = 0.95 1.05]
- if the CORRECT option is set, the total intensity is scaled
by
the
inverse total fraction for total fractions between
<minimum_fraction>
to <lower_limit>. This works also for a single unmatched
partial. This correction relies on accurate
FRACTIONCALC
values, so beware.
- if the GAP option is set (not recommended), partials with a
gap in are accepted, e.g.
a
partial over 3 parts with the middle one missing. The GAP
option
implies
TEST & NOCHECK, & the CORRECT option may also be set.
By setting the TEST & CORRECT limits, you can control
summation
& scaling of partials, e.g .
TEST 1.2 1.2 CORRECT 0.5
will scale up all partials with a total fraction between 0.5
&
1.2
TEST 0.95 1.05
will accept summed partials 0.95->1.05, no scaling
TEST 0.95 1.05 CORRECT 0.4
will accept summed partials 0.95->1.05, and scale up those
with
fractions
between 0.4 & 0.95
Appendix 2: Scaling algorithm
For each reflection h, we have a number of observations Ihl,
with
estimated
standard deviation shl, which defines a weight whl. We need to
determine
the inverse scale factor ghl to put each observation on a common
scale
(as Ihl/ghl). This is done by minimizing
Sum( whl * ( Ihl - ghl * Ih )**2 ) Ref Hamilton, Rollett & Sparks
where Ih is the current best estimate of the "true" intensity
Ih = Sum ( whl * ghl * Ihl ) / Sum ( whl * ghl**2)
An alternative method scales to
an external previously-determined reference dataset, minimising
Sum( whl * ( Ihl - ghl * Ihref )**2 )
where Ihref is the reference intensity and the weight whl =
1/(var(Ihl) + var(Ihref))
Each observation is assigned to a "run", which corresponds
to a set of scale factors. A run would typically consist of a
continuous
rotation of a crystal about a single axis.
The inverse scale factor ghl is derived as follows:
ghl = Thl * Chl * Shl
where Thl is an optional relative B-factor contribution, Chl is
a scale factor,
and Shl is a anisotropic correction expressed as spherical
harmonics
(ie SECONDARY, ABSORPTION options).
a) B-factor (optional)
For each run, a relative B-factor (Bi) is determined at
intervals
in
"time" ("time" is normally defined as rotation angle
if no independent time value is available), at positions ti (t1,
t2, .
. tn). Then for an observation measured at time tl
B = Sum[i=1,n] ( p(delt) Bi ) / Sum (p(delt))
where Bi are the B-factors at time ti
delt = tl - ti
p(delt) = exp ( - (delt)**2 / Vt )
Vt is "variance" of weight, & controls the smoothness
of interpolation
Thl = exp ( + 2 s B )
s = (sin theta / lambda)**2
b) Scale factors
For each run, scale factors Cz are determined at intervals on
rotation angle z. Then for an observation
at position (z0),
Chl(z0) =
Sum(z)[p(delz)*Cz]/Sum(z)[p(delz)]
where delz = z - z0
p(delz) = exp(-delz**2/Vz)
Vz is the "variance" of the weight & controls the smoothness of interpolation
For the SCALES BATCH option, the scale along z is
discontinuous:
the normal option has one scale factor for each batch.
c) Anisotropy factor
The optional surface or anisotropy factor Shl is expressed as
a
sum of spherical harmonic terms as a function of the direction
of
(1) the secondary beam (SECONDARY correction) in the camera
spindle
frame,
(2) the secondary beam (ABSORPTION correction) in the crystal
frame,
permuted to put either a*, b* or c* along the spherical polar
axis
- SECONDARY beam direction (camera frame)
s = [Phi] [UB] h
s2 = s - s0
s2' = [-Phi] s2
Polar coordinates:
s2' = (x y z)
PolarTheta = arctan(sqrt(x**2 + y**2)/z)
PolarPhi = arctan(y/x)
where [Phi] is the spindle rotation matrix
[-Phi] is its inverse
[UB] is the setting matrix
h = (h k l)
- ABSORPTION: Secondary beam direction (permuted crystal
frame)
s = [Phi] [UB] h
s2 = s - s0
s2c' = [-Q] [-U] [-Phi] s2
Polar coordinates:
s2' = (x y z)
PolarTheta = arctan(sqrt(x**2 + y**2)/z)
PolarPhi = arctan(y/x)
where [Phi] is the spindle rotation matrix
[-Phi] is its inverse
[Q] is a permutation matrix to put
h, k, or l along z (see POLE option)
[U] is the orientation matrix
[B] is the orthogonalization matrix
h = (h k l)
then
Shl = 1 + Sum[l=1,lmax] Sum[m=-l,+l] Clm Ylm(PolarTheta,PolarPhi)
where Ylm is the spherical harmonic function for
the direction given by the polar angles
Clm are the coefficients determined by
the program
Notes:
- The initial term "1" is essentially the l = 0 term, but with
a
fixed
coefficient.
- The number of terms = (lmax + 1)**2 - 1
- Even terms (ie l even) are centrosymmetric, odd terms
antisymmetric
- Restraining all terms to zero (with the TIE SURFACE) reduces
the
anisotropic correction. This should always be done
(d)
Detector correction (TILES)
A correction for tiled CCD
detectors
has been implemented to attempt to correct for the underestimation
of
spots falling in the corner of the detector. The present model
expresses a correction factor in terms of an erfc function of the
distance from the tile centre, such that the correction = 1 in the
centre of the tile and falls off at the edge and corners
For a spot at position x,y relative to the tile centre, normalised
by
the tile width in pixels such that x & y run from -1 to +1, then
distance from centre (x0,y0) d =
sqrt[(x-x0)2 + (y-y0)2]
correction factor g = A f(z) + 1 - A where
A is
the amplitude of the correction near the edge and f(z) is a radial
function of the modified "radius" z = (2/w)(d - r - w)
. r defines the point at which the scale starts to
decline
from 1.0, and w the "width" of the fall-off
Currently f(z) = 0.5 erfx(z) though other expressions have been
tried
Amplitude A various azimuthally with the angle phi = tan-1(y/x)
as a Fourier series, A = A0{a cos(phi) + b sin(phi) + c cos(2phi) +
d sin(2phi)}
Refined parameters for each tile are r, w, A0, x0, y0, and the four
Fourier terms for A, a,b,c,d.
By default, parameters are restrained (TIE) as follows (see TIE TILE)
A0, a,b,c,d and x0,y0 are tied to
0.0 with their SDs
r, w are tied to target values with their SDs [default 0.70, 0.40]
r, w, and A0 are tied to be similar over all tiles
Five SD values control the strength of the restraints,
respectively for r, w, A0, x0|y0, and abcd
SD = 0 switches off the restraint
Appendix 3: Data from Denzo
DENZO is often run refining the cell and orientation angles
for
each image independently, then postrefinement is done in
Scalepack. It is essential that you do this postrefinement.
Either
then reintegrate the images with the cell parameters fixed,
or use unmerged output from scalepack as input to Aimless. The
DENZO or
SCALEPACK outputs will need to be converted to a multi-record
MTZ file
using COMBAT (see COMBAT documentation) or POINTLESS (for
Scalepack
output only).
Both of these options have some problems
- If you take the output from Denzo into Scala, there may be
problems with partially recorded reflections: it is difficult
for
Scala to determine reliably that it has all parts of a partial
to sum
together.
- If you take unmerged output from scalepack into Aimless,
most of
the
geometrical information about how the observations were
collected is
lost, so many of the scaling options in Aimless are not
available. Only
Batch scaling can be used, but simultaneous scaling of several
wavelengths or derivatives may still be useful
Appendix 4: Outlier algorithm
The test for outliers is as follows:
- (1) if there are 2 observations (left), then
- (a) for each observation Ihl, test deviation
Delta(hl) = (Ihl - ghl Iother) / sqrt[sigIhl**2 + (ghl*sdIother)**2]
against sdrej2, where Iother = the other observation
- (b) if either |Delta(hl)| > sdrej2, then
- in scaling, reject reflection. Or:
- in merging,
- keep both (default or if KEEP subkey given) or
- reject both (subkey REJECT) or
- reject larger (subkey LARGER) or
- reject smaller (subkey SMALLER).
- (2) if there 3 or more observations left, then
- (a) for each observation Ihl,
- calculate weighted mean of all other observations
<I>n-1 &
its sd(<I>n-1)
- deviation
Delta(hl) =
(Ihl - ghl <I>n-1>) / sqrt[sigIhl**2 + (ghl*sd(<I>n-1))**2]
- find largest deviation max|Delta(hl)|
- count number of observations for which Delta(hl) .ge. 0
(ngt), &
for which Delta(hl) .lt. 0 (nlt)
- (b) if max|Delta(hl)| > sdrej, then reject one
observation,
but
which
one?
- if ngt == 1 .or. nlt == 1, then one observation is a
long
way
from
the others, and this one is rejected
- else reject the one with the worst deviation
max|Delta(hl)|
- (3) iterate from beginning
RELEASE NOTES
0.7.4
More robust normalisation for Emax test. Keep Emax outliers if
most
observations of a reflection are large. Normalisation
still needs
more work.
0.7.3 Make it work when excluding 1st of multiple datasets. Make
initial
scaling more robust by using weighted mean(I). Do Emax test
before
outlier test. More robust normalisation; Emax rejection
also
tests I/sd(I) to avoid rejecting weak data in shells with small
<I>.
0.7.2 small fix to SD analysis table
0.7.1 allow gaps in data if explicit runs are given (unless you
give
"initial minimum_overlap > 0.0"). Improved analysis of
secondary
corrections
0.7.0 REFINE REFERENCE option, for scaling to external reference
intensities
0.6.4 monitor deviant but kept observations in ROGUES file
0.6.3 trap RESTORE with one parameter, turn on ONLYMERGE
0.6.2 Bug fix for only one batch (eg from merged data)
0.6.1 Group batches for analysis, see
ANALYSIS GROUPBATCH. Improve SDCORRECTION SAMPLE, sample
variances: compare individual propagated SDs with sample SDs.
Limit
Emax test to "reliable" resolution range (not very weak high
resolution
ranges, needs further improvement). Many variables converted
from float
to double. Chi^2 statistic against intensity, resolution and
batch.
Cumulative CC(1/2) vs. batch. Do the analysis even if scaling
can't be
done due to insufficient information in input reflection file
(usually
data that is already scaled eg in XSCALE)
0.5.29 trap case of insufficient data with one rotation range,
make dump/restore work for that case
0.5.28 small format change in RunPairs to avoid column
coalescence
0.5.27 More accurate averaging of wavelengths
0.5.26 REJECT EMAX 0.0 switches off Emax test. Big speed-up in
SF calculation (cf Pointless)
0.5.25 fix bug introduced in 0.5.24 which removed from the
merged file I+ & I- if either were negative
0.5.24
fix bug if different resolutions for different runs. Small bug
fix in
secondary beam scaling. Switch off secondary scaling in first
pass,
stabilises the scaling in some cases. Trap negative scales
(shouldn't
happen). Fixes to sample SDs (for high multiplicity), compare
sample SD
to propagated SD. Output secondary beam corrections to logfile
and XML.
Reset SdFac after first round scaling. Default SDCORRECTION SAME
(instead of INDIVIDUAL)
0.5.22, 23
Improved (ie corrected from version 0.5.18) detection of parts
of data
which do not have any scaling overlaps with other rotation
ranges, see
INITIAL MINIMUM_OVERLAP & MAXIMUM_GAP. Some changes to
resolution limit determination
0.5.21 if XMLOUT is assigned on the command line, open it early
so that syntax errors get added
0.5.19 add resolution limit at I/sd > 2 for Frank von Delft
0.5.18
Some changes to improve robustness for low multiplicity, mainly
for
small molecule data. No scaling if multiplicity is too low.
Added
INITIAL MINIMUM_MULTIPLICITY option. Changes to choice of
observations
for scaling and SD optimisation. Trap negative secondary scales.
Correct derivatives in TIEs (doesn't make much difference)
0.5.17 Bug fix to error and warning printing
0.5.16
improved the robustness of SDcorrection refinement with a few
fulls.
REJECT NONE option.
Option to accept XDS "misfits" (outliers) (KEEP MISFIT)
0.5.15 bug fix in run pair correlations. Fix to XML for multiple
datasets
0.5.14
bug fix in Rmerge(batch), was double counting. Change to expect
stdin
input unless "--no-input" given on command line. Added
"descriptions"
to graphs, for ccp4i2.
0.5.13 fix memory leak in resolution calculation
0.5.10,11,12
bug fix in setting spherical harmonic orders. Keep empty batches
in
unmerged file output. Bug fix in scaling one lattice from
multilattice
data
0.5.9 bug fix to allow ABSORPTION <lmax> to work. Fix for
restore problem with variances & tiles
0.5.8
bug fix for case SCALES CONSTANT BROTATION with one run (not
sensible
anyway). Also fixed bug when there are different resolution
limits for
different datasets
0.5.7 minor bug fix to resolution tables
0.5.6 bug fix for SCALE CONSTANT with more than one run
0.5.5 bug fix in radiation damage analysis. Rescale Scalepack
output if intensities are small
0.5.4 bug fix for Sca output with one of I+ or I- missing
0.5.3 fix save/restore bug for BFACTOR OFF
0.5.2 bug fix for already merged data. Fix long-standing rare
bug in hash table
0.5.1 "improved" SD correction refinement. Added [UN]LINK
commands, imporved default linking
0.4.10 add SD analysis graph to XML
0.4.8,9 improved robustness of maximum resolution curve fit
0.4.7 better trap for no data in SDCORRECTION refinement
0.4.5,6 fill in missing IPR columns from I, shouldn't normally
happen
0.4.2,3,4 Bug fixes. Unmerged SCA files written with corrected
symmetry translations
0.4.1
Inflate sd(I) using estimated parameter errors from inverse
normal
matrix (see USESDPARAMETER). Fixed bug in TILE correction. Added
curve fit for maximum resolution estimation
0.3.11 Fixed nasty bug from XDS->Pointless giving Assertion
failed: (sd > 0.0), function Average
0.3.10 Bug fixes. Also restrict run-run correlations to < 200
runs.
0.3.9 Added matrix of run-run cross-correlations
0.3.8 bug fixes to make EXCLUDE BATCH <range> option work
0.3.7
Bug fixes for unusual case of runs with all fulls and no fulls.
Options
for XFEL data: SDCORRECTION SAMPLESD; Rsplit. Bug fixes for
Batch
scaling with rejected batches. REJECT BATCH option for batch
scaling
0.3.6 Fix bug with explicit RUN definitions.
0.3.4,5
remove debug print for self-overlaps. Pick up number of parts
for
previously summed partials (MPART column from Feckless), for
partial
bias analysis
0.3.3 fixed save/restore for TILE correction. Fixed reading of
SDcorrection parameters
0.3.2 more corrections to multilattice handling (mapping lattice
number to run number for scaling)
0.3.1
optional reference data for analysis of agreement by batch,
either as
structure factors (or intensities) HKLREF,LABREF, or coordinates
(XYZIN)
0.2.20 fix bug for single B-factor/run
0.2.18,19 updates from Pointless for multiple lattices.
Corrected calculation of anomalous multiplicity
0.2.17 fix bug in setting same resolution bin widths for
multiple datasets when NBINS is set
0.2.16 message for std::bad_alloc, running out of memory
0.2.15 fix to XML graphs (for ccp4i2)
0.2.14 fix to correctly append to MTZ history
0.2.13 activate writing spacegroup confidence. Reflection status
flags cleared before outlier checks
0.2.12 fix bug in reading multilattice files
0.2.10 small bug fix in radiation damage analysis
0.2.9 fix for Batch scaling if no phi range information
0.2.8 XML changes for I2 report. Change automatic anomalous
thresholds, always output anom statistics
0.2.7 Bug fix in XML if no orientation data (ROGUEPLOT)
0.2.6 Fix to output multilattice overlaps. Added radiation
damage analysis as in CHEF, for Graeme Winter
0.2.5 Fix so that BINS RESOLUTION works
0.2.4 Bug for XDS data, was omitting reflections with
FRACTIONCALC (derived from IPEAK) < 0.95, leading to
incompleteness
0.2.3
Now does reject and record Emax outliers properly (though work
is
continuing on improving this). Fixed small bug in
analyseoverlaps.
0.2.2 fixed bug in Bdecay plot when batches omitted. Explicit
Xrange for XML
batch plots. No ROGUEPLOT if no orientation data. List overlaps
in
ROGUES file
0.2.1
some major reorganisations. Added XML output. SCALES TILE
option. Handling of multilattice data. SDCORRECTION SIMILAR
0.1.30 allow TIE with negative sd to turn off tie, as
documented. Also fixed bug in ABSORPTION
0.1.29 small change to Result table to work with Baubles arcane
(and undocumented) rules for Magic Tables
0.1.28 bug fix in "sdcorrection same"
0.1.27 bug fix in minimizer which sometimes affected the case
with just 2 parameters
0.1.26 Default to "scales secondary"
0.1.25 omit sigI<=0, process REJECT command properly, small
bug fix in smoothed Bfactors
0.1.24 small bug fix in printing batch tables with multiple
datasets
0.1.22,23 INITIAL UNITY option. In tables,
print batches with no observations but not rejected batches. Put
title
into output file. Fix initial scale bug with 3 scales
0.1.21 corrections to ROGUEPLOT, ice rings were in wrong place
(by a factor of wavelength)
0.1.20 made sdcorrection refinement more
robust to low multiplicity. If anomalous off (or no anomalous
detected), statistics are now printed over all I+ I- together.
Reject large negative observations (default E < -5)
0.1.19 preliminary addition of spg_confience(|| status). Bug fix
from valgrind (from Marcin)
0.1.18 changed tablegraph to fix compilation problem (va_start)
0.1.17 bug fix in outlier rejection, problem
with large variances leading to inconsistencies in Rogues file
and some
over-rejection
0.1.16 made SDcorrection refinement more robust
0.1.14,15 various bug fixes (including
memory leaks), fixed autorun generation, improved SD correction
for
large anomalous, constrain cell to lattice group, etc
0.1.12 Half-dataset CC labelled as "CC(1/2)"
0.1.11 Small bug fixes
0.1.9 autodetect anomalous. Plot Rmeas for each run
0.1.7 fix for SCALES CONSTANT from XSCALE
0.1.6 anisotropy analysis against
planes in trigonal, hexagonal and tetragonal systems (inlcuding
rhombohedral axes), principal
anisotropic axes in monoclinic and triclinic, cone analyses
weighted
according to cos(AngleFromPrincipalDirection). Fixed cases where
multiple datasets have different resolution limits
0.1.4,5 more fixes for multiple datasets, dump/restore. OUTPUT
UNMERGED SPLIT is default
0.1.3 More "resolution run"bug fixes
0.1.2 REFINE PARALLEL option (thanks to Ronan Keegan). Fixed bug
in "resolution run" options
0.1.1 fixed bugs in writing ROGUES file;
introduced HKLOUTUNMERGED etc filename specifiers; cleaned
up
Unmerged output; added Rfull to tables
0.1.0 fixed some bugs found by cppcheck and valgrind
0.0.16 fixed small bug in INTENSITIES COMBINE optimisation
0.0.15 if run definitions are given explicitly, then unspecified
batches are excluded
0.0.14 Added optimisation for INTENSITIES COMBINE, for
Mosflm data. This is now the default
AUTHOR
Phil Evans, MRC Laboratory of Molecular Biology, Cambridge
(pre@mrc-lmb.cam.ac.uk)
See above for Release Notes.
SEE ALSO