AIMLESS (CCP4: Supported Program)

NAME

aimless
- scale together multiple observations of reflections

SYNOPSIS

aimless HKLIN foo_in.mtz HKLOUT foo_out.mtz
[Keyworded Input]

References
Input and Output files
Release Notes

DESCRIPTION

Running the program
Scaling options
Control of flow through the program
Partially recorded reflections
Scaling algorithm
Data from Denzo
Datasets

This program scales together multiple observations of reflections, and merges multiple observations into an average intensity: it is a successor program to SCALA

Various scaling models can be used. The scale factor is a function of the primary beam direction, either as a smooth function of Phi (the rotation angle ROT), or expressed as BATCH (image) number (strongly deprecated). In addition, the scale may be a function of the secondary beam direction, acting principally as an absorption correction expanded as spherical harmonics. The secondary beam correction is related to the absorption anisotropy correction described by Blessing (Ref Blessing (1995) ).

The merging algorithm analyses the data for outliers, and gives detailed analyses. It generates a weighted mean of the observations of the same reflection, after rejecting the outliers.

The program does several passes through the data:

initial estimate of the scales
first round scale refinement, using strong data using an I/sigma(I) cutoff
first round of outlier rejection
if both summation and profile-fitted intensity estimates are present (eg from Mosflm), then the cross-over point is determined between using profile-fitted for weak data and summation for strong data.
first analysis pass to refine the "corrections" to the standard deviation estimates
final round scale refinement, using strong data within limits on the normalised intensity |E|^2
final analysis pass to refine the "corrections" to the standard deviation estimates
final outlier rejections
a final pass to apply scales, analyse agreement & write the output file, usually with merged intensities, but alternatively as file with scaled but unmerged observations, with partials summed and outliers rejected, for each dataset

Anomalous scattering is ignored during the scale determination (I+ & I- observations are treated together), but the merged file always contains I+ & I-, even if the ANOMALOUS OFF command is used. Switching ANOMALOUS ON does affect the statistics and the outlier rejection (qv)

Running the program

Aimless will often be run from the CCP4 GUI, but may also be run from a script. In a script the input and output files may be assigned on the command line, or some of them (marked with an asterisk in the list below) may be assigned as keyworded input commands. The option switch "--no-input" forces the program to run immediately with default options, without waiting for input commands, using file assignments from the command line.

Input files:
HKLIN*, HKLREF*, XYZIN*

Output files:
HKLOUT*, XMLOUT*, SCALES*, ROGUES*, TILEIMAGE

Plot files also represented in XMLOUT:
ROGUEPLOT, NORMPLOT, ANOMPLOT, CORRELPLOT

Explicit file assignments for optional output reflection files, otherwise generated from HKLOUT:
HKLOUTUNMERGED, SCALEPACK, SCALEPACKUNMERGED

Scaling options

The optimum form of the scaling will depend a great deal on how the data were collected. It is not possible to lay down definitive rules, but some of the following hints may help. For most purposes, my normal recommendation is the default

  scales rotation spacing 5 secondary  bfactor on brotation spacing 20

Other hints:-

Only use the SCALE BATCH option if every image is different from every other one, i.e. off-line detectors (including film), or rapidly or discontinuously changing incident beam flux. This is rarely the case for synchrotron data, but is appropriate for serial data (eg XFEL). This mode may be VERY slow if there are many batches.
If there is a discontinuity between one set of images and another (e.g. change of exposure time), then flag them as different RUNs. This will be done automatically if no runs are specified.
The SECONDARY correction is recommended and is the default: this provides a correction for absorption. It should always be restrained with a TIE SURFACE command (this is the default): under these conditions it is reasonably stable under most conditions. The ABSORPTION (crystal frame) correction is similar to SECONDARY (camera frame) in most cases, but may be preferable if data has been collected from multiple alignments of the same crystal.
Use a B-factor correction unless the data are only very low-resolution. Traditionally, the relative B-factor is a correction for radiation damage (hence it is a function of time), but it also includes some other corrections eg absorption.
When trying out more complex scaling options, it is a good idea to try a simple scaling first, to check that the more elaborate model gives a real improvement.
When scaling multiple MAD data sets they should all be scaled together in one pass, outliers rejected across all datasets, then each wavelength merged separately. This is the default if multiple datasets are present in the input file.

Other options are described in greater detail under the KEYWORDS.

Control of flow through the program

The ONLYMERGE flag skips the scaling (often in conjuction with RESTORE to read in previously determined scales), calculates statistics and outputs the data.

Partially recorded reflections

See appendix 1

The different options for the treatment of partials are set by the PARTIALS command. Partials may either be summed or scaled : in the latter case, each part is treated independently of the others.

Summed partials [default]:
All the parts are summed (after applying scales) to give the total intensity, provided some checks are passed. The number of reflections failing the checks is printed. You should make sure that you are not losing too many reflections in these checks.

Scaled partials:
In this option, each individual partial observation scaled up by the inverse FRACTIONCALC, provided that the fraction is greater than <minimum_fraction> [default = 0.5]. This only works well if the calculated fractions are accurate, which is not usually the case.

Scaling algorithm

The normal scaling method improves the internal consistency of the dataset by minimising

Sum( whl * ( Ihl - ghl * Ih )**2 )

See appendix 2 for more details

Scaling to reference

THIS OPTION HAS NOT BEEN EXTENSIVELY TESTED

An alternative method scales to an external previously-determined reference dataset, minimising

Sum( whl * ( Ihl - ghl * Ihref )**2 )

where Ihref is the reference intensity and the weight whl = 1/(var(Ihl) + var(Ihref))

This option might be useful for example in scaling long-wavelength data with high absorption to a short-wavelength set from a similar crystal. It is specified with the command REFINE REFERENCE. Reference intensities are taken from an MTZ file of merged intensities specified as HKLREF (command or command line). If intensities are not available, amplitudes F are accepted, and will be squared to intensities, but note that Fs which come from the French & Wilson "truncate" procedure are seriously biased for small intensities, so Fs are deprecated. A coordinate reference XYZIN is not accepted, as that does not seem to work well (in some limited tests).
By default, the first intensity column (or amplitude column) in the file is used, or the column may be explicitly set using the LABREF command. Note that provided the columns are contiguous, only the first of the set need be specified or chosen automatically
eg LABREF I=I(+) will pick up I(+), SIGI(+), I(-), SIGI(-)

Data from Denzo

Data integrated with Denzo may be scaled and merged with Aimless as an alternative to Scalepack, or unmerged output from scalepack may be used. Both have some limitations. See appendix 3 for more details.

Datasets

TBD

KEYWORDED INPUT - DESCRIPTION

In the definitions below "[]" encloses optional items, "|" delineates alternatives. All keywords are case-insensitive, but are listed below in upper-case. Anything after "!" or "#" is treated as comment. The available keywords are:

ANALYSIS, ANOMALOUS, BINS, DUMP, EXCLUDE, HKLIN, HKLOUT, HKLREF, INITIAL, INTENSITIES, KEEP, LABREF, LINK, NAME, ONLYMERGE, OUTPUT, PARTIALS, REFINE, REJECT, RESOLUTION, RESTORE, ROGUES, RUN, SCALES, SDCORRECTION, TIE, TITLE, UNLINK, USESDPARAMETER, XMLOUT, XYZIN

RUN <Nrun> BATCH <b1> to <b2>

Define a "run" : Nrun is the Run number, with an arbitrary integer label (i.e. not necessarily 1,2,3 etc). A "run" defines a set of reflections which share a set of scale factors. Typically a run will be a continuous rotation around a single axis. The definition of a run may use several RUN commands. If no RUN command is given then run assignment will be done automatically, with run breaks at discontinuities in dataset, batch number or Phi. If any RUN definitions are given, then all batches not explicitly specified will be excluded.

SCALES [<subkeys>]

Define layout of scales, ie the scaling model. Note that a layout may be defined for all runs (no RUN subkeyword), then overridden for particular runs by additional commands.

Subkeys:

RUN <run_number>: Define run to which this command applies: the run must have been previously defined. If no run is defined, it applies to all runs
ROTATION <Nscales> | SPACING <delta_rotation>: Define layout of scale factors along rotation axis (i.e. primary beam), either as number of scales or (if SPACING keyword present) as interval on rotation [default SPACING 5]
BATCH: Set "Batch" mode, no interpolation along rotation (primary) axis. This option is compulsory if a ROT column is not present in the input file, but otherwise the ROTATION option is preferred. WARNING: this option is not optimised and may take a very long time if you have many batches
BFACTOR ON | OFF: Switch Bfactors on or off. The default is ON.
BROTATION <Ntime> | SPACING <delta_time>: Define number of B-factors or (if SPACING keyword present) the interval on "time": usually no time is defined in the input file, and the rotation angle is used as its proxy [default SPACING 20].
SECONDARY [<Lmax>]: Secondary beam correction expanded in spherical harmonics up to maximum order Lmax in the camera spindle frame. The number of parameters increases as (Lmax + 1)**2, so you should use the minimum order needed (eg 4 - 6, default 4). The deviation of the surface from spherical should be restrained eg with TIE SURFACE 0.001 [default]. Set Lmax = 0 to switch off
ABSORPTION [<Lmax>]: Secondary beam correction expanded in spherical harmonics up to maximum order Lmax in the crystal frame based on POLE (qv). The number of parameters increases as (Lmax + 1)**2, so you should use the minimum order needed (eg 4 - 6, default 4). The deviation of the surface from spherical should be restrained eg with TIE SURFACE 0.001 [default]. This is not substantially different from SECONDARY in most cases, but may be preferred if data are collected from multiple settings of the same crystal, and you want to use the same absorption surface. This would only be strictly valid if the beam is larger than the crystal.
POLE <h|k|l>: Define the polar axis for ABSORPTION or SURFACE as h, k or l (eg POLE L): the pole will default to either the closest axis to the spindle (if known), or l (k for monoclinic space-groups).
CONSTANT: One scale for each run (equivalent to ROTATION 1)
TILE <NtileX> <NtileY> [CCD]: Define a detector scale for each tile. Currently this implements a scale model for 3x3 tiled CCD detectors to correct for the underestimation of intensities in the corners of the tile, see Appendix 2. If the detector appears to be a 3x3 CCD (3072x3072 pixels) then this correction will be activated automatically unless the NOTILE keyword is given. The parameters are restrained using the TIE TILE parameters (qv)
NOTILE: Switch off the automatic TILE 3 3 correction for CCD detectors

SDCORRECTION [[NO]REFINE] [INDIVIDUAL | SAME [FIXSDB]

[RUN <RunNumber>] [FULL | PARTIAL] <SdFac> [<SdB>] <SdAdd> [DAMP <dampfactor>]

[SIMILAR [<sd1> <sd2> <sd3>]] ||

[[NO]TIE SdFac | SdB | SdAdd <targetvalue> <SDtarget>]

[SAMPLESD]

Input or set options for the "corrections" to the input standard deviations: these are modified to

        sd(I) corrected = SdFac * sqrt{sd(I)**2 + SdB*Ihl + (SdAdd*Ihl)**2}

where Ihl is the intensity and (SdB may be omitted in the input).
The default is "SDCORRECTION REFINE INDIVIDUAL", If explicit values are given, the default changes to NOREFINE.

The keyword REFINE controls refinement of the correction parameters, essentially trying to make the plot of the SD of the distribution of fraction deviations (Ihl - )/sigma = 1.0 over all intensity ranges. The residual minimised is Sum( w * (1 - SD)^2) + Restraint Residual

SAMPLESD is intended for very high multiplicity data such as XFEL serial data. The final SDs are estimated from the weighted population variance, assuming that the input sigma(I)^2 values are proportional to the true errors. This probably gives a more realistic estimate of the error in . In this case refinement of the corrections is switched off unless explicitly requested.

Other subkeys control what values are determined and used for each run (if more than one). TIE and SIMILAR are mutually exclusive

SAME [default] same SD parameters for all runs, different for fulls and partials
INDIVIDUAL use different SD parameters for each run, fulls and partials
FIXSDB fixes the SdB parameter in the refinement (but it seems best to let it refine, even though it has no obvious physical meaning)
DAMP set dampfactor to damp shifts in the refinement [default 0.05]
SIMILAR restrain parameters to be the same for all runs, with SDs optionally given for SdFac (sd1), SdB (sd2), and SdAdd (sd3) [defaults 0.2, 3.0, 0.04)
TIE set restraints for named parameter, "SdFac", "SdB", or "SdAdd". Each restraint is to a specified target value, with a weight = 1/(SDtarget^2). The default is to restrain SdB only, target value 0.0, SD 20. NOTIE removes all restraints, TIE without values sets the defaults.

RUN <run_number>
Define run for which values are given the run must have been previously defined. If no run is defined, it applies to all runs. Different values may be specified for fully recorded reflections (FULL) and for partially recorded reflections (PARTIAL), or the same values may be used for both if one set is given, e.g.

         sdcorrection full 1.4 0.11 part 1.4 0.05

USESDPARAMETER [NO | DIAGONAL | COVARIANCE]

For the final estimation of intensity errors sd(I), incorporate the estimated error in the refined scale model parameters, as estimated from the inverse normal matrix in the scale refinement. The default is DIAGONAL if this keyword is omitted, or given with no sub-keyword. "NO" switches it off. The DIAGONAL option uses the separate parameter variances, ie the diagonal of the variance/covariance matrix. COVARIANCE uses the full matrix, which is slower but may be more accurate.

The variance/covariance matrix [V] = Sum(wD^2)/(m-n) [H]^-1, where [H] is the normal (Hessian) matrix, Sum(wD^2) is the minimised residual, m the number of observations, and n the number of parameters.

The scaled intensity I'hl =   Ihl/ghl where ghl is its inverse scale factor

    Var(I')/I'^2 = Var(I)/I^2 + Var(g)/g^2    ie Var(I') = (1/g^2) [ Var(I) + I'^2 Var(g) ]

    Var(g) = [dg/dp]T [V] [dg/dp]      (COVARIANCE option)   where dg/dp is the vector of partial derivatives with respect to parameters p

    DIAGONAL approximation: Var(g) = Sum(i) { [dg/dp(i)]^2 V(i,i) }    ie summed over parameters i

PARTIALS [[NO]CHECK] [TEST [<lower_limit> <upper_limit>] [CORRECT <minimum_fraction>] [[NO]GAP [<maxgap>]]

Set criteria for accepting complete or incomplete partials. Default is CHECK TEST 0.95 1.05 CORRECT 0.95 NOGAP

After all parts have been assembled, the total observation is accepted if:-

the CHECK flag is set [default] and the MPART flags (if present) are all consistent (these flags indicate that a set of parts is eg 1 of 3, 2 of 3, 3 of 3)
if CHECK fails, then the total fraction is checked to lie between lower_limit & upper_limit [default 0.95, 1.05]
if this fails, then the incomplete partial is scaled up by the total fraction if it is > minimum_fraction [default 0.95] (NB Pointless has different default for a different purpose)
a reflection has a gap in the middle may be accepted if GAP is set, maxgap is maximum number of missing slots [not recommended: default 1 if GAP is set]

INITIAL UNITY | MEAN | MINIMUM_OVERLAP <minimum_overlap> | MAXIMUM_GAP <maximum_gap>

Set initial scale factors either based on mean intensities (MEAN, default) or all set to 1.0 (UNITY)

If the fractional overlap between rotation ranges is less than minimum_overlap in too many rotation ranges, then scaling will be switched off (ie ONLYMERGE and so will SD correction refinement (SDCORRECTION NOREFINE). Default value 0.05. Set to a value <= 0.0 to ignore this check. maximum_gap specifies the maximum number of contiguous rotation ranges which are allowed to fall below the minimum_overlap criterion, default 2.
Fractional overlap is (Number of observations with matching observations in a different rotation range)/(Total number of observations)

INTENSITIES [SUMMATION | PROFILE | COMBINE [<Imid>] [POWER <Ipower>]

Set which intensity to use, of the integrated intensity (column I) or profile-fitted (column IPR), if both are present. This applies to all stages of the program, scaling & averaging. Mosflm produces two different estimates of the intensity, from summation integration and from profile fitting. Generally the profile-fitted estimate is better, but for the strongest reflections the summation value is often better. The default is to use a weighted mean, depending on the "raw" intensity ie before LP correction (COMBINE option), and to optimise automatically the switch-over point Imid, to give the best overallR _meas.

Subkeys:

SUMMATION: use summation integrated intensity Isum.
PROFILE: use profile-fitted intensity Ipr.
COMBINE [<Imid>] [POWER <Ipower>]: Use weighted mean of profile-fitted & integrated intensity, profile-fitted for weak data, summation integration value for strong.

If no value is given for Imid, it will be automatically optimised

I = w*Ipr + (1-w)*Isum

w = 1/(1 + (Iraw/Imid)**Ipower)

Ipower defaults to 3.

REJECT
[SCALE | MERGE] [COMBINE] [SEPARATE]

<Sdrej> [<Sdrej2>]

[ALL <Sdrej+-> [<Sdrej2+->]]

[KEEP | REJECT | LARGER | SMALLER]

[EMAX <Emax>]

[BATCH <batchrejectfactor>]

[NONE]

Define rejection criteria for outliers: different criteria may be set for the scaling and for the merging passes. If neither SCALE nor MERGE are specified, the same values are used for both stages. The default values are REJECT 6 ALL -8, ie test within I+ or I- sets on 6sigma, between I+ & I- with a threshold adjusted upwards from 8sigma according to the strength of the anomalous signal. The adjustment of the ALL test is not necessarily reliable.

If there are multiple datasets, by default, deviation calculations include data from all datasets [COMBINE]. The SEPARATE flag means that outlier rejections are done only between observations from the same dataset. The usual case of multiple datasets is MAD data.

If ANOMALOUS ON is set, then the main outlier test is done in the merging step only within the I+ & I- sets for that reflection, ie Bijvoet-related reflections are treated as independent. The ALL keyword here enables an additional test on all observations including I+ & I-
observations. Observations rejected on this second check are flagged "@" in the ROGUES file.

REJECT BATCH <batchrejectfactor> is intended for batch scaling of eg XFEL data. After the initial scales are calculated, very weak batches with scale factorsbatchrejectfactor x median scale are rejected

REJECT NONE skips all outlier checking, REJECT EMAX 0.0 switches off Emax testing

Subkeys:

SEPARATE: rejection & deviation calculations only between observations from the same dataset
COMBINE: rejection & deviation calculations are done with all datasets [default]
SCALE: use these values for the scaling pass
MERGE: use these values for the merging (FINAL) pass
sdrej: sd multiplier for maximum deviation from weighted mean I [default 6.0]
[sdrej2]: special value for reflections measured twice [default = sdrej]
ALL: check outliers in merging step between as well as within I+ & I- sets (not relevant if ANOMALOUS OFF). A negative value [default -8] means adjust the value upwards according to the slope of the normal probability analysis of anomalous differences (AnomPlot)
sdrej+-: sd multiplier for maximum deviation from weighted mean I including all I+ & I- observations (not relevant if ANOMALOUS OFF)
[sdrej2+-]: special value for reflections measured twice [default = sdrej+-]
KEEP: in merging, if two observations disagree, keep both of them [default]
REJECT: in merging, if two observations disagree, reject both of them
LARGER: in merging, if two observations disagree, reject the larger
SMALLER: in merging, if two observations disagree, reject the smaller
EMAX: maximum acceptable value for E = normalised |F|, <= 0.0 to switch off test [default = 10.0 for acentrics]. Observations are only rejected if E > EMAX and I/sd(I) > sdrej, to allow for inaccurate normalisation in very weak high resolution bins.

The test for outliers is described in Appendix 4

ANOMALOUS [OFF] [ON]

OFF [default]: no anomalous used, I+ & I- observations averaged together in merging
ON: separate anomalous observations in the final output pass, for statistics & merging: this is also selected the keyword ANOMALOUS on its own

RESOLUTION [RUN <RunNumber>] [[LOW] <Resmin>] [[HIGH] <Resmax>]

Set resolution limits in Angstrom, either order, optionally for individual datasets. The keywords LOW or HIGH, followed by a number, may be used to set the low or high resolution limits explicitly: an unset limit will be set as in the input HKLIN file. If a RUN is specified this limit applies only to that run: this may a previous general limit for all runs, and may be used with automatic run generation. [Default use all data]

TITLE <new title>

Set new title to replace the one taken from the input file. By default, the title is copied from hklin to hklout

ANALYSIS [CONE <angle>] [CCMINIMUM <MinimumHalfdatasetCC>] [CCANOMMINIMUM <MinimumHalfdatasetAnomCC>] [ISIGMINIMUM <MinimumIoverSigma>] [BATCHISIGMINIMUM <MinimumBatchIoverSigma>] [GROUPBATCH <BatchGroupRange>]

Specify analysis parameters:

CONE specifies the half-angle (degrees) for cones around each reciprocal axis, for anisotropy analysis [default 20°].

CCMINIMUM & ISIGMINIMUM specify thresholds for estimation of suitable maximum resolution limits, both overall and along each reciprocal axis. These estimates are printed in the final Results summary, and give guide to possible cut-offs. BATCHISIGMINIMUM gives the threshold for the analysis of maximum resolution by batch, on <I/sd> before averaging. CCANOMMINIMUM is the threshold for analysis of the resolution limit of strong anomalous differences, from CC(1/2)anom.

Resolution estimates from CC(1/2) and CC(1/2)anom are done by fitting a function (1/2)(1 - tanh(z)) where z = (s - d0)/r, s = 1/d^2, and d0 is the value of s for which the function = 0.5, and r controls the steepness of falloff. For very negative CCs (usually from CCanom), an additional offset parameter dcc is added, {(1/2)(1 - tanh(z) * dcc - dcc + 1}. The fitted function is plotted along with the values. This curve-fitting was suggested by Ed Pozharski.

MinimumHalfdatasetCC minimum half-dataset CC(1/2) [default 0.3]
MinimumIoverSigma minimum </sd()> (=~ signal/noise) [default 1.5]
MinimumBatchIoverSigma minimum <I/sd(I)> (=~ signal/noise) [default 1.0, a smaller value as I/sd is before averaging]
MinimumHalfdatasetAnomCC minimum half-dataset CCanom [default 0.15]

BatchGroupRange: in the analyses against Batch, the batches (images) are grouped to reduce the number of ranges, with a group size of BatchGroupRange degrees [default 1.0 degrees]

ONLYMERGE

Only do the merge step, no initial analysis, no scaling. If RESTORE is also given, the SDCORRECTION optimising will also be skipped.

DUMP [<Scale_file_name>]

Dump all scale factors to a file after the main scaling. These can be used to restart scaling using the RESTORE option, or for rerunning the merge step. If no filename is given, the scales will be written to logical file SCALES, which may be assigned on the command line.

RESTORE [<Scale_file_name>]

Read scales and SDcorrection parameters from a SCALES file from a previous run of Aimless (see DUMP).

REFINE [CYCLES <Ncycle>] [BFGS | FH | REFERENCE] [SELECT <IovSDmin> <E2min> [<E2max>]]

[PARALLEL [AUTO] | <Nprocessors> | <Fractionprocessors>]

Define number of refinement cycles Ncycle and method for scale refinement.

    BFGS use BFGS optimisation (usual method)
    FH      use Fox-Holmes least-squares algorithm (not recommended)
    REFERENCE scale to an external reference dataset, specified as a merged MTZ file with the HKLREF command. This should contain intensities (either IMEAN or I+/I-), or amplitudes F which will be squared to intensities: intensities are strongly preferred, as squared Fs which have been "truncated" are significantly biased. The LABREF command may be used to specify the column label, otherwise the first intensity (or F) will be used. If I+ and I- are given, Imean for scale refinement is calculated as the unweighted mean. sigma(I) (if present) is assumed to be in the column following the intensity.

SELECT define selection limits for the two rounds of scaling. If unset, suitable values will be chosen automatically

IovSDmin /sd'(I) limit for selection of reflections for 1st round scaling (< 0 for automatic selection)
E2min minimum E² for selection of reflections for main scaling [default 0.8]
E2max maximum E² for selection of reflections for main scaling [default 5.0]

PARALLEL use multiple processors for the scale refinement steps, if available. This produces some speed-up for very large jobs.

For this option to be available, the program must be compiled and linked with the "-fopenmp" option, and the environment variable OMP_NUM_THREADS must be set to the maximum number of threads allowed by the system

<Nprocessors> number of processors to use (this will be forced to be < OMP_NUM_THREADS)
<Fprocessors> (< 1.0) fraction of OMP_NUM_THREADS to use
AUTO [default if no argument to PARALLEL] determine the number of processors to use from the number of observations in the file, currently 1 processor / 200 000 observations, up to the maximum allowed (the optimum settings for this have yet to be determined)

EXCLUDE BATCH <batch range>|<batch list>]

BATCH | <b1> <b2> <b3> ... | <b1> TO <b2> |: Define a list of batches, or a range of batches, to be excluded altogether.

TIE [SURFACE <Sd_srf>] [BFACTOR <Sd_bfac>] [ZEROB <Sd_zerob>] [ROTATION <Sd_z>] [TILE <Sd1-5>] [TARGETTILE <r0> w0>]

Apply or remove restraints to parameters. These can be pairs of neighbouring scale factors on rotation axis (ROTATION = primary beam) to have the same value, or neighbouring Bfactors, or surface spherical harmonic parameters to zero (for SECONDARY or SURFACE corrections, to keep the correction approximately spherical), with a standard deviation as given. This may be used if scales are varying too wildly, particularly in the detector plane. The default is no restraints on scales. A tie is recommended for SECONDARY or SURFACE corrections, eg TIE SURFACE 0.001. A negative SD value indicates no tie.

TILE: tie the CCD tile parameters. 5 SDs for radius r, width w, amplitude A, centre x0,y0, and Fourier coefficients; TARGETTILE: target values for tile parameters r and w

OUTPUT [MTZ] [NO]MERGED [UNMERGED [SPLIT|TOGETHER]] [SCALEPACK [MERGED | UNMERGED]]

Control what goes in the output file. Two types of output files may be produced, either in MTZ format or in Scalepack format: (a) MERGED (or AVERAGE), average intensity for each hkl (I+ & I-) (b) UNMERGED, unaveraged observations, but with scales applied, partials summed or scaled, and outliers rejected. Up to four types of files may be created at the same time: UNMERGED filenames are created from the HKLOUT filename (with dataset appended if there are multiple datasets) with the string "_unmerged" appended. If there are multiple datasets, by default MTZ files, merged or unmerged, are split into separate files (SPLIT). Unmerged MTZ files may optionally include all datasets if the keyword TOGETHER qualifies UNMERGED.

The default is to create a merged MTZ file for each dataset.

File format options:

NONE: no output file written
MERGED or AVERAGE: [default] output averaged intensities, <I+> & <I-> for each hkl
UNMERGED: apply scales, sum or scale partials, reject outliers, but do not average observations
SCALEPACK or POLISH: Write reflections to a formatted file in a format as written by "scalepack" (or my best approximation to it). If the UNMERGED option is also selected, then the output matches the scalepack "output nomerge original index", otherwise it is the "normal" scalepack output, with either I, sigI or I+ sigI+, I-, sigI-, depending on the "anomalous" flag.

KEEP [OVERLOADS|BGRATIO <bgratio_max>|PKRATIO <pkratio_max>|GRADIENT <bg_gradient_max>|EDGE | MISFIT]

Set options to accept observations flagged as rejected by the FLAG column from Mosflm. By default, any observation with FLAG .ne. 0 is rejected. Flagged reflections which are accepted may be marked in the ROGUES file.

Subkeys:

OVERLOADS: Accept profile-fitted overloads
BGRATIO: Observations are flagged in Mosflm if the ratio of rms background deviation relative to its expected value from counting statistics is too large. This option accepts observations if bgratio < bgratio_max [default in Mosflm 3.0]
PKRATIO: Accept observations with peak fitting rms/sd ratio pkratio < pkratio_max [default maximum in Mosflm 3.5]. Only set for fully recorded observations
GRADIENT: Accept observations with background gradient < bg_gradient_max [default in Mosflm 0.03].
EDGE: Accept profile-fitted observations on edge of active area of detector
MISFIT: Accept reflections flagged as MISFIT by XDS (in XDS_ASCII.HKL file), ie flagged as outliers in the CORRECT step

LINK [SURFACE] ALL | <run_2> TO <run_1>

run_2 will use the same SURFACE (SECONDARY or ABSORPTION) parameters as run_1. This can be useful when different runs come from the same crystal, and may stabilize the parameters. The keyword ALL will be assumed if omitted.

For SECONDARY or ABSORPTION parameters, the default is to link runs which come from the same crystal as long as they have similar wavelengths. They should be UNLINKed if they are different.

UNLINK [SURFACE] ALL | <run_2> TO <run_1>

Remove links set by LINK command (or by default). The keyword ALL will be assumed if omitted

BINS [RESOLUTION] <Nsbins> INTENSITY <Nibins>

Define number of resolution and intensity bins for analysis [default 10]

SMOOTHING <subkeyword> <value> NOT YET DONE

Set smoothing factors ("variances" of weights). A larger "variance" leads to greater smoothing

Subkeys:

TIME <Vt>: smoothing of B-factors [default 0.5]
ROTATION <Vz>: smoothing of scale along rotation [default 1.0]
PROB_LIMIT <DelMax_t> <DelMax_z> <DelMax_xy>: maximum values of normalized squared deviation (del**2/V) to include a scale [default set automatically, typically 3]

NAME PROJECT <project_name> CRYSTAL <crystal_name> DATASET <dataset_name>

Assign or reassign project/crystal/dataset names, for output file. The names given here supersede those in the input file and redefines the single output dataset.
Note that these names apply to all data: if multiple datasets are required, these must be specified in Pointless. DATASET must be present, and may optionally be given in the syntax crystal_name/dataset_name

BASE [CRYSTAL <crystal_name>] DATASET <base_dataset_name> NOT YET DONE

If there are multiple datasets in the input file, define the "base" dataset for analysis of dispersive (isomorphous) differences. Differences between other datasets and the base dataset are analysed for correlation and ratios, ie for the i'th dataset (I(i) - I(base)). By default, the datasets with the shortest wavelength will be chosen as the base (or dataset 1 if wavelength is unknown). Typically, the CRYSTAL keyword may be omitted.

HKLIN <input file name>

Filename for the main input file, as an alternative to specifying it on the command line.

HKLOUT <output file name>

Filename for the output file, as an alternative to specifying it on the command line.

XMLOUT <output XML file name>

Filename for the XML output file, as an alternative to specifying it on the command line.

HKLREF <reference file name>

Filename for a reference reflection MTZ file, as an alternative to specifying it on the command line. This file is used to provide a "best" estimate of intensity, possibly for the option to refine against a reference set (see above). This reference set is also used to compare to the scaled observed data, analysing it for its agreement as a function of batch, as R-factors and correlation coefficients, so that particularly bad regions of data may be detected. Column labels may be specified with the LABREF command.
For refinement against reference data, this file should be merged measured intensities from a reference crystal, or possibly amplitudes (deprecated due to bias from the "truncate" procedure).
For analysis, this reference data could also for example be calculated from the best current model, eg the FC_ALL_LS column from Refmac. Amplitudes are squared to intensities, and intensities are scaled to the merged observations with a scale and a anisotropic temperature factor. This is an alternative to giving a coordinate file XYZIN from which structure factors will be calculated.

LABREF [F | I =]<columnlabel>]

For an HKLREF file, this defines the column label for intensity or amplitude (which will be squared to an intensity). If this command is omitted, the first intensity column (or if no intensities, the first amplitude) will be used. The next column is assumed to contain the corresponding sigma. Note that provided the columns are contiguous, only the first of the set need be specified or chosen automatically
eg LABREF I=I(+) will pick up I(+), SIGI(+), I(-), SIGI(-)

XYZIN <reference coordinate file name>

The filename for a reference coordinate set, for analysis, but not for refinement. Structure factors will be calculated to use as a reference, in the same way as HKLREF. This provides a current "best" estimate of intensity, and the observed data is analysed for its agreement as a function of batch, as R-factors and correlation coefficients, so that particularly bad regions of data may be detected. The file should contain a valid space group name (full name with spaces, eg "P 21 21 21", "P 1 21 1" etc) and unit cell parameters (ie a CRYST1 line in PDB format).

ROGUES <rogues file name

File name for rogues file, otherwise ROGUES or assigned on the command line

INPUT AND OUTPUT FILES

Input

HKLIN: The input file must be sorted on H K L M/ISYM BATCH

HKLREF reference file for analysis of agreement by batch. This may contain intensities or amplitudes (which will be squared), eg the FC_ALL_LS column from Refmac. The label is specified on the LABREF command

XYZIN as an alternative to HKLREF, a coordinate file may be given, from which amplitudes and intensities will be calculated

Output

Reflection files output

In all cases, separate files are written for each dataset: files are named with the base HKLOUT name with the dataset name appended, as "_dataset"

(a) HKLOUT: option OUTPUT [MTZ] MERGED

The output file contains columns

H K L  IMEAN SIGIMEAN  I(+) SIGI(+)  I(-) SIGI(-)

Note that there are no M/ISYM or BATCH columns. I(+) & I(-) are the means of the Bijvoet positive and negative reflections respectively and are always present even for the option ANOMALOUS OFF.

(b) HKLOUTUNMERGED: option OUTPUT [MTZ] UNMERGED: Unmerged data with scales applied, with no partials (i.e. partials have been summed or scaled, unmatched partials removed), & outliers rejected. Only a single scaled intensity value is written, chosen as summation, profile-fitted or combined as specified by the INTENSITIES command. Columns defining the diffraction geometry (e.g. FRACTIONCALC XDET YDET ROT TIME WIDTH LP) will be preserved in the output file. If HKLOUTUNMERGED is not specified, then the filename for the unmerged file has "_unmerged" appended to HKLOUT

If a SCALEPACK filename is not specified then the filename will be taken from HKLOUT with the extension ".sca"

(d) SCALEPACKUNMERGED: option OUTPUT SCALEPACK UNMERGED

If a SCALEPACKUNMERGED filename is not specified then the filename will be taken from SCALEPACK with "_unmerged" appended and the extension ".sca"

Other output files

XMLOUT: XML output for plotting etc. It includes the NORMPLOT, ANOMPLOT, CORRELPLOT and ROGUEPLOT data, as well as the $TABLE graph data
SCALES: scale factors from DUMP, used by RESTORE option
ROGUES: list of bad agreements
TILEIMAGE: a detector image representing the CCD TILE correction, if activated, in ADSC image format which may be viewed with adxv

The following 4 files are also represented in the XMLOUT file:

NORMPLOT

normal probability plot from merge stage
*** this is at present written is a format for plotting program xmgr (aka [xm]grace), but can also be read by loggraph ***

ANOMPLOT

normal probability plot of anomalous differences

            (I+ - I-)/sqrt[sd(I+)**2 + sd(I-)**2]

*** this is at present written is a format for plotting program xmgr (aka grace), but can also be read by loggraph ***

CORRELPLOT

scatter plot of pairs of anomalous differences (in multiples of RMS) from random half-datasets. One of these files is generated for each output dataset
*** this is at present written is a format for plotting program xmgr (aka grace), but can also be read by loggraph ***

ROGUEPLOT

a plot of the position on the detector (on an ideal virtual detector with the rotation axis horizontal) of rejected outliers, with the position of the principle ice rings shown
*** this is at present written is a format for plotting program xmgr (aka grace), but can also be read by loggraph ***

REFERENCES

P.R. Evans and ,G.N. Murshudov "How good are my data and what is the resolution?" Acta Cryst. (2013). D69, 1204–1214
P.R.Evans "An introduction to data reduction: space-group determination, scaling and intensity statistics", Acta Cryst. D67, 282-292 (2011)
P.R.Evans, "Scaling and assessment of data quality", Acta Cryst. D62, 72-82 (2006). Note that definitions of R_meas and R_pim in this paper are missing a square-root on the (1/n-1) factor
W. Kabsch, J.Appl.Cryst. 21, 916-924 (1988)
P.R.Evans, "Data reduction", Proceedings of CCP4 Study Weekend, 1993, on Data Collection & Processing, pages 114-122
P.R.Evans, "Scaling of MAD Data", Proceedings of CCP4 Study Weekend, 1997, on Recent Advances in Phasing, Click here
R.Read, "Outlier rejection", Proceedings of CCP4 Study Weekend, 1999, on Data Collection & Processing
Hamilton, Rollett & Sparks, Acta Cryst. 18, 129-130 (1965)
Blessing, R.H., Acta Cryst. A51, 33-38 (1995)
Kay Diederichs & P. Andrew Karplus, "Improved R-factors for diffraction data analysis in macromolecular crystallography", Nature Structural Biology, 4, 269-275 (1997)
Manfred Weiss & Rolf Hilgenfeld, "On the use of the merging R factor as a quality indicator for X-ray data", J.Appl.Cryst. 30, 203-205 (1997)
Manfred Weiss, "Global Indicators of X-ray data quality" J.Appl.Cryst. 34, 130-135 (2001)

Appendix 1: Partially recorded reflections

In the input file, partials are flagged with M=1 in the M/ISYM column, and have a calculated fraction in the FRACTIONCALC column. Data from Mosflm also has a column MPART which enumerates each part (e.g. for a reflection predicted to run over 3 images, the 3 parts are labelled 301, 302, 303), allowing a check that all parts have been found: MPART = 10 for partials already summed in MOSFLM.

Summed partials:
All the parts are summed (after applying scales) to give the total intensity, provided some checks are passed. The parameters for the checks are set by the PARTIALS command. The number of reflections failing the checks is printed. You should make sure that you are not losing too many reflections in these checks.

if the CHECK option is set (the default if an MPART column is present), the MPART flags are examined. If they are consistent, the summed intensity is accepted. If they are inconsistent (quite common), the total fraction is checked (TEST). NOCHECK switches off this check.
if the TEST option is set (default), the summed reflection is accepted if the total fraction (the sum of the FRACTIONCALC values) lies between <lower_limit> -> <upper_limit> [default limits = 0.95 1.05]
if the CORRECT option is set, the total intensity is scaled by the inverse total fraction for total fractions between <minimum_fraction> to <lower_limit>. This works also for a single unmatched partial. This correction relies on accurate FRACTIONCALC values, so beware.
if the GAP option is set (not recommended), partials with a gap in are accepted, e.g. a partial over 3 parts with the middle one missing. The GAP option implies TEST & NOCHECK, & the CORRECT option may also be set.

By setting the TEST & CORRECT limits, you can control summation & scaling of partials, e.g .

      TEST 1.2 1.2 CORRECT 0.5

will scale up all partials with a total fraction between 0.5 & 1.2

      TEST 0.95 1.05

will accept summed partials 0.95->1.05, no scaling

      TEST 0.95 1.05 CORRECT 0.4

will accept summed partials 0.95->1.05, and scale up those with fractions between 0.4 & 0.95

Appendix 2: Scaling algorithm

For each reflection h, we have a number of observations Ihl, with estimated standard deviation shl, which defines a weight whl. We need to determine the inverse scale factor ghl to put each observation on a common scale (as Ihl/ghl). This is done by minimizing

 
        Sum( whl * ( Ihl - ghl * Ih )**2 )   Ref Hamilton, Rollett & Sparks

where Ih is the current best estimate of the "true" intensity

        Ih = Sum ( whl * ghl * Ihl ) / Sum ( whl * ghl**2)

An alternative method scales to an external previously-determined reference dataset, minimising

Sum( whl * ( Ihl - ghl * Ihref )**2 )

where Ihref is the reference intensity and the weight whl = 1/(var(Ihl) + var(Ihref))

Each observation is assigned to a "run", which corresponds to a set of scale factors. A run would typically consist of a continuous rotation of a crystal about a single axis.

The inverse scale factor ghl is derived as follows:

        ghl = Thl * Chl * Shl

where Thl is an optional relative B-factor contribution, Chl is a scale factor, and Shl is a anisotropic correction expressed as spherical harmonics (ie SECONDARY, ABSORPTION options).

a) B-factor (optional)

For each run, a relative B-factor (Bi) is determined at intervals in "time" ("time" is normally defined as rotation angle if no independent time value is available), at positions ti (t1, t2, . . tn). Then for an observation measured at time tl

        B = Sum[i=1,n] ( p(delt) Bi ) / Sum (p(delt))

        where   Bi  are the B-factors at time ti
                delt    = tl - ti
                p(delt) = exp ( - (delt)**2 / Vt )
                Vt  is "variance" of weight, & controls the smoothness
                        of interpolation

        Thl = exp ( + 2 s B )
                s = (sin theta / lambda)**2

b) Scale factors

For each run, scale factors Cz are determined at intervals on rotation angle z. Then for an observation at position (z0),

        Chl(z0) =
   Sum(z)[p(delz)*Cz]/Sum(z)[p(delz)]

where   delz    = z - z0
        p(delz) = exp(-delz**2/Vz)
        Vz is the "variance" of the weight & controls the smoothness of interpolation

For the SCALES BATCH option, the scale along z is discontinuous: the normal option has one scale factor for each batch.

c) Anisotropy factor

The optional surface or anisotropy factor Shl is expressed as a sum of spherical harmonic terms as a function of the direction of
(1) the secondary beam (SECONDARY correction) in the camera spindle frame,
(2) the secondary beam (ABSORPTION correction) in the crystal frame, permuted to put either a*, b* or c* along the spherical polar axis

SECONDARY beam direction (camera frame)

         s  =  [Phi] [UB] h
         s2 = s - s0       
         s2' = [-Phi] s2
Polar coordinates:
         s2' = (x y z)
         PolarTheta = arctan(sqrt(x**2 + y**2)/z)
         PolarPhi   = arctan(y/x)

                             where [Phi] is the spindle rotation matrix
                                   [-Phi] is its inverse
                                   [UB]  is the setting matrix
                                   h = (h k l)

ABSORPTION: Secondary beam direction (permuted crystal frame)

         s    = [Phi] [UB] h
         s2   = s - s0       
         s2c' = [-Q] [-U] [-Phi] s2
Polar coordinates:
         s2' = (x y z)
         PolarTheta = arctan(sqrt(x**2 + y**2)/z)
         PolarPhi   = arctan(y/x)

                             where [Phi] is the spindle rotation matrix
                                   [-Phi] is its inverse
                                   [Q] is a permutation matrix to put
                                       h, k, or l along z (see POLE option)
                                   [U]  is the orientation matrix
                                   [B]  is the orthogonalization matrix
                                   h = (h k l)

then

 Shl = 1  +  Sum[l=1,lmax] Sum[m=-l,+l] Clm  Ylm(PolarTheta,PolarPhi)

                             where Ylm is the spherical harmonic function for
                                       the direction given by the polar angles
                                   Clm are the coefficients determined by
                                       the program

Notes:

The initial term "1" is essentially the l = 0 term, but with a fixed coefficient.
The number of terms = (lmax + 1)**2 - 1
Even terms (ie l even) are centrosymmetric, odd terms antisymmetric
Restraining all terms to zero (with the TIE SURFACE) reduces the anisotropic correction. This should always be done

(d) Detector correction (TILES)

A correction for tiled CCD detectors has been implemented to attempt to correct for the underestimation of spots falling in the corner of the detector. The present model expresses a correction factor in terms of an erfc function of the distance from the tile centre, such that the correction = 1 in the centre of the tile and falls off at the edge and corners

For a spot at position x,y relative to the tile centre, normalised by the tile width in pixels such that x & y run from -1 to +1, then

distance from centre (x0,y0) d = sqrt[(x-x0)² + (y-y0)²]
correction factor g = A f(z) + 1 - A where A is the amplitude of the correction near the edge and f(z) is a radial function of the modified "radius" z = (2/w)(d - r - w) . r defines the point at which the scale starts to decline from 1.0, and w the "width" of the fall-off
Currently f(z) = 0.5 erfx(z) though other expressions have been tried

Amplitude A various azimuthally with the angle phi = tan^-1(y/x) as a Fourier series, A = A0{a cos(phi) + b sin(phi) + c cos(2phi) + d sin(2phi)}

Refined parameters for each tile are r, w, A0, x0, y0, and the four Fourier terms for A, a,b,c,d.

By default, parameters are restrained (TIE) as follows (see TIE TILE)

A0, a,b,c,d and x0,y0 are tied to 0.0 with their SDs
r, w are tied to target values with their SDs [default 0.70, 0.40]
r, w, and A0 are tied to be similar over all tiles
Five SD values control the strength of the restraints, respectively for r, w, A0, x0|y0, and abcd
SD = 0 switches off the restraint

Appendix 3: Data from Denzo

DENZO is often run refining the cell and orientation angles for each image independently, then postrefinement is done in Scalepack. It is essential that you do this postrefinement. Either then reintegrate the images with the cell parameters fixed, or use unmerged output from scalepack as input to Aimless. The DENZO or SCALEPACK outputs will need to be converted to a multi-record MTZ file using COMBAT (see COMBAT documentation) or POINTLESS (for Scalepack output only).

Both of these options have some problems

If you take the output from Denzo into Scala, there may be problems with partially recorded reflections: it is difficult for Scala to determine reliably that it has all parts of a partial to sum together.
If you take unmerged output from scalepack into Aimless, most of the geometrical information about how the observations were collected is lost, so many of the scaling options in Aimless are not available. Only Batch scaling can be used, but simultaneous scaling of several wavelengths or derivatives may still be useful

Appendix 4: Outlier algorithm

The test for outliers is as follows:

(1) if there are 2 observations (left), then
(2) if there 3 or more observations left, then
(3) iterate from beginning

RELEASE NOTES

0.7.4 More robust normalisation for Emax test. Keep Emax outliers if most observations of a reflection are large. Normalisation still needs more work.
0.7.3 Make it work when excluding 1st of multiple datasets. Make initial scaling more robust by using weighted mean(I). Do Emax test before outlier test. More robust normalisation; Emax rejection also tests I/sd(I) to avoid rejecting weak data in shells with small .
0.7.2 small fix to SD analysis table
0.7.1 allow gaps in data if explicit runs are given (unless you give "initial minimum_overlap > 0.0"). Improved analysis of secondary corrections
0.7.0 REFINE REFERENCE option, for scaling to external reference intensities
0.6.4 monitor deviant but kept observations in ROGUES file
0.6.3 trap RESTORE with one parameter, turn on ONLYMERGE
0.6.2 Bug fix for only one batch (eg from merged data)
0.6.1 Group batches for analysis, see ANALYSIS GROUPBATCH. Improve SDCORRECTION SAMPLE, sample variances: compare individual propagated SDs with sample SDs. Limit Emax test to "reliable" resolution range (not very weak high resolution ranges, needs further improvement). Many variables converted from float to double. Chi^2 statistic against intensity, resolution and batch. Cumulative CC(1/2) vs. batch. Do the analysis even if scaling can't be done due to insufficient information in input reflection file (usually data that is already scaled eg in XSCALE)
0.5.29 trap case of insufficient data with one rotation range, make dump/restore work for that case
0.5.28 small format change in RunPairs to avoid column coalescence
0.5.27 More accurate averaging of wavelengths
0.5.26 REJECT EMAX 0.0 switches off Emax test. Big speed-up in SF calculation (cf Pointless)
0.5.25 fix bug introduced in 0.5.24 which removed from the merged file I+ & I- if either were negative
0.5.24 fix bug if different resolutions for different runs. Small bug fix in secondary beam scaling. Switch off secondary scaling in first pass, stabilises the scaling in some cases. Trap negative scales (shouldn't happen). Fixes to sample SDs (for high multiplicity), compare sample SD to propagated SD. Output secondary beam corrections to logfile and XML. Reset SdFac after first round scaling. Default SDCORRECTION SAME (instead of INDIVIDUAL)
0.5.22, 23 Improved (ie corrected from version 0.5.18) detection of parts of data which do not have any scaling overlaps with other rotation ranges, see INITIAL MINIMUM_OVERLAP & MAXIMUM_GAP. Some changes to resolution limit determination
0.5.21 if XMLOUT is assigned on the command line, open it early so that syntax errors get added
0.5.19 add resolution limit at I/sd > 2 for Frank von Delft
0.5.18 Some changes to improve robustness for low multiplicity, mainly for small molecule data. No scaling if multiplicity is too low. Added INITIAL MINIMUM_MULTIPLICITY option. Changes to choice of observations for scaling and SD optimisation. Trap negative secondary scales. Correct derivatives in TIEs (doesn't make much difference)
0.5.17 Bug fix to error and warning printing
0.5.16 improved the robustness of SDcorrection refinement with a few fulls. REJECT NONE option.
Option to accept XDS "misfits" (outliers) (KEEP MISFIT)
0.5.15 bug fix in run pair correlations. Fix to XML for multiple datasets
0.5.14 bug fix in Rmerge(batch), was double counting. Change to expect stdin input unless "--no-input" given on command line. Added "descriptions" to graphs, for ccp4i2.
0.5.13 fix memory leak in resolution calculation
0.5.10,11,12 bug fix in setting spherical harmonic orders. Keep empty batches in unmerged file output. Bug fix in scaling one lattice from multilattice data
0.5.9 bug fix to allow ABSORPTION <lmax> to work. Fix for restore problem with variances & tiles
0.5.8 bug fix for case SCALES CONSTANT BROTATION with one run (not sensible anyway). Also fixed bug when there are different resolution limits for different datasets
0.5.7 minor bug fix to resolution tables
0.5.6 bug fix for SCALE CONSTANT with more than one run
0.5.5 bug fix in radiation damage analysis. Rescale Scalepack output if intensities are small
0.5.4 bug fix for Sca output with one of I+ or I- missing
0.5.3 fix save/restore bug for BFACTOR OFF
0.5.2 bug fix for already merged data. Fix long-standing rare bug in hash table
0.5.1 "improved" SD correction refinement. Added [UN]LINK commands, imporved default linking
0.4.10 add SD analysis graph to XML
0.4.8,9 improved robustness of maximum resolution curve fit
0.4.7 better trap for no data in SDCORRECTION refinement
0.4.5,6 fill in missing IPR columns from I, shouldn't normally happen
0.4.2,3,4 Bug fixes. Unmerged SCA files written with corrected symmetry translations
0.4.1 Inflate sd(I) using estimated parameter errors from inverse normal matrix (see USESDPARAMETER). Fixed bug in TILE correction. Added curve fit for maximum resolution estimation
0.3.11 Fixed nasty bug from XDS->Pointless giving Assertion failed: (sd > 0.0), function Average
0.3.10 Bug fixes. Also restrict run-run correlations to < 200 runs.
0.3.9 Added matrix of run-run cross-correlations
0.3.8 bug fixes to make EXCLUDE BATCH <range> option work
0.3.7 Bug fixes for unusual case of runs with all fulls and no fulls. Options for XFEL data: SDCORRECTION SAMPLESD; Rsplit. Bug fixes for Batch scaling with rejected batches. REJECT BATCH option for batch scaling
0.3.6 Fix bug with explicit RUN definitions.
0.3.4,5 remove debug print for self-overlaps. Pick up number of parts for previously summed partials (MPART column from Feckless), for partial bias analysis
0.3.3 fixed save/restore for TILE correction. Fixed reading of SDcorrection parameters
0.3.2 more corrections to multilattice handling (mapping lattice number to run number for scaling)
0.3.1 optional reference data for analysis of agreement by batch, either as structure factors (or intensities) HKLREF,LABREF, or coordinates (XYZIN)
0.2.20 fix bug for single B-factor/run
0.2.18,19 updates from Pointless for multiple lattices. Corrected calculation of anomalous multiplicity
0.2.17 fix bug in setting same resolution bin widths for multiple datasets when NBINS is set
0.2.16 message for std::bad_alloc, running out of memory
0.2.15 fix to XML graphs (for ccp4i2)
0.2.14 fix to correctly append to MTZ history
0.2.13 activate writing spacegroup confidence. Reflection status flags cleared before outlier checks
0.2.12 fix bug in reading multilattice files
0.2.10 small bug fix in radiation damage analysis
0.2.9 fix for Batch scaling if no phi range information
0.2.8 XML changes for I2 report. Change automatic anomalous thresholds, always output anom statistics
0.2.7 Bug fix in XML if no orientation data (ROGUEPLOT)
0.2.6 Fix to output multilattice overlaps. Added radiation damage analysis as in CHEF, for Graeme Winter
0.2.5 Fix so that BINS RESOLUTION works
0.2.4 Bug for XDS data, was omitting reflections with FRACTIONCALC (derived from IPEAK) < 0.95, leading to incompleteness
0.2.3 Now does reject and record Emax outliers properly (though work is continuing on improving this). Fixed small bug in analyseoverlaps.
0.2.2 fixed bug in Bdecay plot when batches omitted. Explicit Xrange for XML batch plots. No ROGUEPLOT if no orientation data. List overlaps in ROGUES file
0.2.1 some major reorganisations. Added XML output. SCALES TILE option. Handling of multilattice data. SDCORRECTION SIMILAR
0.1.30 allow TIE with negative sd to turn off tie, as documented. Also fixed bug in ABSORPTION
0.1.29 small change to Result table to work with Baubles arcane (and undocumented) rules for Magic Tables
0.1.28 bug fix in "sdcorrection same"
0.1.27 bug fix in minimizer which sometimes affected the case with just 2 parameters
0.1.26 Default to "scales secondary"
0.1.25 omit sigI<=0, process REJECT command properly, small bug fix in smoothed Bfactors
0.1.24 small bug fix in printing batch tables with multiple datasets
0.1.22,23 INITIAL UNITY option. In tables, print batches with no observations but not rejected batches. Put title into output file. Fix initial scale bug with 3 scales
0.1.21 corrections to ROGUEPLOT, ice rings were in wrong place (by a factor of wavelength)
0.1.20 made sdcorrection refinement more robust to low multiplicity. If anomalous off (or no anomalous detected), statistics are now printed over all I+ I- together. Reject large negative observations (default E < -5)
0.1.19 preliminary addition of spg_confience(|| status). Bug fix from valgrind (from Marcin)
0.1.18 changed tablegraph to fix compilation problem (va_start)
0.1.17 bug fix in outlier rejection, problem with large variances leading to inconsistencies in Rogues file and some over-rejection
0.1.16 made SDcorrection refinement more robust
0.1.14,15 various bug fixes (including memory leaks), fixed autorun generation, improved SD correction for large anomalous, constrain cell to lattice group, etc
0.1.12 Half-dataset CC labelled as "CC(1/2)"
0.1.11 Small bug fixes
0.1.9 autodetect anomalous. Plot Rmeas for each run
0.1.7 fix for SCALES CONSTANT from XSCALE
0.1.6 anisotropy analysis against planes in trigonal, hexagonal and tetragonal systems (inlcuding rhombohedral axes), principal anisotropic axes in monoclinic and triclinic, cone analyses weighted according to cos(AngleFromPrincipalDirection). Fixed cases where multiple datasets have different resolution limits
0.1.4,5 more fixes for multiple datasets, dump/restore. OUTPUT UNMERGED SPLIT is default
0.1.3 More "resolution run"bug fixes
0.1.2 REFINE PARALLEL option (thanks to Ronan Keegan). Fixed bug in "resolution run" options
0.1.1 fixed bugs in writing ROGUES file; introduced HKLOUTUNMERGED etc filename specifiers; cleaned up Unmerged output; added Rfull to tables
0.1.0 fixed some bugs found by cppcheck and valgrind
0.0.16 fixed small bug in INTENSITIES COMBINE optimisation
0.0.15 if run definitions are given explicitly, then unspecified batches are excluded
0.0.14 Added optimisation for INTENSITIES COMBINE, for Mosflm data. This is now the default

AUTHOR

Phil Evans, MRC Laboratory of Molecular Biology, Cambridge (pre@mrc-lmb.cam.ac.uk) See above for Release Notes.

AIMLESS (CCP4: Supported Program)

NAME

SYNOPSIS

DESCRIPTION

Running the program

Scaling options

Control of flow through the program

Partially recorded reflections

Scaling algorithm

Scaling to reference

Data from Denzo

Datasets

KEYWORDED INPUT - DESCRIPTION

RUN <Nrun> BATCH <b1> to <b2>

SCALES [<subkeys>]

SDCORRECTION [[NO]REFINE] [INDIVIDUAL | SAME [FIXSDB]

[RUN <RunNumber>] [FULL | PARTIAL] <SdFac> [<SdB>] <SdAdd> [DAMP <dampfactor>]

[SIMILAR [<sd1> <sd2> <sd3>]] ||

[[NO]TIE SdFac | SdB | SdAdd <targetvalue> <SDtarget>]

[SAMPLESD]

USESDPARAMETER [NO | DIAGONAL | COVARIANCE]

PARTIALS [[NO]CHECK] [TEST [<lower_limit> <upper_limit>] [CORRECT <minimum_fraction>] [[NO]GAP [<maxgap>]]

INITIAL UNITY | MEAN | MINIMUM_OVERLAP <minimum_overlap> | MAXIMUM_GAP <maximum_gap>

INTENSITIES [SUMMATION | PROFILE | COMBINE [<Imid>] [POWER <Ipower>]

REJECT [SCALE | MERGE] [COMBINE] [SEPARATE] <Sdrej> [<Sdrej2>] [ALL <Sdrej+-> [<Sdrej2+->]] [KEEP | REJECT | LARGER | SMALLER] [EMAX <Emax>] [BATCH <batchrejectfactor>] [NONE]

ANOMALOUS [OFF] [ON]

RESOLUTION [RUN <RunNumber>] [[LOW] <Resmin>] [[HIGH] <Resmax>]

TITLE <new title>

ANALYSIS [CONE <angle>] [CCMINIMUM <MinimumHalfdatasetCC>] [CCANOMMINIMUM <MinimumHalfdatasetAnomCC>] [ISIGMINIMUM <MinimumIoverSigma>] [BATCHISIGMINIMUM <MinimumBatchIoverSigma>] [GROUPBATCH <BatchGroupRange>]

ONLYMERGE

DUMP [<Scale_file_name>]

RESTORE [<Scale_file_name>]

REFINE [CYCLES <Ncycle>] [BFGS | FH | REFERENCE] [SELECT <IovSDmin> <E2min> [<E2max>]]

[PARALLEL [AUTO] | <Nprocessors> | <Fractionprocessors>]

Define number of refinement cycles Ncycle and method for scale refinement.

EXCLUDE BATCH <batch range>|<batch list>]

TIE [SURFACE <Sd_srf>] [BFACTOR <Sd_bfac>] [ZEROB <Sd_zerob>] [ROTATION <Sd_z>] [TILE <Sd1-5>] [TARGETTILE <r0> w0>]

OUTPUT [MTZ] [NO]MERGED [UNMERGED [SPLIT|TOGETHER]] [SCALEPACK [MERGED | UNMERGED]]

KEEP [OVERLOADS|BGRATIO <bgratio_max>|PKRATIO <pkratio_max>|GRADIENT <bg_gradient_max>|EDGE | MISFIT]

LINK [SURFACE] ALL | <run_2> TO <run_1>

UNLINK [SURFACE] ALL | <run_2> TO <run_1>

BINS [RESOLUTION] <Nsbins> INTENSITY <Nibins>

SMOOTHING <subkeyword> <value> NOT YET DONE

NAME PROJECT <project_name> CRYSTAL <crystal_name> DATASET <dataset_name>

BASE [CRYSTAL <crystal_name>] DATASET <base_dataset_name> NOT YET DONE

HKLIN <input file name>

HKLOUT <output file name>

XMLOUT <output XML file name>

HKLREF <reference file name>

LABREF [F | I =]<columnlabel>]

XYZIN <reference coordinate file name>

ROGUES <rogues file name

INPUT AND OUTPUT FILES

Input

Output

Reflection files output

Other output files

REFERENCES

Appendix 1: Partially recorded reflections

Appendix 2: Scaling algorithm

Appendix 3: Data from Denzo

Appendix 4: Outlier algorithm

RELEASE NOTES

AUTHOR

SEE ALSO

REJECT
[SCALE | MERGE] [COMBINE] [SEPARATE]

<Sdrej> [<Sdrej2>]

[ALL <Sdrej+-> [<Sdrej2+->]]

[KEEP | REJECT | LARGER | SMALLER]

[EMAX <Emax>]

[BATCH <batchrejectfactor>]

[NONE]