RSTATS (CCP4: Supported Program)
NAME
rstats
- scale together two sets of F's
SYNOPSIS
rstats hklin
foo_in.mtz
hklout
foo_out.mtz
rstatsbkr
rstatsbkr.dat
[Keyworded input]
DESCRIPTION
The program scales together two sets of F's, calculates
statistics and outputs a reflection file. Data can be split into
a working set, and a set reserved for calculation of a freeR factor.
Rejected criterion can be specified as FC/FO ratio,
sigma multiple, or |FO-FC|.
KEYWORDED INPUT
The various data control lines are identified by keywords, those
available being:
CYCLES,
END,
FREE,
LABIN,
LABOUT,
LIST,
NOABS,
OUTPUT,
PRINT,
PROCESS,
REJECT,
RESOLUTION,
RSCB,
SCALE,
TEMPERATURE_FACTOR,
TITLE,
WEIGHTING_SCHEME,
WIDTH_OF_BINS
TITLE <title>
The title string is written to the output reflection
file, replacing the title from the input file.
If TITLE is not specified then:
OUTPUT FOFC will use Output from RSTATS.
When using LABOUT ALLIN then the title on the input file
will be used.
FREE <num>
The FreeR sub-set is defined, in the program, as those reflections which
have a value of <num> in the FreeR_flag column. The default is for
FreeR_flag = 0.
RESOLUTION <x1> <x2>
If given then only reflections in the resolution range <x1>-<x2>
will be used during the final (output) cycle in order to calculate statistics.
Note that this a change to the functionality of the RESOLUTION keyword,
which can no longer be used to exclude reflections from the output mtz file.
If RESOLUTION is not specified then the limits <x1> and
<x2> are taken from the input MTZ file, so no data is excluded from
the statistics. The maximum and minimum resolution (in Angstroms)
can be given in either order, and if only one number is given this
is taken as the maximum resolution limit.
RSCB <x1> <x2>
If given then reflections in the resolution range <x1>-<x2>
will be used during the scaling cycles, in order to generate the scale
and temperature factors. The maximum and minimum resolution (in Angstroms)
can be given in either order, and if only one number is given this
is taken as the maximum resolution limit.
If RSCB is not given, the limits are taken from the
RESOLUTION keyword; if RESOLUTION has not been specified
the default is to use all the data, i.e. the resolution limits are read from
the input MTZ file header.
NOABS
If the NOABS keyword is present, the program will take the differences between
the signed values of Fo and Fc, rather than using the moduli (i.e. use Fo and Fc
rather than |Fo| and |Fc|). The default is to use the moduli.
SCALE <scale>
Sets initial scale factor for Fc. If zero cycles are selected
on the CYCLES card, this scale factor is used for
the calculation of R-factors and scaling output data.
Default is 1.0.
TEMPERATURE_FACTOR <factor>
Sets initial value for the temperature factor.
If zero refinement cycles selected using the CYCLES card,
this temperature factor is used for calculation
of R-factors and scaling output data. Default: 0.0.
WIDTH_OF_BINS [ RTHETA <x1> ] | [ FBINR <x2> ]
[Optional]
Controls the width of the bins used in the analysis.
RTHETA = <x1> sets the width of ranges of 4(sintheta/lambda)**2;
default: 0.01.
FBINR = <x2> sets the width of ranges on Fobs. If x2 is not specified
or the card absent then Fobs range will be set by the program. The width
is altered accordingly if the scale is applied to Fobs.
LIST <x>
[Optional]
Sets the value for listing of reflections with
|Fo-Fc| > <x>. Default: 4000.0.
CYCLES <ncyc>
[Optional]
<ncyc> is the maximum number of cycles for scaling; default: 6.
The program will always make one additional pass through the
reflection file to calculate statistics and write
the output file.
If zero cycles are specified then the program will simply
apply the input scale and temperature factor.
If a linear least-squares problem is selected with
no rejections, the program will only make two passes
through the input file.
The program will stop iterating when the magnitude of the
fractional shift in the scale factor is less than 0.005
and the magnitude of the shift in the temperature factor
is less than 0.01.
PRINT ALL | LAST
ALL sets IPRINT on all cycles
LAST (default) sets IPRINT, then print out on ONLY final least squares cycle.
REJECT [ SIGMA=<sig> ] [ RATIO=<rat> ] [ DELTA=<delta> ]
This option sets criteria for rejecting reflections from the scaling calculations.
The rejected reflections are still written to the output file. More than one of the
following options may be specified simultaneously for REJECT:
- RATIO
-
Reflections will be rejected if K*Fc*TFAC/FO < <rat> (i.e. those with
FC<<FO). Default is <rat>=0.0.
- SIGMA
-
Reflections will be rejected if Fo < <sig>*SigFo. Default is <sig>=0.0.
- DELTA
-
Reflections will be rejected if abs(Fo - K*TFAC*Fc) > <delta>. The default
is <delta>=99999.0 (i.e. no rejection tests).
OUTPUT [ NOHKL | FOFC ] [ BKR ]
The output reflection file contains all the reflections present in the input
file. Note that this is different from previous versions of rstats.
If OUTPUT is not given or it is not followed
by a sub-keyword, then FOFC is assumed. Exception when you have LABOUT ALLIN.
- NOHKL
-
No output file
- FOFC
-
The output reflection file has H, K, L, FP, FC with
optionally SIGFP, SIGFC, PHIC, FREE if these are present
on the input file. If weights are used in the scaling
then the output file will include this weight as WT.
Under this option RSTATS will also write an additional history line to the mtz header,
containing: the date; the R-factor; the scale and temperature factors. In this case the
R-factor is that calculated on the final cycle with reflections excluded as defined
by the RESOLUTION keyword.
- BKR
-
The final temperature factor (B), scale factor (K), R factor and the sum of w*(Fo-Fc)**2
are written on one line in the file RSTATSBKR (i.e. RSTATSBKR.DAT in the default
directory unless otherwise assigned) along with the date (as day-month-year). The
format statement controlling this output is
FORMAT(2F10.5,F7.3,E13.6,1X,I2,"-",I2,"-",I2)
(The output file is scaled as defined by PROCESS.)
PROCESS [ FCAL | FOBS | FOBC | SUMF | SUMC | LGFC | LGFO ]
For the FCAL, FOBS and FOBC options, the scale factor (K)
and temperature factor (B) are determined by minimising
Sum w(Fo - K*Fc*exp(-B*s))**2
This non-linear least squares minimisation takes several cycles
to converge.
For the SUMF and SUMC options, the temperature factor is not
considered and the scale factor is calculated by minimising
Sum w(Fo - K*Fc)**2
So that K = Sum(wFoFc)/Sum(wFc**2)
Although a linear problem, if reflections are being rejected
using the DELTA test (see REJECT), several cycles may be required for
convergence.
For the LGFC and LGFO options, the scale and temperature factors
are determined by minimising
Sum w( Log(Fo) - Log(K*Fc*exp(-B*s)) )**2
By considering the logarithms, the least squares minimisation
becomes a linear problem but with different relative weighting.
This scaling gives greater weight to the weak reflections than
the minimisation without taking logs.
A weight of W=(Fo/SigFo)**2 should give similar results to a
weight of W=(1/SigFo)**2 in the non-linear case.
- FCAL
-
Apply scale and B-factor to Fcalc and sigFc
- FOBS
-
Apply scale and B-factor to Fobs and sigFobs
- FOBC
-
Apply scale to Fobs and sigFobs, and B-factor to Fcalc
- SUMF
-
Calculate scale by Sum(FoFc)/SumFcFc) and apply inverse of this to Fo
i.e. temperature factors are not refined and scale calculated without
considering it.
- SUMC
-
as SUMF but apply scale to Fc
- LGFC
-
Apply scale and B-factor to Fcalc and sigFc
- LGFO
-
Apply scale and B-factor to Fobs and sigFobs
WEIGHTING_SCHEME [ NONE | DELF=<x1>,<x2>,<x3>,<x4> | DSIG=<x1>,<x2>,<x3>,<x4> | EXP=<x1>,<x2>,<x3> | SIGMA=<x1> ]
Weight reflections according to one of the following schemes
[Default is NONE]:
- NONE
-
No weighting scheme to be used
- SIGMA
-
W=<x1>*(1/SD(FO))**2
default: x1=1.0.
- DELF
-
W=1/(<x1>+<x2>*S) for S > <x4> (S=sintheta/lambda)
W=1/(<x1>+<x2>*S+<x3>*(<x4>-S)**2 for S < <x4>
there are no defaults for this option and all parameters must be specified.
- DSIG
-
As DELF but multiplied by (1/SD(FO))**2
- EXP
-
W=((1/SD(FO))**2)*<x1>/exp(<x2>+<x3>*S)
the defaults are x1=1.0 x2=0.0 and x3=0.0.
LABIN <program_label>=<file_label> ...
Input reflection file column assignments.
Assigns the program labels to the columns on the input file. The
program labels are:
-
H K L FP SIGFP FC SIGFC PHIC FREE
Data must always be present for H K L FP and FC.
SIGFP must also be present when using the SIGMA weighting scheme.
FREE flags reflections to be considered separately, to give
statistics needed for Free R factors.
LABOUT [ALLIN] <program_label>=<file_label> ...
Output reflection file column assignments.
For OUTPUT FOFC the output program labels are
-
H K L FP FC [ SIGFP SIGFC PHIC WT FREE ]
Where SIGFP, SIGFC and PHIC are only written if they are
present on the input file. The weight WT is only written
if a WEIGHTING_SCHEME option is specified.
By default the output columns will have the same column
labels as used on the input file.
If ALLIN is given as a sub-keyword then all columns in the input file will
be written to the output MTZ file. This option has preference over the other
options for MTZ files.
END
Terminate input (equivalent to end-of-file). Must be last keyword.
EXAMPLES
#
# Produce file containing h,k,l,s,Fp,Sigfp,Fc,Phic with Fc scaled
# to Fo for input to the FFT program. No reflections rejected.
#
#
rstats hklin sample_file hklout fuo_map <<eof-rstats
LABIN FP=FNAT2 SIGFP=SIGFNAT2 FC=FCCYC7 PHIC=PHI FREE=FreeR_flag
RESOLUTION 8.0 2.7 ! If omitted then all data used
eof-rstats
#
#
# A more complicated example:
# All input columns output with an additional weight column.
# Contents of the output FNAT2 and SIGFNAT2 columns will have
# a scale and temperature factor applied.
#
rstats hklin sample_file hklout fuo_map <<eof-rstats
LABIN FP=FNAT2 SIGFP=SIGFNAT2 FC=FNAT1 FREE=FreeR_flag
LABOUT ALLIN WT=SIGMAWT
TITLE FNAT2 column scaled to FNAT1 using sigma weights
RESOLUTION 10.0 2.3 ! default is 1 to 100 Ang
PRINT ALL ! default is LAST
CYCLES 3 ! default is 6
LIST 3000
SCALE 2.3 ! default is 1.0
TEMPERATURE_FACTOR 6.2 ! default is 0.0
OUTPUT FOFC ! this is OVERRIDEN by LABOUT
REJECT DELTA 4000 ! default is no rejections
WEIGHTING_SCHEME SIGMA ! default is NONE
WIDTH_OF_BINS RTHETA=0.02 FBINR=500 ! defaults are .01 and 1000
PROCESS FOBS ! default is FCAL
eof-rstats
There is also a simple runnable unix script in $CEXAM/unix/runnable:
AUTHORS
Written by: S.E.V. Phillips
modified: Dec.1985 G.Fermi (2-6-88)
modified: Nov.1986 A.C.Bloomer
This keyworded version 24/jan/1990: Peter Brick