SFALL (CCP4: Supported Program)
NAME
sfall - Structure factor calculation and X-ray refinement using forward
and reverse FFT
SYNOPSIS
sfall [ XYZIN foo.brk ] [ MAPIN foo.map ] [ HKLIN
foo.mtz ] [ MAPOUT foo_out.map ] [ HKLOUT foo_out.mtz
] [ XYZOUT foo_out.brk ] [ GRADMAT foo.sfmat ]
[Keyworded input]
CONTENTS
-
DESCRIPTION
-
Generate atom map
-
Calculate structure factors
-
Generate X-ray gradients for refinement
-
Space Groups
-
KEYWORDED INPUT
-
MEMORY ALLOCATION
-
INPUT AND OUTPUT FILES
-
Notes on Formfactors
-
Notes on XYZIN, Atom names, etc
-
Notes on HKLIN
-
Notes on MAPIN
-
Output from Structure Factor
Calculation
-
Notes on XYZOUT
-
Notes on HKLOUT
-
Notes on MAPOUT
-
Output from the Gradients Calculation - notes on
GRADMAT
-
Output from the Calculation of
the Optimum Shifts
-
Notes on stepsize
-
COMMON PROBLEMS
-
EXAMPLES
-
Calculate map from atom
coordinates
-
Read in
coordinates to calculate structure factors
-
Read in map to calculate
structure factors
-
Iteration cycles to do back-transformation and
phase combination cycles in solvent flattening procedure
-
Unrestrained refinement
-
Prolsq refinement cycles
Please note: PROLSQ is no longer available as refinement program in the CCP4
Suite.
-
Simple Runnable Examples
-
AUTHORS
-
PROGRAM FUNCTION
-
Modelling of the electron density and
calculation of structure factors
-
Calculation of the gradients
-
The Normal Matrix
-
Calculation of the shifts
-
REFERENCES
-
SEE ALSO
DESCRIPTION
This program calculates structure factors and X-ray gradients for refinement
using inverse and forward fast Fourier techniques.
If coordinates are input, it builds an `atom map' on a suitable grid over the
asymmetric unit of the spacegroup. Atomic density is distributed as a Gaussian
centred on the atom position. The value of the Gaussian depends on the distance
from the atom centre, the atom type, and temperature factor. Its value is
calculated at each grid point within a given radius for each atom, and the
values are summed to give an `atom map'. This `map' can then be used for an
inverse fast Fourier transform, which gives the structure factor. It is
important that the grid is fine enough to sample the atomic Gaussians
accurately - the accuracy of the structure factor estimates depends on this. I
recommend using a grid which gives a sampling of ~0.5Å.
X-ray gradients are estimated by `differential synthesis'. A difference map is
calculated, and this is convoluted with the atomic density.
By default, SFALL scales Fobs to Fcalc. This then lets you do maps on (or close
to) an absolute scale if the map comes from coordinates. If you are
back-transforming a map, e.g. in solvent-flattening or averaging, then
the map isn't on an absolute scale anyway, and so this scaling is not
particularly sensible (although not usually harmful). Note that in iterative
procedures, the Fobs you bring in to SFALL on each cycle should be the original
Fobs, not the one carried through from the previous cycle (which may have been
scaled, selected, or otherwise corrupted). In this case, it is probably more
sensible to use the NOSCALE option in SFALL, then either (i) use
SIGMAA to generate Partial or Combined Fourier coefficients for the
next map (if you want to weight them); or (ii) use RSTATS
to scale Fcalc to Fobs.
For a description of the original method see [1
]. A modification in calculating the gradients suggested by Alain Lifchitz is
used.
A list of common problems (with possible
solutions) can now be found below.
1) Generate atom map
Input:
i) XYZIN
-
-
Coordinates (Brookhaven format with CRYST1 and SCALEi lines).
Output:
i) MAPOUT
-
-
Atom map. Various modifications can be made to this map:
- a)
-
The `solvent' map. This map has zero at all points where there is atomic
density, and a constant value for all points where there was no atomic density.
It is used for calculating structure factors for `bulk solvent' (see
ICOEFL for ideas on using this); or for use as a mask.
- b)
-
A tagged `atom map' where each grid point is tagged with a flag to say which
residue, or which atom number, contributed most to it. This is used for
analysing map correlations, and real space R-factors.
2) Calculate structure factors
By default a scale factor is calculated between Fobs and Fc and is applied to
the output Fobs. If this is not wanted then the keyword NOSCALE
must be used.
Input:
i) either coordinates or a map (compulsory)
-
-
If map is input it can be `truncated' to eliminate very high or low density.
ii) HKLIN (optional)
-
-
This must contain: H K L FP SIGFP [ FREE FPART PHIP ...]
It MUST be sorted on h k l, and be in the CCP4 chosen asymmetric unit. See
NOTES ON HKLIN for further details. -
-
The FREE value is generated by FREERFLAG or in X-PLOR and transferred to MTZ
format using F2MTZ. See the FREERFLAG
documentation. (Note that X-PLOR 3.1 only assigns flags of 0 or 1.) The keyword
FREERFLAG allows reflections with a given FREE value to be excluded from
refinement and Fc calculations.
-
-
IF FPART and PHIP have been assigned, the structure factor output will equal
the scaled FPART vector plus the contribution calculated from the input plus
the scaled FPART vector.
Output:
i) HKLOUT containing: H K L ... FC PHIC (WANGWT)
-
-
(If you are preparing the 'averaged map' for generating the Wang envelope,
(flagged by keyword AVMODE) the WANGWT will be calculated
as well as the structure factor modified for use in solvent flattening.)
ii) MAPOUT - The atom `map' (optional).
-
-
You can also keep the atom map if you want to.
3) Generate X-ray gradients for
refinement
These are used for minimisation of X-ray differences between FP and FC. This
calculation uses the forward FFT.
Input:
i) XYZIN
-
-
Coordinates (Brookhaven format with header).
ii) HKLIN
-
-
Containing: H K L FP SIGFP [ FREE ]
When FREE is assigned then a subset of reflections is excluded from the
gradient calculation (see above
for details). Hence the agreement between them and Fc plays no part of the
refinement procedure. The Free R factor calculated for these reflections is a
useful indicator of the quality of the refinement [4].
It must be stressed that during a particular refinement procedure only one
freeR set of reflections should be used. It is meaningless to change the freeR
set in the middle of refinement. However, with several freeR sets, it is
possible to compare the same refinement procedure against different freeR sets.
It is still possible for all the freeR sets to be unsuitable e.g. if
the NCS and the crystal symmetry are related, as they often are, there may be
special relationships between classes of reflections, and this may mean that
the FREE set is not truly independent of the other observations.
Output:
i) XYZOUT
-
-
A coordinate file in which the least squares shifts have been applied.
ii) GRADMAT
-
-
A direct access file containing information about the X-ray gradients, ready
for input into PROLSQ (no longer available in the CCP4 Suite).
Further details of the specification of input and output files is given in
section INPUT AND OUTPUT FILES
.
See description of keyword MODE and examples
for uses for the various options.
Space Groups
SFALL has special subroutines to handle the following spacegroups:
P1 P21(4) P21212(1018 - alt. origin)
P212121(19) P4122/P4322(91/95) P41212/P43212 (92/96)
P31/P32(144/145) P3/R3(143/146) P3121/P3221(152/154)
P61/P65 (169/170)
It is possible to use the P1 version (or any suitable lower symmetry) for the
calculations. In particular, any non-primitive spacegroup can be run in the
appropriate primitive one. See keyword SFSGROUP
for details.
BUT BEWARE: To use SFSG P1 you must have your reflections sorted with l>=0.
This is NOT the default for P3121/P3221 or P6i22. You would be sensible to run
CAD with the keyword: `OUTLIM SPACEGROUP P1'
before calculating structure factors.
The crystal symmetry will be used to generate atom positions from a unique
molecule. See NOTES ON HKLIN for special
requirements for the HKLIN file when using a lower symmetry version.
KEYWORDED INPUT
All input is keyworded. Only the first 4 characters of a keyword are
significant. The keyworded records can be in any order.
Anything on a line after an ! or # is ignored and lines can be continued by
using a minus sign.
The possible keywords are:
- Compulsory
MODE
- Optional
AVMODE, BADD,
BINS, BRESET, CELL,
CHAIN, EDEN, END,
FORMFACTOR, FREERFLAG,
GRID, H2OBG,
LABIN, LABOUT, NAME,
NOSCALE, OMIT,
RATIO, RESOLUTION,
RMSSHIFT, RSCB, SCALE,
SFSGROUP, SIGMA,
STEPSIZE, SYMMETRY, TITLE,
TRUNCATE, VDWR,
VERBOSE, WEIGHT.
N.B. Some of these are compulsory for some MODEs.
DESCRIPTION OF KEYWORDS
MODE [ ATMMAP ... | SFCALC ... | SFREF ... ]
This keyword controls the course of the calculation.
- ATMMAP [ RESMOD ] [ ATMMOD ] [ SOLVMAP ]
-
Calculating `atom maps' from coordinates. The map extends over the asymmetric
unit of the appropriate space group (which may be different from the true
spacegroup, see keyword SFSGROUP for possible groups). N.B.
XYZIN and MAPOUT must be assigned.
-
-
Subsidiary keywords:
- RESMOD
-
output `atom map' values tagged by a residue number flag.
-
ATMMOD
-
output `atom map' values tagged by a atom number flag.
-
SOLVMAP
-
output `solvent map' with value specified by the H2OBG keyword (default=0.03)
at all points not occupied by atom density. All points occupied by atom density
set to 0.0.
- SFCALC ...
-
Structure factor calculation.
-
-
Subsidiary keywords:
- XYZIN or
MAPIN (one essential)
-
Define input type, either coordinates or a map. N.B. XYZIN or MAPIN
must be assigned.
- followed by:
-
- HKLIN (optional)
-
Only reflections which exist in the HKLIN file are output to HKLOUT. The output
FC PHIC are appended to input H K L FP SIGFP... See keywords LABIN
and LABOUT for further details.
N.B. HKLIN and HKLOUT must be assigned. See NOTES
ON HKLIN for the requirements placed on the HKLIN file. -
ATMMAP or SOLVMAP (optional)
-
Either an `atom map' or a `solvent map' is output. The map extends over the
asymmetric unit of the space group used in the structure factor calculation
(see keyword SFSGROUP). Note that for non-primitive
space groups, only density for symmetry mates corresponding to primitive
symmetry operations is output; use MODE ATMMAP to
obtain a complete map. N.B. MAPOUT must be assigned.
- SFREF
-
Coordinate and/or B-factor refinement; will calculate least squares gradient
and shifts. N.B. XYZIN and HKLIN must be assigned.
-
-
Subsidiary keywords:
- MAPIN
-
(This option is exceedingly unlikely to be needed.) Gradients will be taken
from an input difference map. Usually this difference map is generated
internally from the structure factors calculated from the atomic coordinates.
However, in some special situations, e.g. if you had averaged a
difference map earlier, you may want to input it rather than use the internally
generated one.
- RESTRAINED
-
Atom parameter gradients calculated from the fit of the X-ray data are output
to GRADMAT, to be combined with geometric restraints calculated by
PROTIN and PROLSQ (no longer available in the CCP4 Suite).
-
UNRESTRAINED
-
Parameter shifts to be applied without restraints; a file will be written to
XYZOUT.
-
-
If followed by:
- XYZ
-
refinement of the X Y and Z parameters will be done.
-
XYZB
-
refinement of the XYZ and B parameters will be done.
-
B
-
refinement of the B parameters will be done.
Only possible with the UNRESTRAINED option.
- Example 1: MODE ATMMAP
-
An electron density map is created from the atomic coordinates.
- Example 2: MODE
ATMMAP RESMOD
-
An electron density map is created from the atomic coordinates with an extra
residue identifying flag added to the atomic density; this is 100*IRES+MCHFLG
(If there are several chains in your protein you will need to describe the
chain limits using the keyword CHAIN to avoid the same
flag being given to atoms from different chains). This allows
OVERLAPMAP to analyse the correlation coefficient between 2 maps or the
`real space R-Factor' for contributions from each residue.
- Example 3: MODE
ATMMAP ATMMOD
-
An electron density map is created from the atomic coordinates with an extra
atom identifying flag added to the atomic density; this is 100*atom_number.
This allows OVERLAPMAP to analyse the correlation
coefficient between 2 maps or the `real space R-Factor' for contributions from
each atom (This was used to see if the hydrogen or deuterium atoms in a neutron
study could be distinguished in the electron density map).
- Example 4: MODE
SFCALC XYZIN HKLIN [ ATMMAP ]
-
A structure factor calculation is performed using the input coordinates. HKLIN
is read, and the FP are scaled to FC. HKLOUT contains at least H K L scaled_FP
scaled_SIGFP FC PHIC
(You can output unscaled FP by using the keyword NOSCALE).
An R-factor is calculated. If keyword ATMMAP is
given the electron density map created from the atomic coordinates is saved on
MAPOUT. - Example 5: MODE SFCALC MAPIN HKLIN
-
A structure factor calculation is performed using the input map. The map MUST
cover the asymmetric unit expected for the SFSGROUP.
This will be true if you have calculated the map using FFT
with the FFTSPACEGROUP the same as the
SFSGROUP and have NOT reset the AXIS order. Details are given
below. HKLIN is read, and the FP are scaled to FC. HKLOUT contains at
least H K L scaled_FP scaled_SIGFP FC PHIC
(You can output unscaled FP by setting keyword NOSCALE).
An R-factor is calculated. - Example 6: MODE SFCALC XYZIN (ATMMAP)
-
A structure factor calculation is performed using the input coordinates. Since
no HKLIN is read, the SYMMETRY you wish to use for the
structure will have to be given. The default is to use the symmetry of the
SFSGROUP. Output HKLOUT containing h k l FC PHIC (WANGWT if
AVMODE set) for all reflections within the requested resolution limits.
-
-
If keyword ATMMAP is given the electron density map
created from the atomic coordinates is saved on MAPOUT.
- Example 7: MODE SFCALC
MAPIN
-
A structure factor calculation is performed using the input map. The map MUST
cover the asymmetric unit expected for the SFSGROUP.
This will be true if you have calculated the map using FFT
with the FFTSPACEGROUP the same as the
SFSGROUP and have NOT reset the AXIS order.
- Example 8: MODE SFREF XYZ
(or XYZB) RESTRAINED
-
Used for restrained refinement of XYZ coordinates or XYZ coordinates and B
factors. It is used in conjunction with PROTIN and
PROLSQ (no longer available in the CCP4 Suite). First a structure factor
calculation is done, the FP scaled and an R-factor calculated. Then the program
calculates the normal matrix and gradient contributions for the parameters to
be refined. Gradients of atomic shifts are output to GRADMAT. Reference: R.
Agarwal, A. Lifchitz, J. Konnert, W.A Hendrickson [6].
- Example 9: MODE SFREF XYZ (or XYZB or B) UNRESTRAINED
-
Used for unrestrained refinement of XYZ coordinates or XYZ coordinates and B
factors or Bfactors alone. First a structure factor calculation is done, the FP
scaled and an R-factor calculated. Then the program calculates the normal
matrix, gradient contributions and shifts for the parameters to be refined. The
shifts are analysed using arguments from STEPSIZE,
RATIO and a set of modified coordinates output to XYZOUT. Reference: R.
Agarwal, A. Lifchitz [6].
SFSGROUP <sfsgroup>
<sfsgroup> is the spacegroup number or name of the spacegroup used for
the calculation. You don't normally need to specify this as a sensible default
will be used (the highest symmetry one the program can use which is compatible
with your data) as printed in the output.
N.B. - This IS NOT NECESSARILY the spacegroup of your structure. E.g.
, you may well be calculating structure factors in P1 for a structure with
spacegroup F43212. The program will check whether the requested
<sfsgroup> is compatible with your data.
The possibilities for <sfsgroup> are: P1, P21 (4), P21212 (1018 -
alternative origin), P212121 (19), P4122/P4322 (91/95), P41212/P43212 (92/96),
P31/P32 (144/145), P3/R3 (143/146), P3121/P3221 (152/154), P61/P65 (169/170).
Any spacegroup can be run in an appropriate one whose symmetry operators are a
subset of the one of interest (P1 can always be used as long as the hkl data
are sorted with l>=0). By judiciously shifting origins many spacegroups
become supergroups. $CLIBD/symop.lib contains the modified symmetry operators
for some spacegroups. You may move the origin of your coordinates with
PDBSET. For example:
-
-
SYMGEN X + 1/4, Y + 1/4, Z
LABIN <program label>=<file label> ...
Note: This is essential for all refinement and if HKLIN is assigned for a
structure factor calculation.
The following labels can be assigned.
H K L FP SIGFP FREE FPART PHIP
PHIP F0 F1 F2 F3 F4 F5 F6
F7 F8 F9 F10 F11 F12 F13 F14
F15 F16 F17 F18 F19
Assignments for FP and SIGFP are always required. If the free R flag is
assigned to FREE, a subset of reflections can be excluded from the calculation
or refinement. See the keyword FREERFLAG. If you wish
to add a known FPART to the structure factors, assign FPART and PHIP. This is
useful if you want to calculate hydrogen contributions, add in an anisotropic
heavy atom, add in a bulk solvent correction, which has been calculated
somewhere else. The FPART can be scaled relative to the FC by inputting a
SCALE
.
If you wish to copy some but not all the other columns from your input to your
output file, you may assign F0 F1 ... F19. These will be transcribed to the
output file. If you want to copy everything over, use ALLIN
on the LABOUT line.
LABOUT <program label>=<file label> ... [
ALLIN ]
<program labels> can be given for FC PHIC (and WANGWT, if required).
Any column assigned on LABIN will be automatically copied to the output file.
Note that attempting to reassign any of the labels specified on LABIN will
cause label assignment errors.
ALLIN will copy all columns in the input file to the
output file.
Beware: the program may get confused if a file label included in ALLIN is the
same as a program label.
OPTIONAL KEYWORDS:
AVMODE [ RADIUS <radius> ] [ WEIGHT <modewt> ]
When you are preparing the 'averaged map' for generating the Wang envelope,
this sets the radius of the map sphere and the type of weighting.
Subsidiary keywords:
- RADIUS <radius>
-
Specify the <radius> of the averaging sphere (angstroms)
-
WEIGHT <modewt>
-
- <modewt> = 1
-
Use weighting scheme w=1-(r/R) (Wang's method)
- <modewt> = 2
-
Use weighting scheme w=1-(r/R)**2
BADD <badd>
A parameter added to all atomic B values. It will smear the Gaussians used to
generate the atomic density over a larger volume. L.Ten Eyck suggests this
should allow a coarser sampling grid. G. Bricogne is sceptical and certainly it
introduces serious errors into the B12 coenzyme structure factors at 1Å
resolution (ref Hugh Savage). EJD shares the scepticism. The default value
(0.0) gives reasonable results for most problems.
BINS <nbins>
<nbins> (default 50) is the number of bins for the analysis of R-factor
v. resolution. Ranges of 4sin**2/Lambda**2 are of width 1.0/<nbins>. For
high resolution data this needs to be reset.
BRESET <bmin>
B-factors less than <bmin> (default bmin=5.0) are set equal to
<bmin>. For high resolution data this needs to be reset. If you have data
with low B factor, be aware that an implicit BRESET with a value of2.0 is
performed by sfall. If you want to avoid it, specify BRESET 0.0 explicitly.
CELL <a> <b> <c> [ <alpha>
<beta> <gamma> ]
Input cell parameters explicitly. This is checked against the cell read from
the MTZ file and that read from the PDB file. The program will continue with a
warning if there is a small inconsistency and stop if there is a serious one.
<alpha> <beta> <gamma> default to 90 degrees. If HKLIN is
assigned the cell constants will be picked up from there, otherwise you may
need to supply them yourself, for example with MODE ATMMAP and an input PDB
with missing or incorrect CRYST1 record.
CHAIN <chnlab> <iresn> <iresc>
Only needed for MODE ATMMAP RESMOD if your
coordinate file has more than one chain. You will need one CHAIN card for each
chain.
- <chnlab>
-
single character label (e.g. A)
- <iresn>
-
first residue number
- <iresc>
-
last residue number
EDEN <escale> <cmp> <cmn>
Done for Cecil Tate. The input map is scaled by:
xscal=escale/nx*ny*nz
and truncated.
if density(x,y,z)<0
density(x,y,z)=xscal*cmn*tanh(density(x,y,z)/cmn))
if density(x,y,z)>0
density(x,y,z)=xscal*cmp*tanh(density(x,y,z)/cmp))
END
This ends input.
FORMFACTOR [ NEUTRON | NGAUSS <ngauss> ]
<atnam1> <atnam2> ...
See the NOTES ON FORMFACTORS
.
SFALL has hardcoded X-ray and neutron formfactors available for atom types H C
N O S. (Default CuKa X-ray formfactors.) Other atom types are tabulated in
$CLIBD/atomsf.lib (X-ray) and $CLIBD/atomsf_neutron.lib (neutron). If your atom
identifier is not recognised (i.e.
it is not C, N, O, or S) the program will read atomsf.lib and take the FIRST
table which matches your atom ID. This may not be satisfactory if your atom
type is Fe+3, say, and you may wish to specify the formfactor yourself.
Subsidiary keywords:
- NEUTRON
-
Use neutron formfactors. Remember to assign ATOMSF $CLIBD/atomsf_neutron.lib to
read the correct library of formfactors.
or
- NGAUSS <ngauss>
-
<ngauss> (default 5) is the number of Gaussian terms requested for the
default atoms H C N O S. The only possible values are 2 or 5.
- <atnam1>
<atnam2> ...
-
Atom names for any other atom types used in XYZIN. The atom names MUST exactly
match the entries in ATOMSF library.
Example:
FORM NEUTRON P Co+3 D
FORM NGAUSS 5 Fe+3
FREERFLAG <freeflag>
Reflections where the value of the column FREE, taken from the MTZ file, equals
<freeflag> (default 0) are omitted from the refinement or FC calculation
and their R-factor is monitored separately. If you do not wish a free-R set to
be used don't assign FREE.
GRID <nx> <ny> <nz>
<nx>, <ny>, <nz> are integers giving the number of divisions
along each whole unit cell edge. The atomic density is sampled on this grid, so
if it is too coarse you will introduce errors into the structure factor
calculation. These default to values near to celledge*2, which defines a grid
spacing of 0.5Å. Different spacegroups have special requirements for
factorisation. No prime factors higher than 19 are permitted (The `atom map'
generated on this grid has the asymmetric unit defined for this spacegroup. See
table below
.).
The following general restrictions must be observed:
<nx> >= 2 * HMAX + 1
<ny> >= 2 * KMAX + 1
<nz> >= 2 * LMAX + 1
In addition there are further space group dependent
conditions as follows (where `n' is an integer, i.e. <nz> must
be even, for instance):
<nx> <ny> <nz>
P1 - 2n 2n
P21 2n 4n 2n
P21212 4n 4n 4n
P21212a '' '' ''
P212121 '' '' ''
P4122/P4322 4n 4n 8n
P41212/P43212 '' '' ''
P31/P32 6n 6n 6n
P3/R3 '' '' ''
P3121/p3221 '' '' ''
P61/P65 6n 6n 12n
H2OBG <h2obg>
Bulk water scattering to fill in all empty regions. <h2obg> (default 0.0)
is the value which will fill the SOLVMAP. ICOEFL uses
this.
NAME PROJECT <pname> CRYSTAL <xname> DATASET
<dname>
When a new MTZ file is being created, i.e. for MODE SFCALC with no HKLIN
assigned, this keyword is used to set up an appropriate project/crystal/dataset
for the output MTZ file. This is strongly recommended, otherwise the LABOUT
columns will be assigned to a default dataset with meaningless (for you) names.
NOSCALE
Scale between FP and FC is calculated but not applied to output FP and SIGFP.
OMIT <th>
Reflections are not included in the refinement if mod(Fo/Fc)><th> or
mod(Fc/Fo)><th>. This provides a means of excluding data with bad
agreement from the refinement and it should be used with care (Default: 1000).
RATIO <ratio>
Large xyz and B shifts are truncated to <ratio>*rms shift. This is only
used for UNRESTRAINED
refinement.
At the beginning of a refinement, the value of ratio should be kept fairly low
(1.5 - 2.0) as shifts tend to be large when the errors in the parameters are
large. When most atoms are well refined, then the ratio may be increased to 3
or 4, to allow for the refinement of these atoms which are poorly placed.
RESOLUTION <dmin> [ <dmax> ]
<dmin>, <dmax> (default 1, 1000) are resolution limits in Angstroms
(or in 4*sin**2/l**2 if both are <1.0. They can be given in either order. If
only one value is given, it is assumed to be the high resolution cutoff. The
SFCALC option outputs FC for all reflections to the outer limit (You
usually calculate SFs to make maps and it is best to calculate all SFs and then
limit resolution etc. in FFT
).
Default values are overwritten from HKLIN to include all reflections.
RMSSHIFT [XYZ] <rmsxyz> [B] <rmsb>
The shifts are scaled to limit the rms XYZ shift and B shift to be equal to the
set value, or to the calculated one, whichever is smaller. Only useful during
unrestrained refinement. Equivalent to setting STEPSIZE.
If RMSSHIFT is given STEPSIZE will be ignored. Default values: 0.0 0.0.
RSCB <dsmin> <dsmax>
in Angstroms (either order) or 4s**2/lambda**2. Default values: 4.5Å and 1.0Å.
If appropriate the program refines the value of the scale and overall B value
to fit <Fo**2> to <Fc**2> using reflections within this resolution
range. The default resolution range is chosen to use only those reflections
where the <Fo**2> distribution fit Wilson statistics reasonably well. A
standard protein distribution shows a sharp 5Å dip, which can distort the scale
factors. <Fc**2> is often unrealistically high for low resolution data,
until the solvent is adequately modelled. The lack of `solvent contrast' causes
this.
Users are often worried because they get lower R-Factors if this scale is
fitted for all data. This does not necessarily mean such a scale is more
correct!
The default HKLOUT will have:
-
-
H K L Scale*FP Scale*SIGFP ... FC*exp(-overall_B*ss) PHIC
If NOSCALE is specified HKLOUT will have:
-
-
H K L FP SIGFP ... FC*exp(-overall_B*ss) PHIC
After the end of each structure factor calculation a new scale factor is
calculated from the following expression, where SCNEW is the new scale factor
and SCALE is the previous scale factor.
Sum {(SCALE * Fcalc) * (Fobs)}
SCNEW = SCALE * --------------------------------------
Sum {(SCALE * Fcalc) * (SCALE * Fcalc)}
SCALE [FPART] <scale> <bov>
These parameters are applied to the Fpart as
<scale>*exp(-<bov>*ssq/ll) (Defaults: <scale>=1,
<bov>=0).
Some structure factor outputs (notably SHELX) include an Fcalc scaled as
10*absolute. SFALL calculates FC on the absolute scale. If you wanted to
combine these values you would need
SCALE FPART 0.1
SIGMA <sigma>
Data with Fobs < sigma * sd(Fobs) will not be included in refinement
calculations. This cutoff has no effect on structure factor calculations. Do
not be alarmed when your R-Factor appears to be higher for them than when you
are refining; you will have more reflections in the structure factor sum.
Default value: -1.0, i.e. no cutoff.
STEPSIZE <szo> <szb> <szfrc>
This is only used during UNRESTRAINED refinement.
XYZ shifts are multiplied by <szo>, B shifts are multiplied by
<szb>. <szfrac> is the fractional change in step size above which a
second calculation is done. NOT used if RMSSHIFT
is given. Default values: 1.0 1.0 0.3.
The program tries to estimate these using an algorithm of R. Agarwal's. See
NOTES ON STEPSIZE
.
If you are doing unrestrained refinement, structure factor calculation will be
repeated if abs(initial step - optimum step)/(initial step).
SYMMETRY <sg>
(Defaults to the symmetry of the SFSGROUP). <sg>
is the spacegroup number or name. If HKLIN is assigned the symmetry will be
picked up from the MTZ file header and should NOT be specified with this
keyword. If HKLIN is not assigned you may need to give this. See
examples below.
TITLE <title>
Title (up to 80 characters) used on the printer output. The title information
after the keyword is used as a heading for map sections and also as title
information for the output reflection data file if present.
TRUNCATE <rhmin> <rhmax>
The input map is truncated - all values <<rhmin> are set to
<rhmin>, and all values ><rhmax> set to <rhmax>. This
should be used for the solvent flattening procedure.
VDWR <dlim>
<dlim> helps specify the maximum radius for which an atom is considered
to contribute to the electron density while building the `atom map'. The
initial value is modified by a factor Sqrt(B+25)/2PI to "spread" the density
for atoms with high B factors. Default: 2.5.
The speed and accuracy of the calculation depend to some extent on the values
chosen. For 1.5Å data a value of 2.5 is sufficient (a maximum value of 3.0 is
allowed). For neutron data, or for atoms with low B values, it could safely be
reset to 1Å.
VERBOSE
Lots and lots of output - useful if debugging. Please note that this may
produce a message telling you that atom names have been set to ZZZ or residue
names to DUM if you have non-standard names. This is not important as the
atomic number is determined from the original atom type. See
Notes on Formfactors and Notes on Atom Names
below.
WEIGHT <w>
The weight (default 0.0) applied to each structure factor is given by
(2sintheta/lambda)**<w>
(See [1]). At the beginning of the refinement use
<w>=-1.5 or -1.0 to put most weight on the low angle data. As the
refinement proceeds increase <w> to 0.0 (But actually we never use
it...).
MEMORY ALLOCATION
The program allocates a large work array, whose size may be changed from the
default by setting the logical name MEMSIZE to an appropriate integer value.
The current value is printed in the output (look for `Memory allocation').
There are other dimensions which can currently only be changed by recompilation
and the error messages are currently not very clear about changing the MEMSIZE.
INPUT AND OUTPUT FILES
The following input and output files are used by the program (in addition to
some scratch files which are deleted at the end of program execution):
Input Files:
- XYZIN
-
Input Coordinates File (Brookhaven format)
- HKLIN
-
Input Reflection Data File (MTZ format)
- MAPIN
-
Input Map File (standard map file format)
Output Files:
- XYZOUT
-
Output Coordinate File (Brookhaven format)
- HKLOUT
-
Output reflection data file in standard MTZ format
- MAPOUT
-
Output map file (Standard map file format)
- GRADMAT
-
Output matrix file for Hendrickson Konnert program
The files used in a single run of the program depend on the options requested
via the data control keywords.
NOTES ON FORMFACTORS
The form factors are read from the file assigned to ATOMSF. This defaults to
$CLIBD/atomsf.lib. Coefficients for analytical approximation to the scattering
factors for atom identifier at a given resolution sintheta/lambda (=S) are
given as:
a1*exp(-b1*s*s) + a2*exp(-b2*s*s) - 2 Gaussian approximation
(Agarwal, 1978)
or as:
a1*exp(-b1*s*s) + a2*exp(-b2*s*s) + a3*exp(-b3*s*s)
+ a4*exp(-b4*s*s) + c
- 5 Gaussian approximation
See [2] pp. 99-101 (Table 2.2B) or [5]
p. 500ff, table 6.1.1.4 for values a1-a4, b1-b4, c for X-rays. For neutron
scattering see [2
] pp. 99-101 (Table 2.2B).
The atom name can be used to assign a formfactor to it.
- WARNING 1)
-
The program distinguishes CA - the metal - from CA - main chain atom in a
protein by its position on the coordinate line... And older versions of FRODO
get this wrong! The rules on defining atom names are the same as those used by
PDB. The chemical symbol is specified in columns 13 and 14 and the symbol is
RIGHT justified.
0 1
e.g. column 1 5 0 34
ATOM ... ZN is Zinc
ATOM ... U is Uranium
For the common atom types the program includes tables for form factors for the
required value of NGAUSS
. They are C N O H and S. All others must be read from the library files
atomsf.lib or atomsf_neutron.lib which are automatically assigned. These
contain tables of formfactors for every known atom (Kim Henrick monument).
atomsf.lib contains the five Gaussian form for X-ray data and the two Gaussian
form for atoms H C N O and S. Unless you are working at very high resolution
(>1.5Å) the two Gaussian approximations are adequate and speed up the
calculations somewhat.
atomsf_neutron.lib has the constant value needed for neutron scattering.
NOTES ON XYZIN, ATOM NAMES ETC.
Coordinates are assigned to XYZIN in Brookhaven format.
The CRYST1 card (which gives the cell dimensions) and the SCALEi cards (which
give the required transform for turning the coordinates into fractional ones)
should be included in XYZIN.
Cell parameters from the HKLIN file or CELL
keyword (if either are present) will be used in the absence of CRYST1 and
SCALEi cards, but in all cases the results of omitting the cards are somewhat
unpredictable - in some cases the program may fail, in others it may proceed
but using random or inappropriate values for the cell parameters.
If an atom has occupancy set =0.0 it will be copied to the output file if
appropriate, but not used in the calculation. This means it is easy to exclude
suspect atoms from any calculation without deleting them from the file.
The program does not make use of anisotropic U factors in the calculation of
electron density and structure factors, and so ANISOU records are ignored.
Example of part of XYZIN:
REMARK 2ZN refined coordinates
CRYST1 82.500 82.500 34.000 90.00 90.00 120.0 1 R3
SCALE1 0.01212 0.00700 0.00000 0.00000
SCALE2 0.00000 0.01400 0.00000 0.00000
SCALE3 0.00000 0.00000 0.02941 0.00000
ATOM 1 N GLY A 201 1 -8.863 16.944 14.289 1.00 21.88
.................................
ATOM 826 CA ALA D 130 4 -5.022 21.709 -11.876 1.00 15.58
ATOM 830 CA WAT A 249 1 -0.002 -0.004 7.891 0.33 10.40
ATOM 831 ZN2 WAT A 249 1 0.000 0.000 -8.039 0.33 11.00
ATOM 832 OW1 WAT A 249 1 0.000 0.000 -8.039 0.00 11.00
.................................
Atom 1 is Nitrogen, 826 is Carbon, 830 is Calcium, 831 is Zinc and
832 is Oxygen.
NOTES ON HKLIN
List of H K L .. FP SIGFP ( FPART PHIP F0 .. F19 )
An MTZ file with column labels corresponding to H K L .. FP SIGFP [FREE FPART
PHIP F0 .. F19]. The header will contain the cell dimensions and spacegroup of
the structure (if you want to change these, use MTZUTILS
).
For SFCALC this file MUST BE SORTED on h k l (i.e.
so h is the slowest varying index, then k and l is the fastest), and be within
the asymmetric unit expected by the program; see below
for expected limits (CAD will reorder your file if
necessary). It is sensible to do this if your data has not been processed by
CCP4 programs. XDS, MADNESS etc. sometimes have different default asymmetric
units to the CCP4 suite.
# an example of sorting native data and checking for correct
# asymmetric unit
cad HKLIN1 $CEXAM/toxd/toxd HKLOUT toxd_sorted <<EOF
TITLE Toxd data sorted - default symmetry P212121
LABIN FILE 1 E1=FTOXD3 E2=SIGFTOXD3
CTYPE FILE 1 E1=F E2=Q
EOF
For SFREF this file MUST BE SORTED on h k l (i.e.
so h is the slowest varying index, then k and l is the fastest), and be
extended if necessary to cover the whole asymmetric unit expected by the
program for the SF spacegroup; see below for
expected limits. If you are working at a lower symmetry than optimal the data
must have been extended and sorted (CAD will do this
too). There is a reason - sorting data is quite slow, and you will probably use
this file many times during the course of a refinement.
# an example of extending and sorting native data to P1 +-h,+-k,+l
cad HKLIN1 $CEXAM/toxd/toxd HKLOUT toxd_p1 << EOF
OUTLIM SPACEGROUP P1
TITLE Toxd data extended to P1 cell
LABIN FILE 1 E1=FTOXD3 E2=SIGFTOXD3
CTYPE FILE 1 E1=F E2=Q
EOF
The agreement factor analysis is carried out against the values of FP. Sigma
values are assumed to be in column SIGFP.
IF FPART and PHIP have been defined, the FC vector equals the FPART vector plus
the FC contribution calculated from the input. FPART can be scaled by entering
a suitable value on the SCALE
input.
Other columns in the input file can be copied to the output file by assigning
them to F0= F1= ... Or all columns can be copied over if the
ALLIN subkeyword is given in LABOUT.
Note however that if the input file contains columns with labels that match the
program labels then SFALL will stop with a fatal error.
Limits for Indices for the various space
groups (these are the same as those used in the data reduction programs and
FFT):
P1 h k l : l >= 0
h k 0 : h >= 0
0 k 0 : k >= 0
P21 h k l : k >= 0, l >= 0
h k 0 : h >= 0
P21212 h k l : h >= 0, k >= 0, l >= 0
P212121 h k l : h >= 0, k >= 0, l >= 0
P41212 h k l : h >= 0, k >= 0, l >= 0, h >= k
(+ other tetragonal)
P31/P32 h k l : h >= 0, k > 0
0 0 l : l > 0
P3/R3 h k l : h >= 0, k > 0
0 0 l : l > 0
P3121/P3221 h k l : h >= 0, k >= 0, k <= h (all l)
h h l : l >= 0
P61/P65 h k l : h >= 0, k >= 0, k <= h (all l)
h h l : l >= 0
NOTES ON MAPIN
A standard MAP file. The header will contain the cell dimensions and spacegroup
of the structure.
If you wish to generate structure factors from a input map it is essential that
the map is in an appropriate format for the SFALL spacegroup you request.
Common errors are:
1) using a different grid for the FFT calculation and the SFALL calculation.
FATAL!
2) Changing the axis order in the FFT calculation - both have the same defaults
for the same spacegroup! FATAL!
3) Choosing a different asymmetric unit for the FFT and the SFALL calculation.
You must make sure your map has the one which covers the SFALL requirements.
There is no flexibility here.
Limits for axes for the various space groups
(these are the same as those used as defaults in FFT
):
In space group P1, P21 and P21212a, 'b' is taken as the unique axis.
In space group P21212a, 1/4 is subtracted from the X and Y values of the
equivalent positions given in International Tables.
X1 X2 X3 Range of X3 Axis order
P1 Z X Y 0 to Y Z X Y
P21 Z X Y 0 to Y/2-1 Z X Y
P21212a Z X Y 0 to Y/4 Z X Y
P212121 X Y Z 0 to Z/4 Y X Z
P4122 X Y Z 0 to Z/8 Y X Z
P41212 X Y Z 0 to Z/8 Y X Z
P4322 X Y Z 0 to Z/8 Y X Z
P43212 X Y Z 0 to Z/8 Y X Z
P31 X Y Z 0 to Z/3-1 Y X Z
P32 X Y Z 0 to Z/3-1 Y X Z
P3 X Y Z 0 to Z-1 Y X Z
R3 X Y Z 0 to Z/3-1 Y X Z
P3121 X Y Z 0 to Z/6 Y X Z
P3221 X Y Z 0 to Z/6 Y X Z
P61 X Y Z 0 to Z/6-1 Y X Z
P65 X Y Z 0 to Z/6-1 Y X Z
OUTPUT from Structure
Factor Calculation
Note: The logfile has been flagged with XLOGGRAPH symbols
to allow selected output to be displayed on an Xterminal.
- (a)
-
The limits of the box used in modelling the electron density.
- (b)
-
A list, with coordinates, of the first 10 atoms included in generating the
model electron density map.
- (c)
-
The number of atoms input, the number of atoms in the sorted list of atoms, and
the number of atoms used in generating the electron density.
- (d)
-
The minimum, average and maximum values of the temperature factors for the
atoms.
- (e)
-
Minimum and maximum electron density values are printed for each section of the
generated density. The number of slabs used depends on the space allocated for
the work arrays in the program.
- (f)
-
The number of reflections used and the number of reflections on file.
- (g)
-
Old and new scale factors (with an overall temperature factor BOV if
temperature factor dependent re-scaling was applied).
- (h)
-
A structure factor summary table in ranges of 4sin**2theta/lambda**2 giving,
for each range, the number of reflections, the average values of Fobs and
Fcalc, the reliability index and an average value of the weighted errors.
- (i)
-
The overall reliability index after re-scaling.
NOTES ON XYZOUT
Coordinates in Brookhaven format with the XYZ or B shifts applied - output of
unrestrained refinement.
The proportion of the shift to be applied can be modified (see the description
of keyword STEPSIZE
).
If an input atom has occupancy of 0.0 it will be copied to the output file.
NOTES ON HKLOUT
The output file contains FC and PHIC only for those reflections which exist in
the HKLIN file, even if the structure factor calculation has covered more of
reciprocal space. The list will include ALL reflections in HKLIN within the
outer resolution limit. There is no inner resolution cutoff, or sigma cutoff.
Hence there may well be more reflections included than during refinement and
your R-factor may well be higher than the one output during refinement. All
columns specified in LABIN will be written to HKLOUT. If ALLIN
is specified then all input columns will be in HKLOUT.
If no HKLIN has been assigned the program outputs a list of H K L FC PHIC for
all possible reflections within the reciprocal asymmetric unit for the
SFSGROUP
that are within the requested resolution limits.
FC and PHIC can be modified for use in solvent flattening (see keyword
AVMODE).
NOTES ON MAPOUT
Several types of maps can be output - see keyword MODE for
how to produce each one.
- a)
-
The `atom map'. All density assigned at each grid point due to a neighbouring
atom.
- b)
-
The `atom map' modified to include a residue flag for each main chain and side
chain atom which has contributed most to this grid point. This allows the map
correlation program to produce a set of correlations residue by residue between
an ideal map (the `atom map') and any other map - part of real space R-factor
stuff.
- c)
-
The `atom map' modified to include an atom number flag. This can be used to
look at hydrogen density in a neutron map.
- d)
-
A `solvent map' with zero at all points within the protein and some constant
(H2obg) at all other points. This is a step on the way to adding in bulk
solvent to calculated structure factors.
OUTPUT from the Gradients Calculation - NOTES ON
GRADMAT
A direct access file of gradient information is output ready for input to
PROLSQ (no longer available in the CCP4 Suite). The following items are output
in this section:
- (a)
-
The value of the minimisation function (PII). Plus the number of reflections
included in the calculation of PII.
- (b)
-
The average and peak values of the gradients and shifts.
- (c)
-
The slope of the search direction and the initial step size used in the
quadratic approximation to find the optimum step size.
- (d)
-
For unrestrained refinement a table of estimated error against B-factor is
given. For each temperature range there is the number of atoms and the
estimated error (standard deviation).
OUTPUT from the
Calculation of the Optimum Shifts
This section gives PII (minimisation function) values, relative slopes,
calculated optimum step sizes and, when determined, the actual step size used
in applying the shifts.
NOTES ON STEPSIZE
R. Agarwal attempted to estimate the best step size to use during
UNRESTRAINED refinement. A displacement of the form alphaDelta(u),
which minimises the minimisation function of the least squares P(u +
alphaDelta(u)), will be the optimum where alpha is the optimum step size. As P
as a function of alpha approximates to a quadratic function, it is possible to
calculate the optimum step size, alphaopt, from the values of P(alpha) at alpha
= 0 and alpha = alpha1 and from the slope of P(alpha) at alpha = 0. The initial
step size alpha1 used in the calculation is chosen as 1.0 or the value input in
the control data. If the fractional change in step size is greater than SZFRC
(by default 0.3) then a second calculation is done with alpha1 taken as the
optimum step size just calculated.
COMMON PROBLEMS
-
Problem: Warning after opening the input reflection file HKLIN about the
sort order not being H K L.
Solution: This may not crash the program (sometimes it does) but even so
some of the output may be unreliable if the reflections in HKLIN have not been
sorted with h as the slowest, k as the intermediate, and l as the fastest
varying index. See NOTES ON HKLIN
above for the requirements placed on HKLIN.
-
Problem: Program stops with the message:
MTZ file label duplicates program label name
Solution: SFALL should warn you before stopping, which labels are
duplicated, e.g.:
Warning: this may fail!
This column label FC is also a program label
The suggested workaround at present is to rename the offending file labels, for
example using a utility program such as SFTOOLS
, before rerunning the program.
-
Problem: No conversion from orthogonal to fractional atom co-ordinates
(ie listed fractional co-ordinates have no value, or the same values as the
orthogonal co-ordinates).
or
Problem: Program stops with the message:
**** No Cell Input to subroutine rbfro1?? ****
after reading in XYZIN.
Solution: Check that CRYST1 and SCALE cards are present in the input
co-ordinate file XYZIN.(See NOTES ON XYZIN
above).
-
Problem: Program crashes or stops after reporting statistics on B's in
MODE SFCALC. (May also occur in other MODEs for the same reason.)
Solution: Check that the input reflection file has the correct
asymmetric unit and that the reflections have been sorted so that h is the
slowest changing index, k is the next, and l varies the fastest. (See
NOTES ON HKLIN
above). Nb: the program should now generate an explicit error message
suggesting that this is the problem.
-
Problem: Program fails after printing the title and the line "Chain
Names - First Residue - Last Residue - serial Kcount" (possibly also with error
messages concerning array sizes).
Solution: You may have to increase the value of MEMSIZE (see
MEMORY ALLOCATION above). N.B.
: the program should now generate an explicit error message suggesting that
this is the problem.
-
Common problems associated with the input map can be found in the section
NOTES ON MAPIN.
EXAMPLES
Calculate map from
atom coordinates
sfall
XYZIN /f/scratch/swift/pa_model1_2.pdb
XYZOUT $SCRATCH/junk.pdb
MAPOUT $SCRATCH/junk.map
<< END-sfall
TITL Phasing on initial PA structure (refined twice)
GRID 52 64 76
MODE ATMMAP
RESO 30 2.1
BINS 60
RSCB 5.0 2.1
SFSG 1
END
END-sfall
Read in
coordinates to calculate structure factors
sfall
HKLIN $SCRATCH/pt2_pt4_shhg2_os2_nat.mtz
HKLOUT $SCRATCH/pa_model1_2_phase.mtz
XYZIN /f/scratch/swift/pa_model1_2.pdb
<< END-sfall
TITL Phasing on initial PA structure (refined twice)
GRID 52 64 76
MODE SFCALC XYZIN HKLIN
RESO 30 2.1
BINS 60
RSCB 5.0 2.1
SFSG 1
LABI FP=F_V4 SIGFP=SIGF_V4 FREE=FreeRflag
LABO FC=FC PHIC=AC
END
END-sfall
Read in map to
calculate structure factors
sfall
HKLIN $SCRATCH/pt2_pt4_shhg2_os2_nat.mtz
HKLOUT $SCRATCH/pa_model1_2_phase.mtz
MAPIN $SCRATCH/junk.map
<< END-sfall
TITL Phasing on initial PA structure (refined twice)
GRID 52 64 76
MODE SFCALC MAPIN HKLIN
RESO 30 2.1
BINS 60
RSCB 8.0 2.1
SFSG 1
LABI FP=F_V4 SIGFP=SIGF_V4
LABO FC=FC PHIC=AC
END
END-sfall
Iteration_cycles to do the
back-transformation and phase combination cycles in solvent flattening
procedure
# - back-transform map with sfall to obtain Fcs and phases, scale
# and calculate R-factor
# - combine the phases with sigmaa
# - run fft to calculate a new map with Fobs and combined phases as
# input to flatmap
#
sfall HKLIN ../mtz/fvb18_sigmaa.mtz
HKLOUT fvb_flat_14.mtz
MAPIN fvb_flattened_14.map
<< EOF-sfall
titl back transformation for cyc 14
mode SFCALC MAPIN HKLIN
grid 104 128 160
reso 200.0 3.0
BINS 40
rscb 10.0 3.0
sfsg 19
AVMODE RADIUS 8
badd 0
FORM NGAUSS 5 C H N O S
LABI FP=fvb_Fnp SIGFP=fvb_SIGFnp -
F0=PHIB_I3 F1=FOM_I3 -
F2=A F3=B F4=C F5=D
LABO FC=FCWANG PHIC=PHICWANG WANGWT=WC8
END
EOF-sfall
Unrestrained refinement
sfall
HKLIN ../data/taka_fx_trunc.mtz
XYZIN ../data/takaxp_model8_b.pdb
XYZOUT $SCRATCH/junk.pdb
<< END-sfall
TITL TAKAXP_MODEL8_9 SFS
GRID 104 136 264
MODE SFREF XYZ UNRESTRAINED
FREE 3.0
RESO 20 2.1 60
RSCB 8.0 2.1
SFSG 19
FORM NGAU 2 Ca Cr Fe+2 P
LABI FP=FP SIGFP=SIGFP FREE=FreeRflag
END-sfall
Prolsq refinement cycles
#!/bin/csh -f
#
# REFINE consists of sfall, protin and prolsq
#
set LastCycle=146
set Number_of_Cycles=8
set CycleCounter=0
#
Begin:
#
@ CurrentCycle=$LastCycle + 1
#
# ========== SFALL ==========
#
set DATE=`date`
echo ' *************************** '$CurrentCycle' **************'
echo ' '
echo ' Running SFALL for Cycle Number :' $CurrentCycle' on '$DATE
echo ' '
echo ' *************************** '$CurrentCycle' **************'
#
sfall HKLIN s91a.mtz
XYZIN s91a_cyc${LastCycle}.brk
GRADMAT s91a_GRADMAT${CurrentCycle}.tmp
<< END-sfall
TITL Barnase s91a Mutant
MODE SFREF XYZB RESTRAINED ! for prolsq
GRID 90 90 120 !div CELL by these should give .=. 0.7 A
BINS 60
RESO 10 2.2 !last no. is nstep
RATI 3 !truncate xyz and B shrifts to RATI*RMS
BADD 0 !smear the gaussians to gen atomic density [10.0]
RSCB 4 1.5 !limits for refining scale and B, same as RESO ?
SFSG 145 ! fft space group
LABIN FP=S91A_F SIGFP=S91A_SIGF FREE=FreeRflag
END
END-sfall
#
/bin/rm ${SCRATCH}s91a_junk.brk ${SCRATCH}sfall*
#
# ========== PROTIN ==========
#
set DATE=`date`
echo ' *************************** '$CurrentCycle' **************'
echo ' '
echo ' Running PROTIN for Cycle Number :' $CurrentCycle' on '$DATE
echo ' Running PROTIN for Cycle Number :' $CurrentCycle' on '$DATE
echo ' '
echo ' *************************** '$CurrentCycle' **************'
#
protin XYZIN s91a_cyc${LastCycle}.brk
PROTOUT s91a_protout.out
PROTCOUNTS s91a_protcounts.out
DICTPROTN ${LIBD}protin.dict
<< END-protin
TITLE Barnase Ser91 -> Ala
SYMMETRY 145
!
CHNNAME ID A CHNTYP 1 ROFFSET 0
CHNNAME ID B CHNTYP 2 ROFFSET 0
CHNNAME ID C CHNTYP 3 ROFFSET 0
chnname id E chntyp 4 roffset 0
!
CHNTYP 1 NTER 3 VAL 3 CTER 110 ARG 2
CHNTYP 2 NTER 3 VAL 3 CTER 110 ARG 2
CHNTYP 3 NTER 2 GLN 3 CTER 110 ARG 2
chntyp 4 wat
!
LIST FEW ![few]/some (for >20A)/all
PEPP 5 !no. of atoms restrained to be in a plane [5]
VDWR 1 CPR 8 3.4
END
END-protin
#
# ========== PROLSQ ==========
#
set DATE=`date`
echo ' *************************** '$CurrentCycle' **************'
echo ' '
echo ' Running PROLSQ for Cycle Number :' $CurrentCycle' on '$DATE
echo ' '
echo ' *************************** '$CurrentCycle' **************'
#
prolsq XYZIN s91a_cyc${LastCycle}.brk
PROTOUT s91a_protout.out
PROTCOUNTS s91a_protcounts.out
GRADMAT s91a_GRADMAT${CurrentCycle}.tmp
XYZOUT s91a_cyc${CurrentCycle}.brk
<< END-prol
TITLE Barnase s91a Mutant
CGCYCLES 20 100 1 !max no. cyc for conjugate gradient soln [50]
ORIGIN 0 0 0 !origin constraint flags
RESOLUTION 10 2.2 !select Fobs
OUTPUT XYZ !o/p XYZ(.brk)/SF(.mtz)/SHIFTS(shifts file)
RTEST -1 1 1 1 1 1 1 1 !-1 -> +1 when R factor .=25%
NOUPDATE !not update atomic coord in PROTIN o/p file
MATRIX .30 .85 .85 !1st param adjusts SF/geom contribution
!start=.08;.12;.2;.25;.3;.35; final=.08
!2nd&3rd:shift factors for position and B
BATOM !(+) when refining indiv B
!
! ***** Restraints *****
DISTANCE 1 0.02 0.04 0.05 0.01 0.1
PLANE 1 0.02
CHIRAL 1 0.12
TEMPERATURE 1.0 1.0 1.5 1.5 2.0 0.1 !start
!TEMPERATURE 1.0 1.5 2.0 2.0 2.5 3.0 !when R factor <25%
HOLD 0.1 1.0 0.1
!TORSION 1 20 20 30 50 !when R factor >25%
TORSION 1 10 10 15 45 !when R factor <25%
VANDERWAAL 1 0.2 -0.3 0.0 -0.2 0.0
!
! ***** Monitor *****
MONITOR DISTAN 3. VANDERWAAL .5 TORSION 3
MONITOR PLANE 3 CHIRAL 3 !for R factor<25%
END
END-prol
#
Cleanup:
#
/bin/rm s91a_protout.out s91a_protcounts.out s91a_GRADMAT${CurrentCycle}.tmp
#
@ CycleCounter=$CycleCounter + 1
@ LastCycle=$LastCycle + 1
#
# Test to see whether Number_of_cycles done
#
if ($CycleCounter<$Number_of_Cycles) goto Begin
#
Simple Runnable Examples
examples/unix/runnable/sfall.exam
AUTHORS
Eleanor Dodson and Ted Baker, based on a program by Ramesh Agarwal and Neil
Isaacs. Documented for the Daresbury Laboratory protein crystallography system
by John Campbell.
PROGRAM FUNCTION
The program is based on a least-squares refinement technique using fast Fourier
transform (FFT) methods as devised by Ramesh Agarwal. His paper [1
] on the technique should be studied to get a detailed understanding of the
method and only a broad description of the method is given in this section.
The program may be used for
-
calculating and outputting the matrix and gradient terms required by the
restrained least squares refinement program PROLSQ (no longer available in the
CCP4 Suite). The calculated shifts can be applied directly to the atomic
coordinates/B-factor, unrestrained refinement.
-
The generation and calculation of an electron density map from an input set of
atomic coordinates and its transformation, using FFT techniques, to obtain a
set of calculated structure factors. The program SFALL is used to calculate
structure factors from an electron density map using Fast Fourier Transform
techniques. Whenever possible, the crystallographic symmetry is exploited to
increase the efficiency of the calculation. Care must be taken by the user to
ensure that the map has been calculated on a sufficiently fine grid if
significant errors in the calculated structure factors are to be avoided. As a
rough rule a grid size of approximately three times the maximum index of the
data in each direction is appropriate. The use of the Fast Fourier Transform in
structure factor calculations for large molecules has been described and
discussed by L.F. Ten Eyck [3].
-
The calculation of structure factors from an electron density map. The same
cautionary measures must be applied as those explained in (b).
Modelling of the Electron Density and
Calculation of Structure Factors
The contribution of an atom to a structure factor is the product of an atomic
form factor function and a Gaussian temperature factor function. If the form
factor function is approximated by a sum of Gaussian functions, then this
contribution may be represented by the sum of Gaussian terms and so may the
modelled electron density. The amount of computation required in setting up the
electron density map is proportional to the number of Gaussian terms used in
approximating the form factor. The use of a two term Gaussian is probably
adequate in most cases where the resolution of the data does not extend beyond
1.5Å. The program does, however, allow for the input of other numbers of terms.
Non default atoms defined in FORMFACTOR
are usually represented as five Gaussians since heavy atom scattering factors
are more complex.
Tables of coefficients for a five term Gaussian approximation to the atomic
form factors are given in the International Tables for X-ray Crystallography
Vol.IV [2]. Some values for two and one term Gaussian
approximations are given in the Agarwal paper
.
The radius of the atoms (beyond which the electron
density is considered to be negligible) is also required in the calculation.
The value for this radius must be chosen to balance the requirements of
accuracy (large radius) and speed (small radius). The tables below indicate the
radii required for carbon atoms, for different temperature factors (Bm), to
give a given fractional loss of electron density (DeltaZm/Zm).
DeltaZm/Zm 0.02 0.01 0.005
Two term Gaussian Bm
5 1.89 2.06 2.22
10 2.02 2.21 2.38
20 2.26 2.47 2.68
More values are given in the Agarwal paper.
The electron density map is built up using the contributions from each atom
within, or near (within the atom radius) to, the part of the map being
calculated. A three dimensional FFT of the electron density map is used to
calculate the structure factors. In this calculation the space group symmetries
are used whenever possible.
If the sampling grid is too coarse, significant errors may occur in the
calculation of the structure factors. These errors may be reduced [3]
either by using a finer grid for the calculation or by adding an artificial
extra temperature factor to all the atoms when modelling the density (not
recommended). This factor has also to be taken into account when the calculated
structure factors are compared with the observed structure factors.
Calculation of the Gradients
The gradients are calculated by convoluting a modified difference density
function with the electron densities of the individual atoms being refined. The
density function is calculated by calculating the Fourier transform (again
using the FFT) of a difference function based on the weighted values of the
errors in the structure factors. A three dimensional difference FFT is
calculated and convoluted with the derivatives of the atomic density for x y
and z, or B. The convolution is carried out by summing, over all the grid
points enclosed by the atom (radius defined as described above),
the product of the electron density of the atom and the value of the modified
difference density function. Thus, the calculation of the gradients is similar
to the calculation of the structure factors with the steps being reversed.
The Normal Matrix
Only the diagonal terms (interacting between parameters of the same atom) are
calculated for the normal matrix. The method described by Agarwal
is used but is generalised to allow for a variable number of terms in the
Gaussian approximations to the atomic form factors. Also the function A'xx(bi)
is tabulated for each value of bi from 1 to 150 and the nearest values, rather
than interpolated values, are selected from the table when forming the diagonal
terms of the matrix.
A set of terms, H, is only calculated for one parameter as the other diagonal
terms may be calculated to a good degree of accuracy from this initial set. An
inverse matrix is set up for calculating the orthogonalised contributions of
the gradient.
Calculation of the Shifts
The shifts, calculated as described above, may not in fact be the optimum ones
to use due to the approximate nature and the non-linearity of the method used.
In practice, it is unwise to apply large shifts as these may give rise to
instability in the refinement or to `oscillating atoms' and thus provision is
made to truncate the larger shifts. The program allows for the input of a
ratio, RATIO, which may be used to truncate the maximum
value of a shift to
RATIO * r.m.s. value of the shifts
At the beginning of a refinement, the value of RATIO should be kept fairly low
(1.5-2.0) as shifts tend to be large when the errors in the parameters are
large. When most atoms are well refined, then the ratio may be increased to 3
or 4, to allow for the refinement of these atoms which are poorly placed.
Even these truncated shifts may not be the optimum shifts to be applied. A
displacement of the form alphaDelta(u), which minimises the minimisation
function of the least squares P(u + alphaDelta(u)), will be the optimum where
alpha is the optimum step size. As P as a function of alpha approximates to a
quadratic function, it is possible to calculate the optimum step size,
alphaopt, from the values of P(alpha) at alpha = 0 and alpha = alpha1 and from
the slope of P(alpha) at alpha = 0. The initial step size alpha1 used in the
calculation is chosen as 1.0 or the value input in the control data. If the
fractional change in step size is greater than SZFRC (by default 0.3) then a
second calculation is done with alpha1 taken as the optimum step size just
calculated.
REFERENCES
-
Agarwal, R.C., Acta Cryst., (1978), A34, 791-809.
-
International Tables for X-ray Crystallography, Vol.IV, (1974), Kynoch Press.
-
Ten Eyck, L.F., Acta Cryst., (1977), A33, 486.
-
Bruenger, A.T., Nature 355, 472-4 (1992)
-
International Tables for Crystallography, vol. C, (1995), Kluwer.
-
"Refinement of protein structures", Proceedings of the
Daresbury Study Weekend 15-16 November, 1980 (Compiled by P.S. Machin, J.W.
Campbell and M. Elder).
SEE ALSO
cad, freerflag,
fft, icoefl, mtzutils,
overlapmap, pdbset,
prolsq (no longer available in the CCP4 Suite), protin,
sigmaa, watersearch (1), xloggraph
, X-PLOR manual