SFALL (CCP4: Supported Program)

NAME

sfall - Structure factor calculation and X-ray refinement using forward and reverse FFT

SYNOPSIS

sfall [ XYZIN foo.brk ] [ MAPIN foo.map ] [ HKLIN foo.mtz ] [ MAPOUT foo_out.map ] [ HKLOUT foo_out.mtz ] [ XYZOUT foo_out.brk ] [ GRADMAT foo.sfmat ]
[Keyworded input]

CONTENTS

  1. DESCRIPTION
    1. Generate atom map
    2. Calculate structure factors
    3. Generate X-ray gradients for refinement
    4. Space Groups
  2. KEYWORDED INPUT
  3. MEMORY ALLOCATION
  4. INPUT AND OUTPUT FILES
    1. Notes on Formfactors
    2. Notes on XYZIN, Atom names, etc
    3. Notes on HKLIN
    4. Notes on MAPIN
    5. Output from Structure Factor Calculation
    6. Notes on XYZOUT
    7. Notes on HKLOUT
    8. Notes on MAPOUT
    9. Output from the Gradients Calculation - notes on GRADMAT
    10. Output from the Calculation of the Optimum Shifts
    11. Notes on stepsize
  5. COMMON PROBLEMS
  6. EXAMPLES
    1. Calculate map from atom coordinates
    2. Read in coordinates to calculate structure factors
    3. Read in map to calculate structure factors
    4. Iteration cycles to do back-transformation and phase combination cycles in solvent flattening procedure
    5. Unrestrained refinement
    6. Prolsq refinement cycles Please note: PROLSQ is no longer available as refinement program in the CCP4 Suite.
    7. Simple Runnable Examples
  7. AUTHORS
  8. PROGRAM FUNCTION
    1. Modelling of the electron density and calculation of structure factors
    2. Calculation of the gradients
    3. The Normal Matrix
    4. Calculation of the shifts
  9. REFERENCES
  10. SEE ALSO

DESCRIPTION

This program calculates structure factors and X-ray gradients for refinement using inverse and forward fast Fourier techniques.

If coordinates are input, it builds an `atom map' on a suitable grid over the asymmetric unit of the spacegroup. Atomic density is distributed as a Gaussian centred on the atom position. The value of the Gaussian depends on the distance from the atom centre, the atom type, and temperature factor. Its value is calculated at each grid point within a given radius for each atom, and the values are summed to give an `atom map'. This `map' can then be used for an inverse fast Fourier transform, which gives the structure factor. It is important that the grid is fine enough to sample the atomic Gaussians accurately - the accuracy of the structure factor estimates depends on this. I recommend using a grid which gives a sampling of ~0.5Å.

X-ray gradients are estimated by `differential synthesis'. A difference map is calculated, and this is convoluted with the atomic density.

By default, SFALL scales Fobs to Fcalc. This then lets you do maps on (or close to) an absolute scale if the map comes from coordinates. If you are back-transforming a map, e.g. in solvent-flattening or averaging, then the map isn't on an absolute scale anyway, and so this scaling is not particularly sensible (although not usually harmful). Note that in iterative procedures, the Fobs you bring in to SFALL on each cycle should be the original Fobs, not the one carried through from the previous cycle (which may have been scaled, selected, or otherwise corrupted). In this case, it is probably more sensible to use the NOSCALE option in SFALL, then either (i) use SIGMAA to generate Partial or Combined Fourier coefficients for the next map (if you want to weight them); or (ii) use RSTATS to scale Fcalc to Fobs.

For a description of the original method see [1 ]. A modification in calculating the gradients suggested by Alain Lifchitz is used.

A list of common problems (with possible solutions) can now be found below.

1) Generate atom map

Input:

i) XYZIN

Coordinates (Brookhaven format with CRYST1 and SCALEi lines).

Output:

i) MAPOUT

Atom map. Various modifications can be made to this map:
a)
The `solvent' map. This map has zero at all points where there is atomic density, and a constant value for all points where there was no atomic density. It is used for calculating structure factors for `bulk solvent' (see ICOEFL for ideas on using this); or for use as a mask.
b)
A tagged `atom map' where each grid point is tagged with a flag to say which residue, or which atom number, contributed most to it. This is used for analysing map correlations, and real space R-factors.

2) Calculate structure factors

By default a scale factor is calculated between Fobs and Fc and is applied to the output Fobs. If this is not wanted then the keyword NOSCALE must be used.

Input:

i) either coordinates or a map (compulsory)

If map is input it can be `truncated' to eliminate very high or low density.

ii) HKLIN (optional)

This must contain: H K L FP SIGFP [ FREE FPART PHIP ...]
It MUST be sorted on h k l, and be in the CCP4 chosen asymmetric unit. See NOTES ON HKLIN for further details.
The FREE value is generated by FREERFLAG or in X-PLOR and transferred to MTZ format using F2MTZ. See the FREERFLAG documentation. (Note that X-PLOR 3.1 only assigns flags of 0 or 1.) The keyword FREERFLAG allows reflections with a given FREE value to be excluded from refinement and Fc calculations.
IF FPART and PHIP have been assigned, the structure factor output will equal the scaled FPART vector plus the contribution calculated from the input plus the scaled FPART vector.

Output:

i) HKLOUT containing: H K L ... FC PHIC (WANGWT)

(If you are preparing the 'averaged map' for generating the Wang envelope, (flagged by keyword AVMODE) the WANGWT will be calculated as well as the structure factor modified for use in solvent flattening.)

ii) MAPOUT - The atom `map' (optional).

You can also keep the atom map if you want to.

3) Generate X-ray gradients for refinement

These are used for minimisation of X-ray differences between FP and FC. This calculation uses the forward FFT.

Input:

i) XYZIN

Coordinates (Brookhaven format with header).

ii) HKLIN

Containing: H K L FP SIGFP [ FREE ]
When FREE is assigned then a subset of reflections is excluded from the gradient calculation (see above for details). Hence the agreement between them and Fc plays no part of the refinement procedure. The Free R factor calculated for these reflections is a useful indicator of the quality of the refinement [4]. It must be stressed that during a particular refinement procedure only one freeR set of reflections should be used. It is meaningless to change the freeR set in the middle of refinement. However, with several freeR sets, it is possible to compare the same refinement procedure against different freeR sets. It is still possible for all the freeR sets to be unsuitable e.g. if the NCS and the crystal symmetry are related, as they often are, there may be special relationships between classes of reflections, and this may mean that the FREE set is not truly independent of the other observations.

Output:

i) XYZOUT

A coordinate file in which the least squares shifts have been applied.

ii) GRADMAT

A direct access file containing information about the X-ray gradients, ready for input into PROLSQ (no longer available in the CCP4 Suite).

Further details of the specification of input and output files is given in section INPUT AND OUTPUT FILES .

See description of keyword MODE and examples for uses for the various options.

Space Groups

SFALL has special subroutines to handle the following spacegroups:

  P1                  P21(4)               P21212(1018 - alt. origin)   
  P212121(19)         P4122/P4322(91/95)   P41212/P43212 (92/96)      
  P31/P32(144/145)    P3/R3(143/146)       P3121/P3221(152/154) 
  P61/P65 (169/170)

It is possible to use the P1 version (or any suitable lower symmetry) for the calculations. In particular, any non-primitive spacegroup can be run in the appropriate primitive one. See keyword SFSGROUP for details.

BUT BEWARE: To use SFSG P1 you must have your reflections sorted with l>=0. This is NOT the default for P3121/P3221 or P6i22. You would be sensible to run CAD with the keyword: `OUTLIM SPACEGROUP P1' before calculating structure factors.

The crystal symmetry will be used to generate atom positions from a unique molecule. See NOTES ON HKLIN for special requirements for the HKLIN file when using a lower symmetry version.

KEYWORDED INPUT

All input is keyworded. Only the first 4 characters of a keyword are significant. The keyworded records can be in any order.

Anything on a line after an ! or # is ignored and lines can be continued by using a minus sign.

The possible keywords are:

Compulsory
MODE
Optional
AVMODE, BADD, BINS, BRESET, CELL, CHAIN, EDEN, END, FORMFACTOR, FREERFLAG, GRID, H2OBG, LABIN, LABOUT, NAME, NOSCALE, OMIT, RATIO, RESOLUTION, RMSSHIFT, RSCB, SCALE, SFSGROUP, SIGMA, STEPSIZE, SYMMETRY, TITLE, TRUNCATE, VDWR, VERBOSE, WEIGHT.

N.B. Some of these are compulsory for some MODEs.

DESCRIPTION OF KEYWORDS

MODE [ ATMMAP ... | SFCALC ... | SFREF ... ]

This keyword controls the course of the calculation.
ATMMAP [ RESMOD ] [ ATMMOD ] [ SOLVMAP ]
Calculating `atom maps' from coordinates. The map extends over the asymmetric unit of the appropriate space group (which may be different from the true spacegroup, see keyword SFSGROUP for possible groups). N.B. XYZIN and MAPOUT must be assigned.
Subsidiary keywords:
RESMOD
output `atom map' values tagged by a residue number flag.
ATMMOD
output `atom map' values tagged by a atom number flag.
SOLVMAP
output `solvent map' with value specified by the H2OBG keyword (default=0.03) at all points not occupied by atom density. All points occupied by atom density set to 0.0.
SFCALC ...
Structure factor calculation.
Subsidiary keywords:
XYZIN or MAPIN (one essential)
Define input type, either coordinates or a map. N.B. XYZIN or MAPIN must be assigned.
followed by:
HKLIN (optional)
Only reflections which exist in the HKLIN file are output to HKLOUT. The output FC PHIC are appended to input H K L FP SIGFP... See keywords LABIN and LABOUT for further details.
N.B. HKLIN and HKLOUT must be assigned. See NOTES ON HKLIN for the requirements placed on the HKLIN file.
ATMMAP or SOLVMAP (optional)
Either an `atom map' or a `solvent map' is output. The map extends over the asymmetric unit of the space group used in the structure factor calculation (see keyword SFSGROUP). Note that for non-primitive space groups, only density for symmetry mates corresponding to primitive symmetry operations is output; use MODE ATMMAP to obtain a complete map. N.B. MAPOUT must be assigned.
SFREF
Coordinate and/or B-factor refinement; will calculate least squares gradient and shifts. N.B. XYZIN and HKLIN must be assigned.
Subsidiary keywords:
MAPIN
(This option is exceedingly unlikely to be needed.) Gradients will be taken from an input difference map. Usually this difference map is generated internally from the structure factors calculated from the atomic coordinates. However, in some special situations, e.g. if you had averaged a difference map earlier, you may want to input it rather than use the internally generated one.
RESTRAINED
Atom parameter gradients calculated from the fit of the X-ray data are output to GRADMAT, to be combined with geometric restraints calculated by PROTIN and PROLSQ (no longer available in the CCP4 Suite).
UNRESTRAINED
Parameter shifts to be applied without restraints; a file will be written to XYZOUT.
If followed by:
XYZ
refinement of the X Y and Z parameters will be done.
XYZB
refinement of the XYZ and B parameters will be done.
B
refinement of the B parameters will be done.
Only possible with the UNRESTRAINED option.
Example 1: MODE ATMMAP
An electron density map is created from the atomic coordinates.
Example 2: MODE ATMMAP RESMOD
An electron density map is created from the atomic coordinates with an extra residue identifying flag added to the atomic density; this is 100*IRES+MCHFLG (If there are several chains in your protein you will need to describe the chain limits using the keyword CHAIN to avoid the same flag being given to atoms from different chains). This allows OVERLAPMAP to analyse the correlation coefficient between 2 maps or the `real space R-Factor' for contributions from each residue.
Example 3: MODE ATMMAP ATMMOD
An electron density map is created from the atomic coordinates with an extra atom identifying flag added to the atomic density; this is 100*atom_number. This allows OVERLAPMAP to analyse the correlation coefficient between 2 maps or the `real space R-Factor' for contributions from each atom (This was used to see if the hydrogen or deuterium atoms in a neutron study could be distinguished in the electron density map).
Example 4: MODE SFCALC XYZIN HKLIN [ ATMMAP ]
A structure factor calculation is performed using the input coordinates. HKLIN is read, and the FP are scaled to FC. HKLOUT contains at least H K L scaled_FP scaled_SIGFP FC PHIC
(You can output unscaled FP by using the keyword NOSCALE). An R-factor is calculated. If keyword ATMMAP is given the electron density map created from the atomic coordinates is saved on MAPOUT.
Example 5: MODE SFCALC MAPIN HKLIN
A structure factor calculation is performed using the input map. The map MUST cover the asymmetric unit expected for the SFSGROUP. This will be true if you have calculated the map using FFT with the FFTSPACEGROUP the same as the SFSGROUP and have NOT reset the AXIS order. Details are given below. HKLIN is read, and the FP are scaled to FC. HKLOUT contains at least H K L scaled_FP scaled_SIGFP FC PHIC
(You can output unscaled FP by setting keyword NOSCALE). An R-factor is calculated.
Example 6: MODE SFCALC XYZIN (ATMMAP)
A structure factor calculation is performed using the input coordinates. Since no HKLIN is read, the SYMMETRY you wish to use for the structure will have to be given. The default is to use the symmetry of the SFSGROUP. Output HKLOUT containing h k l FC PHIC (WANGWT if AVMODE set) for all reflections within the requested resolution limits.
If keyword ATMMAP is given the electron density map created from the atomic coordinates is saved on MAPOUT.
Example 7: MODE SFCALC MAPIN
A structure factor calculation is performed using the input map. The map MUST cover the asymmetric unit expected for the SFSGROUP. This will be true if you have calculated the map using FFT with the FFTSPACEGROUP the same as the SFSGROUP and have NOT reset the AXIS order.
Example 8: MODE SFREF XYZ (or XYZB) RESTRAINED
Used for restrained refinement of XYZ coordinates or XYZ coordinates and B factors. It is used in conjunction with PROTIN and PROLSQ (no longer available in the CCP4 Suite). First a structure factor calculation is done, the FP scaled and an R-factor calculated. Then the program calculates the normal matrix and gradient contributions for the parameters to be refined. Gradients of atomic shifts are output to GRADMAT. Reference: R. Agarwal, A. Lifchitz, J. Konnert, W.A Hendrickson [6].
Example 9: MODE SFREF XYZ (or XYZB or B) UNRESTRAINED
Used for unrestrained refinement of XYZ coordinates or XYZ coordinates and B factors or Bfactors alone. First a structure factor calculation is done, the FP scaled and an R-factor calculated. Then the program calculates the normal matrix, gradient contributions and shifts for the parameters to be refined. The shifts are analysed using arguments from STEPSIZE, RATIO and a set of modified coordinates output to XYZOUT. Reference: R. Agarwal, A. Lifchitz [6].

SFSGROUP <sfsgroup>

<sfsgroup> is the spacegroup number or name of the spacegroup used for the calculation. You don't normally need to specify this as a sensible default will be used (the highest symmetry one the program can use which is compatible with your data) as printed in the output.

N.B. - This IS NOT NECESSARILY the spacegroup of your structure. E.g. , you may well be calculating structure factors in P1 for a structure with spacegroup F43212. The program will check whether the requested <sfsgroup> is compatible with your data.

The possibilities for <sfsgroup> are: P1, P21 (4), P21212 (1018 - alternative origin), P212121 (19), P4122/P4322 (91/95), P41212/P43212 (92/96), P31/P32 (144/145), P3/R3 (143/146), P3121/P3221 (152/154), P61/P65 (169/170).

Any spacegroup can be run in an appropriate one whose symmetry operators are a subset of the one of interest (P1 can always be used as long as the hkl data are sorted with l>=0). By judiciously shifting origins many spacegroups become supergroups. $CLIBD/symop.lib contains the modified symmetry operators for some spacegroups. You may move the origin of your coordinates with PDBSET. For example:

SYMGEN X + 1/4, Y + 1/4, Z

LABIN <program label>=<file label> ...

Note: This is essential for all refinement and if HKLIN is assigned for a structure factor calculation.

The following labels can be assigned.


         H     K     L     FP    SIGFP   FREE   FPART   PHIP  
         PHIP  F0    F1    F2    F3      F4     F5      F6   
         F7    F8    F9    F10   F11     F12    F13     F14
         F15   F16   F17   F18   F19      

Assignments for FP and SIGFP are always required. If the free R flag is assigned to FREE, a subset of reflections can be excluded from the calculation or refinement. See the keyword FREERFLAG. If you wish to add a known FPART to the structure factors, assign FPART and PHIP. This is useful if you want to calculate hydrogen contributions, add in an anisotropic heavy atom, add in a bulk solvent correction, which has been calculated somewhere else. The FPART can be scaled relative to the FC by inputting a SCALE .

If you wish to copy some but not all the other columns from your input to your output file, you may assign F0 F1 ... F19. These will be transcribed to the output file. If you want to copy everything over, use ALLIN on the LABOUT line.

LABOUT <program label>=<file label> ... [ ALLIN ]

<program labels> can be given for FC PHIC (and WANGWT, if required).
Any column assigned on LABIN will be automatically copied to the output file. Note that attempting to reassign any of the labels specified on LABIN will cause label assignment errors.
ALLIN will copy all columns in the input file to the output file.
Beware: the program may get confused if a file label included in ALLIN is the same as a program label.

OPTIONAL KEYWORDS:

AVMODE [ RADIUS <radius> ] [ WEIGHT <modewt> ]

When you are preparing the 'averaged map' for generating the Wang envelope, this sets the radius of the map sphere and the type of weighting.

Subsidiary keywords:

RADIUS <radius>
Specify the <radius> of the averaging sphere (angstroms)
WEIGHT <modewt>
<modewt> = 1
Use weighting scheme w=1-(r/R) (Wang's method)
<modewt> = 2
Use weighting scheme w=1-(r/R)**2

BADD <badd>

A parameter added to all atomic B values. It will smear the Gaussians used to generate the atomic density over a larger volume. L.Ten Eyck suggests this should allow a coarser sampling grid. G. Bricogne is sceptical and certainly it introduces serious errors into the B12 coenzyme structure factors at 1Å resolution (ref Hugh Savage). EJD shares the scepticism. The default value (0.0) gives reasonable results for most problems.

BINS <nbins>

<nbins> (default 50) is the number of bins for the analysis of R-factor v. resolution. Ranges of 4sin**2/Lambda**2 are of width 1.0/<nbins>. For high resolution data this needs to be reset.

BRESET <bmin>

B-factors less than <bmin> (default bmin=5.0) are set equal to <bmin>. For high resolution data this needs to be reset. If you have data with low B factor, be aware that an implicit BRESET with a value of2.0 is performed by sfall. If you want to avoid it, specify BRESET 0.0 explicitly.

CELL <a> <b> <c> [ <alpha> <beta> <gamma> ]

Input cell parameters explicitly. This is checked against the cell read from the MTZ file and that read from the PDB file. The program will continue with a warning if there is a small inconsistency and stop if there is a serious one. <alpha> <beta> <gamma> default to 90 degrees. If HKLIN is assigned the cell constants will be picked up from there, otherwise you may need to supply them yourself, for example with MODE ATMMAP and an input PDB with missing or incorrect CRYST1 record.

CHAIN <chnlab> <iresn> <iresc>

Only needed for MODE ATMMAP RESMOD if your coordinate file has more than one chain. You will need one CHAIN card for each chain.
<chnlab>
single character label (e.g. A)
<iresn>
first residue number
<iresc>
last residue number

EDEN <escale> <cmp> <cmn>

Done for Cecil Tate. The input map is scaled by:
xscal=escale/nx*ny*nz
and truncated.
if density(x,y,z)<0
       density(x,y,z)=xscal*cmn*tanh(density(x,y,z)/cmn))
if density(x,y,z)>0
       density(x,y,z)=xscal*cmp*tanh(density(x,y,z)/cmp))

END

This ends input.

FORMFACTOR [ NEUTRON | NGAUSS <ngauss> ] <atnam1> <atnam2> ...

See the NOTES ON FORMFACTORS .

SFALL has hardcoded X-ray and neutron formfactors available for atom types H C N O S. (Default CuKa X-ray formfactors.) Other atom types are tabulated in $CLIBD/atomsf.lib (X-ray) and $CLIBD/atomsf_neutron.lib (neutron). If your atom identifier is not recognised (i.e. it is not C, N, O, or S) the program will read atomsf.lib and take the FIRST table which matches your atom ID. This may not be satisfactory if your atom type is Fe+3, say, and you may wish to specify the formfactor yourself.

Subsidiary keywords:

NEUTRON
Use neutron formfactors. Remember to assign ATOMSF $CLIBD/atomsf_neutron.lib to read the correct library of formfactors.

or

NGAUSS <ngauss>
<ngauss> (default 5) is the number of Gaussian terms requested for the default atoms H C N O S. The only possible values are 2 or 5.
<atnam1> <atnam2> ...
Atom names for any other atom types used in XYZIN. The atom names MUST exactly match the entries in ATOMSF library.

Example:

   FORM NEUTRON   P Co+3 D
   FORM NGAUSS 5  Fe+3

FREERFLAG <freeflag>

Reflections where the value of the column FREE, taken from the MTZ file, equals <freeflag> (default 0) are omitted from the refinement or FC calculation and their R-factor is monitored separately. If you do not wish a free-R set to be used don't assign FREE.

GRID <nx> <ny> <nz>

<nx>, <ny>, <nz> are integers giving the number of divisions along each whole unit cell edge. The atomic density is sampled on this grid, so if it is too coarse you will introduce errors into the structure factor calculation. These default to values near to celledge*2, which defines a grid spacing of 0.5Å. Different spacegroups have special requirements for factorisation. No prime factors higher than 19 are permitted (The `atom map' generated on this grid has the asymmetric unit defined for this spacegroup. See table below .).

The following general restrictions must be observed:

                  <nx> >= 2 * HMAX + 1
                  <ny> >= 2 * KMAX + 1
                  <nz> >= 2 * LMAX + 1
In addition there are further space group dependent conditions as follows (where `n' is an integer, i.e. <nz> must be even, for instance):
                          <nx>     <ny>     <nz>
             P1            -        2n       2n
             P21           2n       4n       2n
             P21212        4n       4n       4n
             P21212a       ''       ''       ''
             P212121       ''       ''       ''
             P4122/P4322   4n       4n       8n
             P41212/P43212 ''       ''       ''
             P31/P32       6n       6n       6n
             P3/R3         ''       ''       ''
             P3121/p3221   ''       ''       ''
             P61/P65       6n       6n      12n

H2OBG <h2obg>

Bulk water scattering to fill in all empty regions. <h2obg> (default 0.0) is the value which will fill the SOLVMAP. ICOEFL uses this.

NAME PROJECT <pname> CRYSTAL <xname> DATASET <dname>

When a new MTZ file is being created, i.e. for MODE SFCALC with no HKLIN assigned, this keyword is used to set up an appropriate project/crystal/dataset for the output MTZ file. This is strongly recommended, otherwise the LABOUT columns will be assigned to a default dataset with meaningless (for you) names.

NOSCALE

Scale between FP and FC is calculated but not applied to output FP and SIGFP.

OMIT <th>

Reflections are not included in the refinement if mod(Fo/Fc)><th> or mod(Fc/Fo)><th>. This provides a means of excluding data with bad agreement from the refinement and it should be used with care (Default: 1000).

RATIO <ratio>

Large xyz and B shifts are truncated to <ratio>*rms shift. This is only used for UNRESTRAINED refinement.

At the beginning of a refinement, the value of ratio should be kept fairly low (1.5 - 2.0) as shifts tend to be large when the errors in the parameters are large. When most atoms are well refined, then the ratio may be increased to 3 or 4, to allow for the refinement of these atoms which are poorly placed.

RESOLUTION <dmin> [ <dmax> ]

<dmin>, <dmax> (default 1, 1000) are resolution limits in Angstroms (or in 4*sin**2/l**2 if both are <1.0. They can be given in either order. If only one value is given, it is assumed to be the high resolution cutoff. The SFCALC option outputs FC for all reflections to the outer limit (You usually calculate SFs to make maps and it is best to calculate all SFs and then limit resolution etc. in FFT ).

Default values are overwritten from HKLIN to include all reflections.

RMSSHIFT [XYZ] <rmsxyz> [B] <rmsb>

The shifts are scaled to limit the rms XYZ shift and B shift to be equal to the set value, or to the calculated one, whichever is smaller. Only useful during unrestrained refinement. Equivalent to setting STEPSIZE. If RMSSHIFT is given STEPSIZE will be ignored. Default values: 0.0 0.0.

RSCB <dsmin> <dsmax>

in Angstroms (either order) or 4s**2/lambda**2. Default values: 4.5Å and 1.0Å.

If appropriate the program refines the value of the scale and overall B value to fit <Fo**2> to <Fc**2> using reflections within this resolution range. The default resolution range is chosen to use only those reflections where the <Fo**2> distribution fit Wilson statistics reasonably well. A standard protein distribution shows a sharp 5Å dip, which can distort the scale factors. <Fc**2> is often unrealistically high for low resolution data, until the solvent is adequately modelled. The lack of `solvent contrast' causes this.

Users are often worried because they get lower R-Factors if this scale is fitted for all data. This does not necessarily mean such a scale is more correct!

The default HKLOUT will have:

H K L Scale*FP Scale*SIGFP ... FC*exp(-overall_B*ss) PHIC

If NOSCALE is specified HKLOUT will have:

H K L FP SIGFP ... FC*exp(-overall_B*ss) PHIC

After the end of each structure factor calculation a new scale factor is calculated from the following expression, where SCNEW is the new scale factor and SCALE is the previous scale factor.

                   Sum {(SCALE * Fcalc) * (Fobs)}
  SCNEW = SCALE *  --------------------------------------
                   Sum {(SCALE * Fcalc) * (SCALE * Fcalc)}

SCALE [FPART] <scale> <bov>

These parameters are applied to the Fpart as <scale>*exp(-<bov>*ssq/ll) (Defaults: <scale>=1, <bov>=0).

Some structure factor outputs (notably SHELX) include an Fcalc scaled as 10*absolute. SFALL calculates FC on the absolute scale. If you wanted to combine these values you would need
SCALE FPART 0.1

SIGMA <sigma>

Data with Fobs < sigma * sd(Fobs) will not be included in refinement calculations. This cutoff has no effect on structure factor calculations. Do not be alarmed when your R-Factor appears to be higher for them than when you are refining; you will have more reflections in the structure factor sum. Default value: -1.0, i.e. no cutoff.

STEPSIZE <szo> <szb> <szfrc>

This is only used during UNRESTRAINED refinement. XYZ shifts are multiplied by <szo>, B shifts are multiplied by <szb>. <szfrac> is the fractional change in step size above which a second calculation is done. NOT used if RMSSHIFT is given. Default values: 1.0 1.0 0.3.

The program tries to estimate these using an algorithm of R. Agarwal's. See NOTES ON STEPSIZE .

If you are doing unrestrained refinement, structure factor calculation will be repeated if abs(initial step - optimum step)/(initial step).

SYMMETRY <sg>

(Defaults to the symmetry of the SFSGROUP). <sg> is the spacegroup number or name. If HKLIN is assigned the symmetry will be picked up from the MTZ file header and should NOT be specified with this keyword. If HKLIN is not assigned you may need to give this. See examples below.

TITLE <title>

Title (up to 80 characters) used on the printer output. The title information after the keyword is used as a heading for map sections and also as title information for the output reflection data file if present.

TRUNCATE <rhmin> <rhmax>

The input map is truncated - all values <<rhmin> are set to <rhmin>, and all values ><rhmax> set to <rhmax>. This should be used for the solvent flattening procedure.

VDWR <dlim>

<dlim> helps specify the maximum radius for which an atom is considered to contribute to the electron density while building the `atom map'. The initial value is modified by a factor Sqrt(B+25)/2PI to "spread" the density for atoms with high B factors. Default: 2.5.

The speed and accuracy of the calculation depend to some extent on the values chosen. For 1.5Å data a value of 2.5 is sufficient (a maximum value of 3.0 is allowed). For neutron data, or for atoms with low B values, it could safely be reset to 1Å.

VERBOSE

Lots and lots of output - useful if debugging. Please note that this may produce a message telling you that atom names have been set to ZZZ or residue names to DUM if you have non-standard names. This is not important as the atomic number is determined from the original atom type. See Notes on Formfactors and Notes on Atom Names below.

WEIGHT <w>

The weight (default 0.0) applied to each structure factor is given by
(2sintheta/lambda)**<w>
(See [1]). At the beginning of the refinement use <w>=-1.5 or -1.0 to put most weight on the low angle data. As the refinement proceeds increase <w> to 0.0 (But actually we never use it...).

MEMORY ALLOCATION

The program allocates a large work array, whose size may be changed from the default by setting the logical name MEMSIZE to an appropriate integer value. The current value is printed in the output (look for `Memory allocation'). There are other dimensions which can currently only be changed by recompilation and the error messages are currently not very clear about changing the MEMSIZE.

INPUT AND OUTPUT FILES

The following input and output files are used by the program (in addition to some scratch files which are deleted at the end of program execution):

Input Files:

XYZIN
Input Coordinates File (Brookhaven format)
HKLIN
Input Reflection Data File (MTZ format)
MAPIN
Input Map File (standard map file format)

Output Files:

XYZOUT
Output Coordinate File (Brookhaven format)
HKLOUT
Output reflection data file in standard MTZ format
MAPOUT
Output map file (Standard map file format)
GRADMAT
Output matrix file for Hendrickson Konnert program

The files used in a single run of the program depend on the options requested via the data control keywords.

NOTES ON FORMFACTORS

The form factors are read from the file assigned to ATOMSF. This defaults to $CLIBD/atomsf.lib. Coefficients for analytical approximation to the scattering factors for atom identifier at a given resolution sintheta/lambda (=S) are given as:

    a1*exp(-b1*s*s) + a2*exp(-b2*s*s) - 2 Gaussian approximation
                                                 (Agarwal, 1978)
or as:

    a1*exp(-b1*s*s) + a2*exp(-b2*s*s) + a3*exp(-b3*s*s) 
                                   + a4*exp(-b4*s*s) +  c
                                      - 5 Gaussian approximation
See [2] pp. 99-101 (Table 2.2B) or [5] p. 500ff, table 6.1.1.4 for values a1-a4, b1-b4, c for X-rays. For neutron scattering see [2 ] pp. 99-101 (Table 2.2B).

The atom name can be used to assign a formfactor to it.

WARNING 1)
The program distinguishes CA - the metal - from CA - main chain atom in a protein by its position on the coordinate line... And older versions of FRODO get this wrong! The rules on defining atom names are the same as those used by PDB. The chemical symbol is specified in columns 13 and 14 and the symbol is RIGHT justified.
                 0        1
e.g.    column   1   5    0  34
                 ATOM  ...   ZN            is Zinc
                 ATOM  ...    U            is Uranium

For the common atom types the program includes tables for form factors for the required value of NGAUSS . They are C N O H and S. All others must be read from the library files atomsf.lib or atomsf_neutron.lib which are automatically assigned. These contain tables of formfactors for every known atom (Kim Henrick monument).

atomsf.lib contains the five Gaussian form for X-ray data and the two Gaussian form for atoms H C N O and S. Unless you are working at very high resolution (>1.5Å) the two Gaussian approximations are adequate and speed up the calculations somewhat.

atomsf_neutron.lib has the constant value needed for neutron scattering.

NOTES ON XYZIN, ATOM NAMES ETC.

Coordinates are assigned to XYZIN in Brookhaven format.

The CRYST1 card (which gives the cell dimensions) and the SCALEi cards (which give the required transform for turning the coordinates into fractional ones) should be included in XYZIN.

Cell parameters from the HKLIN file or CELL keyword (if either are present) will be used in the absence of CRYST1 and SCALEi cards, but in all cases the results of omitting the cards are somewhat unpredictable - in some cases the program may fail, in others it may proceed but using random or inappropriate values for the cell parameters.

If an atom has occupancy set =0.0 it will be copied to the output file if appropriate, but not used in the calculation. This means it is easy to exclude suspect atoms from any calculation without deleting them from the file.

The program does not make use of anisotropic U factors in the calculation of electron density and structure factors, and so ANISOU records are ignored.

Example of part of XYZIN:


REMARK 2ZN refined coordinates
CRYST1   82.500   82.500   34.000  90.00  90.00 120.0 1       R3
SCALE1       0.01212   0.00700   0.00000        0.00000
SCALE2       0.00000   0.01400   0.00000        0.00000
SCALE3       0.00000   0.00000   0.02941        0.00000
ATOM      1  N   GLY A 201  1   -8.863  16.944  14.289  1.00 21.88
 .................................
ATOM    826  CA  ALA D 130  4   -5.022  21.709 -11.876  1.00 15.58
ATOM    830 CA   WAT A 249  1   -0.002  -0.004   7.891  0.33 10.40
ATOM    831 ZN2  WAT A 249  1    0.000   0.000  -8.039  0.33 11.00
ATOM    832  OW1 WAT A 249  1    0.000   0.000  -8.039  0.00 11.00
 .................................

Atom 1 is Nitrogen, 826 is Carbon, 830 is Calcium, 831 is Zinc and
832 is Oxygen.

NOTES ON HKLIN

List of H K L .. FP SIGFP ( FPART PHIP F0 .. F19 )

An MTZ file with column labels corresponding to H K L .. FP SIGFP [FREE FPART PHIP F0 .. F19]. The header will contain the cell dimensions and spacegroup of the structure (if you want to change these, use MTZUTILS ).

For SFCALC this file MUST BE SORTED on h k l (i.e. so h is the slowest varying index, then k and l is the fastest), and be within the asymmetric unit expected by the program; see below for expected limits (CAD will reorder your file if necessary). It is sensible to do this if your data has not been processed by CCP4 programs. XDS, MADNESS etc. sometimes have different default asymmetric units to the CCP4 suite.

  # an example of sorting native data and checking for correct 
  # asymmetric unit
  cad  HKLIN1 $CEXAM/toxd/toxd HKLOUT toxd_sorted <<EOF
  TITLE  Toxd data sorted - default symmetry P212121
  LABIN FILE 1 E1=FTOXD3 E2=SIGFTOXD3
  CTYPE FILE 1 E1=F  E2=Q
  EOF

For SFREF this file MUST BE SORTED on h k l (i.e. so h is the slowest varying index, then k and l is the fastest), and be extended if necessary to cover the whole asymmetric unit expected by the program for the SF spacegroup; see below for expected limits. If you are working at a lower symmetry than optimal the data must have been extended and sorted (CAD will do this too). There is a reason - sorting data is quite slow, and you will probably use this file many times during the course of a refinement.

  # an example of extending and sorting native data to P1 +-h,+-k,+l
  cad HKLIN1 $CEXAM/toxd/toxd HKLOUT toxd_p1  << EOF
  OUTLIM SPACEGROUP P1
  TITLE  Toxd data extended to P1 cell
  LABIN FILE 1 E1=FTOXD3 E2=SIGFTOXD3
  CTYPE FILE 1 E1=F  E2=Q
  EOF

The agreement factor analysis is carried out against the values of FP. Sigma values are assumed to be in column SIGFP.

IF FPART and PHIP have been defined, the FC vector equals the FPART vector plus the FC contribution calculated from the input. FPART can be scaled by entering a suitable value on the SCALE input.

Other columns in the input file can be copied to the output file by assigning them to F0= F1= ... Or all columns can be copied over if the ALLIN subkeyword is given in LABOUT.
Note however that if the input file contains columns with labels that match the program labels then SFALL will stop with a fatal error.

Limits for Indices for the various space groups (these are the same as those used in the data reduction programs and FFT):


     P1          h k l : l >= 0
                 h k 0 : h >= 0
                 0 k 0 : k >= 0
     P21         h k l : k >= 0, l >= 0
                 h k 0 : h >= 0
     P21212      h k l : h >= 0, k >= 0, l >= 0
     P212121     h k l : h >= 0, k >= 0, l >= 0
     P41212      h k l : h >= 0, k >= 0, l >= 0, h >= k
     (+ other tetragonal)
     P31/P32     h k l : h >= 0, k > 0
                 0 0 l : l > 0
     P3/R3       h k l : h >= 0, k > 0
                 0 0 l : l > 0
     P3121/P3221 h k l : h >= 0, k >= 0, k <= h   (all l)
                 h h l : l >= 0
     P61/P65     h k l : h >= 0, k >= 0, k <= h   (all l)
                 h h l : l >= 0

NOTES ON MAPIN

A standard MAP file. The header will contain the cell dimensions and spacegroup of the structure.

If you wish to generate structure factors from a input map it is essential that the map is in an appropriate format for the SFALL spacegroup you request.

Common errors are:

1) using a different grid for the FFT calculation and the SFALL calculation. FATAL!

2) Changing the axis order in the FFT calculation - both have the same defaults for the same spacegroup! FATAL!

3) Choosing a different asymmetric unit for the FFT and the SFALL calculation. You must make sure your map has the one which covers the SFALL requirements. There is no flexibility here.

Limits for axes for the various space groups (these are the same as those used as defaults in FFT ):

In space group P1, P21 and P21212a, 'b' is taken as the unique axis.

In space group P21212a, 1/4 is subtracted from the X and Y values of the equivalent positions given in International Tables.

 
                   X1     X2     X3     Range of X3  Axis order
 
       P1          Z      X      Y      0 to Y         Z X Y
       P21         Z      X      Y      0 to Y/2-1     Z X Y
       P21212a     Z      X      Y      0 to Y/4       Z X Y
       P212121     X      Y      Z      0 to Z/4       Y X Z
       P4122       X      Y      Z      0 to Z/8       Y X Z
       P41212      X      Y      Z      0 to Z/8       Y X Z
       P4322       X      Y      Z      0 to Z/8       Y X Z
       P43212      X      Y      Z      0 to Z/8       Y X Z
       P31         X      Y      Z      0 to Z/3-1     Y X Z
       P32         X      Y      Z      0 to Z/3-1     Y X Z
       P3          X      Y      Z      0 to Z-1     Y X Z
       R3          X      Y      Z      0 to Z/3-1     Y X Z
       P3121       X      Y      Z      0 to Z/6       Y X Z
       P3221       X      Y      Z      0 to Z/6       Y X Z
       P61         X      Y      Z      0 to Z/6-1     Y X Z
       P65         X      Y      Z      0 to Z/6-1     Y X Z

OUTPUT from Structure Factor Calculation

Note: The logfile has been flagged with XLOGGRAPH symbols to allow selected output to be displayed on an Xterminal.
(a)
The limits of the box used in modelling the electron density.
(b)
A list, with coordinates, of the first 10 atoms included in generating the model electron density map.
(c)
The number of atoms input, the number of atoms in the sorted list of atoms, and the number of atoms used in generating the electron density.
(d)
The minimum, average and maximum values of the temperature factors for the atoms.
(e)
Minimum and maximum electron density values are printed for each section of the generated density. The number of slabs used depends on the space allocated for the work arrays in the program.
(f)
The number of reflections used and the number of reflections on file.
(g)
Old and new scale factors (with an overall temperature factor BOV if temperature factor dependent re-scaling was applied).
(h)
A structure factor summary table in ranges of 4sin**2theta/lambda**2 giving, for each range, the number of reflections, the average values of Fobs and Fcalc, the reliability index and an average value of the weighted errors.
(i)
The overall reliability index after re-scaling.

NOTES ON XYZOUT

Coordinates in Brookhaven format with the XYZ or B shifts applied - output of unrestrained refinement.

The proportion of the shift to be applied can be modified (see the description of keyword STEPSIZE ).

If an input atom has occupancy of 0.0 it will be copied to the output file.

NOTES ON HKLOUT

The output file contains FC and PHIC only for those reflections which exist in the HKLIN file, even if the structure factor calculation has covered more of reciprocal space. The list will include ALL reflections in HKLIN within the outer resolution limit. There is no inner resolution cutoff, or sigma cutoff. Hence there may well be more reflections included than during refinement and your R-factor may well be higher than the one output during refinement. All columns specified in LABIN will be written to HKLOUT. If ALLIN is specified then all input columns will be in HKLOUT.

If no HKLIN has been assigned the program outputs a list of H K L FC PHIC for all possible reflections within the reciprocal asymmetric unit for the SFSGROUP that are within the requested resolution limits.

FC and PHIC can be modified for use in solvent flattening (see keyword AVMODE).

NOTES ON MAPOUT

Several types of maps can be output - see keyword MODE for how to produce each one.
a)
The `atom map'. All density assigned at each grid point due to a neighbouring atom.
b)
The `atom map' modified to include a residue flag for each main chain and side chain atom which has contributed most to this grid point. This allows the map correlation program to produce a set of correlations residue by residue between an ideal map (the `atom map') and any other map - part of real space R-factor stuff.
c)
The `atom map' modified to include an atom number flag. This can be used to look at hydrogen density in a neutron map.
d)
A `solvent map' with zero at all points within the protein and some constant (H2obg) at all other points. This is a step on the way to adding in bulk solvent to calculated structure factors.

OUTPUT from the Gradients Calculation - NOTES ON GRADMAT

A direct access file of gradient information is output ready for input to PROLSQ (no longer available in the CCP4 Suite). The following items are output in this section:
(a)
The value of the minimisation function (PII). Plus the number of reflections included in the calculation of PII.
(b)
The average and peak values of the gradients and shifts.
(c)
The slope of the search direction and the initial step size used in the quadratic approximation to find the optimum step size.
(d)
For unrestrained refinement a table of estimated error against B-factor is given. For each temperature range there is the number of atoms and the estimated error (standard deviation).

OUTPUT from the Calculation of the Optimum Shifts

This section gives PII (minimisation function) values, relative slopes, calculated optimum step sizes and, when determined, the actual step size used in applying the shifts.

NOTES ON STEPSIZE

R. Agarwal attempted to estimate the best step size to use during UNRESTRAINED refinement. A displacement of the form alphaDelta(u), which minimises the minimisation function of the least squares P(u + alphaDelta(u)), will be the optimum where alpha is the optimum step size. As P as a function of alpha approximates to a quadratic function, it is possible to calculate the optimum step size, alphaopt, from the values of P(alpha) at alpha = 0 and alpha = alpha1 and from the slope of P(alpha) at alpha = 0. The initial step size alpha1 used in the calculation is chosen as 1.0 or the value input in the control data. If the fractional change in step size is greater than SZFRC (by default 0.3) then a second calculation is done with alpha1 taken as the optimum step size just calculated.

COMMON PROBLEMS

  1. Problem: Warning after opening the input reflection file HKLIN about the sort order not being H K L.

    Solution: This may not crash the program (sometimes it does) but even so some of the output may be unreliable if the reflections in HKLIN have not been sorted with h as the slowest, k as the intermediate, and l as the fastest varying index. See NOTES ON HKLIN above for the requirements placed on HKLIN.

  2. Problem: Program stops with the message:
    MTZ file label duplicates program label name

    Solution: SFALL should warn you before stopping, which labels are duplicated, e.g.:

     Warning: this may fail!
     This column label FC         is also a program label
    
    The suggested workaround at present is to rename the offending file labels, for example using a utility program such as SFTOOLS , before rerunning the program.

  3. Problem: No conversion from orthogonal to fractional atom co-ordinates (ie listed fractional co-ordinates have no value, or the same values as the orthogonal co-ordinates).

    or

    Problem: Program stops with the message:
    **** No Cell Input to subroutine rbfro1?? ****
    after reading in XYZIN.

    Solution: Check that CRYST1 and SCALE cards are present in the input co-ordinate file XYZIN.(See NOTES ON XYZIN above).

  4. Problem: Program crashes or stops after reporting statistics on B's in MODE SFCALC. (May also occur in other MODEs for the same reason.)

    Solution: Check that the input reflection file has the correct asymmetric unit and that the reflections have been sorted so that h is the slowest changing index, k is the next, and l varies the fastest. (See NOTES ON HKLIN above). Nb: the program should now generate an explicit error message suggesting that this is the problem.

  5. Problem: Program fails after printing the title and the line "Chain Names - First Residue - Last Residue - serial Kcount" (possibly also with error messages concerning array sizes).

    Solution: You may have to increase the value of MEMSIZE (see MEMORY ALLOCATION above). N.B. : the program should now generate an explicit error message suggesting that this is the problem.

  6. Common problems associated with the input map can be found in the section NOTES ON MAPIN.

EXAMPLES

Calculate map from atom coordinates

sfall 
XYZIN  /f/scratch/swift/pa_model1_2.pdb 
XYZOUT  $SCRATCH/junk.pdb 
MAPOUT $SCRATCH/junk.map  
<< END-sfall
TITL Phasing on initial PA structure (refined twice)
GRID 52 64 76
MODE ATMMAP
RESO 30 2.1 
BINS  60
RSCB 5.0 2.1
SFSG 1
END 
END-sfall

Read in coordinates to calculate structure factors

sfall 
HKLIN $SCRATCH/pt2_pt4_shhg2_os2_nat.mtz 
HKLOUT $SCRATCH/pa_model1_2_phase.mtz   
XYZIN  /f/scratch/swift/pa_model1_2.pdb 
<< END-sfall
TITL Phasing on initial PA structure (refined twice)
GRID 52 64 76
MODE SFCALC XYZIN HKLIN
RESO 30 2.1 
BINS  60
RSCB 5.0 2.1
SFSG 1
LABI FP=F_V4 SIGFP=SIGF_V4 FREE=FreeRflag
LABO FC=FC PHIC=AC
END 
END-sfall

Read in map to calculate structure factors

sfall 
HKLIN $SCRATCH/pt2_pt4_shhg2_os2_nat.mtz 
HKLOUT $SCRATCH/pa_model1_2_phase.mtz 
MAPIN $SCRATCH/junk.map  
<< END-sfall
TITL Phasing on initial PA structure (refined twice)
GRID 52 64 76
MODE SFCALC MAPIN HKLIN
RESO 30 2.1 
BINS  60
RSCB 8.0 2.1
SFSG 1
LABI FP=F_V4 SIGFP=SIGF_V4
LABO FC=FC PHIC=AC
END 
END-sfall

Iteration_cycles to do the back-transformation and phase combination cycles in solvent flattening procedure

#   - back-transform map with sfall to obtain Fcs and phases, scale 
#     and calculate R-factor
#   - combine the phases with sigmaa
#   - run fft to calculate a new map with Fobs and combined phases as
#     input to flatmap
#
sfall     HKLIN ../mtz/fvb18_sigmaa.mtz     
                HKLOUT fvb_flat_14.mtz              
                MAPIN  fvb_flattened_14.map       
                << EOF-sfall
titl back transformation for cyc 14
mode SFCALC MAPIN HKLIN
grid 104 128 160
reso 200.0  3.0 
BINS 40
rscb 10.0 3.0
sfsg  19
AVMODE RADIUS 8
badd 0
FORM NGAUSS 5  C H N O S
LABI FP=fvb_Fnp SIGFP=fvb_SIGFnp - 
F0=PHIB_I3 F1=FOM_I3  -
F2=A F3=B F4=C F5=D 
LABO   FC=FCWANG  PHIC=PHICWANG WANGWT=WC8
END
EOF-sfall

Unrestrained refinement

sfall      
HKLIN ../data/taka_fx_trunc.mtz    
XYZIN ../data/takaxp_model8_b.pdb 
XYZOUT $SCRATCH/junk.pdb 
<< END-sfall 
TITL TAKAXP_MODEL8_9 SFS 
GRID 104 136 264
MODE SFREF XYZ UNRESTRAINED
FREE 3.0
RESO 20 2.1 60
RSCB 8.0 2.1
SFSG 19
FORM NGAU 2 Ca Cr Fe+2 P 
LABI FP=FP SIGFP=SIGFP FREE=FreeRflag
END-sfall

Prolsq refinement cycles

#!/bin/csh -f
#
#          REFINE consists of sfall, protin and prolsq
#
set LastCycle=146
set Number_of_Cycles=8
set CycleCounter=0
#
Begin:
#
@ CurrentCycle=$LastCycle + 1
#
#    ========== SFALL ==========
#
set DATE=`date`
echo ' *************************** '$CurrentCycle' **************' 
echo '    '
echo ' Running SFALL for Cycle Number :' $CurrentCycle' on '$DATE
echo '    ' 
echo ' *************************** '$CurrentCycle' **************'
#
sfall    HKLIN   s91a.mtz 
        XYZIN   s91a_cyc${LastCycle}.brk 
        GRADMAT s91a_GRADMAT${CurrentCycle}.tmp 
        << END-sfall
TITL Barnase s91a Mutant
MODE SFREF XYZB RESTRAINED  ! for prolsq
GRID 90 90 120       !div CELL by these should give .=. 0.7 A
BINS 60
RESO 10 2.2        !last no. is nstep
RATI 3             !truncate xyz and B shrifts to RATI*RMS
BADD 0             !smear the gaussians to gen atomic density [10.0]
RSCB 4 1.5         !limits for refining scale and B, same as RESO ?
SFSG 145      ! fft space group
LABIN FP=S91A_F SIGFP=S91A_SIGF FREE=FreeRflag
END
END-sfall
#
/bin/rm ${SCRATCH}s91a_junk.brk ${SCRATCH}sfall*
#
#    ========== PROTIN ==========
#
set DATE=`date`
echo ' *************************** '$CurrentCycle' **************'
echo '    '
echo ' Running PROTIN for Cycle Number :' $CurrentCycle' on '$DATE 
echo ' Running PROTIN for Cycle Number :' $CurrentCycle' on '$DATE
echo '    ' 
echo ' *************************** '$CurrentCycle' **************'
#
protin XYZIN      s91a_cyc${LastCycle}.brk 
       PROTOUT    s91a_protout.out 
       PROTCOUNTS s91a_protcounts.out 
       DICTPROTN  ${LIBD}protin.dict
       << END-protin
TITLE Barnase Ser91 -> Ala
SYMMETRY 145
!
CHNNAME ID A CHNTYP  1 ROFFSET 0
CHNNAME ID B CHNTYP  2 ROFFSET 0
CHNNAME ID C CHNTYP  3 ROFFSET 0
chnname id E chntyp  4 roffset 0
!
CHNTYP  1    NTER 3   VAL 3     CTER 110 ARG 2 
CHNTYP  2    NTER 3   VAL 3     CTER 110 ARG 2 
CHNTYP  3    NTER 2   GLN 3     CTER 110 ARG 2 
chntyp  4    wat
!
LIST  FEW          ![few]/some (for >20A)/all
PEPP  5            !no. of atoms restrained to be in a plane [5]
VDWR 1 CPR  8 3.4
END
END-protin
#
#    ========== PROLSQ ==========
#
set DATE=`date`
echo ' *************************** '$CurrentCycle' **************' 
echo '    ' 
echo ' Running PROLSQ for Cycle Number :' $CurrentCycle' on '$DATE 
echo '    ' 
echo ' *************************** '$CurrentCycle' **************' 
#
prolsq XYZIN      s91a_cyc${LastCycle}.brk 
       PROTOUT    s91a_protout.out 
       PROTCOUNTS s91a_protcounts.out 
       GRADMAT    s91a_GRADMAT${CurrentCycle}.tmp 
       XYZOUT     s91a_cyc${CurrentCycle}.brk 
       << END-prol
TITLE    Barnase s91a Mutant
CGCYCLES 20  100 1 !max no. cyc for conjugate gradient soln [50]
ORIGIN 0 0 0           !origin constraint flags
RESOLUTION 10 2.2      !select Fobs
OUTPUT XYZ             !o/p XYZ(.brk)/SF(.mtz)/SHIFTS(shifts file)
RTEST -1 1 1 1 1 1 1 1       !-1 -> +1 when R factor .=25%
NOUPDATE                    !not update atomic coord in PROTIN o/p file
MATRIX .30 .85 .85          !1st param adjusts SF/geom contribution
                            !start=.08;.12;.2;.25;.3;.35; final=.08
                            !2nd&3rd:shift factors for position and B
BATOM                       !(+) when refining indiv B
!
!               ***** Restraints *****
DISTANCE     1 0.02 0.04 0.05 0.01 0.1 
PLANE        1 0.02
CHIRAL       1 0.12 
TEMPERATURE  1.0 1.0 1.5 1.5 2.0 0.1   !start
!TEMPERATURE  1.0 1.5 2.0 2.0 2.5 3.0  !when R factor <25%
HOLD         0.1 1.0 0.1
!TORSION      1 20 20 30 50            !when R factor >25%
TORSION      1 10 10 15 45            !when R factor <25%
VANDERWAAL   1 0.2 -0.3 0.0 -0.2 0.0
!
!                ***** Monitor *****
MONITOR  DISTAN 3. VANDERWAAL .5  TORSION 3 
MONITOR  PLANE 3 CHIRAL 3 !for R factor<25%
END     
END-prol
#
Cleanup:
#
/bin/rm s91a_protout.out s91a_protcounts.out s91a_GRADMAT${CurrentCycle}.tmp 
#
@ CycleCounter=$CycleCounter + 1
@ LastCycle=$LastCycle + 1
#
#    Test to see whether Number_of_cycles done
#
if ($CycleCounter<$Number_of_Cycles) goto Begin
#

Simple Runnable Examples

  • examples/unix/runnable/sfall.exam

    AUTHORS

    Eleanor Dodson and Ted Baker, based on a program by Ramesh Agarwal and Neil Isaacs. Documented for the Daresbury Laboratory protein crystallography system by John Campbell.

    PROGRAM FUNCTION

    The program is based on a least-squares refinement technique using fast Fourier transform (FFT) methods as devised by Ramesh Agarwal. His paper [1 ] on the technique should be studied to get a detailed understanding of the method and only a broad description of the method is given in this section.

    The program may be used for

    1. calculating and outputting the matrix and gradient terms required by the restrained least squares refinement program PROLSQ (no longer available in the CCP4 Suite). The calculated shifts can be applied directly to the atomic coordinates/B-factor, unrestrained refinement.
    2. The generation and calculation of an electron density map from an input set of atomic coordinates and its transformation, using FFT techniques, to obtain a set of calculated structure factors. The program SFALL is used to calculate structure factors from an electron density map using Fast Fourier Transform techniques. Whenever possible, the crystallographic symmetry is exploited to increase the efficiency of the calculation. Care must be taken by the user to ensure that the map has been calculated on a sufficiently fine grid if significant errors in the calculated structure factors are to be avoided. As a rough rule a grid size of approximately three times the maximum index of the data in each direction is appropriate. The use of the Fast Fourier Transform in structure factor calculations for large molecules has been described and discussed by L.F. Ten Eyck [3].
    3. The calculation of structure factors from an electron density map. The same cautionary measures must be applied as those explained in (b).

    Modelling of the Electron Density and Calculation of Structure Factors

    The contribution of an atom to a structure factor is the product of an atomic form factor function and a Gaussian temperature factor function. If the form factor function is approximated by a sum of Gaussian functions, then this contribution may be represented by the sum of Gaussian terms and so may the modelled electron density. The amount of computation required in setting up the electron density map is proportional to the number of Gaussian terms used in approximating the form factor. The use of a two term Gaussian is probably adequate in most cases where the resolution of the data does not extend beyond 1.5Å. The program does, however, allow for the input of other numbers of terms. Non default atoms defined in FORMFACTOR are usually represented as five Gaussians since heavy atom scattering factors are more complex.

    Tables of coefficients for a five term Gaussian approximation to the atomic form factors are given in the International Tables for X-ray Crystallography Vol.IV [2]. Some values for two and one term Gaussian approximations are given in the Agarwal paper .

    The radius of the atoms (beyond which the electron density is considered to be negligible) is also required in the calculation. The value for this radius must be chosen to balance the requirements of accuracy (large radius) and speed (small radius). The tables below indicate the radii required for carbon atoms, for different temperature factors (Bm), to give a given fractional loss of electron density (DeltaZm/Zm).

    
                        DeltaZm/Zm     0.02     0.01     0.005
    Two term Gaussian         Bm
                               5       1.89     2.06     2.22
                              10       2.02     2.21     2.38
                              20       2.26     2.47     2.68
     
    More values are given in the Agarwal paper.
    

    The electron density map is built up using the contributions from each atom within, or near (within the atom radius) to, the part of the map being calculated. A three dimensional FFT of the electron density map is used to calculate the structure factors. In this calculation the space group symmetries are used whenever possible.

    If the sampling grid is too coarse, significant errors may occur in the calculation of the structure factors. These errors may be reduced [3] either by using a finer grid for the calculation or by adding an artificial extra temperature factor to all the atoms when modelling the density (not recommended). This factor has also to be taken into account when the calculated structure factors are compared with the observed structure factors.

    Calculation of the Gradients

    The gradients are calculated by convoluting a modified difference density function with the electron densities of the individual atoms being refined. The density function is calculated by calculating the Fourier transform (again using the FFT) of a difference function based on the weighted values of the errors in the structure factors. A three dimensional difference FFT is calculated and convoluted with the derivatives of the atomic density for x y and z, or B. The convolution is carried out by summing, over all the grid points enclosed by the atom (radius defined as described above), the product of the electron density of the atom and the value of the modified difference density function. Thus, the calculation of the gradients is similar to the calculation of the structure factors with the steps being reversed.

    The Normal Matrix

    Only the diagonal terms (interacting between parameters of the same atom) are calculated for the normal matrix. The method described by Agarwal is used but is generalised to allow for a variable number of terms in the Gaussian approximations to the atomic form factors. Also the function A'xx(bi) is tabulated for each value of bi from 1 to 150 and the nearest values, rather than interpolated values, are selected from the table when forming the diagonal terms of the matrix.

    A set of terms, H, is only calculated for one parameter as the other diagonal terms may be calculated to a good degree of accuracy from this initial set. An inverse matrix is set up for calculating the orthogonalised contributions of the gradient.

    Calculation of the Shifts

    The shifts, calculated as described above, may not in fact be the optimum ones to use due to the approximate nature and the non-linearity of the method used. In practice, it is unwise to apply large shifts as these may give rise to instability in the refinement or to `oscillating atoms' and thus provision is made to truncate the larger shifts. The program allows for the input of a ratio, RATIO, which may be used to truncate the maximum value of a shift to
          RATIO * r.m.s. value of the shifts
    
    At the beginning of a refinement, the value of RATIO should be kept fairly low (1.5-2.0) as shifts tend to be large when the errors in the parameters are large. When most atoms are well refined, then the ratio may be increased to 3 or 4, to allow for the refinement of these atoms which are poorly placed.

    Even these truncated shifts may not be the optimum shifts to be applied. A displacement of the form alphaDelta(u), which minimises the minimisation function of the least squares P(u + alphaDelta(u)), will be the optimum where alpha is the optimum step size. As P as a function of alpha approximates to a quadratic function, it is possible to calculate the optimum step size, alphaopt, from the values of P(alpha) at alpha = 0 and alpha = alpha1 and from the slope of P(alpha) at alpha = 0. The initial step size alpha1 used in the calculation is chosen as 1.0 or the value input in the control data. If the fractional change in step size is greater than SZFRC (by default 0.3) then a second calculation is done with alpha1 taken as the optimum step size just calculated.

    REFERENCES

    1. Agarwal, R.C., Acta Cryst., (1978), A34, 791-809.
    2. International Tables for X-ray Crystallography, Vol.IV, (1974), Kynoch Press.
    3. Ten Eyck, L.F., Acta Cryst., (1977), A33, 486.
    4. Bruenger, A.T., Nature 355, 472-4 (1992)
    5. International Tables for Crystallography, vol. C, (1995), Kluwer.
    6. "Refinement of protein structures", Proceedings of the Daresbury Study Weekend 15-16 November, 1980 (Compiled by P.S. Machin, J.W. Campbell and M. Elder).

    SEE ALSO

    cad, freerflag, fft, icoefl, mtzutils, overlapmap, pdbset, prolsq (no longer available in the CCP4 Suite), protin, sigmaa, watersearch (1), xloggraph , X-PLOR manual