REFMAC (CCP4: Supported Program)

User's manual for the program refmac_5.*

Keyworded input - Essential Xray keywords

Anything input on a line after "!" or "#" is ignored and lines can be continued by using a minus (-) sign. The program only checks the first 4 characters of each keyword. The order of the cards is not important except that an END card must be last. Some keywords have various subsidiary keywords. The available keywords in this section are:

LABI
Input MTZ labels
NCYC
Number of the refinement cycles
REFI
Refinement parameters
SCAL
Scale parameters
SIGM
Parameters of the likelihood (sigmaA)
SOLV
Parameters of the solvent
WEIG
Weighting X-ray vs geometry

LABIN <program label>=<file label>...

This keyword tells the program which columns in the MTZ file should be used as native structure factors, sigmas, FreeR flag, phase information etc.

For example:

--------------------------------------------------------------------------
      #
      #   Only native structure factors, their sigmas and FreeR_flag
      #   are given
      #
      LABIn FP=F_native SIGFP=SIGF_native FREE=FreeR_flag

      #      or
      #
      #   Apart from native structure factors, their sigmas and FreeR_flag
      #   some phase information in a form of Hendrickson and Lattman
      #   coefficients also known. It gives signal to the program that
      #   phased refinement should be used
      #
      LABI FP=F_native SIGF=SIGF_native FREE=FreeR_flag -
           HLA=HLA_phases HLB=HLB_phases HLC=HLC_phases HLD=HLD_phases
--------------------------------------------------------------------------

LABIn is essential for all refinement except geometry idealisation. To some extent the course of the refinement is governed by the assignments given. The following program labels can be assigned:

FP SIGFP   FREE   FPARTi PHIPi   HLA HLB HLC HLD or PHIB FOM
FP SIGFP
Assignments for FP and SIGFP are always required.
FREE
The use of the FreeR flag is recommended. This is an important component of using maximum likelihood refinement. If FREE is assigned, reflections which are flagged with nfree_exclude (default 0) are excluded from the derivative calculations, and therefore the agreement between them and calculated structure factors is not part of the refinement procedure.
REMARK 0: It is strongly recommended to run the uniqueify script on the first dataset as soon as possible, e.g. after TRUNCATE. This script adds a column of FreeR flags, and it is important for the validity of the FreeR approach that this is done before any model refinement. If you are continuing model refinement with a new data set, it is important to preserve the FreeR assignment used before. See FreeR assignment.
If CCP4i (CCP4 graphical user interface) is used, then Uniqueify is in "Convert to MTZ & Standardise" task in "Reflection Data Utilities" module.
REMARK 1: Reflections flagged for FreeR calculation are omitted from the refinement of the atomic parameters, and also from the scale and B-factor calculation. For ML refinement the default is to use them to estimate SigmaA. [See SCALe and SIGMA keywords.]
FPARTi PHIPARTi
In order to add known FPART(s) to the structure factors, assign FPARTi and PHIPi. Possible example of using partial structure is: adding in contributions from unmodelled parts of a structure (e.g. uninterpretable parts of MIR/MAD/SAD/DM maps).
REMARK 2: See SCPART keyword: The FPARTi will be added to the FC without any further scaling unless this is set.
HLA HLB HLC HLD or PHIB FOM
PHIB FOM: An input phase and its figure of merit.
Or: HLA HLB HLC HLD The Hendrickson-Lattman coefficients describing prior or "experimental" phase information. These can be obtained by the usual routes; MIR, MAD plus density modification. Theoretically Hendrickson and Lattman coefficients contain more information about phases than PHIB and FOM.
REMARK 3: In our experience using 'dm' phases gave better result than using MIR phases. However the reliability of the phases may need to be changed. See PHASe keyword.

REFInement [ TYPE | PHASe | RESIdual | BREFinement | METHod | RESOlution | TLSCycles ]

This keyword controls the type of refinement or idealisation.

For example:

 
------------------------------------------------------------------------------
           #
           ####Restrained refinement. Reflections between 20 - 1.5Å will be used
           #
           REFI TYPE RESTrained RESOLUTION  20 1.50
           #
           #   Use maximum likelihood residual
           #
           REFI RESI MLKF
           #
           #   Refine individual isotropic B values
           REFI BREF ISOTropic 
             or
           REFI TYPE REST  RESO 20 1.50  
           REFI RESI MLKF  BREF ISOT 
           REFI METH CGMAT
             or 
           #
           #   Rigid body refinement
           #
           REFI TYPE RIGID  #(all other definitions are defaults)
------------------------------------------------------------------------------

Subkeywords:

TYPE RESTrained | UNREstrained | IDEAlise | RIGId | TLSRefinement
[Default RESTrained]
PHASed SCBLurred <scblur> BBLUrred <bblur> SIGMacalc
[Default: only used if PHASE definition given on LABIN; scblur =1.0, bblur = 0. Its effect is equavalent to the keyword PHASe]
RESIdual LSQF | MLKF
[Default MLKF ]
BREFinement OVERall | ISOTropic | ANISotropic | MIXED
[Default ISOTropic]
METHod CGMAt | CGRAd | CDIR
[Default CGMAT]
RESO <resmin> <resmax>
[Default: all data]
TLSC <ncycles>
[Default: no TLS-refinement]

In more detail, these subkeywords are:

TYPE

Default:

REFInement TYPE RESTrained

This keyword describes the type of refinement.

RESTrained
invokes restrained refinement, where both the Xray residual (reflecting the agreement between the observed and the calculated Fs, and the geometric residual (reflecting the fit between the expected and the observed geometry) are minimised at the same time. The relative weighting of these two terms is defined by the keyword WEIGHt
UNREstrained
is for unrestrained refinement, i.e. geometric part is ignored.
IDEAlised
is for geometry idealisation.
RIGID
invokes rigid body refinement. The description of domains is given by the keywords RIGID. REFI TYPE RIGID is equavalent to MODE RIGID.
TLSR
This invokes TLS refinement. Definition of rigid bodies is taken from TLSIN input file. If this type of refinement has been specified then only TLS refinement will be performed. For TLS followed by individual atomic refinement, use REFI TLSC.

PHASed

Default:

REFInement PHASed SCBLur 1.0 BBLUr 0.0
SCBL <scblur> BBLUr <bblur>

If experimental phases are being used it may be necessary to blur the phase probabilities, especially after some density modification calculations (this information can also be input with the keyword PHASE).

Program will apply blurring as follows:

HLAnew = HLA*scblur*exp(-(sin(theta)/lambda)**2*bblur)
HLBnew = HLB*scblur*exp(-(sin(theta)/lambda)**2*bblur)
HLCnew = HLC*scblur*exp(-(sin(theta)/lambda)**2*bblur)
HLDnew = HLD*scblur*exp(-(sin(theta)/lambda)**2*bblur)

or if PHASE and FOM are given: the program first generates the Hendrickson-Lattman coefficients using the formula:

HLA = Func(FOM)*COS(DEGTOR*PHASE), 
HLB = Func(FOM)*SIN(DEGTOR*PHASE),
HLC = HLD = 0.

i.e. the Phase probability distribution is unimodal.

SIGMAcalc
Use the phase information for sigmaA estimation. This option is not recommended and has not been fully tested.

RESIdual

Default:

REFInement RESIdual MLKF

This keyword describes the Xray part of the function.

LSQF
defines amplitude based least-squares residual.
Fxray = SUM(Whkl*(|FO|-|FC|)**2)
MLKF
A -loglikelihood residual derived from Rice distribution for centric and acentric cases of Fs.
Fxray = SUM(LLKcentric_hkl) + SUM(LLKacentric_hkl)

If experimental phase information is available the residual is modified appropriately. This is invoked by assigning appropriate input columns; see LABIN (for methodology see G.N. Murshudov, A.A.Vagin and E.J.Dodson,(1997) in Acta Cryst. D53, 240-255, or Pannu, Murshudov, Dodson and Read (1998) in Acta Cryst. D54, 1285-1294).

METHod

Default:

REFInement METHod CGMATrix

This keyword describes method of minimisation.

CGMAT
(default) is sparse matrix as in PROLSQ.
CGRAD
is conjugate gradient. Does not work
CDIR
is the conjugate direction method. Does not work

BREFinement

Default:

REFInement BREFinement ISOTropic

This keyword describes method for parameterisation of atomic Bvalues (atomic displacement parameters).

OVERall
Overall B-factor (Boverall) obtained from scaling is added to the atomic B values.
ISOTropic
Individual isotropic B-factor refined for all atoms.
ANISotropic
Individual anisotropic B-factor refined for all atoms.
MIXEd
Some atoms with isotropic, some with anisotropic B-values. In this case input file (PDB) defines which atom should be refined isotropicly and which anisotropicly. The atoms with ANISOU card are refined anisotropicly.

RESO

Default: Use all reflections

<resmin> <resmax>
[Default: all data]
<dmin>, <dmax>
[Default 1, 1000] are resolution limits used for refinement in Angstroms (or in 4*sin**2/l**2 if both are < 1.0). They can be given in either order. If only one value is given, it is assumed to be the high resolution cutoff.

Include all well measured data, not omitting the weak observations; it will be weighted appropriately. The low resolution data helps define the solvent shell. However if you have lost strong terms by some accident of data collection, the scaling may not behave well.

TLSC

Default:

REFInement TLSCycles 0
<ncycles>
This subkeyword indicates that before individual atomic restrained or unrestrained refinement, the overall TLS parameters of rigid body should be refined. Then resultant individual atomic B value in the output coordinate file will be after removing of the overall TLS parameters.

SCALe [TYPE <BULK | SIMP>] [BAVER <baverage>] [RESO <resmin> <resmax>] [APPL <OBSE | CALC>] [LSSC [ANIS] | [FIXBulk SCBUlk <scbulk> BBULk <bbulk>] | [NCYC <ncyc>] | [EXPE] | [FREE] ]

It controls scaling of calculated and observed structure factors. The SCALE keyword has several different options. See below for keywords for estimation of sigmaA, triggered by SCALe MLSC. For example:

#   Use Babinet's bulk solvent type scaling
#   
SCALe TYPE BULK 
# and/or   
#
#  do anisotropic scaling. Use resolution between 100 and 2.1Å
#
SCALe LSSC ANIS    RESO 100 2.1
#
#  Use simple scaling, i.e. do not use Babinet's bulk solvent
#  
SCALe TYPE SIMP
#
#   Fix B value of Babinet's bulk solvent. It is useful when
#   bulk solvent based on the constant value is used.   
SCALe LSSC FIXBulk BBULk 200

Subkeywords:

TYPE BULK | SIMPle
[Default: SCALe TYPE BULK]
BAVERage <baverage> - It is not active now
[If there is not sufficient data to refine a B value it is possible to hold it at some sensible value derived from the Wilson plot.]
RESO <resmin> <resmax>
[Default: all data used for the scaling]
APPLY OBS | CALC - It is not active now
[Default: ouput file contains Fobs brought to Fcalc scale]
LSSC
Flag to indicate all following subkeywords apply to estimation of scale between Fo and Fc.
ANISotropic
[Default: anisotropic overall scale]
FIXBulk SCBUlk <scbulk> BBULk <bbulk>
[Lower resolution structures may not have sufficient data to find sensible overall scales and B values for both the BULK and the protein component. It can help to fix these]
NCYCle <ncycle>
[Default: ncycle = 10]
EXPE
[Default is to not use experimental sigmas in the determination. The keyword EXPE changes this to use experimental sigmas]
FREE
[Default: Scales are calculated against the WORKing set of reflections, but if requested it can be derived from the FREE set.]

In more detail, these subkeywords are:

TYPE

with one of the following sub-subkeywords:

BULK
[Default] If TYPE BULK, then the scale KB is a function of 4 variables with the form:
KB = K0*exp(-B0*s^2) * (1- K1*exp(-B1*s^2))

The scale formulation is based on the Babinet principle and described by Dale Tronrud and others. Better results can be obtained if bulk solvent correction based on a constant value is used. See SOLVENT.

SIMPLE
If TYPE is SIMPle the scale factor has the form:
KB = K0*exp(-B0*s^2)  (Simple Wilson scaling)
i.e. K1 = 0

This may be more appropriate if keyword SOLVENT is active.

BAVErage <baverage>

Lower resolution structures may not have sufficient data to give a robust Wilson plot overall B factor, so it is possible to fix the <B> for the structure to a set value. If you are using this option it is important to add remaining B-value to observed structure factors.

RESO <resmin resmax> or <dmin dmax>

[Default all data are used for the scaling]

Defines resolution limit for scaling.

APPL OBSE|CALC - It is not active now

APPL OBSE | CALC will apply overall Bcorrection to either observed or calculated structure factors.

LSSC

Flag to indicate all following subkeywords apply to estimation of scale between Fo and Fc.

FIXB SCBUlk <scbulk> BBULk <Bbulk>

Lower resolution structures may not have sufficient data to find sensible overall scales and B values for both the BULK and the protein.
SCBULK = <solvent_density>/<protein_density>
i.e. For aqueous solvent, with solvent density ~ 1.0. and protein density ~ 1.35, SCBULK ~ 1.0/(1.35). If bulk solvent based on a constant value (SOLVENT) is used then fixing of BBULK is necessary. In this case SCALE TYPE SIMPLe also could be used.

ANISO

Many crystals generate seriously "anisotropic" reflection data. This is presumably due to some crystalline disorder, and is not the same as anisotropy of individual atoms. However the correction can be expressed in a similar form.

Then, apart from isotropic overall B factor B0, contribution of anisotropic B centered at the origin of coordinate system (i.e. in orthogonal system (B11+B22+B33 = 0.0) is also refined.

Overall anisotropic B values are applied to the calculated structure factor with Miller index h,k,l as follows:

B11*h*h*(a*)^2 +
B22*k*k*(b*)^2 +
B33*l*l*(c*)^2 +
2.0*B12*h*k*(a*)*(b*) +
2.0*B13*h*l*(a*)*(c*) +
2.0*B23*k*l*(b*)*(c*)

where
h,k,l are Miller indices
a*,b*,c* are reciprocal space cell dimensions

REFMAC estimates overall anisotropic B values only once at the first cycle and keeps them constant for the rest of the REFMAC refinement session. For R, free are calculation contribution of them is applied to the calculated structure factor. During refinement it is applied to the observed structure factor.

Anisotropic scaling of data should ideally be done at the merging stage but often the distortion aligns with the crystal axes, and therefore cannot be detected from symmetry equivalent reflections alone. Large improvements in behaviour of refinement, maps and statistics (R, FreeR etc.) can result from this correction.

NCYC <ncyc>

Default: <ncyc> = 10

EXPE

Default is to not use experimental sigmas in the determination. The keyword EXPE changes this to use experimental sigmas.

FREE

Default is to use all reflections in the WORKing set for scaling. The keyword FREE changes this to determine the scale from the FREE set of reflections.

NB: Before applying bulk solvent scaling and including all low resolution data, check your distribution of <F> looks sensible. This is the raw material for all overall scaling algorithms. A good way to check this is to look at a <Fsq> plot against resolution.

This should look something like this:

         +
          +           +
           +        +     +
             +    +          +
                +                  +
       <10A     5A    4.5A          ............

If the low resolution looks strange, it may mean your backstop was causing problems, intensities were saturated etc etc, and including such data may give unreasonable solvent scales. A sensible sort of value would be: bulk Solvent scale around -0.75 and bulk solvent B value around 200.0 if SOLVENT is not used.

NB========================================================================
We are not really sure how best to handle scaling. If you have problems please get in touch. In our experience there have been no problems with data sets with resolution 2.5Å or higher, unless there was some obvious flaw; huge ice rings or Is labelled as Fs or some such thing. But with one unusual data set which died at 2.7Å there has been a problem, which we got round by tweaking parameters, but these cases should be automatically checked.
NB========================================================================

NOTE: When doing ML refinement the scale factors are only used to calculate R values and overall B values (isotropic and anisotropic).

SCALe MLSC [ <NCYC <ncyc> | WORK | FIXBulk SCBUlk <scbulk> BBULk <bbulk> ]

For example:

SCALe MLSC FIXBulk BVALue 100.0 SCVAlue -0.1

The SigmaA estimate is generally fitted to the normalised Free reflections using a 4 parameter equation of an analogous form to the bulk scaling:

SA = SA0*exp(-T0*s^2) * (1- SA1*exp(-T1*s^2))

This keyword controls the estimation of SigmaA. Subkeywords:

FIXBulk
The option FIXBulk to fix parameters can be evoked in the same way as for the SCAL LSSC options, but should only be used with care!
NCYCle <ncyc>
[Default <ncyc> = 10]
Use <ncyc> cycles to determine the parameters.
WORK
[Default Sigmaa is calculated against the FREE set of reflections]
The keyword WORK changes this to determine the scale from the WORKing set of reflections.

SOLVENT [YES|NO] [VDWProb <vdwprob>] [IONProb <ionprob >] [RSHRink <rshrink>]

[Default is use the bulk solvent correction based on a constant value with the parameters of the mask VDWProb=1.4, IONProbe=0.8, RSHRink=0.8]

This keyword controls parameters for the solvent mask calculation. A constant value is assigned to the region of the unit cell not occupied by the atoms present in the input coordinate file. Its Fourier transform is used as contribution to the disordered (bulk) solvent or unmodelled part of the structure. Current version does not attempt to identify uninterpreted but ordered part of the unit cell.

Mask calculation is performed in three stages:

  1. All asymmetric unit is set to a constant value.
  2. For each point inside an atom sphere with its center in the atomic position and a certain radius (Rvdw+vdwprob for vdw atoms, Rion+ionprobe for ions), this value is set to 0.
  3. If the distance from a point to the nearest non-zero point is less than <rshrink> then this point is set to predefined constant also. For each atom all its symmetry-related counterparts are also considered.

For example:

SOLVENT VDWProb 1.4 IONProb 0.8 RSHRink 0.8

Subkeywords:

YES|NO
[Default YES]
Turns on or off the calculation of the contribution from solvent region
VDWProb <vdwprob>
[Default vdwprob=1.4]
For mask calculation the vdw radii of non-ion atoms (like carbon) are increased by this value and this new radius is used
IONProb <ionprobe>
[Default ionprobe=0.8]
For mask calculation the ionic radii of atoms which can be ion (or can participate in a coulombic interaction) are increased by this value and this new radius is used for mask calculation
RSHRink <rshrink>
[Default rshrink = 0.8]
Mask calculated after taking away atoms with new radii is shrunk by this value and a constant value assigned to this new region

If this keyword is active, the scale type could be set to SIMPLe. In our experience setting SCALE type to BULK and fixing BULK solvent B value to 200.0 gives "good" results:

SCALE LSSCale FIXBulk BBULk 200.0

Sometimes with high resolution data, BULK solvent B value may not be fixed.

WEIG [NOEX|EXPE] MATRix <wmat> | AUTO

[Default EXPE MATR 0.5]
This keyword controls the weighting of the X-ray and and geometric parts.

For example:

WEIGht MATRix 0.5
NOEX
Exclude experimental sigmas from weighting.

This sub-keyword allows you not to use experimental sigmas of the observations for the Xray residual. The default action is to use them.

The remaining sub-keywords control the relative weighting of the X-ray and geometry terms in the residual.

MATR <wmat>
[Default 0.5]
This keyword defines the weight between X-ray and geometric part of the refinement residual. For tight restraints it should be decreased. For example (for low resolution data it seems to be necessary to use tight restraints):
WEIGHT MATRix 0.1
For loose restraints which is useful for high resolution data (higher than 1.5Å), this value should be increased. For example (at 1.0Å):
WEIGht MATRix 20

This weighting is based on the comparison between average diagonal term of X-ray and geometry "Hessians" (same as PROLSQ). Weighting equates wmat*average_diagonal_of_geometry to average_diagonal_of_Xray terms.

AUTO
By using this keywords user will let Refmac5 adjust the weights automatically.

NCYC <ncycref>

[Default 5]
This keyword defines number of cycles of refinement.