CAD (CCP4: Supported Program)

NAME

cad - Collect and sort crystallographic reflection data from several files, to generate a single set.

SYNOPSIS

cad hklin1 foo_in_1.mtz hklin2 foo_in_2.mtz ... hklini foo_in_i.mtz hklout foo.mtz
[Keyworded input]

DESCRIPTION

Uses:

  1. Combine and sort reflection data from up to 9 input reflection data files into a single output data file, with various possible operations being performed on the input data items. For example, you can specify a new spacegroup, change column names and/or types, etc. Data can be converted from one area of reciprocal space to another, converting phases, Hendrickson-Lattman coefficients (providing all 4 are present) and anomalous differences appropriately.
  2. Edit the information describing the datasets held in a file, such as dataset name or crystal cell dimensions. Columns can be re-assigned to different datasets.
  3. Unless otherwise instructed, the program places output data in the CCP4 asymmetric unit (which sometimes differs from that in the International Tables), and sorts it to a standard order. This is an important step when importing data from other packages. It is thus a good idea to run data through CAD after converting it to MTZ format with f2mtz.
  4. Extend reflection data to cover more of reciprocal space. For example it is convenient to extend Cubic data to include hkl klh and lhk for many purposes. Or you may want to run refinement calculations in spacegroup P1.
  5. Prepare data for translation functions of various types, e.g. tffc or rsearch.

INPUT AND OUTPUT FILES

The input files are one or more (up to 9) reflection data files in MTZ format, assigned to HKLIN1, HKLIN2, ... HKLIN9.

The output file is a reflection data file in MTZ format.

Missing data items, i.e. empty column entries corresponding to reflections that occur in some input files but not in the input file contributing that particular column, are represented by Missing Number Flags (see VALM keyword). A particularly important example of this is the use of CAD to fill in missing data in a dataset with MNFs, thus completing the dataset. More details can be found in the unique documentation.

KEYWORDED INPUT

The various data control lines are identified by keywords, those available being:

CELL, CENTRIC ONLY, CTYPIN, END, HISTORY, LABIN(compulsory), LABOUT, MONITOR, OUTLIM, REFMONITOR, RESOLUTION, SCALE, SORT, SYMMETRY, SYSAB KEEP, TITLE, VALM

In addition, there are a few keywords for editing dataset information in the MTZ file header:

DCELL, DNAME, DRENAME, DPNAME, DWAVELENGTH, XNAME

General Keywords

LABIN FILE_NUMBER <i> [ ALL | <column assignment> ... ]

(Compulsory.) A line giving the names of the input data items to be selected from FILE_NUMBER <i> to be read from HKLIN<i>. Up to 29 columns can be specified for input from each HKLIN<i>. If you want to pick up all items from a file, AND there are less than 30 items excluding H K L, then you can specify

LABIN FILE_NUMBER <i> ALL

e.g.: LABI FILE_NUMBER 1 E1=F E2=SIGF E3=FC E4=PHIC ... E29=SIGFau (E<j> stands for ENTRY<j>.)

LABOUT <column assignment> ...

A line giving the new names for the data items which will be written to HKLOUT. Output labels can be changed if you wish, but the default is to keep the input label, unique-ified with the input file number if necessary (see above). E.g.:

LABO FILE_NUMBER 1 E1=Fnat1 E2=SIGFnat1

This changes the first 2 labels and leaves all the rest the same.

CTYPIN FILE <i> <program label>=<type> ...

A line giving the names of the data types to be assigned to the entries selected for FILE <i> . The default is to leave the input datatypes unaltered.

The data types for the different types of data which can be present in an MTZ file are as follows;
H F J D G K Q L M P W A B Y I R [ U V ]

H
index h,k,l
F
structure amplitude, F
J
intensity
D
anomalous difference
G
member of Friedel pair, F+ or F-
K
member of Friedel pair, I+ or I-
Q
standard deviation of J,F,D or other
L
standard deviation of F+ or F-
M
standard deviation of I+ or I-
P
phase angle in degrees
W
weight (of some sort)
A
phase probability coefficients (Hendrickson/Lattman)
B
BATCH number
Y
M/ISYM, packed partial/reject flag and symmetry number
I
any other integer
R
any other real

It is essential to have correct column types for PHASES and ANOMALOUS differences:

  1. to distinguish phases which will require changing if the reflection is moved to a symmetry equivalent;
  2. anomalous differences which require changing sign if the reflection is changed to a Friedel pair.

In addition two special data types are used to signal that you are preparing data for translation functions of various types. They are:

U
partial FC
V
partial PHIC

There must be only one FCpart PHICpart per input file, and they must be the last items specified for LABIN. CAD generates equivalent reflections using only the ROTATIONAL part of the primitive symmetry operator; (i.e., if the spacegroup is P212121 these reflections are analysed as though the spacegroup was P222) This is allowed for in the TFFC and RSEARCH programs. See their documentation.

For the above example their output labels would be
FC1 PHIC1 FC2 PHIC2 ... FCnsymp PHICnsymp
where nsymp is the number of primitive symmetry operators.
See example.

CELL <a> <b> <c> [ <alpha> <beta> <gamma> ]

This keyword is now obsolete, since it is no longer appropriate to use one set of cell dimensions to cover all datasets held in the file. The keyword will be ignored! Please see DCELL keyword.

CENTRIC_ONLY

Only output centric terms.

HISTORY <string>

History strings to be added to mtz o/p file HKLOUT

MONITOR NONE | BRIEF | HIST | FULL

Printing MTZ file header information as:

NONE
(default) no header information output
BRIEF
brief header output
HIST
brief + mtz history
FULL
full header output

OUTLIM [ SPACEGROUP <spacegroup> ] [ HKLLIM <hmin> <hmax> <kmin> <kmax> <lmin> <lmax> ]

Defines limits for the OUTPUT file. Use this for expanding data to cover more of reciprocal space. Subsidiary keywords:

SPACEGROUP <name or number of spacegroup>
this is used to choose a Laue code defined for the appropriate point group. The name (or number) corresponds to the spacegroup whose limits are used. NB : This does NOT alter the symmetry operators stored in the mtz file. In the unlikely event of wanting to change these, use the key word SYMM.
HKLLIM <hmin> <hmax> <kmin> <kmax> <lmin> <lmax>
used to set your own choice of hkl limits. It is better to use the spacegroup to choose a Laue group. Using HKLLIM often duplicates reflections with a zero index.
  Spacegroup Laue code limits are:
     PG1         h k l : l >= 0          Spacegroups 1,2
                 h k 0 : h >= 0
                 0 k 0 : k >= 0
     P2/m        h k l : h >= 0, l >= 0
                 h k 0 : h >= 0          Spacegroups 3,..(bsetting)
     Pmmm        h k l : h >= 0, k >= 0, l >= 0 Spacegroups 16,..
     P4/m        h k l : h >= 0, k >= 0, l >= 0 
                 0 k l : k >  0
                 0 0 l : l >  0                 Spacegroups 75,..
     P4/mmm      h k l : h >= 0, k >= 0, l >= 0, h >= k
                 0 0 l : l >  0                 Spacegroups 89,..
     P3  (R3)    h k l : h >= 0, k > 0
                 0 0 l : l > 0                  Spacegroups 143,..
     P312        h k l : h >= 0, k >= 0, k <= h   (all l)
                 h 0 l : l >= 0          Spacegroups 149,151,153,..
     P321        h k l : h >= 0, k >= 0, k <= h   (all l)
                 h h l : l >= 0          Spacegroups 150,152,154,..
     P6/m        h k l : h >= 0, k >= 0, l >= 0
                 0 k l : k >  0
                 0 0 l : l >  0                 Spacegroups 168,..
     P6/mmm      h k l : h >= 0, k >= 0, l >= 0, h >= k
                 0 0 l : l >  0                 Spacegroups 177,..
     P23         h k l : h >= 0, k >= 0, l >= 0, l>=h, 
            and  h k h : k >= h
                 h k l : k >  h if l >  h       Spacegroups 195,..
     P432        h k l : h >= 0, k >= 0, l >= 0, k >= l and l>= h
                                                Spacegroups 209,..

REFMONITOR <nmon>

The program prints lots of information about every <nmon>-th reflection (default 0).

RESOLUTION [ RESOLUTION OVERALL <dmin> <dmax> ] | [RESOLUTION FILE_NUMBER <i> <dmin> <dmax> ]

Use either:

RESOLUTION OVERALL <dmin> <dmax>
for overall resolution limits, or:
RESOLUTION FILE_NUMBER <i> <dmin> <dmax>
to set input limit for FILE_NUMBER <i>.

<dmax>, <dmin> are the resolution limits for the data to be included, i.e. data are included for which
(1/<dmax>)**2 >= 4 sin**2theta/lambda**2 >=(1/<dmin>)**2
NOTE: Defaults are 0.1 and 1000.0 Angstrom.

SCALE FILE_NUMBER <i> <scale> [ <temperature_factor> ]

Specifies <scale> (and optionally <temperature_factor>) to be applied to all items in FILE_NUMBER which are flagged as type F D Q (or G L for F+ F- alternatives), i.e. all items except intensities and PHASES.

(If no <temperature_factor> is supplied then the <scale> only is applied.)

If there is one number for <temperature_factor>, that is taken as an ISOTROPIC correction, and the scale is applied as <scale> exp( -<temperature_factor>s**2)

If there are six numbers B11 B22 B33 B12 B13 B23 for <temperature_factor>, that is taken as an ANISOTROPIC correction and the applied scale is:

<scale> exp{- [ B11*h*h*(a*)(a*) + B22*k*k*(b*)(b*) + B33*l*l*(c*)(c*) + 2.0*(B12*h*k*(a*)*(b*) + B13*h*l*(a*)*(c*) + B23*k*l*(b*)*(c*) ] }

Example:

scale file_number 1 1 4.722 4.722 -7.08 2.36 0 0

would apply the anisotropic correction to file 1 according to the formula above with a unit scale factor (i.e. <scale> = 1) and temperature factor parameters:

B11=4.722
B22=4.722
B33=-7.08
B12=2.36
B13=0
B23=0

SORT <sort order>

Sort order for indices H K and L, e.g.


   SORT H K L
   SORT L K H

This means that the first index will be the slowest, the second the intermediate, and the last the fastest varying, e.g. SORT H K L will have H slowest, L intermediate and K fastest. Note that SORT H K L is the default sort order (i.e. that used in the absence of the SORT keyword), so that SORT is only necessary when you require a sort order which is different from this default.

SYMMETRY <spacegroup>

This can be used to change the symmetry operators in the output file. The default is to keep the symmetry of the first input file, HKLIN1.

SYSAB_KEEP

Keep systematic absences in output file. (The default is to reject them.)

TITLE <title>

Title to be used in output log file and in output hkl file.

VALM <valml> [NOOUTPUT]

The Missing Number Flag (MNF) written to HKLOUT is set to <valml>, which can take the value NaN or be a real number. If this keyword is not set, then the value of the MNF is taken from the header of HKLIN1 or set to NaN if it is not present there. If NOOUTPUT is specified then reflections with all data items missing are not output to HKLOUT.

END

Terminate input.

Dataset keywords

The following keywords allow you to change the dataset headers in MTZ files. These are necessarily complicated to allow for all possibilities! The Graphical User Interface has an interface to these options called Edit MTZ Project & Dataset which is much more user friendly!

For information on the underlying data model, and its representation in MTZ files, see the MTZ format document. For information on the use of datasets in Data Harvesting, see the Harvesting document.

The XNAME and DNAME keywords are for assigning columns to existing or new datasets. The keywords DRENAME, DPNAME, DCELL and DWAVELENGTH are for changing details of existing datasets. It may be possible to mix several keywords in a program run, but the more complicated combinations will probably give weird results. A sequence of well-defined program runs is probably safest.

N.B. The old PNAME keyword is now obsolete. The project name is now considered an attribute of the crystal. It has an administrative role for Data Harvesting, but is not part of the data structure. Columns are therefore assigned according to XNAME/DNAME only.

XNAME FILE_NUMBER <i> <program label> = <crystal name> ...

A line assigning crystal names to the columns of the input data selected from FILE_NUMBER <i> to be read from HKLIN<i>. The program labels should be a subset of those assigned on LABIN. Ranges can be specified with the subkeyword TO, or all program labels can be selected with the subkeyword ALL. Examples:

XNAME FILE_NUMBER 1 E5=toxd
XNAME FILE_NUMBER 2 E2 TO E4=toxd
XNAME FILE_NUMBER 3 E1=toxd E2 TO E4=rnase E5 TO E6=toxd
XNAME FILE_NUMBER 4 ALL=toxd

This keyword can be used to assign a crystal name where there was previously none, or to replace an existing assignment.

A dataset, as listed in the MTZ header, is specified by a crystal-name/dataset-name pair. The crystal-name specifies a particular physical crystal, while the dataset-name specifies a particular dataset contributing to the structure solution. If either the XNAME keyword or the DNAME keyword or both are specified for a particular column, then the dataset assigned for that column will be changed (either to an existing dataset, or a new one). There should only be one XNAME card per file (use continuation lines if necessary).

DNAME FILE_NUMBER <i> <program label> = <dataset name> ...

A line assigning dataset names to the columns of the input data selected from FILE_NUMBER <i> to be read from HKLIN<i>. The syntax is the same as for the XNAME keyword. This keyword can be used to assign a dataset name where there was previously none, or to replace an existing assignment.

A dataset, as listed in the MTZ header, is specified by a crystal-name/dataset-name pair. The crystal-name specifies a particular physical crystal, while the dataset-name specifies a particular dataset contributing to the structure solution. If either the XNAME keyword or the DNAME keyword or both are specified for a particular column, then the dataset assigned for that column will be changed (either to an existing dataset, or a new one). There should only be one DNAME card per file (use continuation lines if necessary).

DRENAME FILE_NUMBER <i> [ <xname> <dname> | <dataset ID> ] <xname_new> <dname_new>

Keyword for changing <xname> and <dname> for a particular dataset from FILE_NUMBER <i> read from HKLIN<i>. The dataset is identified either by the old xname/dname pair, or by the dataset number. The latter is the number listed by MTZDUMP when run on HKLIN<i>. Note that this number may be different in HKLOUT. If you want to change the xname/dname labels for several datasets, then this keyword can be included more than once.

DPNAME FILE_NUMBER <i> [ <xname> <dname> | <dataset ID> ] <pname_new>

Keyword for changing <pname> for a particular dataset from FILE_NUMBER <i> read from HKLIN<i>. The dataset is identified either by the <xname>/<dname> pair, or by the <dataset ID>. The <xname>/<dname> pair identifies the dataset after any renaming done by keyword DRENAME. It will also identify a dataset added by the keywords XNAME and DNAME. In contrast, the <dataset ID> is the number listed by MTZDUMP when run on HKLIN<i>, and thus identifies a dataset before any renaming. The <dataset ID> may be different in HKLOUT. If you want to change the <pname> labels for several datasets, then this keyword can be included more than once.

DCELL FILE_NUMBER <i> [ <xname> <dname> | <dataset ID> ] <a> <b> <c> [ <alpha> <beta> <gamma> ]

Keyword for changing cell information for specific datasets from FILE_NUMBER <i> read from HKLIN<i>. The dataset is identified either by the <xname>/<dname> pair, or by the <dataset ID>. The <xname>/<dname> pair identifies the dataset after any renaming done by keyword DRENAME. It will also identify a dataset added by the keywords XNAME and DNAME. In contrast, the <dataset ID> is the number listed by MTZDUMP when run on HKLIN<i>, and thus identifies a dataset before any renaming. The <dataset ID> may be different in HKLOUT. If you want to change the cell information for several datasets, then this keyword can be included more than once.

DWAVELENGTH FILE_NUMBER <i> [ <xname> <dname> | <dataset ID> ] <wavelength>

Keyword for adding/changing wavelength information for specific datasets from FILE_NUMBER <i> read from HKLIN<i>. The dataset is identified either by the <xname>/<dname> pair, or by the <dataset ID>. The <xname>/<dname> pair identifies the dataset after any renaming done by keyword DRENAME. It will also identify a dataset added by the keywords XNAME and DNAME. In contrast, the <dataset ID> is the number listed by MTZDUMP when run on HKLIN<i>, and thus identifies a dataset before any renaming. The <dataset ID> may be different in HKLOUT. If you want to change the wavelength information for several datasets, then this keyword can be included more than once.

PRINTER OUTPUT

The printer output first gives details taken from the input control data.

Then, for each input reflection data file, the information in the MTZ header, according to the requested level of monitoring. The labels are checked for consistency with those in the file, and the list of output labels is prepared.

The reflection data for each file is read and a summary table of the data is output .

The total number of reflection records in the output file is printed, followed by a summary of HKLOUT.

EXAMPLES

Simple unix example scripts found in $CEXAM/unix/runnable/

  • cad.exam (Example of combining several files and example of data being extended to P1).
  • cad_rnase.exam (Example of adding project- and dataset-information to an mtz file).

    Also found combined with other programs in the example scripts ($CEXAM/unix/runnable/)

  • tffc_procedure (Combining two files prior to running tffc).
  • RF-with-Es (Use in Rotation Function using Es procedure).
  • scalepack2mtz.exam (Use in getting scalepack data into CCP4).
  • phased_translation_calc (Example of extending phased MTZ file from P212121 to P1).
  • ....and non runnable examples in $CEXAM/unix/non-runnable/

  • cad_then_mtzutils.exam (Example of how to save time using both cad and mtzutils).
  • cad_raxis.exam (f2mtz+cad on Raxis data).
  • mlphare_heavyatoms.exam (Extend isomorphously phased file from P212121 to P1).
  • SEE ALSO

    mtzutils, rsearch, tffc, unique.

    AUTHOR

    Eleanor Dodson, York University