pdbset - various useful manipulations on coordinate files
pdbset XYZIN foo_in.pdb XYZOUT foo_out.pdb
[Keyworded input]
Note that PDBSET should work with mmCIF files as well as PDB files.
The available keywords are:
BFACTOR, CELL, CHAIN, COM, ELEMENT, EXCLUDE, OCCUPANCY, ORTHOGONALIZATION, OUTPUT, PICK, REMARK, RENUMBER, REORTHOGONALIZE, REPLACE, ROTATE, SELECT, SEQUENCE, SHIFT, SPACEGROUP, SYMGEN, TRANSFORM, UTOB, XPLOR, NOISE, ATRENUMBER
In the description below, optional items are in [], alternatives are separated by |, keywords are in uppercase, parameters (i.e. numbers) are in lowercase. The input itself is case-insensitive for keywords (but parameters e.g. chain IDs must of course be the correct case). In the output file, the chain ID is always uppercase.
Read cell dimensions and make CRYST1 & SCALE header records. These will replace any CRYST1 & SCALE lines already present in file. The CRYST1 line should have the spacegroup in it, so a SPACEGROUP command is recommended. Note that if the TRANSFORM or SHIFT cards are present and the input PDB file contains CRYST1 and SCALE cards, the transformation operation will take place using the original cell dimensions. If the user wishes to perform the transformation operation using the new cell dimensions then two separate runs of the program are required.
Define code to generate orthogonalization matrix from input cell. This is not normally required, and only has an effect if a CELL command is also given.
Code :- = 1 axes along a, c* x a, c* (Brookhaven standard, default) = 2 axes along b, a* x b, a* = 3 axes along c, b* x c, b* = 4 axes along a+b, c* x (a+b), c* = 5 axes along a*, c x a*, c ( Rollett ) = 6 axes along a, b*, a x b* = 7 axes along a*, b, a* x b (TNT convention, probably not very useful here since TNT has its own converter program)
Read spacegroup name (not essential, but put into CRYST1 line on output)
Generate chains with these symmetry operations applied. If the operations are given explicitly, several SYMGEN commands may be given. The identity operation must be specified explicitly if required. Use the CHAIN command to rename them. Note that, except for NCS, these symmetry operations apply to fractional coordinates, so the orthogonalization operation must be known to the program, either from CRYST1 and/or CELL lines in the input coordinate file, or from a CELL command. If the keyword NCS is given, then a series of TRANSFORM commands should be given to define the non-crystallographic symmetry operations to be used.
NB: if supplying individual symmetry operations, these must be in the form found in the file symop.lib, e.g.
SYMGEN -X,Y,-Z SYMGEN 1/2+X,1/2+Y,ZElements within each operation are separated by commas. To supply multiple operations on a single line, separate each pair of operations by an asterisk, e.g.
SYMGEN -X,Y,-Z * 1/2+X,1/2+Y,Z
Renumber or add constant to residue numbers in given range. The residue
range is given as 1st_residue_number [TO] last_residue_number. If the CHAIN
keyword is present, the renumbering applies only to this chain. The option
TO new_chain causes the chain identifier to be changed. Note that renumbering
is done after chain renaming specified by the CHAIN command, so the chain
specified here (old_chain) is the chain ID after any renaming. N.B. there
is NO check that different RENUMBER commands are mutually exclusive. To
avoid problems with recursive renumbering, if more than one RENUMBER command
would apply to a residue, only the first will be done.
(Defaults all residues, all chains).
e.g. RENUMBER 35 ! renumber all residues, starting from 35 RENUMBER INCREMENT -5 102 TO 110 CHAIN C ! subtract 5 from ! residues 102 to 110 in chain C RENUMBER 101 1 TO 78 CHAIN A TO B ! renumber residues 1 to 78 in chain A from 101 (to 178), ! changing the chain identifier to B
Change chain ID to given value. If only one value given, change all chains to this value. If SYMMETRY keyword given, this applies to this symmetry operation only. A series of CHAIN commands may be given.
e.g. CHAIN Q ! change all chains to Q CHAIN SYMMETRY 2 A B ! change chain generated from chain A ! by symmetry operation 2 to B
Set B-factor (default 20.0).
Subkeys:
- ALWAYS (default)
- Reset all B-factors to B_reset
- ZEROS
- Reset B-factor to B_reset only if B-factor= 0.0
- MINIMUM
- Reset B-factor to B_reset only if B-factor is less than B_reset
- MAXIMUM
- Reset B-factor to B_reset only if B-factor is greater than B_reset
- RANGE
- Truncate B-factors to the given range. If B-factor is less than B_reset, B-factor = B_reset; if B-factor is greater than B_reset2, B-factor = B_reset.
- AVERAGE
- Average B-factors from the main chain (N CA C O atoms) and side chain of a residue and reset B-factor to B_average-mainchain or B_average-sidechain as appropriate.
Set occupancy (default 1.0).
Subkeys:
- ALWAYS (default)
- Reset all occupancies to Occ_reset
- ZERO
- Reset ZERO occupancies to Occ_reset
- MINIMUM
- Reset occupancy to Occ_reset if occupancy less than Occ_reset.
- RESET
- Reset occupancy to 0 if occupancy less than Occ_reset , and to 1.0 if occupancy greater than Occ_reset2.
Subkeys:
- CHAIN
- Select only specified chain(s).
e.g. SELECT CHAIN C ! select only chain C- OCCUPANCY [<minimum_occupancy>]
- Select only atoms with occupancy .gt. minimum_occupancy [ default = 0.0]. This can be used to strip out dummy atoms with zero occupancy
- BFACTOR [<maximum_B>]
- Select only atoms with Bfactor less than <maximum_B> [default = 99.0]
Define rotational transformation, either as MATRIX (this keyword may be omitted) followed by 9 numbers (r11 r12 r13 r21 r22 r23 r31 r32 r33), by keyword EULER followed by Eulerian angles alpha, beta, gamma (as in ALMN), or by keyword POLAR followed by polar angles omega, phi, kappa (as in POLARRFN). This transformation will be applied to all atoms. The SHIFT command may be used to define a translation in addition. The transformation defined by ROTATE & SHIFT, or by TRANSFORM, is applied after any SYMGEN operation. Multiple definitions of ROTATE or TRANSFORM, or of SHIFT will NOT be concatenated: only the last will be effective.
The subkey INVERT causes the inverse transformation to be applied. Note that an INVERT instruction if present will apply to both ROTATE & SHIFT.
Define translation transformation (added AFTER rotation). If the keyword FRACTIONAL is present, the translation is assumed to be in fractional coordinates, otherwise orthogonal Angstroms. The subkey INVERT causes the inverse transformation to be applied. Note that an INVERT instruction if present will apply to both ROTATE & SHIFT.
Define transformation, equivalent to ROTATE MATRIX + SHIFT. If the keyword FRACTIONAL is present, the translation is assumed to be in fractional coordinates, otherwise orthogonal Angstroms. The subkey ODB causes the transformation to be read from a file in the format of an O datablock transformation. The subkey FILE reads the transformation from a formatted file containing a 3x3 matrix followed by a translation vector. The subkey INVERT causes the inverse transformation to be applied.
If a SYMGEN NCS command is given before TRANSFORM commands, these are collected together to generate multiple NCS-symmetry related chains.
Just gets echoed to output coordinate file.
The input file is assumed to come from Xplor; the following operations are then done:-
Define atom names to be included: all other atoms will be omitted - e.g. PICK CA to choose C-alpha only. Note that the atomname is case-sensitive.
Write out sequence to a file (default file name SEQUENCE). This can be edited to give a sequence for Xplor or O, etc. If the keyword PDB is present, the sequence is written in PDB SEQRES format, split by chains. If SINGLE is given, the sequence is written in single-letter code.
This function also writes out the estimated molecular weight based on the sequence. Note that this may differ from the value obtained by summing the weights of all the atoms in the input PDB file.
Set output options. The default is to output a file (XYZOUT) in the same format as the input (XYZIN).
- PDB
- Output a PDB file.
- CIF
- Output an mmCIF file.
- XPLOR
- Duplicate the chain ID as an Xplor segid, to make the file suitable for direct input into Xplor.
Convert Us on input file to B (B = 8 pi**2 u**2).
Define list of 2-character element names to be left-justified in atomnames, e.g. MG, FE, ZN. Note that the element name is case-sensitive. The PDB convention defines the first 2 characters of the atomname as the element name, but Xplor & O put them in the wrong place. CA is NOT accepted, as this conflicts with Calpha: you will have to decide what to do with these yourself.
Change orthogonalization convention for coordinates by converting to fractional in the input convention (FROM) and reorthogonalizing in the output convention (TO). If the FROM Ncode is omitted, the orthogonalization will be taken from the input (PDB) file as SCALEn lines, or the default of Ncode = 1 will be used. If the cell is not present in the input file, a CELL command must be given here. <ncode_out> is compulsory. See above for Ncodes.
Globally replace residue type, e.g. REPLACE RESIDUE CYS BY CYH.
Useful for renaming according to dictionary conventions of different programs.
The residue names will be right-justified before use to allow for single
character names.
e.g. replace residue C by CYT.
Replace atom name by new one, optionally only in specified residue name. Note that replace tests are done in the order given, so an IN <residue_type> command must allow for previous REPLACE RESIDUE commands. Note also that leading spaces must be given in atom names e.g.
REPLACE ATOM " O" BY " OW" IN HOH
Exclude some things, depending on subkey:
- SIDE
- Exclude all non protein and side chain atoms past CB i.e. create a POLYALA model. N.B. the residue names are NOT changed.
- WATer or HOH
- Exclude residues labelled WAT or HOH.
- HYDROGENS
- Exclude hydrogen atoms (as for the XPLOR option)
- HEADERS
- Exclude all lines except ATOM & HETATM lines. The default is to copy them from the input file.
Will calculate the centre of mass and maximum distance from it of the coordinates output. This may be useful for determining the rotation function integration radius (not done by default since it requires an intermediate file).
Introduce random shifts into atom positions in orthogonal coordinates.
maximum_shift | maximum shift (Angs) defaults to 0.2 Angs, fails if greater than 0.5 Angs |
Subkeys: | |
CHAIN | act on only specified chain(s) eg   NOISE 0.1 CHAIN C   select only chain C |
BFACTOR [<minimum_B>] | act on only atoms with B-factor greater than <minimum_B> |
PICK | act on only specified atom names eg   NOISE 0.1 PICK CA   to choose C-alpha only Note that the atomname is case-sensitive |
ATomRENUMBERing: discards the atom numbers from the input file and writes out new sequential atom numbers. This can be used to renumber atoms in PDB files where atom records have been removed without "correcting" the atom numbers.
Phil Evans, MRC LMB, Cambridge, September 1992
######################## Convert PDB file to mmCIF format #!/bin/csh -f # pdbset xyzin toxd.pdb xyzout toxd.cif << eof-1 output cif end eof-1 ######################## Take output from O into a form suitable for refinement #!/bin/csh -f # pdbset xyzin bst_113m.pdb xyzout temp1.pdb << eof-1 cell 132.02 115.21 96.20 90.00 90.00 90.00 spacegroup P212121 eof-1 ################### Take output from Xplor into a form suitable for refinement #!/bin/csh -f # pdbset xyzin bst_113m.pdb xyzout temp1.pdb << eof-1 cell 132.02 115.21 96.20 90.00 90.00 90.00 spacegroup P212121 xplor eof-1 ######################## Expand dimer to tetramer, rename chains, transform #!/bin/csh -f # # Make tetramer from dimer # pdbset xyzin ecrproducts268.pdb xyzout ecrprodpqrtet.pdb <<eof-1 remark Tetramer generated from AB dimer remark rotated to pqr frame remark ! Generate other dimer by z-dyad in P21212 symgen x,y,z symgen -x,-y,z ! Rename chains in second dimer: V & W are water chains chain symmetry 2 A C chain symmetry 2 B D chain symmetry 2 V X chain symmetry 2 W Y ! transform to molecular frame transform - 0.87831 0.47808 0 - 0 0 -1. - -0.47808 0.87831 0 - 0.0 -2.713 0.0 eof-1