Contents
- Related to PDB files
-
edit_pdb_atom.py
: parse and edit PDB files -
res_renumber_pdb_atom.py
: renumber the residue by both resSeq and chainID -
reorder_pdb_atom_by_chain.py
: reorder the records of atoms based on the alphabetic order of chainIDs -
symmetry_2_reorder_args.py
: generate args based on symmetry forreorder_pdb_atom_by_chain.py
-
average_structure_pdb_atom.py
: generate the average coordinates of the structures from multiple PDB files -
add_phosphate_pdb.py
: add 5’-end phosphates to RNA strands - 3 related scripts: (a)
add_triphosphate_pdb.py
: add 5’-triphosphate, (b)add_cyclicphosphate_pdb.py
: add 2’,3’-cyclic phosphate, and (c)remove_extraphosphate_pdb.py
: remove 5’-triphosphate or 2’,3’-cyclic phosphate -
extend_helix_pdb.py
: extend helix using ideal A-form helix, depending on two script (a)append_one_bp_pdb.py
: adding one bp each time, and (b)insert_res_pdb_atom.py
: adding a list of pdb atom records -
chains_pdb.py
: get chain properties from an input pdb file -
ligate_strand_pdb.py
: ligate chains from an input pdb file -
convertPDB
: shell script for PDB format conversion
-
- Related to SimRNA
-
get_sticky_from_seq_ss.py
: create stacking restraints for SimRNA -
stack_pdb_atom.py
: create a PDB file containing a stacked helix or dinucleotide -
terminal_pdb_atom.py
: create a PDB file containing a base pair -
pdb_2_SimRNA_dist_restrs_aligned.py
: create SimRNA distance restraints -
trafl_reduce.py
: reduce the amount of coordinates in the SimRNA trafl file
-
- Related to USCF Chimera
- Related to Phenix
- Other
Note: If not specified otherwise, all python scripts should be run with python3.x
. The scripts are currently only tested with macOS Mojave
. Feel free to contact me if the links expire or there’s any other question.
Related to PDB files
edit_pdb_atom.py
: parse and edit PDB files
- Access
edit_pdb_atom.py
by the link; - It is used to parse pdb files, mainly for being imported of other scripts: so consider adding the directory containing it to
PYTHONPATH
byexport PYTHONPATH=$PYTHONPATH:PATH_TO_THE_DIRECTORY
, or usesys.path.append("PATH_TO_THE_DIRECTORY")
; - PDB record: I is int, S is string, F is float;
-
Record recordName serial name resName chainID resSeq x y z Type String(6) String(5) String(4) String(3) String(1) Integer(4) Real(8.3) Real(8.3) Real(8.3) Get S[:6] I(S[6:11]) S[12:16].strip() S[17:20].strip() S[21] (?S[20]?) I(S[22:26]) F(S[30:38]) F(S[38:46]) F(S[46:54]) Example HETATM 2 PB GTP X 10 - It contains:
- class
pdb_atom_record
- class
pdb_ter_record
(inherited frompdb_atom_record
) - functions
file2rec
andrec2file
: to read from and write into files
- class
- Coding notes:
- Pass a class member as a default value for a class method, see
update_resSeq
method:def update_resSeq(self, new_resSeq=None, by_shift=0): # new_resSeq=self.resSeq generates error! if new_resSeq is None: new_resSeq = self.resSeq ...
- Use
print(edit_pdb_atom.__doc__)
orhelp(edit_pdb_atom)
to read thedocstring
or more info.
- Pass a class member as a default value for a class method, see
res_renumber_pdb_atom.py
: renumber the residue by both resSeq and chainID
- Access
res_renumber_pdb_atom.py
by the link; - Use by
python res_renumber_pdb_atom.py file.pdb 123A 456X 153B 356Y xxX yyY
, wherexxX
andyyY
are old and new residues; - It will also generate restraints for SimRNA by calling
pdb_2_SimRNA_dist_restrs_aligned.py
withos.system("some string of CMD")
, useful for restraining known motifs; - Coding notes:
sys.argv
,sys.exit(1)
fromsys
module;os.path.isfile(some_file_path)
,os.system("some string of CMD")
fromos
module;os.path...
should not be confused withsys.path...
(fromsys
module).
reorder_pdb_atom_by_chain.py
: reorder the records of atoms based on the alphabetic order of chainIDs
- Access
reorder_pdb_atom_by_chain.py
by the link; - Use by
python reorder_pdb_atom_by_chain.py file.pdb
orpython reorder_pdb_atom_by_chain.py file.pdb 123A 456X 153B 356Y xxX yyY
, wherexxX
andyyY
are old and new residues; - For the former usage, the file is reordered based on the alphabetic order of the chains;
- For the latter usage, residues (chainID and resSeq) are changed by adapting
res_renumber_pdb_atom.py
(which calls SimRNA’spdb_2_SimRNA_dist_restrs_aligned.py
); - If you only want to change the chainID, you can do something like
python reorder_pdb_atom_by_chain.py file.pdb 1A 1X 1B 1Y ... ...
so that there is no shift of the resSeq.
symmetry_2_reorder_args.py
: generate args based on symmetry for reorder_pdb_atom_by_chain.py
- Access
symmetry_2_reorder_args.py
by the link; - Two modes to use:
- Concise:
python symmetry_2_reorder_args.py 3 AF
, where3
is the symmetry fold,AF
means contains chains A to F; - Verbose:
python symmetry_2_reorder_args.py AB CD EF
, where all the symmetry-related components are indicated.
- Concise:
- Outputs to appear in console can be directly used as args for script
reorder_pdb_atom_by_chain.py
; - The three scripts
symmetry_2_reorder_args.py
,reorder_pdb_atom_by_chain.py
andaverage_structure_pdb_atom.py
are useful to generate symmetric oligomers (see an example); - Coding notes:
- An iterator, once exhausted, may not be reused. In Python 3.x,
zip()
returns an iterator. Similar question; - Enable its repeated usage by converting to a list
zipping = list(zip(*sym_partners))
, wheresym_partners
is a list of lists (i.e. 2D array) in the script.
- An iterator, once exhausted, may not be reused. In Python 3.x,
average_structure_pdb_atom.py
: generate the average coordinates of the structures from multiple PDB files
- Access
average_structure_pdb_atom.py
by the link; - Use by
python average_structure_pdb_atom.py file1.pdb file2.pdb ...
; - The output files from
reorder_pdb_atom_by_chain.py
are preferred due to the cleaned formats; - Useful to generate the averaged symmetric structure, see the link.
add_phosphate_pdb.py
: add 5’-end phosphates to RNA strands
- Access
add_phosphate_pdb.py
by the link; - Use by
python add_phosphate_pdb.py PDBfile_name
; - Operations:
- It first moves an ideal nucleotide (“sample_nt”) so that it is aligned with the 5’-nucleotide (“curr_head_res”) based on “C5’”, “O4’”, “C4’”, “C1’” (“O5’” is not necessary, though it can be added by using pdbfixer );
- Then, the moved “sample_nt” is slightly moved so that its “C5’” is aligned with that of “curr_head_res” with the movement matrix reported to console for evaluation;
- Extra “O5’” atoms will be deleted;
- Lastly, the phosphate of “sample_nt” is grafted onto “curr_head_res”.
- The output PDB file has the serials reordered.
- Coding notes:
import numpy as np
, to use thenp.array()
function to createndarray
, to usenp.matmul()
function for matrix product, and to usetolist()
method ofndarray
;copy.deepcopy(some_instance)
fromcopy
module;superpose3d.Superpose3D(frozen_cloud, mobile_cloud)
fromsuperpose3d
module: the function returns(RMSD, R, T, c)
, more details here.
3 related scripts: (a) add_triphosphate_pdb.py
: add 5’-triphosphate, (b) add_cyclicphosphate_pdb.py
: add 2’,3’-cyclic phosphate, and (c) remove_extraphosphate_pdb.py
: remove 5’-triphosphate or 2’,3’-cyclic phosphate
- Access
add_triphosphate_pdb.py
by the link, and use bypython add_triphosphate_pdb.py PDBfile_name chainID1 chainID2 ...
; - Access
add_cyclicphosphate_pdb.py
by the link, and use byadd_cyclicphosphate_pdb.py PDBfile_name chainID1 chainID2 ...
; - A current limitation with
add_cyclicphosphate_pdb.py
is that the chainID of the input PDB file should be ordered; - Access
remove_extraphosphate_pdb.py
by the link, and use byremove_extraphosphate_pdb.py PDBfile_name
; - The three scripts work in a similar with as the
add_phosphate_pdb.py
above. - Coding notes:
- To merge two dictionaries:
resName_change = resName_triphosphate_change.copy() resName_change.update(resName_cyclicphosphate_change)
- more details from stackoverflow.
- To merge two dictionaries:
extend_helix_pdb.py
: extend helix using ideal A-form helix, depending on two script (a) append_one_bp_pdb.py
: adding one bp each time, and (b) insert_res_pdb_atom.py
: adding a list of pdb atom records
- Access
extend_helix_pdb.py
by the link, and use bypython extend_helix_pdb.py file.pdb 123A AUCG 153B CCGGU ...
; - Access
append_one_bp_pdb.py
by the link, and use its core function byappend_one_bp_pdb.append_one_bp(rec_list, target_3P, each_base, target_5P)
,target_3P
andtarget_5P
are the residue markers (such as99c
) for the 3’ and 5’ residues, andtarget_5P
is optional (can be located byfind_pair_5P(rec_list, target_3P)
); - Access
insert_res_pdb_atom.py
by the link, and use its core function byres_insert(rec_list, target_3P, rec_insert, mode="post")
orres_insert(rec_list, target_3P, rec_insert, mode="post"
); - Warnings will be given for
TER
record when the end of the file does not haveTER
record. - Coding notes:
- Floor division operator
//
:print(5//2) # 2 print(-5//2) # -3 print(5.0//2) # 2.0
- Handle potential errors:
try: if rec_list[insert_index+len(rec_insert)].recordName == "TER": rec_list[insert_index+len(rec_insert)].update_resName(rec_insert[-1].resName) rec_list[insert_index+len(rec_insert)].update_resSeq(rec_insert[-1].resSeq) except IndexError: print("IndexError occurs at %s%s" % (rec_insert[-1].resSeq, rec_insert[-1].chainID)) print("But no worries! It's the end that has no TER record.")
- distance between two points:
curr_dist = np.linalg.norm(np.array(coord_5P_O3)-np.array(coord_curr_P))
withimport numpy as np
.
- Floor division operator
chains_pdb.py
: get chain properties from an input pdb file
- Access
chains_pdb.py
by the link, and use bypython chains_pdb.py filename
for printing the properties of all chains to console, orpython chains_pdb.py A B X ...
for the first usage plus extracting individual chains; - Core function
def get_chain_properties(rec_list, chain_rec_list = None):
whichreturn chain_dict_list
, a list of chain_dict (containing the properties of a certain chain); ifchain_rec_list
is provided (as an an empty list) atoms of all the chains will be written to thischain_rec_list
; - Printing function
def print_chain_properties(chain_dict_list):
.
ligate_strand_pdb.py
: ligate chains from an input pdb file
- Access
ligate_strand_pdb.py
by the link, and use bypython ligate_strand_pdb.py file.pdb A B X ...
to ligate the chains start with A, B, X… (automatically searching for the next possible chains for ligation); - This script also relies on
chains_pdb.py
(link),insert_res_pdb_atom.py
(link) anddelete_res_pdb_atom.py
(accessible by link).
convertPDB
: shell script for PDB format conversion
- Access
convertPDB
by the link or preview as text; convertPDB -h
for usage help;- Useful for dealing with old format PDB, qrnas output, and vmd output.
Related to SimRNA
get_sticky_from_seq_ss.py
: create stacking restraints for SimRNA
- Access
get_sticky_from_seq_ss.py
by the link; - Use by
python get_sticky_from_seq_ss.py seq_file ss_file
, whereseq_file
andss_file
are SimRNA’s sequence file and secondary-structure file; - It works by:
- Extracting the sticky-end information from the two input files;
- Then creating a PDB file containing the stacked bases from an ideal RNA helix corresponding to each sticky-end by calling
stack_pdb_atom.py
; - Lastly generating the SimRNA restraints for sticky-end (more precisely, nicked end; output file
restraintsSE_xX.dat
) by callingpdb_2_SimRNA_dist_restrs_aligned.py
.
- Be prepared that a lot of PDB files and restraint files will be generated.
stack_pdb_atom.py
: create a PDB file containing a stacked helix or dinucleotide
- Access
stack_pdb_atom.py
by the link; - Use by, for example,
python stack_pdb_atom.py GC 203X 204X 77C 105D
(usage1 for 4 nt or 2 bp) orpython stack_pdb_atom.py GC 203X 204X
(usage2 for 2 nt), whereGC
is the sequence for203X
and204X
while77C
is complementary to204X
and105D
is complementary to203X
; - It works by:
- Reading the identities of the 4 residues by the 5 input args;
- Then creating a PDB file containing the stacked bases using an ideal RNA helix.
- The main function is for generating restraint files of sticky-ends together with
pdb_2_SimRNA_dist_restrs_aligned.py
for SimRNA.
terminal_pdb_atom.py
: create a PDB file containing a base pair
- Access
terminal_pdb_atom.py
by the link; - Use by, for example,
python terminal_pdb_atom.py GC 203A 77B
, for aGC
pair formed by203A
and77B
; - It works by:
- Reading the identities of the 2 residues by the 3 input args;
- Then creating a PDB file containing the pairing bases using an ideal RNA helix.
- The main function is for generating restraint files of base pairs together with
pdb_2_SimRNA_dist_restrs_aligned.py
for SimRNA.
pdb_2_SimRNA_dist_restrs_aligned.py
: create SimRNA distance restraints
- Access
pdb_2_SimRNA_dist_restrs_aligned.py
by the link; - It is adapted from Michal Boniecki’s script and changes including aligned output and compatibility for python3.x;
- Use by
python pdb_2_SimRNA_dist_restrs_aligned.py file.pdb TOLERANCE(optional, default:0.2)
.
trafl_reduce.py
: reduce the amount of coordinates in the SimRNA trafl file
- Access
trafl_reduce.py
by the link; - Use by
python trafl_reduce.py filename.trafl reduction_rate(such as '5' or 'P' for backbone phosphate)
; - If
reduction_rate
is anint
, it will reduce the amount of residues by a factor of thisint
; ifP
, only coordinates for P atoms will be extracted.
Related to USCF Chimera
transfer_savepos_chimera.py
: transfer Chimera savepos
positions
- Access
transfer_savepos_chimera.py
by the link; - Use by
python transfer_savepos_chimera.py from-filename to-filename
to reorder records, while from-file must exist, to-file withto-filename
will be created if not existing; - The script looks for the line starting with
"\tformattedPositions = "
to locate the dictionary of the saved positions (note that for a Chimera session (.py
) file with nosavepos
,formattedPositions = {}
exists in the.py
file); - Coding notes:
target_frompos = ast.literal_eval(target_frompos_string)
:literal_eval()
function fromast
module to convert strings to Python literal structures;if 'tolines' not in locals():
to check whether ( or not) the variable has been defined, i.e.locals()
function returns the dictionary of current local symbol table.
compare_chimera_swapna.cpp
need to convert from C++ to python3
Related to Phenix
phenix_na_ss.py
: create the class for storing records of secondary structure restraints of Phenix and reorder these records
- Access
phenix_na_ss.py
by the link; - Use by
python phenix_na_ss.py from-filename to-filename
, while from-file must exist, to-file withto-filename
will be created; class base_pair_record
andclass stacking_pair_record
is created by 3 or 2 lines of the records in file:pdb_interpretation { secondary_structure { nucleic_acid { base_pair { base1 = chain 'A' and resid 372 base2 = chain 'A' and resid 399 saenger_class = 20 # 19 for GC, 20 for AU, 28 for GU } stacking_pair { base1 = chain 'A' and resid 417 base2 = chain 'A' and resid 418 } } enabled = True } }
- Coding notes:
- tuples can be compared, such as
if (self.base1["chain"], self.base1["resid"]) > (self.base2["chain"], self.base2["resid"]):
; - tuples can be used as the
key
forsorted()
, such assorted(record_list, key=lambda x: (x.base1["chain"], x.base1["resid"]))
.
- tuples can be compared, such as
phenix_res_from_seq_ss.py
: convert secondary structure denotations to Phenix secondary structure restraints
- Access
phenix_res_from_seq_ss.py
by the link; - Use by
python phenix_res_from_seq_ss.py seq_file ss_file out_file
, whereseq_file
andss_file
are SimRNA sequence file and secondary-structure file; - The script is dependent on
get_sticky_from_seq_ss.py
andphenix_na_ss.py
. - Coding notes:
- Even use
from file_other improt some_function
, all the codes infile_other
will be executed, solution isif __name__ == '__main__':
; - Directly assigning a new value to an object such as
list
in function will not change its value outside of the function, solution isreturn
.
- Even use
Other
ranking_energy_nanotiler.py
: post-process Nanotiler output log files
- Access
ranking_energy_nanotiler.py
by the link; - Use by
python ranking_energy_nanotiler.py nt_outfile
orpython ranking_energy_nanotiler.py nt_outfilename1 nt_outfilename2
, where nt_outfile(1/2) is a Nanotiler output log file; - For the latter use, both files will be processed, and an extra “Energy_sum ranking” file of
.sum
will be generated. - The output files should contain the markers written by the nanotiler script files, which should contain:
# 100*"#" echo #################################################################################################### echo HBP1: ${HBP1} HBP2: ${HBP2} # 96*"#" echo ################################################################################################
- So the script can be modified based on the markers written by the Nanotiler script;
- Coding notes:
from pathlib import Path
to convert a string to aPath
object byp=Path(nt_outfilename)
, most commonly usedp.is_file()
,p.stem
, andp.suffix
, for more info;- Regex: use
re
module’sre.compile()
function to create Regex objects:float_re = re.compile(r"([+-]?)(\d+)\.(\d+)([Ee]?)([+-]?)(\d*)")
&float_str = float_re.search(s_s_str).group(0)
for search patterns of a float and get its string, phenix_na_ss.py also uses Regex; sorted(dict_energy.items(), key=lambda x: x[1])
to get a list containing the items of a dictionary sorted by the values.
ranking_energy_nanotiler.py
: search
- Access
RNA_complement_search.py
by the link; - Use by
python ranking_energy_nanotiler.py nt_outfile
orpython ranking_energy_nanotiler.py nt_outfilename1 nt_outfilename2
, where nt_outfile(1/2) is a Nanotiler output log file; - For the latter use, both files will be processed, and an extra “Energy_sum ranking” file of
.sum
will be generated. - The output files should contain the markers written by the nanotiler script files, which should contain:
#short seq_short1 (must be in a single line) seq_short2 #long seq_long (can be in multiple lines) #restriction GAAGAGC (can be multiple sequences in multiple lines) #start
- So the script can be modified based on the markers written by the Nanotiler script;