Contents
- Related to PDB files
-
edit_pdb_atom.py: parse and edit PDB files -
res_renumber_pdb_atom.py: renumber the residue by both resSeq and chainID -
reorder_pdb_atom_by_chain.py: reorder the records of atoms based on the alphabetic order of chainIDs -
symmetry_2_reorder_args.py: generate args based on symmetry forreorder_pdb_atom_by_chain.py -
average_structure_pdb_atom.py: generate the average coordinates of the structures from multiple PDB files -
add_phosphate_pdb.py: add 5’-end phosphates to RNA strands - 3 related scripts: (a)
add_triphosphate_pdb.py: add 5’-triphosphate, (b)add_cyclicphosphate_pdb.py: add 2’,3’-cyclic phosphate, and (c)remove_extraphosphate_pdb.py: remove 5’-triphosphate or 2’,3’-cyclic phosphate -
extend_helix_pdb.py: extend helix using ideal A-form helix, depending on two script (a)append_one_bp_pdb.py: adding one bp each time, and (b)insert_res_pdb_atom.py: adding a list of pdb atom records -
chains_pdb.py: get chain properties from an input pdb file -
ligate_strand_pdb.py: ligate chains from an input pdb file -
convertPDB: shell script for PDB format conversion
-
- Related to SimRNA
-
get_sticky_from_seq_ss.py: create stacking restraints for SimRNA -
stack_pdb_atom.py: create a PDB file containing a stacked helix or dinucleotide -
terminal_pdb_atom.py: create a PDB file containing a base pair -
pdb_2_SimRNA_dist_restrs_aligned.py: create SimRNA distance restraints -
trafl_reduce.py: reduce the amount of coordinates in the SimRNA trafl file
-
- Related to USCF Chimera
- Related to Phenix
- Other
Note: If not specified otherwise, all python scripts should be run with python3.x. The scripts are currently only tested with macOS Mojave. Feel free to contact me if the links expire or there’s any other question.
Related to PDB files
edit_pdb_atom.py: parse and edit PDB files
- Access
edit_pdb_atom.pyby the link; - It is used to parse pdb files, mainly for being imported of other scripts: so consider adding the directory containing it to
PYTHONPATHbyexport PYTHONPATH=$PYTHONPATH:PATH_TO_THE_DIRECTORY, or usesys.path.append("PATH_TO_THE_DIRECTORY"); - PDB record: I is int, S is string, F is float;
-
Record recordName serial name resName chainID resSeq x y z Type String(6) String(5) String(4) String(3) String(1) Integer(4) Real(8.3) Real(8.3) Real(8.3) Get S[:6] I(S[6:11]) S[12:16].strip() S[17:20].strip() S[21] (?S[20]?) I(S[22:26]) F(S[30:38]) F(S[38:46]) F(S[46:54]) Example HETATM 2 PB GTP X 10 - It contains:
- class
pdb_atom_record - class
pdb_ter_record(inherited frompdb_atom_record) - functions
file2recandrec2file: to read from and write into files
- class
- Coding notes:
- Pass a class member as a default value for a class method, see
update_resSeqmethod:def update_resSeq(self, new_resSeq=None, by_shift=0): # new_resSeq=self.resSeq generates error! if new_resSeq is None: new_resSeq = self.resSeq ... - Use
print(edit_pdb_atom.__doc__)orhelp(edit_pdb_atom)to read thedocstringor more info.
- Pass a class member as a default value for a class method, see
res_renumber_pdb_atom.py: renumber the residue by both resSeq and chainID
- Access
res_renumber_pdb_atom.pyby the link; - Use by
python res_renumber_pdb_atom.py file.pdb 123A 456X 153B 356Y xxX yyY, wherexxXandyyYare old and new residues; - It will also generate restraints for SimRNA by calling
pdb_2_SimRNA_dist_restrs_aligned.pywithos.system("some string of CMD"), useful for restraining known motifs; - Coding notes:
sys.argv,sys.exit(1)fromsysmodule;os.path.isfile(some_file_path),os.system("some string of CMD")fromosmodule;os.path...should not be confused withsys.path...(fromsysmodule).
reorder_pdb_atom_by_chain.py: reorder the records of atoms based on the alphabetic order of chainIDs
- Access
reorder_pdb_atom_by_chain.pyby the link; - Use by
python reorder_pdb_atom_by_chain.py file.pdborpython reorder_pdb_atom_by_chain.py file.pdb 123A 456X 153B 356Y xxX yyY, wherexxXandyyYare old and new residues; - For the former usage, the file is reordered based on the alphabetic order of the chains;
- For the latter usage, residues (chainID and resSeq) are changed by adapting
res_renumber_pdb_atom.py(which calls SimRNA’spdb_2_SimRNA_dist_restrs_aligned.py); - If you only want to change the chainID, you can do something like
python reorder_pdb_atom_by_chain.py file.pdb 1A 1X 1B 1Y ... ...so that there is no shift of the resSeq.
symmetry_2_reorder_args.py: generate args based on symmetry for reorder_pdb_atom_by_chain.py
- Access
symmetry_2_reorder_args.pyby the link; - Two modes to use:
- Concise:
python symmetry_2_reorder_args.py 3 AF, where3is the symmetry fold,AFmeans contains chains A to F; - Verbose:
python symmetry_2_reorder_args.py AB CD EF, where all the symmetry-related components are indicated.
- Concise:
- Outputs to appear in console can be directly used as args for script
reorder_pdb_atom_by_chain.py; - The three scripts
symmetry_2_reorder_args.py,reorder_pdb_atom_by_chain.pyandaverage_structure_pdb_atom.pyare useful to generate symmetric oligomers (see an example); - Coding notes:
- An iterator, once exhausted, may not be reused. In Python 3.x,
zip()returns an iterator. Similar question; - Enable its repeated usage by converting to a list
zipping = list(zip(*sym_partners)), wheresym_partnersis a list of lists (i.e. 2D array) in the script.
- An iterator, once exhausted, may not be reused. In Python 3.x,
average_structure_pdb_atom.py: generate the average coordinates of the structures from multiple PDB files
- Access
average_structure_pdb_atom.pyby the link; - Use by
python average_structure_pdb_atom.py file1.pdb file2.pdb ...; - The output files from
reorder_pdb_atom_by_chain.pyare preferred due to the cleaned formats; - Useful to generate the averaged symmetric structure, see the link.
add_phosphate_pdb.py: add 5’-end phosphates to RNA strands
- Access
add_phosphate_pdb.pyby the link; - Use by
python add_phosphate_pdb.py PDBfile_name; - Operations:
- It first moves an ideal nucleotide (“sample_nt”) so that it is aligned with the 5’-nucleotide (“curr_head_res”) based on “C5’”, “O4’”, “C4’”, “C1’” (“O5’” is not necessary, though it can be added by using pdbfixer );
- Then, the moved “sample_nt” is slightly moved so that its “C5’” is aligned with that of “curr_head_res” with the movement matrix reported to console for evaluation;
- Extra “O5’” atoms will be deleted;
- Lastly, the phosphate of “sample_nt” is grafted onto “curr_head_res”.
- The output PDB file has the serials reordered.
- Coding notes:
import numpy as np, to use thenp.array()function to createndarray, to usenp.matmul()function for matrix product, and to usetolist()method ofndarray;copy.deepcopy(some_instance)fromcopymodule;superpose3d.Superpose3D(frozen_cloud, mobile_cloud)fromsuperpose3dmodule: the function returns(RMSD, R, T, c), more details here.
3 related scripts: (a) add_triphosphate_pdb.py: add 5’-triphosphate, (b) add_cyclicphosphate_pdb.py: add 2’,3’-cyclic phosphate, and (c) remove_extraphosphate_pdb.py: remove 5’-triphosphate or 2’,3’-cyclic phosphate
- Access
add_triphosphate_pdb.pyby the link, and use bypython add_triphosphate_pdb.py PDBfile_name chainID1 chainID2 ...; - Access
add_cyclicphosphate_pdb.pyby the link, and use byadd_cyclicphosphate_pdb.py PDBfile_name chainID1 chainID2 ...; - A current limitation with
add_cyclicphosphate_pdb.pyis that the chainID of the input PDB file should be ordered; - Access
remove_extraphosphate_pdb.pyby the link, and use byremove_extraphosphate_pdb.py PDBfile_name; - The three scripts work in a similar with as the
add_phosphate_pdb.pyabove. - Coding notes:
- To merge two dictionaries:
resName_change = resName_triphosphate_change.copy() resName_change.update(resName_cyclicphosphate_change) - more details from stackoverflow.
- To merge two dictionaries:
extend_helix_pdb.py: extend helix using ideal A-form helix, depending on two script (a) append_one_bp_pdb.py: adding one bp each time, and (b) insert_res_pdb_atom.py: adding a list of pdb atom records
- Access
extend_helix_pdb.pyby the link, and use bypython extend_helix_pdb.py file.pdb 123A AUCG 153B CCGGU ...; - Access
append_one_bp_pdb.pyby the link, and use its core function byappend_one_bp_pdb.append_one_bp(rec_list, target_3P, each_base, target_5P),target_3Pandtarget_5Pare the residue markers (such as99c) for the 3’ and 5’ residues, andtarget_5Pis optional (can be located byfind_pair_5P(rec_list, target_3P)); - Access
insert_res_pdb_atom.pyby the link, and use its core function byres_insert(rec_list, target_3P, rec_insert, mode="post")orres_insert(rec_list, target_3P, rec_insert, mode="post"); - Warnings will be given for
TERrecord when the end of the file does not haveTERrecord. - Coding notes:
- Floor division operator
//:print(5//2) # 2 print(-5//2) # -3 print(5.0//2) # 2.0 - Handle potential errors:
try: if rec_list[insert_index+len(rec_insert)].recordName == "TER": rec_list[insert_index+len(rec_insert)].update_resName(rec_insert[-1].resName) rec_list[insert_index+len(rec_insert)].update_resSeq(rec_insert[-1].resSeq) except IndexError: print("IndexError occurs at %s%s" % (rec_insert[-1].resSeq, rec_insert[-1].chainID)) print("But no worries! It's the end that has no TER record.") - distance between two points:
curr_dist = np.linalg.norm(np.array(coord_5P_O3)-np.array(coord_curr_P))withimport numpy as np.
- Floor division operator
chains_pdb.py: get chain properties from an input pdb file
- Access
chains_pdb.pyby the link, and use bypython chains_pdb.py filenamefor printing the properties of all chains to console, orpython chains_pdb.py A B X ...for the first usage plus extracting individual chains; - Core function
def get_chain_properties(rec_list, chain_rec_list = None):whichreturn chain_dict_list, a list of chain_dict (containing the properties of a certain chain); ifchain_rec_listis provided (as an an empty list) atoms of all the chains will be written to thischain_rec_list; - Printing function
def print_chain_properties(chain_dict_list):.
ligate_strand_pdb.py: ligate chains from an input pdb file
- Access
ligate_strand_pdb.pyby the link, and use bypython ligate_strand_pdb.py file.pdb A B X ...to ligate the chains start with A, B, X… (automatically searching for the next possible chains for ligation); - This script also relies on
chains_pdb.py(link),insert_res_pdb_atom.py(link) anddelete_res_pdb_atom.py(accessible by link).
convertPDB: shell script for PDB format conversion
- Access
convertPDBby the link or preview as text; convertPDB -hfor usage help;- Useful for dealing with old format PDB, qrnas output, and vmd output.
Related to SimRNA
get_sticky_from_seq_ss.py: create stacking restraints for SimRNA
- Access
get_sticky_from_seq_ss.pyby the link; - Use by
python get_sticky_from_seq_ss.py seq_file ss_file, whereseq_fileandss_fileare SimRNA’s sequence file and secondary-structure file; - It works by:
- Extracting the sticky-end information from the two input files;
- Then creating a PDB file containing the stacked bases from an ideal RNA helix corresponding to each sticky-end by calling
stack_pdb_atom.py; - Lastly generating the SimRNA restraints for sticky-end (more precisely, nicked end; output file
restraintsSE_xX.dat) by callingpdb_2_SimRNA_dist_restrs_aligned.py.
- Be prepared that a lot of PDB files and restraint files will be generated.
stack_pdb_atom.py: create a PDB file containing a stacked helix or dinucleotide
- Access
stack_pdb_atom.pyby the link; - Use by, for example,
python stack_pdb_atom.py GC 203X 204X 77C 105D(usage1 for 4 nt or 2 bp) orpython stack_pdb_atom.py GC 203X 204X(usage2 for 2 nt), whereGCis the sequence for203Xand204Xwhile77Cis complementary to204Xand105Dis complementary to203X; - It works by:
- Reading the identities of the 4 residues by the 5 input args;
- Then creating a PDB file containing the stacked bases using an ideal RNA helix.
- The main function is for generating restraint files of sticky-ends together with
pdb_2_SimRNA_dist_restrs_aligned.pyfor SimRNA.
terminal_pdb_atom.py: create a PDB file containing a base pair
- Access
terminal_pdb_atom.pyby the link; - Use by, for example,
python terminal_pdb_atom.py GC 203A 77B, for aGCpair formed by203Aand77B; - It works by:
- Reading the identities of the 2 residues by the 3 input args;
- Then creating a PDB file containing the pairing bases using an ideal RNA helix.
- The main function is for generating restraint files of base pairs together with
pdb_2_SimRNA_dist_restrs_aligned.pyfor SimRNA.
pdb_2_SimRNA_dist_restrs_aligned.py: create SimRNA distance restraints
- Access
pdb_2_SimRNA_dist_restrs_aligned.pyby the link; - It is adapted from Michal Boniecki’s script and changes including aligned output and compatibility for python3.x;
- Use by
python pdb_2_SimRNA_dist_restrs_aligned.py file.pdb TOLERANCE(optional, default:0.2).
trafl_reduce.py: reduce the amount of coordinates in the SimRNA trafl file
- Access
trafl_reduce.pyby the link; - Use by
python trafl_reduce.py filename.trafl reduction_rate(such as '5' or 'P' for backbone phosphate); - If
reduction_rateis anint, it will reduce the amount of residues by a factor of thisint; ifP, only coordinates for P atoms will be extracted.
Related to USCF Chimera
transfer_savepos_chimera.py: transfer Chimera savepos positions
- Access
transfer_savepos_chimera.pyby the link; - Use by
python transfer_savepos_chimera.py from-filename to-filenameto reorder records, while from-file must exist, to-file withto-filenamewill be created if not existing; - The script looks for the line starting with
"\tformattedPositions = "to locate the dictionary of the saved positions (note that for a Chimera session (.py) file with nosavepos,formattedPositions = {}exists in the.pyfile); - Coding notes:
target_frompos = ast.literal_eval(target_frompos_string):literal_eval()function fromastmodule to convert strings to Python literal structures;if 'tolines' not in locals():to check whether ( or not) the variable has been defined, i.e.locals()function returns the dictionary of current local symbol table.
compare_chimera_swapna.cpp need to convert from C++ to python3
Related to Phenix
phenix_na_ss.py: create the class for storing records of secondary structure restraints of Phenix and reorder these records
- Access
phenix_na_ss.pyby the link; - Use by
python phenix_na_ss.py from-filename to-filename, while from-file must exist, to-file withto-filenamewill be created; class base_pair_recordandclass stacking_pair_recordis created by 3 or 2 lines of the records in file:pdb_interpretation { secondary_structure { nucleic_acid { base_pair { base1 = chain 'A' and resid 372 base2 = chain 'A' and resid 399 saenger_class = 20 # 19 for GC, 20 for AU, 28 for GU } stacking_pair { base1 = chain 'A' and resid 417 base2 = chain 'A' and resid 418 } } enabled = True } }- Coding notes:
- tuples can be compared, such as
if (self.base1["chain"], self.base1["resid"]) > (self.base2["chain"], self.base2["resid"]):; - tuples can be used as the
keyforsorted(), such assorted(record_list, key=lambda x: (x.base1["chain"], x.base1["resid"])).
- tuples can be compared, such as
phenix_res_from_seq_ss.py: convert secondary structure denotations to Phenix secondary structure restraints
- Access
phenix_res_from_seq_ss.pyby the link; - Use by
python phenix_res_from_seq_ss.py seq_file ss_file out_file, whereseq_fileandss_fileare SimRNA sequence file and secondary-structure file; - The script is dependent on
get_sticky_from_seq_ss.pyandphenix_na_ss.py. - Coding notes:
- Even use
from file_other improt some_function, all the codes infile_otherwill be executed, solution isif __name__ == '__main__':; - Directly assigning a new value to an object such as
listin function will not change its value outside of the function, solution isreturn.
- Even use
Other
ranking_energy_nanotiler.py: post-process Nanotiler output log files
- Access
ranking_energy_nanotiler.pyby the link; - Use by
python ranking_energy_nanotiler.py nt_outfileorpython ranking_energy_nanotiler.py nt_outfilename1 nt_outfilename2, where nt_outfile(1/2) is a Nanotiler output log file; - For the latter use, both files will be processed, and an extra “Energy_sum ranking” file of
.sumwill be generated. - The output files should contain the markers written by the nanotiler script files, which should contain:
# 100*"#" echo #################################################################################################### echo HBP1: ${HBP1} HBP2: ${HBP2} # 96*"#" echo ################################################################################################ - So the script can be modified based on the markers written by the Nanotiler script;
- Coding notes:
from pathlib import Pathto convert a string to aPathobject byp=Path(nt_outfilename), most commonly usedp.is_file(),p.stem, andp.suffix, for more info;- Regex: use
remodule’sre.compile()function to create Regex objects:float_re = re.compile(r"([+-]?)(\d+)\.(\d+)([Ee]?)([+-]?)(\d*)")&float_str = float_re.search(s_s_str).group(0)for search patterns of a float and get its string, phenix_na_ss.py also uses Regex; sorted(dict_energy.items(), key=lambda x: x[1])to get a list containing the items of a dictionary sorted by the values.
ranking_energy_nanotiler.py: search
- Access
RNA_complement_search.pyby the link; - Use by
python ranking_energy_nanotiler.py nt_outfileorpython ranking_energy_nanotiler.py nt_outfilename1 nt_outfilename2, where nt_outfile(1/2) is a Nanotiler output log file; - For the latter use, both files will be processed, and an extra “Energy_sum ranking” file of
.sumwill be generated. - The output files should contain the markers written by the nanotiler script files, which should contain:
#short seq_short1 (must be in a single line) seq_short2 #long seq_long (can be in multiple lines) #restriction GAAGAGC (can be multiple sequences in multiple lines) #start - So the script can be modified based on the markers written by the Nanotiler script;