PyCoM - Protein Coevolution Database (API & Python Library)
PyCoM is a Python API and libray to the Coevolution Matrix database (PyCoMdb), containing 457,622 annotated proteins, from UniProtKB/Swiss-Prot with their coevolution matrices calculated using HH-suite3 and CCMpred.
We provide you simple tools written in Python to query, extract and visualise data for your choice of protein(s).
The code is open source and available on GitHub.
Installing is as simple as running:
pip3 install git+https://github.com/scdantu/pycomAnd you are ready to run the code!
from pycom import PyCom, CoMAnalysis import matplotlib.pyplot as plt pyc = PyCom(remote=True) prots = pyc.find( # search for proteins with: min_length=200, # 200-210 residues max_length=210, disease='cancer', # that are associated with cancer has_substrate=True, # have a known substrate page=1, matrix=True # and load their coevolution matrices ) CoMAnalysis().add_contact_predictions(prots) # add contact predictions plt.axis('off'); plt.title(f'Contact Map for uniprot_id={prots.uniprot_id[0]}') plt.imshow(prots.contact_matrix[0]) # plot the contact map print(prots.iloc[0]) # print the protein's details
uniprot_id P62070
neff 12.754
sequence_length 204
sequence MAAAGWRDGSGQEK...
organism_id 9606
helix_frac 0.29902
turn_frac 0.019608
strand_frac 0.220588
has_ptm 1
has_pdb 1
has_substrate 1
matrix [[0.0, 0.268, ...
contact_matrix [[0.0, 0.0, ...
Name: 0, dtype: object
|
Click here if the image is not showing.
Features
PyCoM breaks down into:
PyCoMdb: A database of coevolution matrices for proteins from UniProtKB/Swiss-Prot, with 457,622 annotated proteins.
Available at: https://pycom.brunel.ac.uk/downloads/
PyCoM Python Library: A Python API to query, extract and visualise coevolution matrices from PyCoMdb
Available at: https://github.com/scdantu/pycom
Guides: Getting Started
This library supports:
Querying - Search for proteins based on various criteria
Coevolution Matrices - Load coevolution matrices for proteins
Analysis - A set of tools for performing analysis on coevolution matrices Analysis Tutorial
PDB&AlphaFold PDB parsing/analysis - A set of tools for parsing and analysing PDB files. PDB Tutorial
PyCoM API: A RESTful API to query, extract and visualise coevolution matrices from PyCoMdb
Available at: https://pycom.brunel.ac.uk/api/
Alignment File Repository: A repository of 370,000+ alignment files, available for download
If you are more interested in alignment data (rather than Protein Residue-Residue Contacts), we also provide these at: https://pycom.brunel.ac.uk/alignments/
The parameters for generating the alignment using HH-suite3 can be found in Kamisetty et al. 2013.
How to Cite
If you have found PyCoM useful for your research work please cite the following article published in Bioinformatics journal:
Philipp Bibik, Sabriyeh Alibai, Alessandro Pandini, Sarath Chandra Dantu, PyCoM: a python library for large-scale analysis of residue–residue coevolution data, Bioinformatics, Volume 40, Issue 4, April 2024, btae16. web link