PyCoM - Protein Coevolution Database (API & Python Library)

PyCoM is a Python API and libray to the Coevolution Matrix database (PyCoMdb), containing 457,622 annotated proteins, from UniProtKB/Swiss-Prot with their coevolution matrices calculated using HH-suite3 and CCMpred.

We provide you simple tools written in Python to query, extract and visualise data for your choice of protein(s).

The code is open source and available on GitHub.

Installing is as simple as running:

pip3 install git+https://github.com/scdantu/pycom

And you are ready to run the code!

from pycom import PyCom, CoMAnalysis
import matplotlib.pyplot as plt

pyc = PyCom(remote=True)
prots = pyc.find(        # search for proteins with:
    min_length=200,      # 200-210 residues
    max_length=210,
    disease='cancer',    # that are associated with cancer
    has_substrate=True,  # have a known substrate
    page=1,
    matrix=True          # and load their coevolution matrices
)

CoMAnalysis().add_contact_predictions(prots)  # add contact predictions

plt.axis('off'); plt.title(f'Contact Map for uniprot_id={prots.uniprot_id[0]}')
plt.imshow(prots.contact_matrix[0])  # plot the contact map

print(prots.iloc[0])  # print the protein's details
Output:
Output of the code above
uniprot_id                    P62070
neff                          12.754
sequence_length                  204
sequence           MAAAGWRDGSGQEK...
organism_id                     9606
helix_frac                   0.29902
turn_frac                   0.019608
strand_frac                 0.220588
has_ptm                            1
has_pdb                            1
has_substrate                      1
matrix             [[0.0, 0.268, ...
contact_matrix     [[0.0, 0.0,   ...
Name: 0, dtype: object

Click here if the image is not showing.

Features

PyCoM breaks down into:

If you are more interested in alignment data (rather than Protein Residue-Residue Contacts), we also provide these at: https://pycom.brunel.ac.uk/alignments/

The parameters for generating the alignment using HH-suite3 can be found in Kamisetty et al. 2013.

How to Cite

If you have found PyCoM useful for your research work please cite the following article published in Bioinformatics journal:

Philipp Bibik, Sabriyeh Alibai, Alessandro Pandini, Sarath Chandra Dantu, PyCoM: a python library for large-scale analysis of residue–residue coevolution data, Bioinformatics, Volume 40, Issue 4, April 2024, btae16. web link

Authors

*Brunel University London, UK

Indices and tables