Database (PyCoMdb)
PyCoMdb is divided into two files:
pycom.db: Database with protein annotation information (700MB)
pycom.mat: Coevolution matrices in HDF5 format (115GB)
Additionally, we provide the alignment files for each protein in the database. See: https://pycom.brunel.ac.uk/alignments/
Warning
Downloading the full database (including the alignment files) requires 1.2TB of disk space.
Instructions
Please download the files from: https://pycom.brunel.ac.uk/downloads/
The required files are:
Query keywords
Feature |
Description |
|
Example |
Additional information |
|
UniProt ID |
UniProt ID of the protein |
ID |
uniprot_id |
P0C9F6 |
|
Sequence |
Amino acid sequence |
SEQUENCE |
sequence |
AKLMPALTYDGHA… |
Partial match is not supported |
Sequence length |
Length of the aminoacid sequence |
MIN_LENGTH |
min_length |
100 |
|
MAX_LENGTH |
max_length |
350 |
|||
Neff |
Number of effective sequences in the alignment |
||||
Structure properties |
|||||
Helix (%) |
% of helix content in the structure |
MIN_HELIX |
min_helix |
2 |
|
MAX_HELIX |
max_helix |
75 |
|||
Turn (%) |
% of turn content in the structure |
MIN_TURN |
min_turn |
5 |
|
MAX_TURN |
max_turn |
10 |
|||
Strand (%) |
% of strand content in the structure |
MIN_STRAND |
min_strand |
30 |
|
MAX_STRAND |
max_strand |
40 |
|||
PDB |
PDB ID’s of known structures |
HAS_PDB1 |
has_pdb1 |
True/False |
|
Substrate |
Whether the protein has a known substrate |
HAS_SUBSTRATE1 |
has_substrate1 |
True/False |
|
CATH ID |
CATH classification of the protein |
CATH |
cath |
3.40.50.360 or 3.40.*.* or 3.* |
|
Cofactor |
Cofactors associated with the proteins |
COFACTOR |
cofactor |
Zn(2+) |
use pyc.get_cofactor_list() to get full list of co-factors |
Cofactor ID |
ID of the cofactors from CHEBI |
COFACTOR_ID |
cofactor_id |
CHEBI:00001 |
use pyc.get_cofactor_list() to get full list of co-factors |
Domain |
Domain associated with the protein |
DOMAIN |
domain |
zinc-finger |
use pyc.get_domain_list() for full list |
Ligand |
Ligand associated with the protein |
LIGAND |
ligand |
glucose |
pyc.get_ligand_list() for full list |
Modifications |
|||||
PTM |
Post-translational modification associated with the protein |
PTM |
ptm |
phosphoprotein |
use pyc.get_ptm_list() for full list |
Whether the protein has a known post-translational modification |
HAS_PTM1 |
has_ptm1 |
True/False |
||
Biological features |
|||||
Organism ID |
NCBI taxonomy ID of the genus/species |
ORGANISM_ID |
organism_id |
||
Organism |
Name of the genus/species |
ORGANISM*,2 |
organism*,2 |
:homo: or homo |
pyc.get_organism_list() for full list); Surround with : to get precise results, for example :homo: returns Homo sapiens & Homo sapiens neanderthalensis), while homo also returns homoeomma, thomomys, and hundreds others |
Enzyme Commission number |
Enzyme Commission number of the protein |
ENZYME |
enzyme |
1.3.1.3 or 1.3.*.* or 1.* |
|
Biological Process |
Biological process associated with the protein |
BIOLOGICAL_PROCESS |
biological_process |
antiviral defense |
use pyc.get_biological_process_list() for full list |
Cellular Component |
Cellular component associated with the protein |
CELLULAR_COMPONENT |
cellular_component |
nucleus |
use pyc.get_cellular_component_list() for full list |
Developmental Stage |
DEVELOPMENTAL_STAGE |
||||
Molecular Function |
Molecular function associated with the protein |
MOLECULAR_FUNCTION |
molecular_function |
antioxidant activity |
use pyc.get_molecular_function_list() for full list |
Disease |
The disease linked with the protein |
DISEASE* |
disease* |
cancer |
pyc.get_disease_list() to get full list of diseases |
HAS_DISEASE1 |
has_disease1 |
True/False |
|||
The ID of the disease associated with the protein |
DISEASE_ID |
disease_id |
DI-02205 |
pyc.get_disease_list() to get full list of diseases |
|
Disease ID |