Database (PyCoMdb)

PyCoMdb is divided into two files:

  1. pycom.db: Database with protein annotation information (700MB)

  2. pycom.mat: Coevolution matrices in HDF5 format (115GB)

Additionally, we provide the alignment files for each protein in the database. See: https://pycom.brunel.ac.uk/alignments/

Warning

Downloading the full database (including the alignment files) requires 1.2TB of disk space.

Instructions

Please download the files from: https://pycom.brunel.ac.uk/downloads/

The required files are:

  1. pycom.db (to query the database)

  2. pycom.mat (to load the coevolution matrices)

  3. Please note that you can only access individual alignment files. Instructions on how to download the alignment files are available from the tutorial.

Query keywords

Feature

Description

Query keyword

=====================+====================

Local/Remote API | Web API

Example

Additional information

UniProt ID

UniProt ID of the protein

ID

uniprot_id

P0C9F6

Sequence

Amino acid sequence

SEQUENCE

sequence

AKLMPALTYDGHA…

Partial match is not supported

Sequence length

Length of the aminoacid sequence

MIN_LENGTH

min_length

100

MAX_LENGTH

max_length

350

Neff

Number of effective sequences in the alignment

Structure properties

Helix (%)

% of helix content in the structure

MIN_HELIX

min_helix

2

MAX_HELIX

max_helix

75

Turn (%)

% of turn content in the structure

MIN_TURN

min_turn

5

MAX_TURN

max_turn

10

Strand (%)

% of strand content in the structure

MIN_STRAND

min_strand

30

MAX_STRAND

max_strand

40

PDB

PDB ID’s of known structures

HAS_PDB1

has_pdb1

True/False

Substrate

Whether the protein has a known substrate

HAS_SUBSTRATE1

has_substrate1

True/False

CATH ID

CATH classification of the protein

CATH

cath

3.40.50.360 or 3.40.*.* or 3.*

Cofactor

Cofactors associated with the proteins

COFACTOR

cofactor

Zn(2+)

use pyc.get_cofactor_list() to get full list of co-factors

Cofactor ID

ID of the cofactors from CHEBI

COFACTOR_ID

cofactor_id

CHEBI:00001

use pyc.get_cofactor_list() to get full list of co-factors

Domain

Domain associated with the protein

DOMAIN

domain

zinc-finger

use pyc.get_domain_list() for full list

Ligand

Ligand associated with the protein

LIGAND

ligand

glucose

pyc.get_ligand_list() for full list

Modifications

PTM

Post-translational modification associated with the protein

PTM

ptm

phosphoprotein

use pyc.get_ptm_list() for full list

Whether the protein has a known post-translational modification

HAS_PTM1

has_ptm1

True/False

Biological features

Organism ID

NCBI taxonomy ID of the genus/species

ORGANISM_ID

organism_id

Organism

Name of the genus/species

ORGANISM*,2

organism*,2

:homo: or homo

pyc.get_organism_list() for full list); Surround with : to get precise results, for example :homo: returns Homo sapiens & Homo sapiens neanderthalensis), while homo also returns homoeomma, thomomys, and hundreds others

Enzyme Commission number

Enzyme Commission number of the protein

ENZYME

enzyme

1.3.1.3 or 1.3.*.* or 1.*

Biological Process

Biological process associated with the protein

BIOLOGICAL_PROCESS

biological_process

antiviral defense

use pyc.get_biological_process_list() for full list

Cellular Component

Cellular component associated with the protein

CELLULAR_COMPONENT

cellular_component

nucleus

use pyc.get_cellular_component_list() for full list

Developmental Stage

DEVELOPMENTAL_STAGE

Molecular Function

Molecular function associated with the protein

MOLECULAR_FUNCTION

molecular_function

antioxidant activity

use pyc.get_molecular_function_list() for full list

Disease

The disease linked with the protein

DISEASE*

disease*

cancer

pyc.get_disease_list() to get full list of diseases

HAS_DISEASE1

has_disease1

True/False

The ID of the disease associated with the protein

DISEASE_ID

disease_id

DI-02205

pyc.get_disease_list() to get full list of diseases

Disease ID