Database (PyCoMdb)

PyCoMdb is divided into two files:

pycom.db: Database with protein annotation information (700MB)
pycom.mat: Coevolution matrices in HDF5 format (115GB)

Additionally, we provide the alignment files for each protein in the database. See: https://pycom.brunel.ac.uk/alignments/

Warning

Downloading the full database (including the alignment files) requires 1.2TB of disk space.

Instructions

Please download the files from: https://pycom.brunel.ac.uk/downloads/

The required files are:

pycom.db (to query the database)

pycom.mat (to load the coevolution matrices)

Please note that you can only access individual alignment files. Instructions on how to download the alignment files are available from the tutorial.

Query keywords

Feature	Description	Query keyword =====================+==================== Local/Remote API \| Web API		Example	Additional information
Feature	Description			Example	Additional information
UniProt ID	UniProt ID of the protein	ID	uniprot_id	P0C9F6
Sequence	Amino acid sequence	SEQUENCE	sequence	AKLMPALTYDGHA…	Partial match is not supported
Sequence length	Length of the aminoacid sequence	MIN_LENGTH	min_length	100
Sequence length	Length of the aminoacid sequence	MAX_LENGTH	max_length	350
Neff	Number of effective sequences in the alignment
Structure properties
Helix (%)	% of helix content in the structure	MIN_HELIX	min_helix	2
Helix (%)	% of helix content in the structure	MAX_HELIX	max_helix	75
Turn (%)	% of turn content in the structure	MIN_TURN	min_turn	5
Turn (%)	% of turn content in the structure	MAX_TURN	max_turn	10
Strand (%)	% of strand content in the structure	MIN_STRAND	min_strand	30
Strand (%)	% of strand content in the structure	MAX_STRAND	max_strand	40
PDB	PDB ID’s of known structures	HAS_PDB1	has_pdb1	True/False
Substrate	Whether the protein has a known substrate	HAS_SUBSTRATE1	has_substrate1	True/False
CATH ID	CATH classification of the protein	CATH	cath	3.40.50.360 or 3.40.. or 3.*
Cofactor	Cofactors associated with the proteins	COFACTOR	cofactor	Zn(2+)	use pyc.get_cofactor_list() to get full list of co-factors
Cofactor ID	ID of the cofactors from CHEBI	COFACTOR_ID	cofactor_id	CHEBI:00001	use pyc.get_cofactor_list() to get full list of co-factors
Domain	Domain associated with the protein	DOMAIN	domain	zinc-finger	use pyc.get_domain_list() for full list
Ligand	Ligand associated with the protein	LIGAND	ligand	glucose	pyc.get_ligand_list() for full list
Modifications
PTM	Post-translational modification associated with the protein	PTM	ptm	phosphoprotein	use pyc.get_ptm_list() for full list
PTM	Whether the protein has a known post-translational modification	HAS_PTM1	has_ptm1	True/False
Biological features
Organism ID	NCBI taxonomy ID of the genus/species	ORGANISM_ID	organism_id
Organism	Name of the genus/species	ORGANISM*,2	organism*,2	:homo: or homo	pyc.get_organism_list() for full list); Surround with : to get precise results, for example :homo: returns Homo sapiens & Homo sapiens neanderthalensis), while homo also returns homoeomma, thomomys, and hundreds others
Organism	Name of the genus/species	ORGANISM*,2	organism*,2	:homo: or homo
Enzyme Commission number	Enzyme Commission number of the protein	ENZYME	enzyme	1.3.1.3 or 1.3.. or 1.*
Biological Process	Biological process associated with the protein	BIOLOGICAL_PROCESS	biological_process	antiviral defense	use pyc.get_biological_process_list() for full list
Biological Process	Biological process associated with the protein	BIOLOGICAL_PROCESS	biological_process	antiviral defense	use pyc.get_biological_process_list() for full list
Cellular Component	Cellular component associated with the protein	CELLULAR_COMPONENT	cellular_component	nucleus	use pyc.get_cellular_component_list() for full list
Developmental Stage		DEVELOPMENTAL_STAGE
Molecular Function	Molecular function associated with the protein	MOLECULAR_FUNCTION	molecular_function	antioxidant activity	use pyc.get_molecular_function_list() for full list
Molecular Function	Molecular function associated with the protein	MOLECULAR_FUNCTION	molecular_function	antioxidant activity	use pyc.get_molecular_function_list() for full list
Disease	The disease linked with the protein	DISEASE*	disease*	cancer	pyc.get_disease_list() to get full list of diseases
Disease	The disease linked with the protein	HAS_DISEASE1	has_disease1	True/False	pyc.get_disease_list() to get full list of diseases
	The ID of the disease associated with the protein	DISEASE_ID	disease_id	DI-02205	pyc.get_disease_list() to get full list of diseases
Disease ID	The ID of the disease associated with the protein	DISEASE_ID	disease_id	DI-02205	pyc.get_disease_list() to get full list of diseases