Tutorial 1: Workflow of PyCoM
In this tutorial, with a small example, you will learn how to create a workflow with the local version of PyCoM, using the steps listed below:
Setup
The assumption is that you have completed the installation and downloaded the database. For help on this please look at the quick guide here.
Initalise pycom objects
First, lets import all the libraries and classes we need from pycom, pandas, matplotlib, and numpy
[14]:
# importing all usefull classes from PyCoM
from pycom import PyCom, ProteinParams,CoMAnalysis
import pandas as pd
import numpy as np
# matplotlib; useful for plotting later
import matplotlib
import matplotlib.pyplot as plt
#setting matplotlib parameters
matplotlib.rcParams['pdf.fonttype'] = 42
matplotlib.rcParams['font.family'] = "sans-serif"
matplotlib.rcParams['font.sans-serif'] = "Arial"
[15]:
#set the path to the database
database_folder_path="/Volumes/mason/Work/Sarath/Research/pycom/"
#matrix file name and path
file_matrix_db = database_folder_path+"pycom.mat"
#protein database file name and path
file_protein_db= database_folder_path+"pycom.db"
[16]:
obj_pycom = PyCom(db_path=file_protein_db, mat_path=file_matrix_db)
Create a query dictionary
To query the database, we need to create a dictionary object query_parameters
using the keywords for our choice of properties. For the full list of keywords please check
Empty ``query_parameters`` will return information on all the ~457,000 proteins in the database
To query the database, we need to create a dictionary object query_parameters
using the keywords for our choice of properties. For the full list of keywords please check
Empty ``query_parameters`` will return information on all the ~457,000 proteins in the database
[17]:
#creating empty query dictionary
query_parameters={}
[18]:
# Here we are asking for all the proteins that match the enzyme class 3 and have been associated with the disease cancer.
query_parameters={ProteinParams.DISEASE:"cancer",
ProteinParams.ENZYME: '3.*',
ProteinParams.MIN_LENGTH: 100,
ProteinParams.MAX_LENGTH: 200,
}
Executing the query with the parameters defined in the above cell using the pycom object obj_pycom
find()
function will return a pandas dataframe with the search results containing information about all the proteins which match our query.
[19]:
entries_data_frame=obj_pycom.find(query_parameters)
Save and retrieve progress
We can save and retreive our progress by saving our dataframe with information on our favourite proteins by saving it to a csv file.
Save the query to a csv file
To avoid rerunning the query we can cave the progress to a csv file.
[20]:
entries_data_frame.to_csv("output/DB_Query_Results.csv",index=False)
Read query data from csv file
Retrieving our progress from the csv file.
[21]:
#entries_data_frame=pd.read_csv("output/DB_Query_Results.csv")
Analyse search results
The search returns a pandas data frame with proteins matching the query critiera with following information for each protein:
uniprot_id: Uniprot ID
neff: Depth of the sequence alignment \(N_{eff}\)
sequence_length: Sequence length
sequence: protein sequence
organism_id: Organism ID
helic_frac, turn_frac, strand_frac: helix, turn, and strand structure fraction
has_ptm: Has a PTM Yes/No
has_pdb: Has a PDB structure Yes/No
has_substrate: Has a substrate for biological activity Yes/No
matrix: coevolution matrix column is empty because at this stage we would still want you to check the search results and if required filter them based on any of the biological properties before loading the matrices.
First, look what columns we have and their names.
[22]:
entries_data_frame.head()
[22]:
uniprot_id | neff | sequence_length | sequence | organism_id | helix_frac | turn_frac | strand_frac | has_ptm | has_pdb | has_substrate | matrix | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | P01111 | 12.817 | 189 | MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... | 9606 | 0.349206 | 0.015873 | 0.227513 | 1 | 1 | 1 | None |
1 | P01112 | 12.841 | 189 | MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... | 9606 | 0.317460 | 0.031746 | 0.359788 | 1 | 1 | 1 | None |
2 | P01116 | 12.626 | 189 | MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... | 9606 | 0.375661 | 0.031746 | 0.328042 | 1 | 1 | 1 | None |
describe()
function from pandas can be used to get a summary of all the features:
[23]:
entries_data_frame.describe(include="all")
[23]:
uniprot_id | neff | sequence_length | sequence | organism_id | helix_frac | turn_frac | strand_frac | has_ptm | has_pdb | has_substrate | matrix | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 3 | 3.000000 | 3.0 | 3 | 3 | 3.000000 | 3.000000 | 3.000000 | 3.0 | 3.0 | 3.0 | 0 |
unique | 3 | NaN | NaN | 3 | 1 | NaN | NaN | NaN | NaN | NaN | NaN | 0 |
top | P01111 | NaN | NaN | MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... | 9606 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
freq | 1 | NaN | NaN | 1 | 3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
mean | NaN | 12.761333 | 189.0 | NaN | NaN | 0.347443 | 0.026455 | 0.305115 | 1.0 | 1.0 | 1.0 | NaN |
std | NaN | 0.117815 | 0.0 | NaN | NaN | 0.029141 | 0.009164 | 0.069054 | 0.0 | 0.0 | 0.0 | NaN |
min | NaN | 12.626000 | 189.0 | NaN | NaN | 0.317460 | 0.015873 | 0.227513 | 1.0 | 1.0 | 1.0 | NaN |
25% | NaN | 12.721500 | 189.0 | NaN | NaN | 0.333333 | 0.023810 | 0.277778 | 1.0 | 1.0 | 1.0 | NaN |
50% | NaN | 12.817000 | 189.0 | NaN | NaN | 0.349206 | 0.031746 | 0.328042 | 1.0 | 1.0 | 1.0 | NaN |
75% | NaN | 12.829000 | 189.0 | NaN | NaN | 0.362434 | 0.031746 | 0.343915 | 1.0 | 1.0 | 1.0 | NaN |
max | NaN | 12.841000 | 189.0 | NaN | NaN | 0.375661 | 0.031746 | 0.359788 | 1.0 | 1.0 | 1.0 | NaN |
Get counts of categorical data in the column, for example: * number of proteins with a known PDB structure * number of proteins with a known PTM
[24]:
entries_data_frame['has_pdb'].value_counts()
[24]:
has_pdb
1 3
Name: count, dtype: int64
Find number of unique elements in the column, for example number of unique organisms:
[25]:
entries_data_frame['organism_id'].unique()
[25]:
array(['9606'], dtype=object)
All the sequences are from the same organism, 9606
i.e. from Homo sapiens
. Full list is available from UniProt
Some statistics on the numerical column, for example neff
:
[26]:
entries_data_frame["neff"].describe()
[26]:
count 3.000000
mean 12.761333
std 0.117815
min 12.626000
25% 12.721500
50% 12.817000
75% 12.829000
max 12.841000
Name: neff, dtype: float64
We can also use other functions to get some of the information
[27]:
entries_data_frame["neff"].min()
[27]:
12.626
[28]:
entries_data_frame["neff"].mean()
[28]:
12.761333333333333
Add biological features
Initialise the object loader class and then call each add function
Add Enzyme Classification
Add CATH Class
Add Co-factors
Add PTM
Add Diseases
For a protein entry if the requested data (EC/CATH/Cofactors…) does not exist, corresponding entry in that column will be nan
. We can filter such rows as shown further below.
Please note the dataloader functions will not work with remote version of PyCoM
[29]:
#initialise the object for data loader class
obj_data_loader=obj_pycom.get_data_loader()
[30]:
#add enzyme commission data to the dataframe
entries_data_frame=obj_data_loader.add_enzyme_commission(entries_data_frame,force_single_entry=False)
#add CATH data to the dataframe
entries_data_frame=obj_data_loader.add_cath_class(entries_data_frame,force_single_entry=False)
#add CATH data to the dataframe
entries_data_frame=obj_data_loader.add_pdbs(entries_data_frame,force_single_entry=False)
#get list of all cofactors for each protein
entries_data_frame=obj_data_loader.add_cofactors(entries_data_frame,force_single_entry=False)
#get list of all PTM's for each protein
entries_data_frame=obj_data_loader.add_ptm(entries_data_frame,force_single_entry=False)
#get list of all diseases for each protein
entries_data_frame=obj_data_loader.add_diseases(entries_data_frame,force_single_entry=False)
[31]:
#get substrates for the proteins
entries_data_frame=obj_data_loader.add_ligand(entries_data_frame,force_single_entry=False)
Save the progress to a csv file
As we have added a lot of information to our dataframe, let’s save our progress so that we can restart from this point, in future, if required.
[32]:
entries_data_frame.to_csv("output/DB_Query_Results_With_Details.csv",index=False)
Some Statistics
Let’s look at some statistics for all the columns in the dataframe. Below are some examples of how you can do some fun things with the dataframe.
[33]:
#include=all will also include columns with 'nan' entries
entries_data_frame.describe(include="all")
[33]:
uniprot_id | neff | sequence_length | sequence | organism_id | helix_frac | turn_frac | strand_frac | has_ptm | has_pdb | has_substrate | matrix | enzyme_commission | cath_class | pdb_id | cofactor | ptm | disease_name | disease_id | ligand | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 3 | 3.000000 | 3.0 | 3 | 3 | 3.000000 | 3.000000 | 3.000000 | 3.0 | 3.0 | 3.0 | 0 | 3 | 3 | 3 | 0 | 0 | 3 | 3 | 3 |
unique | 3 | NaN | NaN | 3 | 1 | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 1 | 1 | 3 | 0 | 0 | 3 | 3 | 1 |
top | P01111 | NaN | NaN | MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... | 9606 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | [3.6.5.2] | [3.40.50.300] | [2N9C, 3CON, 5UHV, 6E6H, 6MPP, 6ULI, 6ULK, 6UL... | NaN | NaN | [Leukemia, juvenile myelomonocytic, Noonan syn... | [DI-01851, DI-02558, DI-03381, DI-04099, DI-04... | [GTP-binding, Nucleotide-binding] |
freq | 1 | NaN | NaN | 1 | 3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3 | 3 | 1 | NaN | NaN | 1 | 1 | 3 |
mean | NaN | 12.761333 | 189.0 | NaN | NaN | 0.347443 | 0.026455 | 0.305115 | 1.0 | 1.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
std | NaN | 0.117815 | 0.0 | NaN | NaN | 0.029141 | 0.009164 | 0.069054 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
min | NaN | 12.626000 | 189.0 | NaN | NaN | 0.317460 | 0.015873 | 0.227513 | 1.0 | 1.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
25% | NaN | 12.721500 | 189.0 | NaN | NaN | 0.333333 | 0.023810 | 0.277778 | 1.0 | 1.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
50% | NaN | 12.817000 | 189.0 | NaN | NaN | 0.349206 | 0.031746 | 0.328042 | 1.0 | 1.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
75% | NaN | 12.829000 | 189.0 | NaN | NaN | 0.362434 | 0.031746 | 0.343915 | 1.0 | 1.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
max | NaN | 12.841000 | 189.0 | NaN | NaN | 0.375661 | 0.031746 | 0.359788 | 1.0 | 1.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Find unique ligands and count them:
[34]:
entries_data_frame['ligand'].value_counts()
[34]:
ligand
[GTP-binding, Nucleotide-binding] 3
Name: count, dtype: int64
[35]:
# Filter the search results where we have a ligand interacting with the protein
df_results_with_ligand=entries_data_frame[entries_data_frame['ligand'].notna()]
Count columns without ‘nan’ entries
[36]:
#Number of proteins with pdb data
df_results_with_ligand["pdb_id"].notna().sum()
[36]:
3
Coevolution matrix analysis
Load the matrix
Lets get the coevolution matrix for the filtered dataframe df_results_with_ligand
using the load_matrices()
from obj_pycom
[37]:
df_results_with_ligand=obj_pycom.load_matrices(df_results_with_ligand)
[38]:
df_results_with_ligand
[38]:
uniprot_id | neff | sequence_length | sequence | organism_id | helix_frac | turn_frac | strand_frac | has_ptm | has_pdb | has_substrate | matrix | enzyme_commission | cath_class | pdb_id | cofactor | ptm | disease_name | disease_id | ligand | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | P01111 | 12.817 | 189 | MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... | 9606 | 0.349206 | 0.015873 | 0.227513 | 1 | 1 | 1 | [[0.0, 0.5163763761520386, 0.3219393491744995,... | [3.6.5.2] | [3.40.50.300] | [2N9C, 3CON, 5UHV, 6E6H, 6MPP, 6ULI, 6ULK, 6UL... | NaN | NaN | [Leukemia, juvenile myelomonocytic, Noonan syn... | [DI-01851, DI-02558, DI-03381, DI-04099, DI-04... | [GTP-binding, Nucleotide-binding] |
1 | P01112 | 12.841 | 189 | MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... | 9606 | 0.317460 | 0.031746 | 0.359788 | 1 | 1 | 1 | [[0.0, 0.5560339689254761, 0.34521734714508057... | [3.6.5.2] | [3.40.50.300] | [121P, 1AA9, 1AGP, 1BKD, 1CLU, 1CRP, 1CRQ, 1CR... | NaN | NaN | [Costello syndrome, Congenital myopathy with e... | [DI-01437, DI-01411, DI-04532, DI-02612, DI-03... | [GTP-binding, Nucleotide-binding] |
2 | P01116 | 12.626 | 189 | MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... | 9606 | 0.375661 | 0.031746 | 0.328042 | 1 | 1 | 1 | [[0.0, 0.38467222452163696, 0.3104382753372192... | [3.6.5.2] | [3.40.50.300] | [1D8D, 1D8E, 1KZO, 1KZP, 1N4P, 1N4Q, 1N4R, 1N4... | NaN | NaN | [Leukemia, acute myelogenous, Leukemia, juveni... | [DI-01171, DI-01851, DI-02073, DI-02971, DI-03... | [GTP-binding, Nucleotide-binding] |
Normalise/Scale the matrix
Scaled matrix (:math:`S_{i}`): Coevolution Matrices (\(C_{i}\)) have to be scaled by average \(\langle{C_{i}}\rangle\), all values < \(\langle{C_{i}}\rangle\) are set to 0.
Normalised matrix (:math:`N_{i}`): For comparing scaled coevolution scores across multiple proteins, we can normalise the values of all matrices \({S_{i}...S_{n}}\), by dividing them by the \(\max({S_{i}...S_{n}})\).
These operations can be performed by using object from CoMAnalysis
class.
[39]:
#initialise CoMAnalysis class object
obj_com_analysis=CoMAnalysis()
[40]:
df_results_with_ligand_matrix=obj_com_analysis.scale_and_normalise_coevolution_matrices(df_results_with_ligand)
[41]:
df_results_with_ligand_matrix.head()
[41]:
uniprot_id | neff | sequence_length | sequence | organism_id | helix_frac | turn_frac | strand_frac | has_ptm | has_pdb | ... | enzyme_commission | cath_class | pdb_id | cofactor | ptm | disease_name | disease_id | ligand | matrix_S | matrix_N | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | P01111 | 12.817 | 189 | MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... | 9606 | 0.349206 | 0.015873 | 0.227513 | 1 | 1 | ... | [3.6.5.2] | [3.40.50.300] | [2N9C, 3CON, 5UHV, 6E6H, 6MPP, 6ULI, 6ULK, 6UL... | NaN | NaN | [Leukemia, juvenile myelomonocytic, Noonan syn... | [DI-01851, DI-02558, DI-03381, DI-04099, DI-04... | [GTP-binding, Nucleotide-binding] | [[0.0, 2.4990853333930714, 1.5580765172867141,... | [[0.0, 0.18836677642207234, 0.1174389073708615... |
1 | P01112 | 12.841 | 189 | MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... | 9606 | 0.317460 | 0.031746 | 0.359788 | 1 | 1 | ... | [3.6.5.2] | [3.40.50.300] | [121P, 1AA9, 1AGP, 1BKD, 1CLU, 1CRP, 1CRQ, 1CR... | NaN | NaN | [Costello syndrome, Congenital myopathy with e... | [DI-01437, DI-01411, DI-04532, DI-02612, DI-03... | [GTP-binding, Nucleotide-binding] | [[0.0, 2.4841366766214805, 1.5422925960913767,... | [[0.0, 0.18724003206873668, 0.1162492055567066... |
2 | P01116 | 12.626 | 189 | MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... | 9606 | 0.375661 | 0.031746 | 0.328042 | 1 | 1 | ... | [3.6.5.2] | [3.40.50.300] | [1D8D, 1D8E, 1KZO, 1KZP, 1N4P, 1N4Q, 1N4R, 1N4... | NaN | NaN | [Leukemia, acute myelogenous, Leukemia, juveni... | [DI-01171, DI-01851, DI-02073, DI-02971, DI-03... | [GTP-binding, Nucleotide-binding] | [[0.0, 1.8285180440964264, 1.4756510916226846,... | [[0.0, 0.1378232447662731, 0.1112261496390277,... |
3 rows × 22 columns
matrix_S
column contains the \(S_{i}\) matrix and the matrix_N
column contains \(N_{i}\)
Plot the matrix and save it
Lets plot the matrix for the first (index is 0)
protein in the dataframe.
[42]:
plt.imshow(df_results_with_ligand_matrix.loc[0,'matrix_S'],cmap='Blues')
plt.colorbar()
file_name="output/%s_Scaled_Matrix.png"%(df_results_with_ligand_matrix.loc[0,'uniprot_id'])
plt.savefig(file_name,dpi=300)
[43]:
plt.imshow(df_results_with_ligand_matrix.loc[0,'matrix_N'],cmap='Blues')
plt.colorbar()
file_name="output/%s_Normalised_Matrix.png"%(df_results_with_ligand_matrix.loc[0,'uniprot_id'])
plt.savefig(file_name,dpi=300)
Save the list of top coevolution pairs
Residues with scores close to 1 have the strongest evolutionary relationship, and close to 0 may not be covarying during evolution.
The top scoring residues are sorted in descending order and saved to an ASCII file for further interpretation.
[44]:
obj_com_analysis.save_top_scoring_residue_pairs(df_results_with_ligand_matrix,data_folder="output",matrix_type="matrix_N",res_gap=5,percentile=95)
Saved file : output/P01111_Pairs.txt
Saved file : output/P01112_Pairs.txt
Saved file : output/P01116_Pairs.txt
Help on UniProt Controlled Vocabulary
For each biological feature category, UniProtKB/Swiss-Prot has a curated list of keywords. To search using those keywords or ID’s and help you find them we have some helper functions that will help you find them:
Function get_cofactor_list()
from pycom
will get you list of cofactors. You can either use the cofactorId
or the cofactorName
.
[45]:
# list of cofactors
cofactors = obj_pycom.get_cofactor_list()
cofactors
[45]:
cofactorId | cofactorName | |
---|---|---|
0 | CHEBI:597326 | pyridoxal 5'-phosphate |
1 | CHEBI:18420 | Mg(2+) |
2 | CHEBI:60240 | a divalent metal cation |
3 | CHEBI:30413 | heme |
4 | CHEBI:29105 | Zn(2+) |
... | ... | ... |
109 | CHEBI:61721 | chlorophyll b |
110 | CHEBI:73095 | divinyl chlorophyll a |
111 | CHEBI:73096 | divinyl chlorophyll b |
112 | CHEBI:57453 | (6S)-5,6,7,8-tetrahydrofolate |
113 | CHEBI:30402 | tungstopterin |
114 rows × 2 columns
Function get_disease_list()
from pycom
will get you list of diseases. You can either use the diseaseId
or the diseaseName
.
[46]:
# list of diseases
diseases = obj_pycom.get_disease_list()
diseases
[46]:
diseaseId | diseaseName | |
---|---|---|
0 | DI-04420 | Intellectual developmental disorder, autosomal... |
1 | DI-00085 | Alzheimer disease 1 |
2 | DI-00097 | Cerebral amyloid angiopathy, APP-related |
3 | DI-00262 | Chanarin-Dorfman syndrome |
4 | DI-01042 | Spastic paraplegia 42, autosomal dominant |
... | ... | ... |
6039 | DI-05800 | Wieacker-Wolff syndrome, female-restricted |
6040 | DI-01041 | Spastic paraplegia 33, autosomal dominant |
6041 | DI-05703 | Neurodevelopmental disorder with dysmorphic fa... |
6042 | DI-06050 | Intellectual developmental disorder, autosomal... |
6043 | DI-04662 | Paget disease of bone 6 |
6044 rows × 2 columns
Function get_organism_list()
from pycom
will get you list of diseases. You can either use the organismId
or the nameScientific
or nameCommon
or any categories in the taxonomy
.
[47]:
# list of organisms
organisms = obj_pycom.get_organism_list()
organisms
[47]:
organismId | nameScientific | nameCommon | taxonomy | |
---|---|---|---|---|
0 | 561445 | African swine fever virus (isolate Pig/Kenya/K... | ASFV | :Viruses:Varidnaviria:Bamfordvirae:Nucleocytov... |
1 | 10500 | African swine fever virus (isolate Tick/Malawi... | ASFV | :Viruses:Varidnaviria:Bamfordvirae:Nucleocytov... |
2 | 561443 | African swine fever virus (isolate Tick/South ... | ASFV | :Viruses:Varidnaviria:Bamfordvirae:Nucleocytov... |
3 | 561444 | African swine fever virus (isolate Warthog/Nam... | ASFV | :Viruses:Varidnaviria:Bamfordvirae:Nucleocytov... |
4 | 10498 | African swine fever virus (strain Badajoz 1971... | Ba71V | :Viruses:Varidnaviria:Bamfordvirae:Nucleocytov... |
... | ... | ... | ... | ... |
14316 | 31581 | Rotavirus A (isolate RVA/Pig/Australia/TFR-41/... | RV-A | :Viruses:Riboviria:Orthornavirae:Duplornaviric... |
14317 | 31579 | Rotavirus A (isolate RVA/Pig/Australia/BEN144/... | RV-A | :Viruses:Riboviria:Orthornavirae:Duplornaviric... |
14318 | 10918 | Rotavirus A (strain RVA/Pig/Russia/K/1987) | RV-A | :Viruses:Riboviria:Orthornavirae:Duplornaviric... |
14319 | 31580 | Rotavirus A (isolate RVA/Pig/Australia/BMI-1/1... | RV-A | :Viruses:Riboviria:Orthornavirae:Duplornaviric... |
14320 | 47664 | Populus tremula x Populus tremuloides | Hybrid aspen | :Eukaryota:Viridiplantae:Streptophyta:Embryoph... |
14321 rows × 4 columns
[48]:
#full list of helper functions to get searchable terms on other biological categories
obj_pycom.get_biological_process_list()
obj_pycom.get_cellular_component_list()
obj_pycom.get_developmental_stage_list()
obj_pycom.get_domain_list()
obj_pycom.get_ligand_list()
obj_pycom.get_ptm_list()
obj_pycom.get_molecular_function_list()
[48]:
name | |
---|---|
0 | Acetylcholine receptor inhibiting toxin |
1 | Actin-binding |
2 | Activator |
3 | Acyltransferase |
4 | Allosteric enzyme |
... | ... |
191 | Viral short tail ejection system |
192 | Viral exotoxin |
193 | Chloride channel impairing toxin |
194 | Proton-gated sodium channel impairing toxin |
195 | Translocase |
196 rows × 1 columns