{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Getting started: PyCoM Local\n",
"\n",
"PyCoM, the Python interface for PyCoMDB can both run locally and remotely.\n",
"- Run locally to run large-scale analyses. Requires 115GB of disk space for the database. \n",
"- Run remotely to run small-scale analyses. No disk space required. Follow this tutorial `00_Getting_Started_Remotely.ipynb`.\n",
"\n",
"This is a crash course on how to use the local variant of the Python interface for the PyCoM database.\n",
"1. [Installation](#installation)\n",
"2. [Initialise Pycom object](#initialise-pycom-object)\n",
"3. [Supported query keywords](#supported-query-keywords)\n",
"4. [Paginate the results](#paginate-the-results)\n",
"5. [Load coevolution matrices](#load-coevolution-matrices)\n",
"6. [Adding biological data to dataframe](#adding-biological-data-to-dataframe)\n",
"\n",
"More indepth tutorials are available here: [https://pycom.brunel.ac.uk/tutorials.html](https://pycom.brunel.ac.uk/tutorials.html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"### Installation\n",
"\n",
"Install the PyCom package:\n",
"\n",
"`pip3 install git+https://github.com/scdantu/pycom`\n",
"\n",
"Note: **Requires Python 3.8 or higher**\n",
"\n",
"Download the `pycom.db` and `pycom.mat` files from [https://pycom.brunel.ac.uk/downloads](https://pycom.brunel.ac.uk/downloads)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Initialise PyCom object\n",
"\n",
"Import the required classes and create a PyCom object:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2023-07-15T16:43:49.476196Z",
"start_time": "2023-07-15T16:43:49.178452Z"
}
},
"outputs": [],
"source": [
"from pycom import PyCom, ProteinParams\n",
"\n",
"pyc = PyCom(db_path='~/docs/pycom.db', mat_path='~/docs/pycom.mat')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Query the database\n",
"\n",
"Query the database by passing a dictionary of keywords:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2023-07-15T16:43:49.543273Z",
"start_time": "2023-07-15T16:43:49.477430Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" uniprot_id | \n",
" neff | \n",
" sequence_length | \n",
" sequence | \n",
" organism_id | \n",
" helix_frac | \n",
" turn_frac | \n",
" strand_frac | \n",
" has_ptm | \n",
" has_pdb | \n",
" has_substrate | \n",
" matrix | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" P01111 | \n",
" 12.817 | \n",
" 189 | \n",
" MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... | \n",
" 9606 | \n",
" 0.349206 | \n",
" 0.015873 | \n",
" 0.227513 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" None | \n",
"
\n",
" \n",
" 1 | \n",
" P01112 | \n",
" 12.841 | \n",
" 189 | \n",
" MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... | \n",
" 9606 | \n",
" 0.317460 | \n",
" 0.031746 | \n",
" 0.359788 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" None | \n",
"
\n",
" \n",
" 2 | \n",
" P01116 | \n",
" 12.626 | \n",
" 189 | \n",
" MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... | \n",
" 9606 | \n",
" 0.375661 | \n",
" 0.031746 | \n",
" 0.328042 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" None | \n",
"
\n",
" \n",
" 3 | \n",
" P62070 | \n",
" 12.754 | \n",
" 204 | \n",
" MAAAGWRDGSGQEKYRLVVVGGGGVGKSALTIQFIQSYFVTDYDPT... | \n",
" 9606 | \n",
" 0.299020 | \n",
" 0.019608 | \n",
" 0.220588 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" None | \n",
"
\n",
" \n",
" 4 | \n",
" Q9UNW1 | \n",
" 9.554 | \n",
" 487 | \n",
" MLRAPGCLLRTSVAPAAALAAALLSSLARCSLLEPRDPVASSLSPY... | \n",
" 9606 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" None | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" uniprot_id neff sequence_length \\\n",
"0 P01111 12.817 189 \n",
"1 P01112 12.841 189 \n",
"2 P01116 12.626 189 \n",
"3 P62070 12.754 204 \n",
"4 Q9UNW1 9.554 487 \n",
"\n",
" sequence organism_id helix_frac \\\n",
"0 MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... 9606 0.349206 \n",
"1 MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... 9606 0.317460 \n",
"2 MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... 9606 0.375661 \n",
"3 MAAAGWRDGSGQEKYRLVVVGGGGVGKSALTIQFIQSYFVTDYDPT... 9606 0.299020 \n",
"4 MLRAPGCLLRTSVAPAAALAAALLSSLARCSLLEPRDPVASSLSPY... 9606 0.000000 \n",
"\n",
" turn_frac strand_frac has_ptm has_pdb has_substrate matrix \n",
"0 0.015873 0.227513 1 1 1 None \n",
"1 0.031746 0.359788 1 1 1 None \n",
"2 0.031746 0.328042 1 1 1 None \n",
"3 0.019608 0.220588 1 1 1 None \n",
"4 0.000000 0.000000 0 0 1 None "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"entries = pyc.find({\n",
" ProteinParams.ENZYME: '3.*.*.*',\n",
" ProteinParams.DISEASE: 'cancer', # string search, case-insensitive\n",
"})\n",
"\n",
"entries"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Alternatively, query the database by passing keyword arguments:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2023-07-15T16:43:49.742472Z",
"start_time": "2023-07-15T16:43:49.544060Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" uniprot_id | \n",
" neff | \n",
" sequence_length | \n",
" sequence | \n",
" organism_id | \n",
" helix_frac | \n",
" turn_frac | \n",
" strand_frac | \n",
" has_ptm | \n",
" has_pdb | \n",
" has_substrate | \n",
" matrix | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" P11310 | \n",
" 9.930 | \n",
" 421 | \n",
" MAAGFGRCCRVLRSISRFHWRSQHTKANRQREPGLGFSFEFTEQQK... | \n",
" 9606 | \n",
" 0.517815 | \n",
" 0.016627 | \n",
" 0.180523 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" None | \n",
"
\n",
" \n",
" 1 | \n",
" Q658P3 | \n",
" 9.677 | \n",
" 488 | \n",
" MPEEMDKPLISLHLVDSDSSLAKVPDEAPKVGILGSGDFARSLATR... | \n",
" 9606 | \n",
" 0.157787 | \n",
" 0.000000 | \n",
" 0.086066 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" None | \n",
"
\n",
" \n",
" 2 | \n",
" Q16795 | \n",
" 10.997 | \n",
" 377 | \n",
" MAAAAQSRVVRVLSMSRSAITAIATSVCHGPPCRQLHHALMPHGKG... | \n",
" 9606 | \n",
" 0.363395 | \n",
" 0.037135 | \n",
" 0.124668 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" None | \n",
"
\n",
" \n",
" 3 | \n",
" O95299 | \n",
" 9.244 | \n",
" 355 | \n",
" MALRLLKLAATSASARVVAAGAQRVRGIHSSVQCKLRYGMWHFLLG... | \n",
" 9606 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" None | \n",
"
\n",
" \n",
" 4 | \n",
" P13804 | \n",
" 8.627 | \n",
" 333 | \n",
" MFRAAAPGQLRRAASLLRFQSTLVIAEHANDSLAPITLNTITAATR... | \n",
" 9606 | \n",
" 0.300300 | \n",
" 0.027027 | \n",
" 0.333333 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" None | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" uniprot_id neff sequence_length \\\n",
"0 P11310 9.930 421 \n",
"1 Q658P3 9.677 488 \n",
"2 Q16795 10.997 377 \n",
"3 O95299 9.244 355 \n",
"4 P13804 8.627 333 \n",
"\n",
" sequence organism_id helix_frac \\\n",
"0 MAAGFGRCCRVLRSISRFHWRSQHTKANRQREPGLGFSFEFTEQQK... 9606 0.517815 \n",
"1 MPEEMDKPLISLHLVDSDSSLAKVPDEAPKVGILGSGDFARSLATR... 9606 0.157787 \n",
"2 MAAAAQSRVVRVLSMSRSAITAIATSVCHGPPCRQLHHALMPHGKG... 9606 0.363395 \n",
"3 MALRLLKLAATSASARVVAAGAQRVRGIHSSVQCKLRYGMWHFLLG... 9606 0.000000 \n",
"4 MFRAAAPGQLRRAASLLRFQSTLVIAEHANDSLAPITLNTITAATR... 9606 0.300300 \n",
"\n",
" turn_frac strand_frac has_ptm has_pdb has_substrate matrix \n",
"0 0.016627 0.180523 1 1 1 None \n",
"1 0.000000 0.086066 1 1 0 None \n",
"2 0.037135 0.124668 1 1 0 None \n",
"3 0.000000 0.000000 1 1 0 None \n",
"4 0.027027 0.333333 1 1 0 None "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"entries = pyc.find(\n",
" cofactor='FAD', # string search, case-insensitive\n",
" has_ptm=True,\n",
" has_disease=True,\n",
")\n",
"\n",
"entries"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Supported query keywords\n",
"* `uniprot_id`: The UniProt ID of the protein.\n",
"* `sequence`: The amino acid sequence of protein to search for. (full match)\n",
"* `min_length` / `max_length`: Min/Max number of residues in the protein.\n",
"* `min_helix` / `max_helix`: Min/Max percentage of helical structure in the protein.\n",
"* `min_turn` / `max_turn`: Min/Max percentage of turn structure in the protein.\n",
"* `min_strand` / `max_strand`: Min/Max percentage of beta strand structure in the protein.\n",
"* `organism`: Taxonomic name of the genus / species of the protein. (case-insensitive)\n",
" * Species name or any parent taxonomic level can be used. (`pyc.get_organism_list()` for full list)\n",
" * Surround with `:` to get precise results\n",
" * `:homo:` returns `Homo sapiens` & `Homo sapiens neanderthalensis`)\n",
" * `homo` also returns **homo**eomma, t**homo**mys, and *hundreds* others\n",
"* `organism_id`: Precise NCBI Taxonomy ID of the species of the protein. (prefer to use `organism` instead)\n",
"* `cath`: CATH classification of the protein (`3.40.50.360` or `3.40.*.*` or `3.*`).\n",
"* `enzyme`: Enzyme Commission number of the protein. (`1.3.1.3` or `1.3.*.*` or `1.*`).\n",
"* `has_substrate`: Whether the protein has a known substrate. (`True`/`False`)\n",
"* `has_ptm`: Whether the protein has a known post-translational modification. (`True`/`False`)\n",
"* `has_pbd`: Whether the protein has a known PDB structure. (`True`/`False`)\n",
"* `disease`: The disease associated with the protein. (name of disease, case-insensitive, e.g `cancer`)\n",
" * Use `pyc.get_disease_list()` for full list.\n",
" * `cancer` searches for `Ovarian cancer`, `Lung cancer`, ...\n",
"* `disease_id`: The ID of the disease associated with the protein. (`DI-02205`, get_disease_list()\n",
"* `has_disease`: Whether the protein is associated with a disease. (`True`/`False`)\n",
"* `cofactor`: The cofactor associated with the protein. (name of cofactor, case-insensitive, e.g `Zn(2+)`])\n",
"* `cofactor_id`: The ID of the cofactor associated with the protein. (`CHEBI:00001`, get_cofactor_list())\n",
"* `biological_process`: Biological process associated with the protein. (e.g `antiviral defense`, use `pyc.get_biological_process_list()` for full list)\n",
"* `cellular_component`: Cellular component associated with the protein. (e.g `nucleus`, use `pyc.get_cellular_component_list()` for full list\n",
"* `domain`: Domain associated with the protein. (e.g `zinc-finger`, use `pyc.get_domain_list()` for full list)\n",
"* `ligand`: Ligand associated with the protein. (e.g `zinc`, use `pyc.get_ligand_list()` for full list\n",
"* `molecular_function`: Molecular function associated with the protein. (e.g `antioxidant activity`, use `pyc.get_molecular_function_list()` for full list\n",
"* `ptm`: Post-translational modification associated with the protein. (e.g `phosphoprotein`, use `pyc.get_ptm_list()` for full list\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Paginate the results\n",
"\n",
"Before loading coevolution matrices, it is recommended to paginate the results, as the matrices can take up a lot of memory.\n",
"\n",
"Here is an example of making a large query, then paginating the results:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2023-07-15T16:43:49.766123Z",
"start_time": "2023-07-15T16:43:49.742886Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Found 2958 entries with length <= 20\n",
"Found 100 entries on page 1\n"
]
}
],
"source": [
"entries = pyc.find(max_length=20)\n",
"print(f'Found {len(entries)} entries with length <= 20')\n",
"\n",
"page = pyc.paginate(entries, page=1, per_page=100) # get first n entries (default 100)\n",
"print(f'Found {len(page)} entries on page 1')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load coevolution matrices\n",
"\n",
"Now the coevolution matrices can be loaded for the paginated results.\n",
"\n",
"This loads them into the `matrix` column of the dataframe."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2023-07-15T16:43:49.885056Z",
"start_time": "2023-07-15T16:43:49.782638Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[0.00000000e+00, 2.16066837e-07, 1.56462193e-07, 0.00000000e+00,\n",
" 0.00000000e+00],\n",
" [2.16066837e-07, 0.00000000e+00, 4.61935997e-07, 4.54485416e-07,\n",
" 4.54485416e-07],\n",
" [1.56462193e-07, 4.61935997e-07, 0.00000000e+00, 2.98023224e-07,\n",
" 2.98023224e-07],\n",
" [0.00000000e+00, 4.54485416e-07, 2.98023224e-07, 0.00000000e+00,\n",
" 2.23517418e-07],\n",
" [0.00000000e+00, 4.54485416e-07, 2.98023224e-07, 2.23517418e-07,\n",
" 0.00000000e+00]])"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pyc.load_matrices(page)\n",
"\n",
"page.iloc[0].matrix # show the coevolution matrix for the first entry"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By default, the matrices are loaded as a `numpy.ndarray`. Different formats can be specified.\n",
"\n",
"Here is an example of the matrices being loaded as Pandas DataFrames and 2d-lists:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2024-02-01T14:27:07.547109Z",
"start_time": "2024-02-01T14:27:07.384618Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Numpy: \n",
"Pandas: \n",
"List: \n"
]
}
],
"source": [
"from pycom import MatrixFormat\n",
"\n",
"resultsNumpy = pyc.load_matrices(page, mat_format=MatrixFormat.NUMPY) # default\n",
"resultsPandas = pyc.load_matrices(page, mat_format=MatrixFormat.PANDAS)\n",
"resultsList = pyc.load_matrices(page, mat_format=MatrixFormat.LIST)\n",
"\n",
"print(f'Numpy: {type(resultsNumpy.iloc[0].matrix)}')\n",
"print(f'Pandas: {type(resultsPandas.iloc[0].matrix)}')\n",
"print(f'List: {type(resultsList.iloc[0].matrix)}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Adding biological data to dataframe\n",
"\n",
"**This is supported in the local variant only!**\n",
"\n",
"PyCom contains a lot of additional protein annotation info. This is not loaded by default, but can be added it needed.\n",
"\n",
"The list of cofactors, diseases, and organisms can loaded by calling:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"ExecuteTime": {
"end_time": "2023-07-15T16:43:49.927594Z",
"start_time": "2023-07-15T16:43:49.911153Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" cofactorId | \n",
" cofactorName | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" CHEBI:597326 | \n",
" pyridoxal 5'-phosphate | \n",
"
\n",
" \n",
" 1 | \n",
" CHEBI:18420 | \n",
" Mg(2+) | \n",
"
\n",
" \n",
" 2 | \n",
" CHEBI:60240 | \n",
" a divalent metal cation | \n",
"
\n",
" \n",
" 3 | \n",
" CHEBI:30413 | \n",
" heme | \n",
"
\n",
" \n",
" 4 | \n",
" CHEBI:29105 | \n",
" Zn(2+) | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 109 | \n",
" CHEBI:61721 | \n",
" chlorophyll b | \n",
"
\n",
" \n",
" 110 | \n",
" CHEBI:73095 | \n",
" divinyl chlorophyll a | \n",
"
\n",
" \n",
" 111 | \n",
" CHEBI:73096 | \n",
" divinyl chlorophyll b | \n",
"
\n",
" \n",
" 112 | \n",
" CHEBI:57453 | \n",
" (6S)-5,6,7,8-tetrahydrofolate | \n",
"
\n",
" \n",
" 113 | \n",
" CHEBI:30402 | \n",
" tungstopterin | \n",
"
\n",
" \n",
"
\n",
"
114 rows × 2 columns
\n",
"
"
],
"text/plain": [
" cofactorId cofactorName\n",
"0 CHEBI:597326 pyridoxal 5'-phosphate\n",
"1 CHEBI:18420 Mg(2+)\n",
"2 CHEBI:60240 a divalent metal cation\n",
"3 CHEBI:30413 heme\n",
"4 CHEBI:29105 Zn(2+)\n",
".. ... ...\n",
"109 CHEBI:61721 chlorophyll b\n",
"110 CHEBI:73095 divinyl chlorophyll a\n",
"111 CHEBI:73096 divinyl chlorophyll b\n",
"112 CHEBI:57453 (6S)-5,6,7,8-tetrahydrofolate\n",
"113 CHEBI:30402 tungstopterin\n",
"\n",
"[114 rows x 2 columns]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cofactors = pyc.get_cofactor_list()\n",
"diseases = pyc.get_disease_list()\n",
"organisms = pyc.get_organism_list()\n",
"\n",
"cofactors"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"ExecuteTime": {
"end_time": "2024-02-01T14:47:02.321024Z",
"start_time": "2024-02-01T14:46:47.586969Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"uniprot_id P15291\n",
"neff 7.854\n",
"sequence_length 398\n",
"sequence MRLREPLLSGSAAMPGASLQRACRLLVAVCALHLGVTLVYYLAGRD...\n",
"organism_id 9606\n",
"helix_frac 0.198492\n",
"turn_frac 0.030151\n",
"strand_frac 0.163317\n",
"has_ptm 1\n",
"has_pdb 1\n",
"has_substrate 1\n",
"matrix None\n",
"cofactor_x [Mn(2+)]\n",
"biological_process [Lipid metabolism]\n",
"cath_class 3.90.550.10\n",
"coding_sequence_diversity [Alternative initiation]\n",
"cofactor_y [Mn(2+)]\n",
"developmental_stage NaN\n",
"disease_name [Congenital disorder of glycosylation 2D]\n",
"disease_id [DI-00349]\n",
"enzyme_commission 2.4.1.-\n",
"ligand [Manganese, Metal-binding]\n",
"molecular_function [Glycosyltransferase, Transferase]\n",
"organism_name Homo sapiens\n",
"taxonomy [Eukaryota, Metazoa, Chordata, Craniata, Verte...\n",
"pdb_id [2AE7, 2AEC, 2AES, 2AGD, 2AH9, 2FY7, 2FYA, 2FY...\n",
"cellular_component [Cell membrane, Cell projection, Golgi apparat...\n",
"domain [Signal-anchor, Transmembrane, Transmembrane h...\n",
"ptm NaN\n",
"substrate [D-glucose + UDP-alpha-D-galactose = H(+) + la...\n",
"Name: 0, dtype: object"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loader = pyc.get_data_loader()\n",
"\n",
"entries = pyc.find(uniprot_id='P15291')\n",
"\n",
"# Add the protein's cofactors to the dataframe\n",
"entries = loader.add_cofactors(entries)\n",
"\n",
"# The following functions are supported, data taken directly from UniProt\n",
"entries = loader.add_biological_processes(entries)\n",
"entries = loader.add_cath_class(entries) # Protein's CATH\n",
"entries = loader.add_coding_sequence_diversity(entries) # https://www.uniprot.org/help/keywords\n",
"entries = loader.add_cofactors(entries) # Cofactors\n",
"entries = loader.add_developmental_stage(entries)\n",
"entries = loader.add_diseases(entries) # The diseases associated with the protein\n",
"entries = loader.add_enzyme_commission(entries) # Protein's EC\n",
"entries = loader.add_ligand(entries) # Ligands\n",
"entries = loader.add_molecular_function(entries)\n",
"entries = loader.add_organism_name(entries)\n",
"entries = loader.add_organism_taxonomy(entries)\n",
"entries = loader.add_pdbs(entries) # Experimental PDB IDs of protein\n",
"entries = loader.add_protein_cellular_component(entries)\n",
"entries = loader.add_protein_domain(entries)\n",
"entries = loader.add_ptm(entries) # Protein's Post-translational modifications\n",
"entries = loader.add_substrates(entries) # Protein's substrates\n",
"\n",
"entries.iloc[0]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 1
}