Coevolution Matrix Analysis
Warning
Documentation sites are Work in progress
Quick Summary
Takes in a datafrmae containing coevolution matrices ('matrix' column) and adds a column with contact predictions ('contact_matrix' column). |
|
|
Takes in a scaled coevolution matrix and returns a contact prediction matrix |
|
returns 'N' top scoring residues as a dataframe |
Calculate the residue frequencies count from the top_scoring_residue pairs list |
|
|
Scales coevolution matrix by average :param matrix: :return: scaled_matrix |
returns top scoring residues using percentile cut-off |
|
|
scales coevolution matrix by average, set all values <average to 0 --> S scale S by max of all S in the data frame and add additional column of normalised matrices :param df: :param normalise_matrix: :return: data frame with normalised and scaled coevolution matrix columns |
|
Extract top percentile (default: 90) pairs from the chosen matrix and write them in sorted order to an ASCII txt file |
Documentation
- class pycom.analysis.CoMAnalysis[source]
CoMAnalysis provides functions to manipulate coevolution matrices from the pandas dataframe and save the interesting residue pairs into an ASCII file
- add_contact_predictions(df: DataFrame, contact_factor: int = 1.5)[source]
Takes in a datafrmae containing coevolution matrices (‘matrix’ column) and adds a column with contact predictions (‘contact_matrix’ column).
By default, the top 1.5xL pairs are returned as contact predictions, L being the length of the protein This can be modified by changing the contact_factor parameter.
- Parameters:
df – dataframe with coevolution matrices (matrix column)
contact_factor – 1.5 (default)
- Returns:
dataframe with contact predictions (contact_matrix column)
- static calculate_scaled_coevolution_matrix(matrix) ndarray [source]
Scales coevolution matrix by average :param matrix: :return: scaled_matrix
- get_residue_frequencies(top_residue_pairs)[source]
Calculate the residue frequencies count from the top_scoring_residue pairs list
- Parameters:
top_residue_pairs –
- Returns:
dataframe with residueID and count df_res_count
- get_top_contacts_from_coevolution(scaled_matrix, num_contact_factor=1.5)[source]
returns ‘N’ top scoring residues as a dataframe by default, consistent with GREMLIN, top 1.5xL pairs are returned, L being the length of the protein
- Parameters:
scaled_matrix –
num_contact_factor –
- Returns:
data frame with top residue pairs
- get_top_scoring_residues(matrix, res_gap=5, percentile=90)[source]
returns top scoring residues using percentile cut-off
- Parameters:
matrix –
res_gap –
percentile –
- Returns:
data frame with top residue pairs
- save_top_scoring_residue_pairs(df: DataFrame, data_folder='output', matrix_type='matrix_S', res_gap=5, percentile=90)[source]
Extract top percentile (default: 90) pairs from the chosen matrix and write them in sorted order to an ASCII txt file
- Parameters:
df – data frame with coevolution matrices
data_folder – output folder for all files
matrix_type – matrix or matrix_S or matrix_N
res_gap – 5 (default)
percentile – 90
- Returns:
None
- static scale_and_normalise_coevolution_matrices(df: DataFrame, normalise_matrix=True) DataFrame [source]
scales coevolution matrix by average, set all values <average to 0 –> S scale S by max of all S in the data frame and add additional column of normalised matrices :param df: :param normalise_matrix: :return: data frame with normalised and scaled coevolution matrix columns
- scaled_matrix_to_contact_predictions(scaled_matrix, num_contact_factor=1.5)[source]
Takes in a scaled coevolution matrix and returns a contact prediction matrix
By default, the leading diagonal (self contacts) are not set to 1.
- Parameters:
scaled_matrix – scaled coevolution matrix
num_contact_factor – 1.5 (default)
- Returns:
contact prediction matrix