Coevolution Matrix Analysis

Warning

Documentation sites are Work in progress

Quick Summary

`pycom.analysis.CoMAnalysis.add_contact_predictions`(df)	Takes in a datafrmae containing coevolution matrices ('matrix' column) and adds a column with contact predictions ('contact_matrix' column).
`pycom.analysis.CoMAnalysis.scaled_matrix_to_contact_predictions`(...)	Takes in a scaled coevolution matrix and returns a contact prediction matrix
`pycom.analysis.CoMAnalysis.get_top_contacts_from_coevolution`(...)	returns 'N' top scoring residues as a dataframe
`pycom.analysis.CoMAnalysis.get_residue_frequencies`(...)	Calculate the residue frequencies count from the top_scoring_residue pairs list
`pycom.analysis.CoMAnalysis.calculate_scaled_coevolution_matrix`(matrix)	Scales coevolution matrix by average :param matrix: :return: scaled_matrix
`pycom.analysis.CoMAnalysis.get_top_scoring_residues`(matrix)	returns top scoring residues using percentile cut-off
`pycom.analysis.CoMAnalysis.scale_and_normalise_coevolution_matrices`(df)	scales coevolution matrix by average, set all values <average to 0 --> S scale S by max of all S in the data frame and add additional column of normalised matrices :param df: :param normalise_matrix: :return: data frame with normalised and scaled coevolution matrix columns
`pycom.analysis.CoMAnalysis.save_top_scoring_residue_pairs`(df)	Extract top percentile (default: 90) pairs from the chosen matrix and write them in sorted order to an ASCII txt file

Documentation

class pycom.analysis.CoMAnalysis[source]

CoMAnalysis provides functions to manipulate coevolution matrices from the pandas dataframe and save the interesting residue pairs into an ASCII file

add_contact_predictions(df: DataFrame, contact_factor: int = 1.5)[source]

Takes in a datafrmae containing coevolution matrices (‘matrix’ column) and adds a column with contact predictions (‘contact_matrix’ column).

By default, the top 1.5xL pairs are returned as contact predictions, L being the length of the protein This can be modified by changing the contact_factor parameter.

Parameters:

df – dataframe with coevolution matrices (matrix column)
contact_factor – 1.5 (default)

Returns:

dataframe with contact predictions (contact_matrix column)

static calculate_scaled_coevolution_matrix(matrix) → ndarray[source]: Scales coevolution matrix by average :param matrix: :return: scaled_matrix

get_residue_frequencies(top_residue_pairs)[source]

Calculate the residue frequencies count from the top_scoring_residue pairs list

Parameters:: top_residue_pairs
Returns:: dataframe with residueID and count df_res_count

get_top_contacts_from_coevolution(scaled_matrix, num_contact_factor=1.5)[source]

returns ‘N’ top scoring residues as a dataframe by default, consistent with GREMLIN, top 1.5xL pairs are returned, L being the length of the protein

Parameters:

scaled_matrix
num_contact_factor

Returns:

data frame with top residue pairs

get_top_scoring_residues(matrix, res_gap=5, percentile=90)[source]

returns top scoring residues using percentile cut-off

Parameters:

matrix
res_gap
percentile

Returns:

data frame with top residue pairs

save_top_scoring_residue_pairs(df: DataFrame, data_folder='output', matrix_type='matrix_S', res_gap=5, percentile=90)[source]

Extract top percentile (default: 90) pairs from the chosen matrix and write them in sorted order to an ASCII txt file

Parameters:

df – data frame with coevolution matrices
data_folder – output folder for all files
matrix_type – matrix or matrix_S or matrix_N
res_gap – 5 (default)
percentile – 90

Returns:

None

static scale_and_normalise_coevolution_matrices(df: DataFrame, normalise_matrix=True) → DataFrame[source]: scales coevolution matrix by average, set all values <average to 0 –> S scale S by max of all S in the data frame and add additional column of normalised matrices :param df: :param normalise_matrix: :return: data frame with normalised and scaled coevolution matrix columns

scaled_matrix_to_contact_predictions(scaled_matrix, num_contact_factor=1.5)[source]

Takes in a scaled coevolution matrix and returns a contact prediction matrix

By default, the leading diagonal (self contacts) are not set to 1.

Parameters:

scaled_matrix – scaled coevolution matrix
num_contact_factor – 1.5 (default)

Returns:

contact prediction matrix