Coevolution Matrix Analysis

Warning

Documentation sites are Work in progress

Quick Summary

pycom.analysis.CoMAnalysis.add_contact_predictions(df)

Takes in a datafrmae containing coevolution matrices ('matrix' column) and adds a column with contact predictions ('contact_matrix' column).

pycom.analysis.CoMAnalysis.scaled_matrix_to_contact_predictions(...)

Takes in a scaled coevolution matrix and returns a contact prediction matrix

pycom.analysis.CoMAnalysis.get_top_contacts_from_coevolution(...)

returns 'N' top scoring residues as a dataframe

pycom.analysis.CoMAnalysis.get_residue_frequencies(...)

Calculate the residue frequencies count from the top_scoring_residue pairs list

pycom.analysis.CoMAnalysis.calculate_scaled_coevolution_matrix(matrix)

Scales coevolution matrix by average :param matrix: :return: scaled_matrix

pycom.analysis.CoMAnalysis.get_top_scoring_residues(matrix)

returns top scoring residues using percentile cut-off

pycom.analysis.CoMAnalysis.scale_and_normalise_coevolution_matrices(df)

scales coevolution matrix by average, set all values <average to 0 --> S scale S by max of all S in the data frame and add additional column of normalised matrices :param df: :param normalise_matrix: :return: data frame with normalised and scaled coevolution matrix columns

pycom.analysis.CoMAnalysis.save_top_scoring_residue_pairs(df)

Extract top percentile (default: 90) pairs from the chosen matrix and write them in sorted order to an ASCII txt file

Documentation

class pycom.analysis.CoMAnalysis[source]

CoMAnalysis provides functions to manipulate coevolution matrices from the pandas dataframe and save the interesting residue pairs into an ASCII file

add_contact_predictions(df: DataFrame, contact_factor: int = 1.5)[source]

Takes in a datafrmae containing coevolution matrices (‘matrix’ column) and adds a column with contact predictions (‘contact_matrix’ column).

By default, the top 1.5xL pairs are returned as contact predictions, L being the length of the protein This can be modified by changing the contact_factor parameter.

Parameters:
  • df – dataframe with coevolution matrices (matrix column)

  • contact_factor – 1.5 (default)

Returns:

dataframe with contact predictions (contact_matrix column)

static calculate_scaled_coevolution_matrix(matrix) ndarray[source]

Scales coevolution matrix by average :param matrix: :return: scaled_matrix

get_residue_frequencies(top_residue_pairs)[source]

Calculate the residue frequencies count from the top_scoring_residue pairs list

Parameters:

top_residue_pairs

Returns:

dataframe with residueID and count df_res_count

get_top_contacts_from_coevolution(scaled_matrix, num_contact_factor=1.5)[source]

returns ‘N’ top scoring residues as a dataframe by default, consistent with GREMLIN, top 1.5xL pairs are returned, L being the length of the protein

Parameters:
  • scaled_matrix

  • num_contact_factor

Returns:

data frame with top residue pairs

get_top_scoring_residues(matrix, res_gap=5, percentile=90)[source]

returns top scoring residues using percentile cut-off

Parameters:
  • matrix

  • res_gap

  • percentile

Returns:

data frame with top residue pairs

save_top_scoring_residue_pairs(df: DataFrame, data_folder='output', matrix_type='matrix_S', res_gap=5, percentile=90)[source]

Extract top percentile (default: 90) pairs from the chosen matrix and write them in sorted order to an ASCII txt file

Parameters:
  • df – data frame with coevolution matrices

  • data_folder – output folder for all files

  • matrix_type – matrix or matrix_S or matrix_N

  • res_gap – 5 (default)

  • percentile – 90

Returns:

None

static scale_and_normalise_coevolution_matrices(df: DataFrame, normalise_matrix=True) DataFrame[source]

scales coevolution matrix by average, set all values <average to 0 –> S scale S by max of all S in the data frame and add additional column of normalised matrices :param df: :param normalise_matrix: :return: data frame with normalised and scaled coevolution matrix columns

scaled_matrix_to_contact_predictions(scaled_matrix, num_contact_factor=1.5)[source]

Takes in a scaled coevolution matrix and returns a contact prediction matrix

By default, the leading diagonal (self contacts) are not set to 1.

Parameters:
  • scaled_matrix – scaled coevolution matrix

  • num_contact_factor – 1.5 (default)

Returns:

contact prediction matrix