poli.objective_repository.gfp_cbas.cbas_alphabet_preprocessing

poli.objective_repository.gfp_cbas.cbas_alphabet_preprocessing#

Functions

`convert_aas_to_idx_array`(X_aa)	Converts a list of amino acid sequences into an array of amino acid indices from AA_IDX.
`convert_idx_array_to_aas`(X_aa)	Converts an array containing indices of amino acids into the corresponding string amino acid sequences.
`convert_mutations_to_sequence`(base_seq, ...)	Given the wild type sequence and a formatted mtuation string, returns the mutated sequence
`get_argmax`(Xt_p)	Given a categorical probability distribution specifying the probability of amino acids at each position in a sequence, returns the most probable sequence
`get_balaji_predictions`(preds, Xt)	Given a set of predictors built according to the methods in the Balaji Lakshminarayanan paper 'Simple and scalable predictive uncertainty estimation using deep ensembles' (2017), returns the mean and variance of the total prediction.
`get_experimental_X_y`([random_state, ...])	For the GFP testing experiments.
`get_gfp_X_y_aa`(data_df[, functional_only, ...])	Converts the raw GFP data to a set of X and y values that are ready to use in a model
`get_gfp_base_seq`()	Returns the wild type GFP sequence
`get_samples`(Xt_p)	Samples from a categorical probability distribution specifying the probability of amino acids at each position in a sequence
`one_hot_encode_aa`(aa_str[, pad])	Returns a one hot encoded amino acid sequence
`one_hot_encode_aa_array`(X_aa)	OneHot encodes array: (batch_size, L) -> (batch_size, L, alphabet_size)
`one_hot_encode_dna`(dna_str[, pad, base_order])	Convert length M string into M x 4 tokenized array
`partition_data`(X, y[, percentile, ...])	Partition a (X, y) data set by a percentile of the y values
`read_gfp_data`([path, df_save_file])	Reads the GFP brightness data in a pandas DataFrame