poli 🧪: a library of discrete objective functions

`poli` 🧪: a library of discrete objective functions#

poli is a library of discrete objective functions for benchmarking optimization algorithms. If offers

isolation of black box function calls inside conda environments. Don’t worry about clashes w. black box requirements, poli will create the relevant conda environments for you.
logging each black box call using observers.
A numpy interface. Inputs are np.arrays of strings, outputs are np.arrays of floats.

We also provide poli-baselines, a collection of optimizers of these discrete black box functions.

We are running a benchmark!

Using poli and poli-baselines, we are running a benchmark comparing high-dimensional Bayesian optimization algorithms for discrete sequence.

Getting started#

A good place to start is the next chapter! Go to Getting Started.

To install poli and poli-baselines, we recommend creating a fresh conda environment

conda create -n poli-base python=3.10
conda activate poli-base
pip install poli-core
pip install git+https://github.com/MachineLearningLifeScience/poli-baselines.git@main

poli also runs on Google Colab. Here is a small example of how to run one of the objective functions..

Black-box objective functions#

For a full list, click here.

Toy problems#

White noise

White noise drawn from a unit Gaussian

./using_poli/objective_repository/white_noise.html

Aloha

A toy example about optimizing 5-letter words to spell “ALOHA”

./using_poli/objective_repository/aloha.html

Toy continuous problems

The usual benchmark functions for continuous optimization (e.g. easom, or ackley_function_01)

./using_poli/objective_repository/toy_continuous_problems.html

Small molecules#

Albuterol Similarity (using tdc)

The Therapeutics Data Commons’ implementation of the Albuterol similarity oracle of GuacaMol.

./using_poli/objective_repository/albuterol_similarity.html

Amlodipine MPO (using tdc)

The Therapeutics Data Commons’ implementation of the Amlodipine MPO oracle of GuacaMol.

./using_poli/objective_repository/amlodipine_mpo.html

Celecoxib rediscovery (using tdc)

The Therapeutics Data Commons’ implementation of the Celecoxib rediscovery oracle of GuacaMol.

./using_poli/objective_repository/celecoxib_rediscovery.html

Decorator Hop (using tdc)

The Therapeutics Data Commons’ implementation of the “deco Hop” oracle of GuacaMol.

./using_poli/objective_repository/deco_hop.html

dockstring for ligand design

Using dockstring to assess the docking score of a small molecule.

./using_poli/objective_repository/dockstring.html

DRD2 docking (using tdc)

The Therapeutics Data Commons’ implementation of the DRD2 docking oracle.

./using_poli/objective_repository/albuterol_similarity.html

DRD3 (or 3pbl) docking (using tdc)

A wrapper around the Therapeutics Data Commons implementation of 3pbl docking.

./using_poli/objective_repository/drd3_docking.html

Ehrlich Functions

A Closed-form objective for discrete sequences

./using_poli/objective_repository/ehrlich_functions.html

Fexofenadine MPO (using tdc)

The Therapeutics Data Commons’ implementation of the Fexofenadine MPO oracle of GuacaMol.

./using_poli/objective_repository/fexofenadine_mpo.html

GSK3β (using tdc)

The Therapeutics Data Commons’ implementation of the GSK3β oracle.

./using_poli/objective_repository/gsk3_beta.html

Isomer C7H8N2O2 (using tdc)

The Therapeutics Data Commons’ implementation of the first isomer oracle of GuacaMol.

./using_poli/objective_repository/isomer_c7h8n2o2.html

Isomer C9H10N2O2PF2Cl (using tdc)

The Therapeutics Data Commons’ implementation of the second isomer oracle of GuacaMol.

./using_poli/objective_repository/isomer_c9h10n2o2pf2cl.html

JNK3 (using tdc)

The Therapeutics Data Commons’ implementation of the JNK3 oracle.

./using_poli/objective_repository/jnk3.html

Log-solubility (LogP)

Computing the log-quotient of solubilities using RDKit.

./using_poli/objective_repository/rdkit_logp.html

Median 1 (using tdc)

The Therapeutics Data Commons’ implementation of the “median 1” oracle of GuacaMol.

./using_poli/objective_repository/median_1.html

Median 2 (using tdc)

The Therapeutics Data Commons’ implementation of the “median 2” oracle of GuacaMol.

./using_poli/objective_repository/median_2.html

Mestranol Similarity (using tdc)

The Therapeutics Data Commons’ implementation of the Mestranol similarity oracle of GuacaMol.

./using_poli/objective_repository/albuterol_similarity.html

Osimetrinib MPO (using tdc)

The Therapeutics Data Commons’ implementation of the Osimetrinib MPO oracle of GuacaMol.

./using_poli/objective_repository/osimetrinib_mpo.html

Penalized Log-solubility (LogP, using lambo)

Computing the penalized log-quotient of solubilities using lambo’s implementation.

./using_poli/objective_repository/penalized_logp_lambo.html

Quantitative Estimate of Druglikeness (QED)

Computing the QED using RDKit.

./using_poli/objective_repository/rdkit_qed.html

Ranolazine MPO (using tdc)

The Therapeutics Data Commons’ implementation of the Ranolazine MPO oracle of GuacaMol.

./using_poli/objective_repository/ranolazine_mpo.html

Scaffold Hop (using tdc)

The Therapeutics Data Commons’ implementation of the scaffold Hop oracle of GuacaMol.

./using_poli/objective_repository/deco_hop.html

Sitagliptin MPO (using tdc)

The Therapeutics Data Commons’ implementation of the Sitagliptin MPO oracle of GuacaMol.

./using_poli/objective_repository/sitagliptin_mpo.html

Synthetic Accessibility (SA, using tdc)

A wrapper around the Therapeutics Data Commons implementation of the synthetic accessibility oracle.

./using_poli/objective_repository/sa_tdc.html

Thiothixene rediscovery (using tdc)

The Therapeutics Data Commons’ implementation of the Thiothixene rediscovery oracle of GuacaMol.

./using_poli/objective_repository/thiothixene_rediscovery.html

Troglitazone rediscovery (using tdc)

The Therapeutics Data Commons’ implementation of the Troglitazone rediscovery oracle of GuacaMol.

./using_poli/objective_repository/troglitazone_rediscovery.html

Valsartan SMARTS (using tdc)

The Therapeutics Data Commons’ implementation of the Valsartan SMARTS oracle of GuacaMol.

./using_poli/objective_repository/valsartan_smarts.html

Zaleplon MPO (using tdc)

The Therapeutics Data Commons’ implementation of the Zaleplon MPO oracle of GuacaMol.

./using_poli/objective_repository/zaleplon_mpo.html

Proteins#

Protein Stability (using foldx)

Stability of mutations of a wildtype using foldx

./using_poli/objective_repository/foldx_stability.html

Protein SASA score (using foldx)

Solvent accessibility of mutations of a wildtype using foldx

./using_poli/objective_repository/foldx_sasa.html

Protein Stability (using RaSP)

Rapid Stability Predictions of single mutations from a wildtype.

./using_poli/objective_repository/RaSP.html

Protein Stability (using PyRosetta)

Stability Predictions of variants from a wildtype.

./using_poli/objective_repository/Rosetta_energy.html

RFP Fluorescence Protein Stability (using lambo)

LaMBO Fluorescence (RFP) by stability and solvent-accessible surface area.

./using_poli/objective_repository/foldx_rfp_lambo.html

Black-box optimization algorithms#

On top of poli, we provide poli-baselines, a collection of black-box optimization algorithms (focusing especially on discrete sequences). Examples include

Discrete#

Random Mutations

Optimizing a discrete sequence by performing random mutations

./using_poli_baselines/random_mutations.html

LaMBO2

Optimizing protein sequences using guided discrete diffusion

./using_poli_baselines/lambo2.html

Increasingly high-dimensional combinatorial and continuous embeddings (Bounce)

Papenmeier et al’s Bounce, using their official implementation.

./using_poli_baselines/bounce.html

Bayesian optimization with probabilistic reparametrization (ProbRep)

Daulton et al’s PR, using their official implementation.

./using_poli_baselines/probrep.html

Continuous#

CMA-ES

An evolutionary strategy for continuous problems

./using_poli_baselines/cma_es.html

Line Bayesian Optimization

A version of Bayesian Optimization where the acquisition is optimized over a line.

./using_poli_baselines/latent_space_bo.html

Hvarfner’s Vanilla Bayesian Optimization

Bayesian Optimization with log-expected improvement and a dimensionality-dependent prior over the lengthscales.

./using_poli_baselines/hvarfners_vanilla_bo.html

Sparse Axis-Aligned Subspace Bayesian Optimization (SAASBO)

Eriksson and Jankowiak’s SAASBO, using Ax.

./using_poli_baselines/saasbo.html

Adaptive expanding subspaces (BAxUS)

Papenmeier et al’s BAxUS, using their official implementation.

./using_poli_baselines/baxus.html

Cite us and other relevant work#

If you use certain black boxes, we expect you to cite the relevant work. Check inside the documentation of each black box for the relevant references.

Contribute problems or solvers#

These are a couple of guides about how to contribute a new problem factory (i.e. black-box objective function), or a new optimization algorithm.

Contribute a new problem

A guide to contributing a new problem to the repository.

./contributing/a_new_problem.html

Contribute a new solver

How to contribute a new black-box optimization algorithm.

./contributing/a_new_solver.html

poli 🧪: a library of discrete objective functions

Contents

poli 🧪: a library of discrete objective functions#

Getting started#

Black-box objective functions#

Toy problems#

Small molecules#

Proteins#

Black-box optimization algorithms#

Discrete#

Continuous#

Cite us and other relevant work#

Contribute problems or solvers#

`poli` 🧪: a library of discrete objective functions#