poli 🧪: a library of discrete objective functions#

poli is a library of discrete objective functions for benchmarking optimization algorithms. Examples include:

  • 🔬 stability of mutations from a wildtype protein (using foldx or rasp).

  • 🧪 docking scores of ligands to proteins (using dockstring, pyscreener and pytdc).

  • đź’Š druglikeness or synthetic acccesibility of small molecules (using rdkit and pytdc).

Some of poli’s features:

  • 🔲 isolation of black box function calls inside conda environments. Don’t worry about clashes w. black box requirements, poli will create the relevant conda environments for you.

  • 🗒️ logging each black box call using observers.

  • A numpy interface. Inputs are np.arrays of strings, outputs are np.arrays of floats.

  • SMILES and SELFIES support for small molecule manipulation.

This documentation also discusses poli-baselines, a collection of optimizers of these discrete black box functions.

Getting started#

A good place to start is the next chapter! Go to Getting Started.

To install poli and poli-baselines, we recommend creating a fresh conda environment

conda create -n poli-base python=3.9
conda activate poli-base
pip install git+https://github.com/MachineLearningLifeScience/poli.git@dev
pip install git+https://github.com/MachineLearningLifeScience/poli-baselines.git@main

poli also runs on Google Colab. Here is a small example of how to run one of the objective functions..

Black-box objective functions#

For a full list, click here.

Toy problems#

White noise

White noise drawn from a unit Gaussian

Aloha

A toy example about optimizing 5-letter words to spell “ALOHA”

Toy continuous problems

The usual benchmark functions for continuous optimization (e.g. easom, or ackley_function_01)

Small molecules#

Albuterol Similarity (using tdc)

The Therapeutics Data Commons’ implementation of the Albuterol similarity oracle of GuacaMol.

Amlodipine MPO (using tdc)

The Therapeutics Data Commons’ implementation of the Amlodipine MPO oracle of GuacaMol.

Celecoxib rediscovery (using tdc)

The Therapeutics Data Commons’ implementation of the Celecoxib rediscovery oracle of GuacaMol.

Decorator Hop (using tdc)

The Therapeutics Data Commons’ implementation of the “deco Hop” oracle of GuacaMol.

dockstring for ligand design

Using dockstring to assess the docking score of a small molecule.

DRD2 docking (using tdc)

The Therapeutics Data Commons’ implementation of the DRD2 docking oracle.

DRD3 (or 3pbl) docking (using tdc)

A wrapper around the Therapeutics Data Commons implementation of 3pbl docking.

Fexofenadine MPO (using tdc)

The Therapeutics Data Commons’ implementation of the Fexofenadine MPO oracle of GuacaMol.

GSK3β (using tdc)

The Therapeutics Data Commons’ implementation of the GSK3β oracle.

Isomer C7H8N2O2 (using tdc)

The Therapeutics Data Commons’ implementation of the first isomer oracle of GuacaMol.

Isomer C9H10N2O2PF2Cl (using tdc)

The Therapeutics Data Commons’ implementation of the second isomer oracle of GuacaMol.

JNK3 (using tdc)

The Therapeutics Data Commons’ implementation of the JNK3 oracle.

Log-solubility (LogP)

Computing the log-quotient of solubilities using RDKit.

Median 1 (using tdc)

The Therapeutics Data Commons’ implementation of the “median 1” oracle of GuacaMol.

Median 2 (using tdc)

The Therapeutics Data Commons’ implementation of the “median 2” oracle of GuacaMol.

Mestranol Similarity (using tdc)

The Therapeutics Data Commons’ implementation of the Mestranol similarity oracle of GuacaMol.

Osimetrinib MPO (using tdc)

The Therapeutics Data Commons’ implementation of the Osimetrinib MPO oracle of GuacaMol.

Penalized Log-solubility (LogP, using lambo)

Computing the penalized log-quotient of solubilities using lambo’s implementation.

Quantitative Estimate of Druglikeness (QED)

Computing the QED using RDKit.

Ranolazine MPO (using tdc)

The Therapeutics Data Commons’ implementation of the Ranolazine MPO oracle of GuacaMol.

Scaffold Hop (using tdc)

The Therapeutics Data Commons’ implementation of the scaffold Hop oracle of GuacaMol.

Sitagliptin MPO (using tdc)

The Therapeutics Data Commons’ implementation of the Sitagliptin MPO oracle of GuacaMol.

Synthetic Accessibility (SA, using tdc)

A wrapper around the Therapeutics Data Commons implementation of the synthetic accessibility oracle.

Thiothixene rediscovery (using tdc)

The Therapeutics Data Commons’ implementation of the Thiothixene rediscovery oracle of GuacaMol.

Troglitazone rediscovery (using tdc)

The Therapeutics Data Commons’ implementation of the Troglitazone rediscovery oracle of GuacaMol.

Valsartan SMARTS (using tdc)

The Therapeutics Data Commons’ implementation of the Valsartan SMARTS oracle of GuacaMol.

Zaleplon MPO (using tdc)

The Therapeutics Data Commons’ implementation of the Zaleplon MPO oracle of GuacaMol.

Proteins#

Protein Stability (using foldx)

Stability of mutations of a wildtype using foldx

Protein SASA score (using foldx)

Solvent accessibility of mutations of a wildtype using foldx

Protein Stability (using RaSP)

Rapid Stability Predictions of single mutations from a wildtype.

RFP Fluorescence Protein Stability (using lambo)

LaMBO Fluorescence (RFP) by stability and solvent-accessible surface area.

Black-box optimization algorithms#

On top of poli, we provide poli-baselines, a collection of black-box optimization algorithms (focusing especially on discrete sequences). Examples include

Discrete#

Random Mutations

Optimizing a discrete sequence by performing random mutations

Increasingly high-dimensional combinatorial and continuous embeddings (Bounce)

Papenmeier et al’s Bounce, using their official implementation.

Bayesian optimization with probabilistic reparametrization (ProbRep)

Daulton et al’s PR, using their official implementation.

Continuous#

CMA-ES

An evolutionary strategy for continuous problems

Line Bayesian Optimization

A version of Bayesian Optimization where the acquisition is optimized over a line.

Hvarfner’s Vanilla Bayesian Optimization

Bayesian Optimization with log-expected improvement and a dimensionality-dependent prior over the lengthscales.

Sparse Axis-Aligned Subspace Bayesian Optimization (SAASBO)

Eriksson and Jankowiak’s SAASBO, using Ax.

Adaptive expanding subspaces (BAxUS)

Papenmeier et al’s BAxUS, using their official implementation.

Cite us and other relevant work#

If you use certain black boxes, we expect you to cite the relevant work. Check inside the documentation of each black box for the relevant references.

Contribute problems or solvers#

These are a couple of guides about how to contribute a new problem factory (i.e. black-box objective function), or a new optimization algorithm.

Contribute a new problem

A guide to contributing a new problem to the repository.

Contribute a new solver

How to contribute a new black-box optimization algorithm.