poli
🧪: a library of discrete objective functions#
poli
is a library of discrete objective functions for benchmarking optimization algorithms. If offers
isolation of black box function calls inside conda environments. Don’t worry about clashes w. black box requirements, poli will create the relevant conda environments for you.
logging each black box call using observers.
A numpy interface. Inputs are
np.array
s of strings, outputs arenp.array
s of floats.
We also provide poli-baselines
, a collection of optimizers of these discrete black box functions.
We are running a benchmark!
Using poli
and poli-baselines
, we are running a benchmark comparing high-dimensional Bayesian optimization algorithms for discrete sequence.
Getting started#
A good place to start is the next chapter! Go to Getting Started.
To install poli
and poli-baselines
, we recommend creating a fresh conda environment
conda create -n poli-base python=3.10
conda activate poli-base
pip install poli-core
pip install git+https://github.com/MachineLearningLifeScience/poli-baselines.git@main
poli
also runs on Google Colab. Here is a small example of how to run one of the objective functions..
Black-box objective functions#
Toy problems#
White noise drawn from a unit Gaussian
A toy example about optimizing 5-letter words to spell “ALOHA”
The usual benchmark functions for continuous optimization (e.g. easom
, or ackley_function_01
)
Small molecules#
tdc
)The Therapeutics Data Commons’ implementation of the Albuterol similarity oracle of GuacaMol.
tdc
)The Therapeutics Data Commons’ implementation of the Amlodipine MPO oracle of GuacaMol.
tdc
)The Therapeutics Data Commons’ implementation of the Celecoxib rediscovery oracle of GuacaMol.
tdc
)The Therapeutics Data Commons’ implementation of the “deco Hop” oracle of GuacaMol.
dockstring
for ligand designUsing dockstring
to assess the docking score of a small molecule.
tdc
)The Therapeutics Data Commons’ implementation of the DRD2 docking oracle.
tdc
)A wrapper around the Therapeutics Data Commons implementation of 3pbl docking.
A Closed-form objective for discrete sequences
tdc
)The Therapeutics Data Commons’ implementation of the Fexofenadine MPO oracle of GuacaMol.
tdc
)The Therapeutics Data Commons’ implementation of the GSK3β oracle.
tdc
)The Therapeutics Data Commons’ implementation of the first isomer oracle of GuacaMol.
tdc
)The Therapeutics Data Commons’ implementation of the second isomer oracle of GuacaMol.
tdc
)The Therapeutics Data Commons’ implementation of the JNK3 oracle.
Computing the log-quotient of solubilities using RDKit
.
tdc
)The Therapeutics Data Commons’ implementation of the “median 1” oracle of GuacaMol.
tdc
)The Therapeutics Data Commons’ implementation of the “median 2” oracle of GuacaMol.
tdc
)The Therapeutics Data Commons’ implementation of the Mestranol similarity oracle of GuacaMol.
tdc
)The Therapeutics Data Commons’ implementation of the Osimetrinib MPO oracle of GuacaMol.
lambo
)Computing the penalized log-quotient of solubilities using lambo
’s implementation.
Computing the QED using RDKit
.
tdc
)The Therapeutics Data Commons’ implementation of the Ranolazine MPO oracle of GuacaMol.
tdc
)The Therapeutics Data Commons’ implementation of the scaffold Hop oracle of GuacaMol.
tdc
)The Therapeutics Data Commons’ implementation of the Sitagliptin MPO oracle of GuacaMol.
tdc
)A wrapper around the Therapeutics Data Commons implementation of the synthetic accessibility oracle.
tdc
)The Therapeutics Data Commons’ implementation of the Thiothixene rediscovery oracle of GuacaMol.
tdc
)The Therapeutics Data Commons’ implementation of the Troglitazone rediscovery oracle of GuacaMol.
tdc
)The Therapeutics Data Commons’ implementation of the Valsartan SMARTS oracle of GuacaMol.
tdc
)The Therapeutics Data Commons’ implementation of the Zaleplon MPO oracle of GuacaMol.
Proteins#
foldx
)Stability of mutations of a wildtype using foldx
foldx
)Solvent accessibility of mutations of a wildtype using foldx
RaSP
)Rapid Stability Predictions of single mutations from a wildtype.
PyRosetta
)Stability Predictions of variants from a wildtype.
lambo
)LaMBO Fluorescence (RFP) by stability and solvent-accessible surface area.
Black-box optimization algorithms#
On top of poli
, we provide poli-baselines
, a collection of black-box optimization algorithms (focusing especially on discrete sequences). Examples include
Discrete#
Optimizing a discrete sequence by performing random mutations
Optimizing protein sequences using guided discrete diffusion
Papenmeier et al’s Bounce, using their official implementation.
Daulton et al’s PR, using their official implementation.
Continuous#
An evolutionary strategy for continuous problems
A version of Bayesian Optimization where the acquisition is optimized over a line.
Bayesian Optimization with log-expected improvement and a dimensionality-dependent prior over the lengthscales.
Eriksson and Jankowiak’s SAASBO, using Ax
.
Papenmeier et al’s BAxUS, using their official implementation.
Cite us and other relevant work#
If you use certain black boxes, we expect you to cite the relevant work. Check inside the documentation of each black box for the relevant references.
Contribute problems or solvers#
These are a couple of guides about how to contribute a new problem factory (i.e. black-box objective function), or a new optimization algorithm.
A guide to contributing a new problem to the repository.
How to contribute a new black-box optimization algorithm.