Benchmarking HDBO
A survey and benchmark of high-dimensional Bayesian optimization of discrete sequences
Abstract
Optimizing discrete black-box functions is key in several domains,
e.g. protein engineering and drug design. Due to the lack of gradient
information and the need for sample efficiency, Bayesian optimization
is an ideal candidate for these tasks. Several methods for
high-dimensional continuous and categorical Bayesian optimization have
been proposed recently. However, our survey of the field reveals
highly heterogeneous experimental set-ups across methods and technical
barriers for the replicability and application of published algorithms
to real-world tasks. To address these issues, we develop a unified
framework to test a vast array of high-dimensional Bayesian
optimization methods and a collection of standardized black-box
functions representing real-world application domains in chemistry and
biology. These two components of the benchmark are each supported by
flexible, scalable, and easily extendable software libraries (poli
and poli-baselines
), allowing practitioners to readily incorporate
new optimization objectives or discrete optimizers.
Read the full paper on arXiv or continue reading below.
Optimizing an unknown and expensive-to-evaluate function is a frequent problem across disciplines (Shahriari et al., 2016): examples are finding the right parameters for machine learning models or simulators, drug discovery (Gómez-Bombarelli et al., 2018 Griffiths and Hernández-Lobato, 2020 Pyzer-Knapp, 2018), protein design (Stanton et al., 2022, Gruver et al., 2023), hyperparameter tuning in Machine Learning (Snoek et al., 2012; Turner et al., 2021) and train scheduling. In some scenarios, evaluating the black-box involves an expensive process (e.g. training a large model, or running a physical simulation); Bayesian Optimization (BO, Močkus (1975)) is a powerful method for sample efficient black-box optimization. High dimensional (discrete) problems have long been identified as a key challenge for Bayesian optimization algorithms (Wang et al., 2013; Snoek et al., 2012) given that they tend to scale poorly with both dataset size and dimensionality of the input.
High-dimensional BO has been the focus of an entire research field (see Figure 1), in which methods are extended to address the curse of dimensionality and its consequences
(Binois and Wycoff, 2022;
Santoni et al., 2023). Within this setting, discrete sequence optimization has received particular focus, due to its applicability in the optimization of molecules and proteins. However, prior work often focuses on sequence lengths and number of categories below the hundreds (see Fig. 2), making it difficult for practitioners to judge expected performance on real-world problems in these domains. We contribute (i) a survey of the field while focusing on the real-world applications of high-dimensional discrete sequences, (ii) a benchmark several optimizers in established black boxes, and (iii) an open source, unified interface: poli
and poli-baselines
.
Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1):148–175.
Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 4(2):268–276. PMID: 29532027.
Constrained bayesian optimization for automatic chemical design using variational autoencoders. Chemical Science, 11(2):577–586.
Bayesian optimization for accelerated drug discovery. IBM Journal of Research and Development, 62(6):2:1–2:7
Accelerating Bayesian optimization for biological sequence design with denoising autoencoders. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S., editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 20459–20478. PMLR.
Protein design with guided discrete diffusion. In Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S., editors, Advances in Neural Information Processing Systems, volume 36, pages 12489–12517. Curran Associates, Inc
Practical bayesian optimization of machine learning algorithms. In Pereira, F., Burges, C., Bottou, L., and Weinberger, K., editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc.
Bayesian optimization is superior to random search for machine learning hyperparameter tuning: Analysis of the black-box optimization challenge 2020. In Escalante, H. J. and Hofmann, K., editors, Proceedings of the NeurIPS 2020 Competition and Demonstration Track, volume 133 of Proceedings of Machine Learning Research, pages 3–26. PMLR.
On bayesian methods for seeking the extremum. In Marchuk, G. I., editor, Optimization Techniques IFIP Technical Conference Novosibirsk, July 1–7, 1974, page 400–404, Berlin, Heidelberg. Springer.
Bayesian optimization in high dimensions via random embeddings. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI ’13, page 1778–1784.
A survey on high-dimensional gaussian process modeling with application to bayesian optimization. 2(2).
Comparison of high-dimensional bayesian optimization algorithms on bbob.