Benchmarking HDBO HDBO Benchmarking HDBO

Benchmarking HDBO

A survey and benchmark of high-dimensional Bayesian optimization of discrete sequences

Abstract

Optimizing discrete black-box functions is key in several domains, e.g. protein engineering and drug design. Due to the lack of gradient information and the need for sample efficiency, Bayesian optimization is an ideal candidate for these tasks. Several methods for high-dimensional continuous and categorical Bayesian optimization have been proposed recently. However, our survey of the field reveals highly heterogeneous experimental set-ups across methods and technical barriers for the replicability and application of published algorithms to real-world tasks. To address these issues, we develop a unified framework to test a vast array of high-dimensional Bayesian optimization methods and a collection of standardized black-box functions representing real-world application domains in chemistry and biology. These two components of the benchmark are each supported by flexible, scalable, and easily extendable software libraries (poli and poli-baselines), allowing practitioners to readily incorporate new optimization objectives or discrete optimizers.

Read the full paper on arXiv or continue reading below.

Introduction

Optimizing an unknown and expensive-to-evaluate function is a frequent problem across disciplines (Shahriari et al., 2016): examples are finding the right parameters for machine learning models or simulators, drug discovery (Gómez-Bombarelli et al., 2018 Griffiths and Hernández-Lobato, 2020 Pyzer-Knapp, 2018), protein design (Stanton et al., 2022, Gruver et al., 2023), hyperparameter tuning in Machine Learning (Snoek et al., 2012; Turner et al., 2021) and train scheduling. In some scenarios, evaluating the black-box involves an expensive process (e.g. training a large model, or running a physical simulation); Bayesian Optimization (BO, Močkus (1975)) is a powerful method for sample efficient black-box optimization. High dimensional (discrete) problems have long been identified as a key challenge for Bayesian optimization algorithms (Wang et al., 2013; Snoek et al., 2012) given that they tend to scale poorly with both dataset size and dimensionality of the input.

Figure 1: A timeline of high-dimensional Bayesian optimization methods, with arrows drawn between methods that explicitly augment or use each other. References are clickable.

High-dimensional BO has been the focus of an entire research field (see Figure 1), in which methods are extended to address the curse of dimensionality and its consequences (Binois and Wycoff, 2022; Santoni et al., 2023). Within this setting, discrete sequence optimization has received particular focus, due to its applicability in the optimization of molecules and proteins. However, prior work often focuses on sequence lengths and number of categories below the hundreds (see Fig. 2), making it difficult for practitioners to judge expected performance on real-world problems in these domains. We contribute (i) a survey of the field while focusing on the real-world applications of high-dimensional discrete sequences, (ii) a benchmark several optimizers in established black boxes, and (iii) an open source, unified interface: poli and poli-baselines.

Contnue reading the paper

arXiv

References

Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and de Freitas, N. (2016).

Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1):148–175.

Gómez-Bombarelli, R., Wei, J. N., Duvenaud, D., Hernández-Lobato, J. M., Sánchez-Lengeling, B., Sheberla, D., Aguilera-Iparraguirre, J., Hirzel, T. D., Adams, R. P., and Aspuru-Guzik, A. (2018).

Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 4(2):268–276. PMID: 29532027.

Griffiths, R.-R. and Hernández-Lobato, J. M. (2020).

Constrained bayesian optimization for automatic chemical design using variational autoencoders. Chemical Science, 11(2):577–586.

Pyzer-Knapp, E. O. (2018).

Bayesian optimization for accelerated drug discovery. IBM Journal of Research and Development, 62(6):2:1–2:7

Stanton, S., Maddox, W., Gruver, N., Maffettone, P., Delaney, E., Greenside, P., and Wilson, A. G. (2022).

Accelerating Bayesian optimization for biological sequence design with denoising autoencoders. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S., editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 20459–20478. PMLR.

Gruver, N., Stanton, S., Frey, N., Rudner, T. G. J., Hotzel, I., Lafrance-Vanasse, J., Rajpal, A., Cho, K., and Wilson, A. G. (2023).

Protein design with guided discrete diffusion. In Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S., editors, Advances in Neural Information Processing Systems, volume 36, pages 12489–12517. Curran Associates, Inc

Snoek, J., Larochelle, H., and Adams, R. P. (2012).

Practical bayesian optimization of machine learning algorithms. In Pereira, F., Burges, C., Bottou, L., and Weinberger, K., editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc.

Turner, R., Eriksson, D., McCourt, M., Kiili, J., Laaksonen, E., Xu, Z., and Guyon, I. (2021).

Bayesian optimization is superior to random search for machine learning hyperparameter tuning: Analysis of the black-box optimization challenge 2020. In Escalante, H. J. and Hofmann, K., editors, Proceedings of the NeurIPS 2020 Competition and Demonstration Track, volume 133 of Proceedings of Machine Learning Research, pages 3–26. PMLR.

Močkus, J. (1975).

On bayesian methods for seeking the extremum. In Marchuk, G. I., editor, Optimization Techniques IFIP Technical Conference Novosibirsk, July 1–7, 1974, page 400–404, Berlin, Heidelberg. Springer.

Wang, Z., Zoghi, M., Hutter, F., Matheson, D., and De Freitas, N. (2013).

Bayesian optimization in high dimensions via random embeddings. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI ’13, page 1778–1784.

Binois, M. and Wycoff, N. (2022).

A survey on high-dimensional gaussian process modeling with application to bayesian optimization. 2(2).

Santoni, M. L., Raponi, E., Leone, R. D., and Doerr, C. (2023).

Comparison of high-dimensional bayesian optimization algorithms on bbob.

MLLS

This project is brought to you by Center for Basic Machine Learning Research in Life Science

Docs