High-dimensional Bayesian optimization using low-dimensional feature spaces

  • PDF / 4,695,897 Bytes
  • 19 Pages / 439.37 x 666.142 pts Page_size
  • 95 Downloads / 206 Views

DOWNLOAD

REPORT


High‑dimensional Bayesian optimization using low‑dimensional feature spaces Riccardo Moriconi1   · Marc Peter Deisenroth2 · K. S. Sesh Kumar3 Received: 3 November 2019 / Revised: 29 July 2020 / Accepted: 11 August 2020 / Published online: 21 September 2020 © The Author(s) 2020

Abstract Bayesian optimization (BO) is a powerful approach for seeking the global optimum of expensive black-box functions and has proven successful for fine tuning hyper-parameters of machine learning models. However, BO is practically limited to optimizing 10–20 parameters. To scale BO to high dimensions, we usually make structural assumptions on the decomposition of the objective and/or exploit the intrinsic lower dimensionality of the problem, e.g. by using linear projections. We could achieve a higher compression rate with nonlinear projections, but learning these nonlinear embeddings typically requires much data. This contradicts the BO objective of a relatively small evaluation budget. To address this challenge, we propose to learn a low-dimensional feature space jointly with (a) the response surface and (b) a reconstruction mapping. Our approach allows for optimization of BO’s acquisition function in the lower-dimensional subspace, which significantly simplifies the optimization problem. We reconstruct the original parameter space from the lower-dimensional subspace for evaluating the black-box function. For meaningful exploration, we solve a constrained optimization problem.

1 Introduction Bayesian optimization (BO) is a useful model-based approach to global optimization of black-box functions, which are expensive to evaluate (Kushner 1964; Jones et al. 1998). This sample-efficient technique for optimization has been effective in experimental Editors: Ira Assent, Carlotta Domeniconi, Aristides Gionis, Eyke Hüllermeier. * Riccardo Moriconi [email protected] Marc Peter Deisenroth [email protected] K. S. Sesh Kumar [email protected] 1

Department of Computing, Imperial College London, London, UK

2

Department of Computer Science, University College London, London, UK

3

Data Science Institute, Imperial College London, London, UK



13

Vol.:(0123456789)

1926

Machine Learning (2020) 109:1925–1943

design of machine learning algorithms (Bergstra et  al. 2011), robotics applications (Cully et  al. 2015; Calandra et  al. 2016b) and medical therapies (Sui et  al. 2015) for optimization of spinal-cord electro-stimulation. Despite its great success, BO is practically limited to optimizing 10–20 parameters. A large body of literature has been devoted to address scalability issues to elevate BO to high-dimensional optimization problems, such as discovery of chemical compounds (Gomez-Bombarelli et al. 2018) or automatic software configuration (Hutter et al. 2011). The standard BO routine consists of two key steps: (1) estimating the black-box function from data through a probabilistic surrogate model, usually a Gaussian process (GP), referred to as the response surface; (2) maximizing an acquisition function that trades