High-dimensional Bayesian optimization using low-dimensional feature spaces

PDF / 4,695,897 Bytes
19 Pages / 439.37 x 666.142 pts Page_size
95 Downloads / 238 Views

High‑dimensional Bayesian optimization using low‑dimensional feature spaces Riccardo Moriconi1 · Marc Peter Deisenroth2 · K. S. Sesh Kumar3 Received: 3 November 2019 / Revised: 29 July 2020 / Accepted: 11 August 2020 / Published online: 21 September 2020 © The Author(s) 2020

Abstract Bayesian optimization (BO) is a powerful approach for seeking the global optimum of expensive black-box functions and has proven successful for fine tuning hyper-parameters of machine learning models. However, BO is practically limited to optimizing 10–20 parameters. To scale BO to high dimensions, we usually make structural assumptions on the decomposition of the objective and/or exploit the intrinsic lower dimensionality of the problem, e.g. by using linear projections. We could achieve a higher compression rate with nonlinear projections, but learning these nonlinear embeddings typically requires much data. This contradicts the BO objective of a relatively small evaluation budget. To address this challenge, we propose to learn a low-dimensional feature space jointly with (a) the response surface and (b) a reconstruction mapping. Our approach allows for optimization of BO’s acquisition function in the lower-dimensional subspace, which significantly simplifies the optimization problem. We reconstruct the original parameter space from the lower-dimensional subspace for evaluating the black-box function. For meaningful exploration, we solve a constrained optimization problem.

1 Introduction Bayesian optimization (BO) is a useful model-based approach to global optimization of black-box functions, which are expensive to evaluate (Kushner 1964; Jones et al. 1998). This sample-efficient technique for optimization has been effective in experimental Editors: Ira Assent, Carlotta Domeniconi, Aristides Gionis, Eyke Hüllermeier. * Riccardo Moriconi [email protected] Marc Peter Deisenroth [email protected] K. S. Sesh Kumar [email protected] 1

Department of Computing, Imperial College London, London, UK

2

Department of Computer Science, University College London, London, UK

3

Data Science Institute, Imperial College London, London, UK

13

Vol.:(0123456789)

1926

Machine Learning (2020) 109:1925–1943

design of machine learning algorithms (Bergstra et al. 2011), robotics applications (Cully et al. 2015; Calandra et al. 2016b) and medical therapies (Sui et al. 2015) for optimization of spinal-cord electro-stimulation. Despite its great success, BO is practically limited to optimizing 10–20 parameters. A large body of literature has been devoted to address scalability issues to elevate BO to high-dimensional optimization problems, such as discovery of chemical compounds (Gomez-Bombarelli et al. 2018) or automatic software configuration (Hutter et al. 2011). The standard BO routine consists of two key steps: (1) estimating the black-box function from data through a probabilistic surrogate model, usually a Gaussian process (GP), referred to as the response surface; (2) maximizing an acquisition function that trades

Data Loading...

High-dimensional Bayesian optimization using low-dimensional feature spaces

Recommend Documents

Bayesian Optimization

Bayesian Global Optimization

Combining Bayesian optimization and Lipschitz optimization

Accelerated Development of Refractory Nanocomposite Solar Absorbers using Bayesian Optimization

Real World Bayesian Optimization Using Robots to Clean Liquid Spills

Bayesian Optimization and Data Science

Iris Segmentation Using Feature Channel Optimization for Noisy Environments

Feature Selection for Vocal Segmentation Using Social Emotional Optimization Algorithm

Feature Selection Optimization Using a Hybrid Genetic Algorithm

Feature Selection Using Ant Colony Optimization and Weighted Visibility Graph

Offline music symbol recognition using Daisy feature and quantum Grey wolf optimization based feature selection

Hyperparameter optimization for recommender systems through Bayesian optimization