Combining Bayesian optimization and Lipschitz optimization
- PDF / 5,315,253 Bytes
- 24 Pages / 439.37 x 666.142 pts Page_size
- 10 Downloads / 277 Views
Combining Bayesian optimization and Lipschitz optimization Mohamed Osama Ahmed1
· Sharan Vaswani1 · Mark Schmidt1
Received: 21 January 2019 / Revised: 13 June 2019 / Accepted: 20 July 2019 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019
Abstract Bayesian optimization and Lipschitz optimization have developed alternative techniques for optimizing black-box functions. They each exploit a different form of prior about the function. In this work, we explore strategies to combine these techniques for better global optimization. In particular, we propose ways to use the Lipschitz continuity assumption within traditional BO algorithms, which we call Lipschitz Bayesian optimization (LBO). This approach does not increase the asymptotic runtime and in some cases drastically improves the performance (while in the worst case the performance is similar). Indeed, in a particular setting, we prove that using the Lipschitz information yields the same or a better bound on the regret compared to using Bayesian optimization on its own. Moreover, we propose a simple heuristics to estimate the Lipschitz constant, and prove that a growing estimate of the Lipschitz constant is in some sense “harmless”. Our experiments on 15 datasets with 4 acquisition functions show that in the worst case LBO performs similar to the underlying BO method while in some cases it performs substantially better. Thompson sampling in particular typically saw drastic improvements (as the Lipschitz information corrected for its well-known “over-exploration” pheonemon) and its LBO variant often outperformed other acquisition functions. Keywords Bayesian optimization · Global optimization · Lipschitz optimzation · Optimization
1 Introduction Bayesian optimization (BO) has a long history and has been used in a variety of fields (see Shahriari et al. 2016), with recent interest from the machine learning community in the context of automatic hyper-parameter tuning (Snoek et al. 2012; Golovin et al. 2017). BO is
Editors: Karsten Borgwardt, Po-Ling Loh, Evimaria Terzi, and Antti Ukkonen.
B
Mohamed Osama Ahmed [email protected] Sharan Vaswani [email protected] Mark Schmidt [email protected]
1
University of British Columbia, Vancouver, BC, Canada
123
Machine Learning
an example of a global black-box optimization algorithm (Hendrix et al. 2010; Jones et al. 1998; Pintér 1996; Rios and Sahinidis 2013) which optimizes an unknown function that may not have nice properties such as convexity. In the typical setting, we assume that we only have access to a black box that evaluates the function and that it is expensive to do these evaluations. The objective is to find a global optimum of the unknown function with the minimum number of function evaluations. The global optimization of a real-valued function is impossible unless we make assumptions about the structure of the unknown function. Lipschitz continuity assumes that the function can’t change arbitrarily fast as we change the inputs. This is one of
Data Loading...