Parallel sequential Monte Carlo for stochastic gradient-free nonconvex optimization

  • PDF / 1,819,615 Bytes
  • 19 Pages / 595.276 x 790.866 pts Page_size
  • 3 Downloads / 219 Views

DOWNLOAD

REPORT


Parallel sequential Monte Carlo for stochastic gradient-free nonconvex optimization Ömer Deniz Akyildiz1,2

· Dan Crisan3 · Joaquín Míguez4,5

Received: 3 June 2019 / Accepted: 8 July 2020 © The Author(s) 2020

Abstract We introduce and analyze a parallel sequential Monte Carlo methodology for the numerical solution of optimization problems that involve the minimization of a cost function that consists of the sum of many individual components. The proposed scheme is a stochastic zeroth-order optimization algorithm which demands only the capability to evaluate small subsets of components of the cost function. It can be depicted as a bank of samplers that generate particle approximations of several sequences of probability measures. These measures are constructed in such a way that they have associated probability density functions whose global maxima coincide with the global minima of the original cost function. The algorithm selects the best performing sampler and uses it to approximate a global minimum of the cost function. We prove analytically that the resulting estimator converges to a global minimum of the cost function almost surely and provide explicit convergence rates in terms of the number of generated Monte Carlo samples and the dimension of the search space. We show, by way of numerical examples, that the algorithm can tackle cost functions with multiple minima or with broad “flat” regions which are hard to minimize using gradient-based techniques. Keywords Sequential Monte Carlo · Stochastic optimization · Nonconvex optimization · Gradient-free optimization · Sampling

1 Introduction An important part of this work was carried out when Ö. D. A. was visiting Department of Mathematics, Imperial College London. This work was partially supported by Agencia Estatal de Investigación of Spain (RTI2018-099655-B-I00 CLARA), and the regional government of Madrid (program CASICAM-CM S2013/ICE-2845). The work of the second author has been partially supported by a UC3M-Santander Chair of Excellence grant held at the Universidad Carlos III de Madrid.

B

Ömer Deniz Akyildiz [email protected] Dan Crisan [email protected] Joaquín Míguez [email protected]

1

The Alan Turing Institute, London, UK

2

University of Warwick, Coventry, UK

3

Imperial College London, London, England

4

Instituto de Investigación Sanitaria Gregorio Marañón, Madrid, Spain

5

Instituto de Investigación Sanitaria Gregorio Marañón, Universidad Carlos III de Madrid, Leganes, Madrid, Spain

In signal processing and machine learning, optimization problems of the form min f (θ ) = θ∈Θ

n 

f i (θ ),

(1.1)

i=1

where Θ ⊂ Rd is the d-dimensional compact search space, have attracted significant attention in recent years for problems where n is very large. Such problems often arise in big data settings, e.g., when one needs to estimate parameters given a large number of observations (Bottou et al. 2018). Because of their efficiency, the optimization community has focused mainly on stochastic gradient-based methods (Robbins and