Hyperparameter Optimization with Factorized Multilayer Perceptrons

In machine learning, hyperparameter optimization is a challenging task that is usually approached by experienced practitioners or in a computationally expensive brute-force manner such as grid-search. Therefore, recent research proposes to use observed hy

PDF / 482,976 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
30 Downloads / 267 Views

DOWNLOAD

REPORT

Sequential model-based

Introduction

Unfortunately, machine learning models are very rarely parameter-free, as they usually contain a set of hyperparameters which have to be chosen appropriately on validation data. As a simple example, the number of latent variables in a matrix factorization cannot be determined using gradient descent as ﬁrstly, it is not explicitly given in the objective function and secondly is not a continuous but a discrete parameter. Additionally, the choice of kernel function for an SVM can also be understood as hyperparameter, where gradient descent approaches fail. Besides being a parameter of learned model, hyperparameters can also be part of the objective function, such as regularization constants. Moreover, they can also be part of c Springer International Publishing Switzerland 2015 A. Appice et al. (Eds.): ECML PKDD 2015, Part II, LNAI 9285, pp. 87–103, 2015. DOI: 10.1007/978-3-319-23525-7 6

88

N. Schilling et al.

the learning algorithm that is used to optimize the model for the objective function, for example the steplength of a gradient based technique or the threshold of a stopping criterion. Finally, even the choice of preprocessing can be viewed as a hyperparameter. Some of these hyperparameters are continuous, some are categorical, but what they all have in common is that there is no eﬃcient learning algorithm for them. Therefore many researchers rely on searching them on a grid, which is computationally very expensive, as with growing data and growing complexity of models the optimization part usually requires a lot of time. The performance of a model on test data trained with speciﬁc hyperparameters depends on the data set where the machine learning model should be learned, and therefore hyperparameter optimization is usually started from the scratch for each new data set. Thus, possibly valuable information of past hyperparameter performance on other data sets is ignored. Recent work proposes to use this information to be able to perform a more eﬃcient and faster hyperparameter optimization than before [2]. To accomplish this, the sequential model-based optimization framework is applied, where a surrogate model is learned to predict hyperparameter performances in a ﬁrst step. Then an acquisition function is queried to choose the next hyperparameter to test while maintaining a reasonable tradeoﬀ between exploration and exploitation. As the prediction of the surrogate model can be done in constant time, hyperparameters can be optimized in a controlled way, resulting in less runs of the actual learning algorithm until a promising conﬁguration is found. This paper targets the problem of hyperparameter learning and more generally model selection across diﬀerent data sets. We propose to use a multilayer perceptron as surrogate model and show how it can be learned to also include hyperparameter performances of data sets observed in the past. Additionally, we propose a factorized multilayer perceptron that contains a factorization part in the ﬁrst layer of the network to direct

Data Loading...

Hyperparameter Optimization with Factorized Multilayer Perceptrons

Recommend Documents

Multilayer Perceptrons: Architecture and Error Backpropagation

Hyperparameter Optimization Using Scikit-Learn

Hyperparameter optimization for recommender systems through Bayesian optimization

Support Vector Machines and Perceptrons Learning, Optimization, Clas

Hyperparameter Optimization in Machine Learning Make Your Machin

Estimate of significant wave height from non-coherent marine radar images by multilayer perceptrons

On finite factorized groups with permutable subgroups of factors

The Forgotten Hyperparameter:

Hyperparameter Optimization of ICS Intrusion Detection Classifier Based on Improved Hybrid Algorithm

QIM: Quantifying Hyperparameter Importance for Deep Learning

Supervised Hyperparameter Estimation for Anomaly Detection

Visual tracking with multilayer filter fusion network