QIM: Quantifying Hyperparameter Importance for Deep Learning

Recently, Deep Learning (DL) has become super hot because it achieves breakthroughs in many areas such as image processing and face identification. The performance of DL models critically depend on hyperparameter settings. However, existing approaches tha

PDF / 1,133,889 Bytes
9 Pages / 439.37 x 666.142 pts Page_size
76 Downloads / 238 Views

DOWNLOAD

REPORT

2

Beihang University, Beijing, China Shenzhen Institute of Advanced Technology, CAS, Shenzhen, China [email protected]

Abstract. Recently, Deep Learning (DL) has become super hot because it achieves breakthroughs in many areas such as image processing and face identiﬁcation. The performance of DL models critically depend on hyperparameter settings. However, existing approaches that quantify the importance of these hyperparameters are time-consuming. In this paper, we propose a fast approach to quantify the importance of the DL hyperparameters, called QIM. It leverages Plackett-Burman design to collect as few as possible data but can still correctly quantify the hyperparameter importance. We conducted experiments on the popular deep learning framework – Caﬀe – with diﬀerent datasets to evaluate QIM. The results show that QIM can rank the importance of the DL hyperparameters correctly with very low cost. Keywords: Deep learning

1

· Plackett-burman design · Hyperparameter

Introduction

Deep learning (DL) is a sub-ﬁeld of machine learning (ML) that focuses on extracting features from data through multiple layers of abstraction. While DL algorithms usually behave very diﬀerently with variant models such as deep belief networks [8], convolutional networks [13], and stacked denoising autoencoders [17], all of which have up to hundreds of hyperparameters which signiﬁcantly aﬀect the performance of DL algorithms. Due to the inability for any one network to best generalize for all datasets, a necessary step before applying DL algorithm to a new dataset is to select an appropriate set of hyperparameters. To address this issue, a number of approaches are developed and the most popular three ones are (1) manual search, (2) grid search, and (3) random search [3]. These approaches have their respective advantages and disadvantages. However, how to optimize the hyperparameter settings for DL algorithms is still an open question. There has been a recent surge of interest in more sophisticated hyperparameter optimization methods [1,3,9,15]. For example, [3] has applied Bayesian optimization techniques for designing convolutional vision architectures by learning c IFIP International Federation for Information Processing 2016 Published by Springer International Publishing AG 2016. All Rights Reserved G.R. Gao et al. (Eds.): NPC 2016, LNCS 9966, pp. 180–188, 2016. DOI: 10.1007/978-3-319-47099-3 15

QIM: Quantifying Hyperparameter Importance for Deep Learning

181

Fig. 1. Deep learning architecture

a probabilistic model over the hyperparameter search space. However, all these approaches have not provide scientists with answers to questions like the following: how important is each of the hyperparameters, and how do their values aﬀect performance? The answer to such questions is the key to scientiﬁc discoveries. However, not much work has been done on quantifying the relative importance of the hyperparameters that does matter. In this paper, we propose to quantify the importance of hyperparameters of DL using PB design [14], cal

Data Loading...

QIM: Quantifying Hyperparameter Importance for Deep Learning

Recommend Documents

Hyperparameter Optimization in Machine Learning Make Your Machin

Deep Learning the City: Quantifying Urban Perception at a Global Scale

Machine learning on quantifying quantum steerability

Deep Learning for Cancer Diagnosis

Deep Dictionary Learning for Inpainting

Deep Learning

Deep Learning

Supervised Hyperparameter Estimation for Anomaly Detection

The Forgotten Hyperparameter:

Machine Learning vs. Deep Learning

Exploration of Hyperparameter in Extreme Learning Machine for Brain MRI Datasets

Using Applicability to Quantifying Octave Resonance in Deep Neural Networks