QIM: Quantifying Hyperparameter Importance for Deep Learning
Recently, Deep Learning (DL) has become super hot because it achieves breakthroughs in many areas such as image processing and face identification. The performance of DL models critically depend on hyperparameter settings. However, existing approaches tha
- PDF / 1,133,889 Bytes
- 9 Pages / 439.37 x 666.142 pts Page_size
- 76 Downloads / 216 Views
2
Beihang University, Beijing, China Shenzhen Institute of Advanced Technology, CAS, Shenzhen, China [email protected]
Abstract. Recently, Deep Learning (DL) has become super hot because it achieves breakthroughs in many areas such as image processing and face identification. The performance of DL models critically depend on hyperparameter settings. However, existing approaches that quantify the importance of these hyperparameters are time-consuming. In this paper, we propose a fast approach to quantify the importance of the DL hyperparameters, called QIM. It leverages Plackett-Burman design to collect as few as possible data but can still correctly quantify the hyperparameter importance. We conducted experiments on the popular deep learning framework – Caffe – with different datasets to evaluate QIM. The results show that QIM can rank the importance of the DL hyperparameters correctly with very low cost. Keywords: Deep learning
1
· Plackett-burman design · Hyperparameter
Introduction
Deep learning (DL) is a sub-field of machine learning (ML) that focuses on extracting features from data through multiple layers of abstraction. While DL algorithms usually behave very differently with variant models such as deep belief networks [8], convolutional networks [13], and stacked denoising autoencoders [17], all of which have up to hundreds of hyperparameters which significantly affect the performance of DL algorithms. Due to the inability for any one network to best generalize for all datasets, a necessary step before applying DL algorithm to a new dataset is to select an appropriate set of hyperparameters. To address this issue, a number of approaches are developed and the most popular three ones are (1) manual search, (2) grid search, and (3) random search [3]. These approaches have their respective advantages and disadvantages. However, how to optimize the hyperparameter settings for DL algorithms is still an open question. There has been a recent surge of interest in more sophisticated hyperparameter optimization methods [1,3,9,15]. For example, [3] has applied Bayesian optimization techniques for designing convolutional vision architectures by learning c IFIP International Federation for Information Processing 2016 Published by Springer International Publishing AG 2016. All Rights Reserved G.R. Gao et al. (Eds.): NPC 2016, LNCS 9966, pp. 180–188, 2016. DOI: 10.1007/978-3-319-47099-3 15
QIM: Quantifying Hyperparameter Importance for Deep Learning
181
Fig. 1. Deep learning architecture
a probabilistic model over the hyperparameter search space. However, all these approaches have not provide scientists with answers to questions like the following: how important is each of the hyperparameters, and how do their values affect performance? The answer to such questions is the key to scientific discoveries. However, not much work has been done on quantifying the relative importance of the hyperparameters that does matter. In this paper, we propose to quantify the importance of hyperparameters of DL using PB design [14], cal
Data Loading...