Impact of Hyperparameters on Model Development in Deep Learning

Deep learning has revolutionized the field of computer vision. To develop a deep learning model, one has to decide on the optimal values of various hyperparameters such as learning rate. These are also called as model parameters which are not learned by t

  • PDF / 1,236,633 Bytes
  • 11 Pages / 439.37 x 666.142 pts Page_size
  • 22 Downloads / 224 Views

DOWNLOAD

REPORT


and Raniah Zaheer

Abstract Deep learning has revolutionized the field of computer vision. To develop a deep learning model, one has to decide on the optimal values of various hyperparameters such as learning rate. These are also called as model parameters which are not learned by the model rather initialized by the user. Hyperparameters control other parameters of the model such as weights and biases. Parameter values are learned effectively by tuning the hyperparameters. Hence, hyperparameters determine the values of the parameters of the model. Manual Tuning is a tedious and timeconsuming process. Automating the selection of values for hyperparameters results in the development of effective models. It has to be investigated to figure out which combinations yield the optimum results. This work has considered scikit-optimize library functions to study the impact of hyperparameters on the accuracy of MNIST dataset classification problem. The results obtained for different combination of learning rate, number of dense layers, number of nodes per dense layer, and activation function showed that a minimum of 8.68% and a maximum of 98.96% for gp_minimize function, 8.68% and 98.74% for forest_minimize function and gbrt_minimize generates 9.24% and 98.94% for lowest and highest accuracy, respectively. Keywords Convolutional Neural Networks (CNN) · Deep learning · gp_minimize · forest_minimize · gbrt_minimize · Hyperparameters · Skopt

H. Shaziya (B) Department of Informatics, Nizam College, Osmania University, Hyderabad, India e-mail: [email protected] R. Zaheer Department of CS, Najran University, Najran, Saudi Arabia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 N. Chaki et al. (eds.), Proceedings of International Conference on Computational Intelligence and Data Engineering, Lecture Notes on Data Engineering and Communications Technologies 56, https://doi.org/10.1007/978-981-15-8767-2_5

57

58

H. Shaziya and R. Zaheer

1 Introduction Hyperparameters determine the values for various attributes of network structure and training process. Few examples of hyperparameters are learning rate, number of layers, number of hidden layers, number of filters, dropout rate, number of iterations, batch size, activation function, optimizer, and regularization. Tuning of these parameters requires an additional step in the deep learning model development. Hyperparameter tuning can be done manually or automatically. Manual is not an effective way of choosing values as it requires more time. Automatic methods are of three kinds, they are grid search, random search and bayesian optimization. Grid search considers every combination of the specified parameters which often leads to an enormous number of configurations. Sometimes it’s not feasible to look for every combination as it affects the performance drastically. Further to overcome this problem, random search was introduced. As the name suggests, hyperparameters are chosen randomly and experimented. Finally, bayesian method uses an objecti