Automatic model selection for fully connected neural networks

  • PDF / 585,892 Bytes
  • 17 Pages / 595.276 x 790.866 pts Page_size
  • 23 Downloads / 234 Views

DOWNLOAD

REPORT


Automatic model selection for fully connected neural networks David Laredo1 · Shangjie Frank Ma2 · Ghazaale Leylaz2 · Oliver Schütze1 · Jian-Qiao Sun2 Received: 2 July 2020 / Revised: 16 September 2020 / Accepted: 28 September 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Neural networks and deep learning are changing the way that artificial intelligence is being done. Efficiently choosing a suitable network architecture and fine tuning its hyper-parameters for a specific dataset is a time-consuming task given the staggering number of possible alternatives. In this paper, we address the problem of model selection by means of a fully automated framework for efficiently selecting a neural network model for a selected task, whether it is classification or regression. The algorithm, named Automatic Model Selection, is a modified micro-genetic algorithm that automatically and efficiently finds the most suitable fully connected neural network model for a given dataset. The main contributions of this method are: a simple, list based encoding for neural networks, which will be used as the genotype in our evolutionary algorithm, novel crossover and mutation operators, the introduction of a fitness function that considers the accuracy of the neural network and its complexity, and a method to measure the similarity between two neural networks. AMS is evaluated on two different datasets. By comparing some models obtained with AMS to state-of-the-art models for each dataset we show that AMS can automatically find efficient neural network models. Furthermore, AMS is computationally efficient and can make use of distributed computing paradigms to further boost its performance. Keywords Artificial neural networks · Model selection · Hyperparameter tuning · Distributed computing · Evolutionary algorithms

1 Introduction Machine learning (ML) studies algorithms that can perform simple tasks such as image classification, noise classification and face recognition without the need explicitly code the rules to perform such tasks. Thanks to the maturity of the internet, the proliferation of “the cloud”, and the increase of the byte per USD ratio for storage systems, large amounts of

B

Jian-Qiao Sun [email protected] David Laredo [email protected] Shangjie Frank Ma [email protected] Ghazaale Leylaz [email protected] Oliver Schütze [email protected]

1

Department of Computer Science, CINVESTAV, Mexico City, Mexico

2

Department of Mechanical Engineering, University of California, Merced, CA 95343, USA

data from many fields is now available. This availability of data along with and the affordability of computational power (mainly through on demand services such as AWS or Azure) machine learning has become available to mainstream users with very diverse backgrounds like mechanical engineering, bio engineering and finance. Among the main challenges of implementing a ML solution is the design of an efficient ML model, which involves the selection of a learning algorithm, hyper-parameters, fe