Does Removing Pooling Layers from Convolutional Neural Networks Improve Results?
- PDF / 830,158 Bytes
- 10 Pages / 595.276 x 790.866 pts Page_size
- 23 Downloads / 293 Views
ORIGINAL RESEARCH
Does Removing Pooling Layers from Convolutional Neural Networks Improve Results? Claudio Filipi Goncalves dos Santos1 · Thierry Pinheiro Moreira2 · Danilo Colombo3 · João Paulo Papa2 Received: 22 June 2020 / Accepted: 7 August 2020 © Springer Nature Singapore Pte Ltd 2020
Abstract Due to their number of parameters, convolutional neural networks are known to take long training periods and extended inference time. Learning may take so much computational power that it requires a costly machine and, sometimes, weeks for training. In this context, there is a trend already in motion to replace convolutional pooling layers for a stride operation in the previous layer to save time. In this work, we evaluate the speedup of such an approach and how it trades off with accuracy loss in multiple computer vision domains, deep neural architectures, and datasets. The results showed significant acceleration with an almost negligible loss in accuracy, when any, which is a further indication that convolutional pooling on deep learning performs redundant calculations. Keywords Pooling · Convolutional neural networks · Gait recognition · Optical character recognition
Introduction Convolutional neural networks (CNNs) were firstly designed based on human visual cortex [22]. Such a brain region comprises two main types of cells: (1) simple cells, which are computationally emulated by CNN kernels; and (2) complex cells, that can be found either in the primary visual cortex [12], secondary visual cortex, and the Broadman area 19 of the human brain [14]. The former cells are allocated in the primary visual cortex, and such structures respond mainly to edges and bars [13]. The former cells respond both to edges and gradings, like a simple cell, but also to spatial
* Claudio Filipi Goncalves dos Santos [email protected] Thierry Pinheiro Moreira [email protected] Danilo Colombo [email protected] João Paulo Papa [email protected] 1
UFSCar, Federal University of São Carlos, São Carlos, Brazil
2
UNESP, State University of Sao Paulo, Bauru, Brazil
3
Cenpes, Petróleo Brasileiro S.A., Petrobras, Rio de Janeiro, RJ, Brazil
invariance. It means that such cells react to light patterns in a large receptive field on a given orientation. Based on such biological properties, LeCun et al. [22] developed the first successful CNN model. Its structure consists of a total of seven layers: two pairs of convolutions followed by an average pooling, two multi-layer perceptrons layer, and a final layer responsible for classification. Convolutional pooling is a dimensionality reduction technique that preserves locality in feature maps, i.e., it reduces the size of an internal representation with little loss of information. These characteristics are ideal for convolutional neural networks because it dramatically reduces the number of hyperparameters, allowing them to perform faster, fit in GPU memory, and restrict the search space. Since CNNs are acknowledged to require high computational cost for training and,
Data Loading...