A Sparse Pyramid Pooling Strategy
In this paper, we introduce a more principled pooling strategy for the Convolutional Restricted Boltzmann Machine. In order to solve the information loss problem of pooling operation and inspired by the idea of spatial pyramid, we replace the probabilisti
- PDF / 13,988,682 Bytes
- 11 Pages / 439.37 x 666.14 pts Page_size
- 33 Downloads / 204 Views
Abstract. In this paper, we introduce a more principled pooling strategy for the Convolutional Restricted Boltzmann Machine. In order to solve the information loss problem of pooling operation and inspired by the idea of spatial pyramid, we replace the probabilistic max-pooling with our sparse pyramid pooling, which produces outputs of different sizes for different pyramid levels. And then we use sparse coding method to aggregate the multi-level feature maps. The experimental results on KTH action dataset and Maryland dynamic scenes dataset show that the sparse pyramid pooling achieves superior performance to the conventional probabilistic max-pooling. In addition, our pooling strategy can effectively improve the performance of deep neural network on video classification. Keywords: Probabilistic max-pooling · Spatial pyramid pooling · Sparse coding · Deep neural network
1
Introduction
Deep learning emerges as a new area of Machine Learning research, and has achieved significant success in many artificial intelligence applications, such as speech recognition [1] and image processing [2]. In recent years, as more and more high-tech companies invest their resources for the development of deep learning, new architectures or algorithms may appear every few weeks. As a branch of machine learning, Deep learning refers to these feature learning methods that use unsupervised or/and supervised strategies to learn abstract feature representations in each layer of deep architectures, with the layers forming a hierarchy from low-level to high-level features [3,4,5]. In order to good learn feature representations of data, deep neural network focuses on end-to-end feature learning based on raw inputs regardless of label information in training and can compactly represent complex functions with the number of hidden units that is polynomial in the number of inputs. One important branch in the field of deep neural network, convolutional neural network (CNN) uses a combination of supervised and unsupervised method to learn multiple stages of invariant features. Each stage of the CNN includes convolution layer and pooling/subsampling layer. In the convolution step, the same feature is applied to different locations for the stationary property of natural images. In other words, the convolutional layer extracts the common patterns in local regions of the inputs. In the pooling step, responses over nearby locations are summarized to make the representation invariance to small spatial shifts and geometric distortions. © Springer-Verlag Berlin Heidelberg 2015 H. Zha et al. (Eds.): CCCV 2015, Part II, CCIS 547, pp. 406–416, 2015. DOI: 10.1007/978-3-662-48570-5_39
A Sparse Pyramid Pooling Strategy
407
The Convolutional Restricted Boltzmann machine (CRBM) is very similar to the stage of conventional convolutional network in terms of its structure. CRBM can be trained in an unsupervised way similar to that for the Restricted Boltzmann machine. In this work, a novel type of pooling strategy, called sparse pyramid pooling, is proposed for C
Data Loading...