Transformation-invariant Gabor convolutional networks
- PDF / 395,133 Bytes
- 8 Pages / 595.276 x 790.866 pts Page_size
- 30 Downloads / 173 Views
ORIGINAL PAPER
Transformation-invariant Gabor convolutional networks Lei Zhuang1,2 · Feipeng Da1,2,3 · Shaoyan Gai1,2 · Mengxiang Li4 Received: 28 October 2019 / Revised: 17 February 2020 / Accepted: 30 March 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020
Abstract Although deep convolutional neural networks (DCNNs) have powerful capability of learning complex feature representations, they are limited by poor ability in handling large rotations and scale transformations. In this paper, we propose a novel alternative to conventional convolutional layer named Gabor convolutional layer (GCL) to enhance the robustness to transformations. The GCL is a simple but efficient combination of Gabor prior knowledge and parameters learning. A GCL is composed of three components: Gabor extraction module, weight-sharing convolution module, and transformation pooling module, respectively. DCNNs integrated with GCLs, referred to as transformation-invariant Gabor convolutional networks (TI-GCNs), can be easily built by replacing standard convolutional layers with designed GCLs. Our experimental results on various real-world recognition tasks indicate that encoding traditional hand-crafted Gabor filters with dominant orientation and scale information into DCNNs is of great importance for learning compact feature representations and reinforcing the resistance to scale changes and orientation variations. The source code can be found at https://github.com/GuichenLv. Keywords Gabor filters · Convolutional neural networks · Rotation · Character recognition
1 Introduction Deep convolutional neural networks (DCNNs) have led to a range of breakthroughs in various fields such as character recognition, object detection, face recognition, and semantic segmentation. However, the learned features are not robust enough to spatial geometric transformations due to the lack of specific modules designed for transformation. Although max pooling layer [2] endows DCNNs with the capacity to process scale changes and moderate rotations, the problem of large rotation and scale transformation cannot be completely solved without transformation encoding mechanism [15].
B
Feipeng Da [email protected]
1
The School of Automation, Southeast University, Nanjing, China
2
The Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, Nanjing, China
3
Shenzhen Research Institute, Southeast University, Shenzhen, China
4
The National Research Center of Overseas Sinology, Beijing Foreign Studies University, Beijing, China
Numerous state-of-the-art approaches were developed to encode transformation invariance into DCNNs, which can be roughly divided into two categories: transforming the input feature maps and transforming the filters. In [10], the localization network was introduced in spatial transformer networks to predict transformation parameters. The predicted transformation parameters were then used to produce transformed output. Randomly transforming the feature maps during training was introduce
Data Loading...