Active weighted mapping-based residual convolutional neural network for image classification

  • PDF / 1,698,674 Bytes
  • 15 Pages / 439.642 x 666.49 pts Page_size
  • 15 Downloads / 176 Views

DOWNLOAD

REPORT


Active weighted mapping-based residual convolutional neural network for image classification Hyungho Jung1 · Ryong Lee2 · Sang-Hwan Lee2 · Wonjun Hwang1 Received: 19 March 2020 / Revised: 15 July 2020 / Accepted: 2 September 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract In visual recognition, the key to the performance improvement of ResNet is the success in establishing the stack of deep sequential convolutional layers using identical mapping by a shortcut connection. It results in multiple paths of data flow under a network and the paths are merged with the equal weights. However, it is questionable whether it is correct to use the fixed and predefined weights at the mapping units of all paths. In this paper, we introduce the active weighted mapping method which infers proper weight values based on the characteristic of input data on the fly. The weight values of each mapping unit are not fixed but changed as the input image is changed, and the most proper weight values for each mapping unit are derived according to the input image. For this purpose, channelwise information is embedded from both the shortcut connection and convolutional block, and then the fully connected layers are used to estimate the weight values for the mapping units. We train the backbone network and the proposed module alternately for a more stable learning of the proposed method. Results of the extensive experiments show that the proposed method works successfully on the various backbone architectures from ResNet to DenseNet. We also verify the superiority and generality of the proposed method on various datasets in comparison with the baseline. Keywords Deep learning · Object recognition · Convolutional neural network · Residual convolutional network  Wonjun Hwang

[email protected] Hyungho Jung [email protected] Ryong Lee [email protected] Sang-Hwan Lee [email protected] 1

Department of Artificial Intelligence, Ajou University, San 5-1, Woncheon-dong, Yeongtong-gu, Suwon-si, Gyeonggi-do, 16499, Korea

2

Research Data Sharing Center, Korea Institute of Science and Technology Information, Daejeon 34141, South Korea

Multimedia Tools and Applications

1 Introduction It has recently been noted that deeper stacking of the layers of a convolutional neural network lead to better accuracy of the visual recognition. A key challenge in visual recognition has been how to stack a larger number of layers efficiently. Several studies [4, 10, 17, 19, 22] have been done for this purpose. ResNet [4], which adds shortcut connections to implement identity mapping, offers a simple and effective method to more deeply stack convolutional layers without the gradient vanishing problem. After ResNet [4], many novel trials [2, 7, 8, 30] have focused on determining methods to develop efficient network architectures to ensure better accuracy in visual recognition. However, most of these methods have modified only the main architecture of the neural network based on the identical mapping using the shortcut connec