Image robust recognition based on feature-entropy-oriented differential fusion capsule network

  • PDF / 4,099,894 Bytes
  • 10 Pages / 595.224 x 790.955 pts Page_size
  • 112 Downloads / 192 Views

DOWNLOAD

REPORT


Image robust recognition based on feature-entropy-oriented differential fusion capsule network Kui Qian1 · Lei Tian1 · Yiting Liu1 · Xiulan Wen1 · Jiatong Bao2

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract In solving the black box attribute problem of neural networks, how to extract feature information in data and generalize inherent features of data are the focus of artificial intelligence research. Aiming at the problem of the weak generalization ability of large image transformation under deep convolutional networks, a new method for image robust recognition based on a feature-entropy-oriented differential fusion capsule network (DFC) is proposed, the core of which is feature entropy approximation. First, convolution feature entropy is introduced as the transformation metric at the feature extraction level, and a convolution difference scale space is constructed using a residual network to approximate the similar entropy. Then, based on this scale feature, convolution feature extraction in a lower scale space is carried out and fused with the last scale feature to form a convolution differential fusion feature. Finally, a capsule network is used to autonomously cluster using dynamic routing to complete the semantic learning of various high-dimensional features, thereby further enhancing the recognition robustness. Experimental results show that feature entropy can effectively evaluate the transformation image recognition effect, and the DFC is effective for robust recognition with large image transformations such as image translation, rotation, and scale transformation. Keywords Robust recognition · Capsule network · Differential fusion · Feature-entropy-oriented

1 Introduction Due to the popularity of deep convolutional neural networks (CNNs), computer vision technology has advanced rapidly. At present, it has achieved great success in the fields of image recognition [1–4], text processing [5, 6], voice processing [7, 8], and video analysis[9, 10]. Due to the combined effect of convolution and pooling, CNNs have always been considered to have a certain invariance, such as image translation, scaling and rotation[11]. Recently, more authors have shown that this is not the case: small translations or rescalings of the input image can drastically change the network’s prediction [12–15]. To improve the generalization performance of CNNs, many studies have focused on data enhancement or  Kui Qian

[email protected] 1

School of Automation, Nanjing Institute of Technology, Nanjing, China

2

School of Electrical, Energy and Power Engineering, Yangzhou University, Yangzhou, China

network improvement. Miko [16] proposed multiple data augmentation methods in the image classification task; thus, any particular input image can be seen by the network at different shifts and rescalings during training. Lenc and Vedaldi [17] estimated the linear relationships between representations of original and transformed images to ensure that CNN would learn a discriminant that is invariant to resiz