Deep interactive encoding with capsule networks for image classification

  • PDF / 1,496,516 Bytes
  • 16 Pages / 439.642 x 666.49 pts Page_size
  • 101 Downloads / 270 Views

DOWNLOAD

REPORT


Deep interactive encoding with capsule networks for image classification Rita Pucci1

· Christian Micheloni1 · Gian Luca Foresti1 · Niki Martinel1

Received: 24 January 2020 / Revised: 2 July 2020 / Accepted: 28 July 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract With new architectures providing astonishing performance on many vision tasks, the interest in Convolutional Neural Networks (CNNs) has grown exponentially in the recent past. Such architectures, however, are not problem-free. For instance, one of the many issues is that they require a huge amount of labeled data and are not able to encode pose and deformation information. Capsule Networks (CapsNets) have been recently proposed as a solution to the issues related to CNNs. CapsNet achieved interesting results in images recognition by addressing pose and deformation encoding challenges. Despite their success, CapsNets are still an under-investigated architecture with respect to the more classical CNNs. Following the ideas of CapsNet, we propose to introduce Residual Capsule Network (ResNetCaps) and Dense Capsule Network (DenseNetCaps) to tackle the image recognition problem. With these two architectures, we expand the encoding phase of CapsNet by adding residual convolutional and densely connected convolutional blocks. In addition to this, we investigate the application of feature interaction methods between capsules to promote their cooperation while dealing with complex data. Experiments on four benchmark datasets demonstrate that the proposed approach performs better than existing solutions. Keywords Machine learning · Capsule network · Image classification · Bilinear function · Feature interaction

1 Introduction Human beings have astonishing visual recognition abilities. Transferring the same visual recognition ability on a visual-based machine, in order to solve everyday tasks, would open to a new plethora of intelligent systems. Yet the many challenges of such a task have denied the successful development of a machine matching human capabilities. To date, Deep Neural Networks (DNNs) [30, 32, 36, 48, 55] represent the most common approach in dealing with visual recognition challenges. The most famous DNN architecture for visual recognition is the Convolutional Neural Network (CNN) [29]. CNNs are dominating over visual  Rita Pucci

[email protected] 1

Department of Math, Computer science, and Physics, University of Udine, Udine, Italy

Multimedia Tools and Applications

recognition tasks by providing excellent results improving the state-of-the-art in many research fields including medical, ecological, and surveillance. For instance, in the medical field, CNNs made autonomous skin problem analysis possible [5, 11, 12, 15] and helped to diagnose organs diseases [1, 4, 20, 25]. It is also worth noting a growing interest in CNN in biology in particular bio-informatics. CNN has, in fact, been used for protein discovery [3, 23, 40] and conservation ecology research projects [6, 9, 13, 37, 50, 54]. Automatic analysis