Convolutional Neural Networks for Pose Recognition in Binary Omni-directional Images

In this work, we present a methodology for pose classification of silhouettes using convolutional neural networks. The training set consists exclusively from the synthetic images that are generated from three-dimensional (3D) human models, using the calib

  • PDF / 2,570,760 Bytes
  • 11 Pages / 439.37 x 666.142 pts Page_size
  • 71 Downloads / 185 Views

DOWNLOAD

REPORT


Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece {spirosgeorg,vpp}@dib.uth.gr, [email protected], [email protected] 2 Department of Digital Systems, University of Piraeus, Piraeus, Greece [email protected]

Abstract. In this work, we present a methodology for pose classification of silhouettes using convolutional neural networks. The training set consists exclusively from the synthetic images that are generated from three-dimensional (3D) human models, using the calibration of an omni-directional camera (fish-eye). Thus, we are able to generate a large volume of training set that is usually required for Convolutional Neural Networks (CNNs). Testing is performed using synthetically generated silhouettes, as well as real silhouettes. This work is in the same realm with previous work utilizing Zernike image descriptors designed specifically for a calibrated fish-eye camera. Results show that the proposed method improves pose classification accuracy for synthetic images, but it is outperformed by our previously proposed Zernike descriptors in real silhouettes. The computational complexity of the proposed methodology is also examined and the corresponding results are provided. Keywords: Computer vision  Convolutional neural networks (CNNs) Omnidirectional image  Fish-eye camera calibration  Pose classification Synthetic silhouette

 

1 Introduction Several computer vision and Artificial Intelligence applications require classification of segmented objects in digital images and videos. The use of object descriptors is a conventional approach for object recognition though a variety of classifiers. Recently, many reports have been published supporting the ability of automatic feature extraction by Convolutional Neural Networks (CNNs) that achieve high classification accuracy in many generic object recognition tasks, without the need of user-defined features. This approach is often referred to as deep learning. More specifically, CNNs are state of the art classification methods in several problems of computer vision. They have been suggested for pattern recognition [2], object localization [3], object classification in large-scale database of real world images © IFIP International Federation for Information Processing 2016 Published by Springer International Publishing Switzerland 2016. All Rights Reserved L. Iliadis and I. Maglogiannis (Eds.): AIAI 2016, IFIP AICT 475, pp. 106–116, 2016. DOI: 10.1007/978-3-319-44944-9_10

Convolutional Neural Networks for Pose Recognition

107

[4], and malignancy detection on medical images [5–7] Several reports exist in literature for the problem of human pose estimation and can be categorized in two approaches. The first approach relies on leveraging images local descriptors (HoG [8], SHIFT [9], Zernike [1, 10, 11, 12] to extract features and subsequently constructing a model for classification. The second approach is based on model fitting processes [13, 14]. CNNs are trainable multistage architectures that belong to the first approach o