3D Object Recognition Based on Volumetric Representation Using Convolutional Neural Networks

Following the success of Convolutional Neural Networks on object recognition and image classification using 2D images; in this work the framework has been extended to process 3D data. However, many current systems require huge amount of computation cost f

  • PDF / 1,457,200 Bytes
  • 10 Pages / 439.37 x 666.142 pts Page_size
  • 85 Downloads / 250 Views

DOWNLOAD

REPORT


Movidius Ltd., 1st Floor, O’Connell Br House, D’Olier St, Dublin, Ireland {xiaofan.xu,alireza.dehghani,sam.caulfield,david.moloney}@movidius.com 2 Trinity College Dublin, College Green, Dublin, Ireland [email protected]

Abstract. Following the success of Convolutional Neural Networks on object recognition and image classification using 2D images; in this work the framework has been extended to process 3D data. However, many current systems require huge amount of computation cost for dealing with large amount of data. In this work, we introduce an efficient 3D volumetric representation for training and testing CNNs and we also build several datasets based on the volumetric representation of 3D digits, different rotations along the x, y and z axis are also taken into account. Unlike the normal volumetric representation, our datasets are much less memory usage. Finally, we introduce a model based on the combination of CNN models, the structure of the model is based on the classical LeNet. The accuracy result achieved is beyond the state of art and it can classify a 3D digit in around 9 ms. Keywords: 3D object recognition · Volumetric representation · 3D digit dataset · CNN

1

Introduction

Object Recognition (OR) is widely used in our daily life for the purposes of inspection, registration, and manipulation [1]. The well-known applications such as Google, Facebook and Baidu are probably the most famous websites which use OR on a large scale. Generally, object classification is performed using colour based segmentation methods or from grayscale images using classification methods such as HoG [2]/SVM [3] or other classifiers. However, nowadays deep learning is becoming ubiquitous. We are now able to solve some of the problems once considered impossible in fields such as computer vision, natural language processing, and robotics with recent advancements in deep learning algorithms. Machine learning techniques use data (images, signals, text) to train a model (or machine) to perform image classification or object detection. Although classical machine learning techniques are still being used to solve challenging image c Springer International Publishing Switzerland 2016  F.J. Perales and J. Kittler (Eds.): AMDO 2016, LNCS 9756, pp. 147–156, 2016. DOI: 10.1007/978-3-319-41778-3 15

148

X. Xu et al.

classification problems, they don’t work well when applied directly to images. This is because they ignore the structure and compositional nature of images. The state-of-art CNN technique which is a specific type of deep learning algorithm, addresses the gaps in traditional machine learning techniques. CNNs not only perform classification, but they can also learn to extract features directly from raw images which eliminates the need for manual feature extraction. The performance of deep learning which uses CNNs has rapidly grown to over 95 % accuracy (GoogLeNet [4], VGG [5], AlexNet [6] etc.) in recent years with the availability of large labelled datasets and powerful GPUs. These methods, while appealing from an accuracy standpoint, are