An Improved Image Classification Method Considering Rotation Based on Convolutional Neural Network

Convolution Neural Network (CNN) is one of the most popular deep learning methods in recent years, which achieves great success in the field of image classification. In this paper, an improved image classification method considering rotation based on CNN

  • PDF / 752,176 Bytes
  • 9 Pages / 439.37 x 666.142 pts Page_size
  • 65 Downloads / 240 Views

DOWNLOAD

REPORT


Abstract. Convolution Neural Network (CNN) is one of the most popular deep learning methods in recent years, which achieves great success in the field of image classification. In this paper, an improved image classification method considering rotation based on CNN is proposed. Essentially, convolution is only a method to smooth the image, which doesn’t consider the effect of image rotation any more. It can be proven that after some images are rotated 180◦ , CNN can recognize them well while fail to recognize them before. So, rotation is one of the efficient ways to improve object recognization. Four kinds of typical CNN are adopted in this paper, which are CaffeNet, VGG16, VGG19 and GoolgeNet. It has been proven that the accurate rates are all increased no matter which one is adopted among these four CNN. This method proposed in this paper can recognize dangerous objects automatically with good performances. Keywords: Convolution Neural Network Rotation

1

·

Object recognization

·

Introduction

The feedforward neural network is one of the most popular methods for object recognition. However, as the depth and width of network increase, there are so many parameters to train. Compared with standard forward feedforward neural network, Convolutional Neural Networks (CNN) have much fewer connections and parameters and so they are easy to train (LeCun et al. 1990, 2004; Lee et al. 2009; Pinto et al. 2009; Jarrett et al. 2009; Turaga et al. 2010). ConvNets have recently achieved a great success in large-scale image and video recognition (Krizhevsky et al. 2012; Zeiler and Fergus 2014; Sermanet et al. 2014; Simonyan and Zisserman 2014) which has become possible due to the large public image repositories and high performance computing systems, such as GPUs or largescale distributed clusters (Dean et al. 2012). With CaffeNet becoming more of a commodity, a number of attempts have been made to improve original architecture to achieve better accuracy. Simonyan et al. proposed a thorough c Springer International Publishing Switzerland 2016  Y. Wang et al. (Eds.): BigCom 2016, LNCS 9784, pp. 421–429, 2016. DOI: 10.1007/978-3-319-42553-5 36

422

J. Qu

evaluation of network of increasing depth using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16–19 weight layers, called VGG net (Simonyan and Zisserman 2015). Google Inc. proposed a 22 layers deep convolutional neural network architecture for computer vision called GoogleNet. To optimize quality, the architectural decisions were based on the Hebbian principle and intuition of multi-scale processing (Szegedy et al. 2014). To train typical deeper and wider CNN mentioned above, larger dataset is essential. Previous datasets of labeled images were relatively small, only tens of or thousands of images, such as BORB (LeCun et al. 2004), Caltech-101/256 (Fei et al. 2007; Griffin et al. 2007) and CIFAR-10/100 (Krizhevsky 2009). Simple recognition tasks can be solved quite we