From On-Road to Off: Transfer Learning Within a Deep Convolutional Neural Network for Segmentation and Classification of

Real-time road-scene understanding is a challenging computer vision task with recent advances in convolutional neural networks (CNN) achieving results that notably surpass prior traditional feature driven approaches. Here, we take an existing CNN architec

  • PDF / 2,225,794 Bytes
  • 14 Pages / 439.37 x 666.142 pts Page_size
  • 65 Downloads / 208 Views

DOWNLOAD

REPORT


Institute for Infocomm Research, Singapore, Singapore 2 School of Engineering and Computer Sciences, Durham University, Durham, UK [email protected]

Abstract. Real-time road-scene understanding is a challenging computer vision task with recent advances in convolutional neural networks (CNN) achieving results that notably surpass prior traditional feature driven approaches. Here, we take an existing CNN architecture, pre-trained for urban road-scene understanding, and retrain it towards the task of classifying off-road scenes, assessing the network performance within the training cycle. Within the paradigm of transfer learning we analyse the effects on CNN classification, by training and assessing varying levels of prior training on varying sub-sets of our off-road training data. For each of these configurations, we evaluate the network at multiple points during its training cycle, allowing us to analyse in depth exactly how the training process is affected by these variations. Finally, we compare this CNN to a more traditional approach using a feature-driven Support Vector Machine (SVM) classifier and demonstrate state-of-the-art results in this particularly challenging problem of off-road scene understanding.

1 Introduction Scene understanding is a vital step in an autonomous vehicle processing pipeline, but this can be especially challenging in an off-road, unstructured environment. Knowledge about upcoming terrain and obstacles is necessary for deciding on the optimum path through such an environment, and can also be used to inform vehicle driving parameters to improve traction, efficiency and maximise passenger comfort and safety. Whole scene understanding is a well-discussed problem with applications in many domains [1, 2]. Recent contributions have used convolutional neural network (CNN) based approaches to achieve state-of-the-art results [3], while approaches combining hand-crafted features with linear classifiers have been somewhat side-lined [4]. Work in the domain of scene understanding for autonomous vehicles has followed this trend [5, 6], however there is very little work applying deep-learning techniques to the more challenging off-road environment. This paper aims to assess the applicability to such an environment of a state-of-the-art CNN architecture that was originally designed and trained to perform per-pixel classification on urban road scene images [6]. © Springer International Publishing Switzerland 2016 G. Hua and H. Jégou (Eds.): ECCV 2016 Workshops, Part I, LNCS 9913, pp. 149–162, 2016. DOI: 10.1007/978-3-319-46604-0_11

150

C.J. Holder et al.

Fig. 1. Architecture of the Segnet convolutional neural network [6]. The encoder network, consisting of convolution and pooling layers, is followed by a mirror-image decoder network, consisting of convolution and up-sampling layers

Within this work we perform transfer learning, taking a CNN architecture that has already been originally trained to classify a large, often more generic data set and re-training it from this initialization to a more s