DAVE: A Unified Framework for Fast Vehicle Detection and Annotation

Vehicle detection and annotation for streaming video data with complex scenes is an interesting but challenging task for urban traffic surveillance. In this paper, we present a fast framework of Detection and Annotation for Vehicles (DAVE), which effectiv

  • PDF / 6,324,125 Bytes
  • 16 Pages / 439.37 x 666.142 pts Page_size
  • 62 Downloads / 229 Views

DOWNLOAD

REPORT


Northumbria University, Newcastle upon Tyne NE1 8ST, UK {yi2.zhou,li2.liu,ling.shao}@northumbria.ac.uk 2 Createc, Cockermouth, Cumbria CA13 0HT, UK [email protected]

Abstract. Vehicle detection and annotation for streaming video data with complex scenes is an interesting but challenging task for urban traffic surveillance. In this paper, we present a fast framework of Detection and Annotation for Vehicles (DAVE), which effectively combines vehicle detection and attributes annotation. DAVE consists of two convolutional neural networks (CNNs): a fast vehicle proposal network (FVPN) for vehicle-like objects extraction and an attributes learning network (ALN) aiming to verify each proposal and infer each vehicle’s pose, color and type simultaneously. These two nets are jointly optimized so that abundant latent knowledge learned from the ALN can be exploited to guide FVPN training. Once the system is trained, it can achieve efficient vehicle detection and annotation for real-world traffic surveillance data. We evaluate DAVE on a new self-collected UTS dataset and the public PASCAL VOC2007 car and LISA 2010 datasets, with consistent improvements over existing algorithms. Keywords: Vehicle detection · Attributes annotation knowledge guidance · Joint learning · Deep networks

1

·

Latent

Introduction and Related Work

Automatic analysis of urban traffic activities is an urgent need due to essential traffic management and increased vehicle violations. Among many traffic surveillance techniques, computer vision-based methods have attracted a great deal of attention and made great contributions to realistic applications such as vehicle counting, target vehicle retrieval and behavior analysis. In these research areas, efficient and accurate vehicle detection and attributes annotation is the most important component of traffic surveillance systems. Vehicle detection is a fundamental objective of traffic surveillance. Traditional vehicle detection methods can be categorized into frame-based and motion-based approaches [1,2]. For motion-based approaches, frames subtraction [3], adaptive background modeling [4] and optical flow [5,6] are often utilized. However, some non-vehicle moving objects will be falsely detected with c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part II, LNCS 9906, pp. 278–293, 2016. DOI: 10.1007/978-3-319-46475-6 18

DAVE: A Unified Framework for Fast Vehicle Detection and Annotation

279

motion-based approaches since less visual information is exploited. To achieve higher detection performance, recently, the deformable part-based model (DPM) [7] employs a star-structured architecture consisting of root and parts filters with associated deformation models for object detection. DPM can successfully handle deformable object detection even when the target is partially occluded. However, it leads to heavy computational costs due to the use of the sliding window procedure for appearance features extraction and classification. With the wide success of deep networks on image classification