Video object detection algorithm based on dynamic combination of sparse feature propagation and dense feature aggregatio

  • PDF / 2,457,027 Bytes
  • 21 Pages / 439.37 x 666.142 pts Page_size
  • 37 Downloads / 209 Views

DOWNLOAD

REPORT


Video object detection algorithm based on dynamic combination of sparse feature propagation and dense feature aggregation Danyang Cao 1,2

1

& Jinfeng Ma & Zhixin Chen

1

Received: 5 November 2019 / Revised: 4 August 2020 / Accepted: 3 September 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

In comparison with static image object detection, focusing on video objects has greater research significance in realizing intelligent monitoring and automatic anomaly detection. However, it may be challenging to apply the most advanced image recognition networks to video data, as the number of static frame files represented in a video is often huge, thereby causing the problem of the slow evaluation speed, in addition to other issues, such as motion blur, low resolution, occlusion, and object deformation. In the present study, to mitigate these deficiencies, we applied sparse feature propagation to improve the detection speed and dense feature aggregation to refine the detection accuracy. Moreover, we utilized the key frame scheduling strategy relying on the consistency of feature information. Implementing these technologies allowed steadily improving the detection speed and accuracy to achieve high performance. To verify the applicability of the optimized video detection strategy proposed in this paper, we selected the part of the video data in the ImageNet VID training dataset. Then, the other part of this dataset was used to conduct the experiments, including the calculation and comparison of mean average precision (MAP) and frames per second (FPS). Keywords Deep learning . Object detection . Sparse feature propagation . Intensive feature aggregation

Danyang Cao, Jinfeng Ma and Zhixin Chen contributed equally to this work.

* Danyang Cao [email protected]

1

School of Information Science and Technology, North China University of Technology, No 5, Jin Yuan Zhuang Road, Beijing 100144, China

2

Beijing Key Laboratory on Integration and Analysis of Large-scale Stream Data, Beijing 100144, China

Multimedia Tools and Applications

1 Introduction Video object detection has a wider range of applications, compared to static image object detection. It is more research-oriented in terms of implementing intelligent monitoring and automatic anomaly detection. In recent years, the research works on video object detection based on deep convolution neural networks (CNNs) has attracted increasing attention from scholars, specifically, concerning the ImageNet VID competition. Aiming at the investigation of the different characteristics of video and image information, many studies have focused on developing and improving the object detection methods, specifically, concerning video data. In general, object detection in images still provides the foundation for video object detection. The video frame features can be extracted using a backbone network. Then, the detection network outputs the object type and the location information. The structure of an image feature extraction network is usually consider