A review of object detection based on deep learning

  • PDF / 4,399,528 Bytes
  • 63 Pages / 439.642 x 666.49 pts Page_size
  • 97 Downloads / 268 Views

DOWNLOAD

REPORT


A review of object detection based on deep learning Youzi Xiao1 · Zhiqiang Tian1 Du2 · Xuguang Lan2

· Jiachen Yu1 · Yinshu Zhang1 · Shuai Liu1 · Shaoyi

Received: 25 April 2019 / Revised: 14 February 2020 / Accepted: 22 April 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract With the rapid development of deep learning techniques, deep convolutional neural networks (DCNNs) have become more important for object detection. Compared with traditional handcrafted feature-based methods, the deep learning-based object detection methods can learn both low-level and high-level image features. The image features learned through deep learning techniques are more representative than the handcrafted features. Therefore, this review paper focuses on the object detection algorithms based on deep convolutional neural networks, while the traditional object detection algorithms will be simply introduced as well. Through the review and analysis of deep learning-based object detection techniques in recent years, this work includes the following parts: backbone networks, loss functions and training strategies, classical object detection architectures, complex problems, datasets and evaluation metrics, applications and future development directions. We hope this review paper will be helpful for researchers in the field of object detection. Keywords Object detection · Deep learning · Deep convolutional neural networks · Computer vision

1 Introduction The essence of object detection is to locate and classify objects, which uses rectangular bounding boxes to locate the detected objects and classify the categories of the objects. Object detection has some relations with object classification, semantic segmentation and instance segmentation. The details are illustrated in Fig. 1. Object detection is an important area of computer vision and has important applications in scientific research and practical industrial production, such as face detection [215], text detection [94, 282], pedestrian detection [170, 274], logo detection [87, 108], video detection [102, 103], vehicle detection [23, 54], and medical image detection [145], the details are shown in Fig. 2. The limitation  Zhiqiang Tian

[email protected] 1

School of Software Engineering, Xi’an Jiaotong University, Xi’an, China

2

Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, China

Multimedia Tools and Applications

Fig. 1 a Object classification needs to identify the category of objects in image. b Object detection not only needs to identify the category of objects, but also needs to locate the objects with rectangular bounding boxes. c Semantic segmentation only needs to predict the categories of each pixel, and does not need to distinguish the object instances. d Instance segmentation needs to predict both the categories of each pixel and object instances

of the computing resources, the datasets, and the basic theories have limited the development and application of deep neural networks in recent de