Pyramid context learning for object detection

  • PDF / 1,336,375 Bytes
  • 14 Pages / 439.37 x 666.142 pts Page_size
  • 100 Downloads / 263 Views

DOWNLOAD

REPORT


Pyramid context learning for object detection Pengxin Ding1,2 · Jianping Zhang2 · Huan Zhou1 · Xiang Zou2 · Minghui Wang1 

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Contextual information in complex scenarios is critical for accurate object detection. Existing state-of-the-art detectors have greatly improved detection performance with the use of contexts around objects. However, these detectors consider the local and global contexts separately, which limits the improvement in detection accuracy. In this paper, we propose a pyramid context learning module (PCL) for object detection, which makes full use of the feature context at different levels. Specifically, two operators, named aggregation and distribution, are designed to assemble and synthesize contextual information at different levels. In addition, a channel context learning operator is also used to capture the channel context. PCL is a universal module, so it can be easily integrated into most of the detection frameworks. To evaluate our PCL, we apply it into some popular detectors, e.g., SSD, Faster R-CNN and RetinaNet, and conduct extensive experiments on PASCAL VOC and MS COCO datasets. Experimental results show that PCL can produce competitive performance gains and significantly improve the baselines. Keywords  Object detection · Contextual learning · Aggregation operation · Distribution operation

1 Introduction Object detection is one of the fundamental research fields in computer vision. In general, object detection is to predict the location of each object by a rectangle bounding box and assign a class label to the content of the bounding box. In recent years, deep neural networks have achieved success in object detection task [28, 30, 32, 35]. All detectors based on deep neural networks can be coarsely divided into two categories: one stage and two stage. Two-stage detectors, such * Minghui Wang [email protected] 1

College of Computer Science, Sichuan University, Chengdu, China

2

The Second Research Institute of CAAC​, Chengdu, China



13

Vol.:(0123456789)



P. Ding et al.

as Faster R-CNN [35], Mask R-CNN [14] and Cascade R-CNN [2], have achieved impressive performance on the public PASCAL VOC [10] and MS COCO [29] datasets. Different from two-stage detectors that focus all attention on the detection accuracy, one-stage detectors [20, 34, 44] have a better balance in the detection accuracy and detection speed. Despite the success in object detection, most of the advanced detectors cannot deal with complex scenarios, such as containing small objects, occluded objects, varied size objects or dense objects. To deal better with existing issues in object detection, much effort, such as context augmentation [3, 6, 23, 39], training strategy [36, 41], structure optimization [5, 7, 19, 26, 46], multi-task learning [14] and attention mechanism [13, 23], has been made to improve the detection performance. Recently, many studies attempt to exploit contextual information to object detection, which have ach