Pyramid context learning for object detection

PDF / 1,336,375 Bytes
14 Pages / 439.37 x 666.142 pts Page_size
100 Downloads / 366 Views

Pyramid context learning for object detection Pengxin Ding1,2 · Jianping Zhang2 · Huan Zhou1 · Xiang Zou2 · Minghui Wang1

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Contextual information in complex scenarios is critical for accurate object detection. Existing state-of-the-art detectors have greatly improved detection performance with the use of contexts around objects. However, these detectors consider the local and global contexts separately, which limits the improvement in detection accuracy. In this paper, we propose a pyramid context learning module (PCL) for object detection, which makes full use of the feature context at different levels. Specifically, two operators, named aggregation and distribution, are designed to assemble and synthesize contextual information at different levels. In addition, a channel context learning operator is also used to capture the channel context. PCL is a universal module, so it can be easily integrated into most of the detection frameworks. To evaluate our PCL, we apply it into some popular detectors, e.g., SSD, Faster R-CNN and RetinaNet, and conduct extensive experiments on PASCAL VOC and MS COCO datasets. Experimental results show that PCL can produce competitive performance gains and significantly improve the baselines. Keywords Object detection · Contextual learning · Aggregation operation · Distribution operation

1 Introduction Object detection is one of the fundamental research fields in computer vision. In general, object detection is to predict the location of each object by a rectangle bounding box and assign a class label to the content of the bounding box. In recent years, deep neural networks have achieved success in object detection task [28, 30, 32, 35]. All detectors based on deep neural networks can be coarsely divided into two categories: one stage and two stage. Two-stage detectors, such * Minghui Wang [email protected] 1

College of Computer Science, Sichuan University, Chengdu, China

2

The Second Research Institute of CAAC, Chengdu, China

13

Vol.:(0123456789)

P. Ding et al.

as Faster R-CNN [35], Mask R-CNN [14] and Cascade R-CNN [2], have achieved impressive performance on the public PASCAL VOC [10] and MS COCO [29] datasets. Different from two-stage detectors that focus all attention on the detection accuracy, one-stage detectors [20, 34, 44] have a better balance in the detection accuracy and detection speed. Despite the success in object detection, most of the advanced detectors cannot deal with complex scenarios, such as containing small objects, occluded objects, varied size objects or dense objects. To deal better with existing issues in object detection, much effort, such as context augmentation [3, 6, 23, 39], training strategy [36, 41], structure optimization [5, 7, 19, 26, 46], multi-task learning [14] and attention mechanism [13, 23], has been made to improve the detection performance. Recently, many studies attempt to exploit contextual information to object detection, which have ach

Data Loading...

Pyramid context learning for object detection

Recommend Documents

Recursive Context Routing for Object Detection

Learning Data Augmentation Strategies for Object Detection

Context Driven Focus of Attention for Object Detection

Hierarchical Context Embedding for Region-Based Object Detection

Learning region-guided scale-aware feature selection for object detection

Visual Compositional Learning for Human-Object Interaction Detection

Learning Where to Focus for Efficient Video Object Detection

Improved SSD for Object Detection

Mixture Models for Object Detection

Multi-scale Object Detection in Optical Remote Sensing Images Using Atrous Feature Pyramid Network

Accurate RGB-D Salient Object Detection via Collaborative Learning

Object Detection and Tracking with UAV Data Using Deep Learning