Gated Bi-directional CNN for Object Detection

The visual cues from multiple support regions of different sizes and resolutions are complementary in classifying a candidate box in object detection. How to effectively integrate local and contextual visual cues from these regions has become a fundamenta

PDF / 1,625,394 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
84 Downloads / 202 Views

DOWNLOAD

REPORT

The Chinese University of Hong Kong, Hong Kong, China {xyzeng,wlouyang,xgwang}@ee.cuhk.edu.hk 2 Sensetime Group Limited, Sha Tin, Hong Kong {yangbin,yanjunjie}@sensetime.com

Abstract. The visual cues from multiple support regions of diﬀerent sizes and resolutions are complementary in classifying a candidate box in object detection. How to eﬀectively integrate local and contextual visual cues from these regions has become a fundamental problem in object detection. Most existing works simply concatenated features or scores obtained from support regions. In this paper, we proposal a novel gated bi-directional CNN (GBD-Net) to pass messages between features from diﬀerent support regions during both feature learning and feature extraction. Such message passing can be implemented through convolution in two directions and can be conducted in various layers. Therefore, local and contextual visual patterns can validate the existence of each other by learning their nonlinear relationships and their close iterations are modeled in a much more complex way. It is also shown that message passing is not always helpful depending on individual samples. Gated functions are further introduced to control message transmission and their on-and-oﬀ is controlled by extra visual evidence from the input sample. GBD-Net is implemented under the Fast RCNN detection framework. Its eﬀectiveness is shown through experiments on three object detection datasets, ImageNet, Pascal VOC2007 and Microsoft COCO.

1

Introduction

Object detection is one of the fundamental vision problems. It provides basic information for semantic understanding of images and videos and has attracted a lot of attentions. Detection is regarded as a problem classifying candidate boxes. Due to large variations in viewpoints, poses, occlusions, lighting conditions and background, object detection is challenging. Recently, convolutional neural networks (CNNs) have been proved to be eﬀective for object detection [1–4] because of its power in learning features. In object detection, a candidate box is counted as true-positive for an object category if the intersection-over-union (IOU) between the candidate box and the ground-truth box is greater than a threshold. When a candidate box cover a part of the ground-truth regions, there are some potential problems. – Visual cues in this candidate box may not be suﬃcient to distinguish object categories. Take the candidate boxes in Fig. 1(a) for example, they cover parts c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part VII, LNCS 9911, pp. 354–369, 2016. DOI: 10.1007/978-3-319-46478-7 22

Gated Bi-directional CNN for Object Detection

355

of bodies and have similar visual cues, but with diﬀerent ground-truth class labels. It is hard to distinguish their class labels without information from larger surrounding regions of the candidate boxes. – Classiﬁcation on the candidate boxes depends on the occlusion status, which has to be inferred from larger surrounding regions. Because of occlusion, the cand

Data Loading...

Gated Bi-directional CNN for Object Detection

Recommend Documents

Bidirectional Non-local Networks for Object Detection

CNN-based single object detection and tracking in videos and its application to drone detection

Mask R-CNN-Based Welding Image Object Detection and Dynamic Modelling for WAAM

Anomaly Detection Using Bidirectional LSTM

Lightweight CNN for Robust Voice Activity Detection

Hyperspectral Image Classification Based on Bidirectional Gated Recurrent Units

Bangla Text Generation Using Bidirectional Optimized Gated Recurrent Unit Network

Improved SSD for Object Detection

Mixture Models for Object Detection

Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training

BiGCNN: Bidirectional Gated Convolutional Neural Network for Chinese Named Entity Recognition

Noise Resistant Focal Loss for Object Detection