Mask-guided SSD for small-object detection
- PDF / 2,314,561 Bytes
- 12 Pages / 595.224 x 790.955 pts Page_size
- 12 Downloads / 278 Views
Mask-guided SSD for small-object detection Chang Sun1 · Yibo Ai1 · Sheng Wang2 · Weidong Zhang1 Accepted: 13 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Detecting small objects is a challenging job for the single-shot multibox detector (SSD) model due to the limited information contained in features and complex background interference. Here, we increased the performance of the SSD for detecting target objects with small size by enhancing detection features with contextual information and introducing a segmentation mask to eliminate background regions. The proposed model is referred to as a “guided SSD” (Mask-SSD) and includes two branches: a detection branch and a segmentation branch. We created a feature-fusion module to allow the detection branch to exploit contextual information for feature maps with large resolution, with the segmentation branch primarily built with atrous convolution to provide additional contextual information to the detection branch. The input of the segmentation branch was also the output of the detection branch, and output segmentation features were fused with detection features in order to classify and locate target objects. Additionally, segmentation features were applied to generate the mask, which was utilized to guide the detection branch to find objects in potential foreground regions. Evaluation of Mask-SSD on the Tsinghua-Tencent 100K and Caltech pedestrian datasets demonstrated its effectiveness at detecting small objects and comparable performance relative to other state-of-the-art methods. Keywords Deep learning · Neural network · Object detection · Atrous convolution · Feature fusion
1 Introduction With the development of convolution neural networks (CNNs), significant improvements in object detection have been achieved in both research and application areas. Object detection is a fundamental task in computer vision and is widely used in disease diagnosis [1], intelligent security [2], and autonomous driving [3]. CNN-based object-detection models are usually divided into two sets: single-stage [4– 6] and two-stage methods [7–9]. These methods have been trained and evaluated on several open-source datasets, including PASCAL VOC [10] and COCO [11]; however, objects in VOC and COCO are usually large. Considering Weidong Zhang
[email protected] Sheng Wang [email protected] 1
National Center for Materials Service Safety, University of Science and Technology Beijing, Beijing, China
2
AI Lab, UCAR, 118 East Zhongguancun Road, Haidian Dist., Beijing, China
the definition of small objects in COCO, the images containing small objects only occupy 51.82% in COCO, the small objects account for 41.43% of all objects, while the large objects account for 24.24% [12]. Most of the objects in VOC dataset occupy more than 20% of the entire image [13]. When dealing with small objects, detection methods trained on large-object datasets might not be suitable. To better evaluate the performance of methods at detecting small objects, the T
Data Loading...