Using Deep Learning to Find Victims in Unknown Cluttered Urban Search and Rescue Environments
- PDF / 1,543,245 Bytes
- 11 Pages / 595.276 x 790.866 pts Page_size
- 35 Downloads / 172 Views
DEFENSE, MILITARY, AND SURVEILLANCE ROBOTICS (S FERRARI AND P ZHU, SECTION EDITORS)
Using Deep Learning to Find Victims in Unknown Cluttered Urban Search and Rescue Environments Angus Fung 1
&
Long Yu Wang 1 & Kaicheng Zhang 1 & Goldie Nejat 1 & Beno Benhabib 1
# Springer Nature Switzerland AG 2020
Abstract Purpose of Review We investigate the first use of deep networks for victim identification in Urban Search and Rescue (USAR). Moreover, we provide the first experimental comparison of single-stage and two-stage networks for body part detection, for cases of partial occlusions and varying illumination, on a RGB-D dataset obtained by a mobile robot navigating cluttered USAR-like environments. Recent Findings We considered the single-stage detectors Single Shot Multi-box Detector, You Only Look Once, and RetinaNet and the two-stage Feature Pyramid Network detector. Experimental results show that RetinaNet has the highest mean average precision (77.66%) and recall (86.98%) for detecting victims with body part occlusions in different lighting conditions. Summary End-to-end deep networks can be used for finding victims in USAR by autonomously extracting RGB-D image features from sensory data. We show that RetinaNet using RGB-D is robust to body part occlusions and low-lighting conditions and outperforms other detectors regardless of the image input type. Keywords Urban search and rescue . Victim identification . Body part occlusion . Low-lighting conditions . Deep learning
Introduction Autonomous victim identification in urban search and rescue (USAR) scenes is challenging due to the occlusion of body parts in cluttered environments, variations in body poses and sensory viewpoints, and sensor noise [1]. The majority of classical This article is part of the Topical Collection on Defense, Military, and Surveillance Robotics * Angus Fung [email protected] Long Yu Wang [email protected] Kaicheng Zhang [email protected] Goldie Nejat [email protected] Beno Benhabib [email protected] 1
Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON M5S 3G8, Canada
learning approaches that have been developed to detect human body parts in cluttered USAR environments have focused on first extracting a set of handcrafted features, such as human geometric and skin region features [1] or histograms of oriented gradients (HOG) [2], and, then, training a supervised learning model (e.g., support vector machines (SVM)) using these features. The manual design of the features often requires empirical selection and validation [3], which can be time-consuming and entail expert knowledge. Furthermore, these approaches also use pre-defined rules to analyze the grouping of human parts. However, in USAR scenes, due to occlusions, multiple body parts of a person may not be visible at the same time for such groupings to occur. Deep networks have the potential to be used in USAR to autonomously extract features directly from sensory data. While they have been applied to human bo
Data Loading...