Multi-scale ResNet for real-time underwater object detection
- PDF / 1,406,198 Bytes
- 9 Pages / 595.276 x 790.866 pts Page_size
- 108 Downloads / 231 Views
ORIGINAL PAPER
Multi-scale ResNet for real-time underwater object detection Tien-Szu Pan1 · Huang-Chu Huang2 · Jen-Chun Lee2 · Chung-Hsien Chen3 Received: 16 July 2020 / Revised: 2 November 2020 / Accepted: 6 November 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020
Abstract An automatic underwater object recognition system is essential to reduce the costs of underwater inspection. In this study, we propose a novel convolutional neural network architecture that is trained on underwater video frames. This method is based on a modified residual neural network (ResNet) for underwater object detection. Multi-scale ResNet (M-ResNet), the modified method, improves efficiency by utilizing multi-scale operations for the accurate detection of objects of various sizes, especially small objects. The experimental results show that the proposed method yields an accuracy of 96.5% (mAP) in recognition performance. As a consequence, we propose a novel system for automatic object detection as an application for marine environments. Keywords Marine object recognition · Deep learning · Convolutional neural network · Residual neural network
1 Introduction Object recognition is now a popular research topic in the computer vision field. The traditional process of object recognition includes preprocessing, feature extraction, and classification. Most work on object detection is based on traditional algorithms from before 2012. With the increasing amount of data and computing resources, traditional object detection methods have been replaced by artificial intelligence (AI) [1, 2]. Various CNN architectures have been applied broadly to computer vision because of the success of these architectures in object recognition tasks such as region-based convolutional neural networks (R-CNN) [3], single shot multibox detectors (SSD) [4], you only look once (YOLO) [5], and deep residual networks (ResNet) [6]. In this paper, we focus on a fast and efficient neural network architecture from the YOLO and ResNet families. The newer YOLOv4 [7] architecture boasts residual shortcut connections and upsampling. Upsampled layers concatenate
B
Huang-Chu Huang [email protected]
1
Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
2
Department of Telecommunication Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
3
Metal Industries Research & Development Centre (MIRDC), Kaohsiung, Taiwan
previous layers that help preserve fine-grained features which help in detecting small objects (object of size less than of 1% of the total image area). Moreover, ResNet [6] makes the training process faster and is more accurate compared to equivalent neural networks; it achieves this improvement by adding simple shortcut connections that allow a signal to bypass a layer and move to the next layer in the sequence. However, even though ResNet methods offer better detection performance than others, they still do not detect small objects, especially in v
Data Loading...