Enhanced 3D residual network for video event recognition in shipping monitoring

  • PDF / 1,334,823 Bytes
  • 12 Pages / 439.37 x 666.142 pts Page_size
  • 15 Downloads / 152 Views

DOWNLOAD

REPORT


Enhanced 3D residual network for video event recognition in shipping monitoring Hong Zhang 1,2 & Jiexiong Rong 1,2 Received: 18 November 2019 / Revised: 3 July 2020 / Accepted: 6 August 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

The three-dimensional convolutional neural network is widely used in video recognition, action recognition and other tasks because it can directly extract temporal and spatial features. Due to the large number of parameters, many computing resources, and difficulty in training, the structure of three-dimensional convolutional neural network is generally shallow. For example, the traditional C3D [17] method uses only the 11-layer VGGNet structure, and the traditional Res3D [18] method adopts a residual network of 18 and 34 layers. Some experience of two-dimensional convolutional neural network shows that the deeper the network structure is, the higher the recognition accuracy will be. Therefore, this paper proposes a new method 3D ResNet-66, which combines a 50-layer 3D residual network and four-layer residual blocks, effectively reducing the number of parameters while increasing the depth of the network, and we finally obtain a better video recognition model through experiments. We evaluate our method on shipping event datasets. Compared to the traditional C3D and Res3D method, our method has improved the accuracy from 91.48% to 96.33%, the model size has been reduced from 561 MB to 135 MB, and the average processing time has become half of the original. Keywords Three-dimensional convolutional neural network . Residual network . Residual blocks . Video recognition

1 Introduction Ship video monitoring is to monitor the state of the ship. It can not only judge whether the ship is in the driving state or in the berthing state, but also provide early warning of the abnormal * Hong Zhang [email protected]

1

College of Computer Science & Technology, Wuhan University of Science & Technology, Wuhan 430081, China

2

Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan, China

Multimedia Tools and Applications

state of the ship, such as the loading and unloading state during the berthing process, rain cloth blowing and uncovered rain cloth state during the driving process. With the rapid development of artificial intelligence, people try to apply artificial intelligence technology to ship video monitoring to achieve intelligent monitoring of shipping. Through the deep learning method, a good network can be trained on a large number of training datasets to realize the recognition of the ship state, which not only reduces the labor cost, but also makes the early warning of the abnormal state timelier. In the early days, two-dimensional convolutional neural network (2DCNN) has made great breakthroughs in image recognition. From the beginning of LeNet, AlexNet, to VGGNet [12], GoogLeNet [15, 7, 16, 14] and today’s ResNet [4, 5, 10], DenseNet [6], etc., there will be new networks every few years, th