Integrating Gaussian mixture model and dilated residual network for action recognition in videos

PDF / 2,625,945 Bytes
11 Pages / 595.276 x 790.866 pts Page_size
105 Downloads / 249 Views

REGULAR PAPER

Integrating Gaussian mixture model and dilated residual network for action recognition in videos Ming Fang1 · Xiaoying Bai1 · Jianwei Zhao1 · Fengqin Yang1 · Chih‑Cheng Hung2 · Shuhua Liu1 Received: 4 May 2020 / Accepted: 10 August 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Action recognition in video is one of the important applications in computer vision. In recent years, the two-stream architecture has made significant progress in action recognition, but it has not systematically explored spatial–temporal features. Therefore, this paper proposes an integrated approach using Gaussian mixture model (GMM) and dilated convolution residual network (GD-RN) for action recognition. This method uses ResNet-101 as spatial and temporal stream ConvNet. On the one hand, this paper first sends the action video into the GMM for background subtraction and then sends the video marking the action profile to ResNet-101 for identification and classification. Compared with the baseline, ConvNet takes the original RGB image as input, which not only reduces the complexity of the video background, but also reduces the amount of computation of the learning space information. On the other hand, using the stacked optical flow images as the input of the ResNet-101 added to the dilated convolution, the convolution receptive field is expanded without lowering the resolution of the optical flow image, thereby improving the classification accuracy. The two ConvNet-independent learning spatial and temporal features of the GD-RN network finally fine-tune and fuse the spatio-temporal features to obtain the final action recognition accuracy. The action recognition method proposed in this paper is tested on the challenging UCF101 and HMDB51 datasets, and accuracy rates of 91.3% and 62.4%, respectively, are obtained, which proves the proposed method with the competitive results. Keywords Dilated convolution · GMM · Residual network · Video action recognition

1 Introduction Communicated by H. Lin. * Shuhua Liu [email protected] Ming Fang [email protected] Xiaoying Bai [email protected] Jianwei Zhao [email protected] Fengqin Yang [email protected] Chih‑Cheng Hung [email protected] 1

School of Information Science and Technology, Northeast Normal University, Changchun, China

College of Computing and Software Engineering, Kennesaw State University, Marietta, USA

2

Video action recognition is a challenging computer vision task. With the rapid development of deep learning, video action recognition has received great attention [1–6]. In recent years, deep neural networks have achieved remarkable results in the fields of image classification and object detection [7–10], and several researchers have begun to apply deep neural networks to the field of video action recognition. Compared with static images, video contains a number of temporal features. The relationship between frames and frames provides important temporal information. Many actions can be effectively identified on the

Data Loading...

Integrating Gaussian mixture model and dilated residual network for action recognition in videos

Recommend Documents

Multi-cue based 3D residual network for action recognition

Number of Components and Initialization in Gaussian Mixture Model for Pattern Recognition

Using the Gini Index for a Gaussian Mixture Model

Gaussian Mixture Models

Gaussian Mixture Density

Symmetric Dilated Convolution for Surgical Gesture Recognition

A Part Fusion Model for Action Recognition in Still Images

Gaussian Mixture Model Based Multi-region Blood Vessel Segmentation Method

Gaussian Mixture Model Based Image Denoising Method with Local Constraints

Robust Content-Based Recommendation Distribution System with Gaussian Mixture Model

Spatial-Temporal Co-attention Network for Action Recognition

DDGCN: A Dynamic Directed Graph Convolutional Network for Action Recognition