Hierarchical Dynamic Parsing and Encoding for Action Recognition

A video action generally exhibits quite complex rhythms and non-stationary dynamics. To model such non-uniform dynamics, this paper describes a novel hierarchical dynamic encoding method to capture both the locally smooth dynamics and globally drastic dyn

PDF / 1,032,304 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
51 Downloads / 234 Views

DOWNLOAD

REPORT

Science and Technology on Integrated Information System Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China [email protected] 2 Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL 60208, USA {jzt011,yingwu}@eecs.northwestern.edu 3 State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China [email protected] 4 State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China [email protected] Abstract. A video action generally exhibits quite complex rhythms and non-stationary dynamics. To model such non-uniform dynamics, this paper describes a novel hierarchical dynamic encoding method to capture both the locally smooth dynamics and globally drastic dynamic changes. It provides a multi-layer joint representation for temporal modeling for action recognition. At the ﬁrst layer, the action sequence is parsed in an unsupervised manner into several smooth-changing stages corresponding to diﬀerent key poses or temporal structures. The dynamics within each stage are encoded by mean-pooling or learning to rank based encoding. At the second layer, the temporal information of the ordered dynamics extracted from the previous layer is encoded again to form the overall representation. Extensive experiments on a gesture action dataset (Chalearn) and several generic action datasets (Olympic Sports and Hollywood2) have demonstrated the eﬀectiveness of the proposed method. Keywords: Action recognition encoding

1

·

Hierarchical modeling

·

Dynamic

Introduction

The performance of action recognition methods depends heavily on the representation of video data. For this reason, many recent eﬀorts focus on developing various action representations in diﬀerent levels. The state-of-the-art action representation is based on the Bag-of-Visual-Words (BoW) [1] framework, which includes three steps: local descriptors extraction, codebook learning, and descriptors encoding. The raw local descriptors themselves are noisy and the discriminative power of the distributed BoW representation comes from the eﬃcient c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part IV, LNCS 9908, pp. 202–217, 2016. DOI: 10.1007/978-3-319-46493-0 13

Hierarchical Dynamic Parsing and Encoding for Action Recognition

203

Fig. 1. The action “jump” can be roughly parsed into three divisions: running approach, body stay ﬂew in the air and touch down. Each division can also be parsed into diﬀerent sub-divisions

coding of these local descriptors. As a result, the temporal dependencies and dynamics of the video are seriously neglected. Dynamics characterize the inherent global temporal dependencies of actions. Existing dynamic-based approaches generally view the video as a sequence of observations and model it with temporal models. The models can either

Data Loading...

Hierarchical Dynamic Parsing and Encoding for Action Recognition

Recommend Documents

DDGCN: A Dynamic Directed Graph Convolutional Network for Action Recognition

An Approach Towards Action Recognition Using Part Based Hierarchical Fusion

Iris Encoding and Recognition using Gabor Wavelets

Hierarchical Perceptual Grouping for Object Recognition Theoretical

Temporal Distinct Representation Learning for Action Recognition

Hierarchical Matching and Reasoning for Action Localization via Language Query

Video Action Recognition

Modeling of a method of parallel hierarchical transformation for fast recognition of dynamic images

Learning Attentive and Hierarchical Representations for 3D Shape Recognition

Hierarchical Dynamic Spatio-temporal Models

RNN Fisher Vectors for Action Recognition and Image Annotation

Motion History Images for Action Recognition and Understanding