Comparative Evaluation of Action Recognition Methods via Riemannian Manifolds, Fisher Vectors and GMMs: Ideal and Challe

We present a comparative evaluation of various techniques for action recognition while keeping as many variables as possible controlled. We employ two categories of Riemannian manifolds: symmetric positive definite matrices and linear subspaces. For both

  • PDF / 869,091 Bytes
  • 13 Pages / 439.37 x 666.142 pts Page_size
  • 107 Downloads / 142 Views

DOWNLOAD

REPORT


1 University of Queensland, Brisbane, Australia Queensland University of Technology, Brisbane, Australia 3 NICTA, Brisbane, Australia [email protected] 4 Data61, CSIRO, Canberra, Australia

Abstract. We present a comparative evaluation of various techniques for action recognition while keeping as many variables as possible controlled. We employ two categories of Riemannian manifolds: symmetric positive definite matrices and linear subspaces. For both categories we use their corresponding nearest neighbour classifiers, kernels, and recent kernelised sparse representations. We compare against traditional action recognition techniques based on Gaussian mixture models and Fisher vectors (FVs). We evaluate these action recognition techniques under ideal conditions, as well as their sensitivity in more challenging conditions (variations in scale and translation). Despite recent advancements for handling manifolds, manifold based techniques obtain the lowest performance and their kernel representations are more unstable in the presence of challenging conditions. The FV approach obtains the highest accuracy under ideal conditions. Moreover, FV best deals with moderate scale and translation changes.

1

Introduction

Recently, there has been an increasing interest on action recognition using Riemannian manifolds. Such recognition systems can be roughly placed into two main categories: (i) based on linear subspaces (LS), and (ii) based on symmetric positive definite (SPD) matrices. The space of m-dimensional LS in Rn can be viewed as a special case of Riemannian manifolds, known as Grassmann manifolds [1]. Other techniques have been also applied for the action recognition problem. Among them we can find Gaussian mixture models (GMMs), bag-of-features (BoF), and Fisher vectors (FVs). In [2,3] each action is represented by a combination of GMMs and then the decision making is based on the principle of selecting the most probable action according to Bayes’ theorem [4]. The FV representation can be thought as an evolution of the BoF representation, encoding c Springer International Publishing Switzerland 2016  H. Cao et al. (Eds.): PAKDD 2016 Workshops, LNAI 9794, pp. 88–100, 2016. DOI: 10.1007/978-3-319-42996-0 8

Comparative Evaluation of Action Recognition Methods

89

additional information [5]. Rather than encoding the frequency of the descriptors for a given video, FV encodes the deviations from a probabilistic version of the visual dictionary (which is typically a GMM) [6]. Several review papers have compared various techniques for human action recognition [7–11]. The reviews show how this research area has progressed throughout the years, discuss the current advantages and limitations of the state-of-the-art, and provide potential directions for addressing the limitations. However, none of them focus on how well various action recognition systems work across same datasets and same extracted features. An earlier comparison of classifiers for human activity recognition is studied [12]. The performance comparison w