DAPs: Deep Action Proposals for Action Understanding
Object proposals have contributed significantly to recent advances in object understanding in images. Inspired by the success of this approach, we introduce Deep Action Proposals (DAPs), an effective and efficient algorithm for generating temporal action
- PDF / 3,586,185 Bytes
- 17 Pages / 439.37 x 666.142 pts Page_size
- 48 Downloads / 228 Views
King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia {victor.escorcia,fabian.caba,bernard.ghanem}@kaust.edu.sa 2 Stanford University, Stanford, USA [email protected] 3 Universidad del Norte, Barranquilla, Colombia
Abstract. Object proposals have contributed significantly to recent advances in object understanding in images. Inspired by the success of this approach, we introduce Deep Action Proposals (DAPs), an effective and efficient algorithm for generating temporal action proposals from long videos. We show how to take advantage of the vast capacity of deep learning models and memory cells to retrieve from untrimmed videos temporal segments, which are likely to contain actions. A comprehensive evaluation indicates that our approach outperforms previous work on a large scale action benchmark, runs at 134 FPS making it practical for large-scale scenarios, and exhibits an appealing ability to generalize, i.e. to retrieve good quality temporal proposals of actions unseen in training.
Keywords: Action proposals memory
1
·
Action detection
·
Long-short term
Introduction
Nowadays, the ubiquity of digital cameras and social networks has increased the amount of visual media content (especially videos) generated and shared by people. In the face of this data deluge, it becomes crucial to develop efficient and scalable algorithms that can intelligently parse/browse visual data to discover semantic information. In this paper, we focus on the task of quickly localizing temporal chunks in untrimmed videos that are likely to contain human activities of interest. This is the well-known task of temporal action proposal generation. The detected temporal proposals can facilitate and speedup activity detection, indexing, and retrieval in long videos. For example, a “good” action proposal method can retrieve video snippets of a home-run being scored within a large Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46487-9 47) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part III, LNCS 9907, pp. 768–784, 2016. DOI: 10.1007/978-3-319-46487-9 47
DAPs: Deep Action Proposals for Action Understanding
769
Fig. 1. An effective and efficient action proposal algorithm can localize segments of varied duration around actions occurring along a video without exhaustively exploring multiple temporal scales. This work shows how to produce high-quality temporal proposals likely to contain actions and to be 10x faster that the state of the art approach.
corpus of baseball games or extract important moments during the construction of a new skyscraper. Motivated by the large-scale nature of the problem, we develop a temporal proposal algorithm that retrieves high fidelity proposals with a much smaller computational cost than previous methods (refer to Fig. 1). The idea of extracting regions with semantic content is not new in the computer vision community. Object pro
Data Loading...