Boundary discrimination and proposal evaluation for temporal action proposal generation

  • PDF / 2,222,990 Bytes
  • 17 Pages / 439.642 x 666.49 pts Page_size
  • 35 Downloads / 254 Views

DOWNLOAD

REPORT


Boundary discrimination and proposal evaluation for temporal action proposal generation Tianyu Li1 · Bing Bing1 · Xinxiao Wu1 Received: 7 May 2020 / Revised: 15 August 2020 / Accepted: 24 August 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Temporal action proposal generation for temporal action localization aims to capture temporal intervals that are likely to contain actions from untrimmed videos. Prevailing bottom-up proposal generation methods locate action boundaries (the start and the end) with high classifying probabilities. But for many actions, motions at boundaries are not discriminative, which makes action segments and background segments be classified into boundary classes, thereby generating low-overlap proposals. In this work, we propose a novel method that generates proposals by evaluating the continuity of video frames, and then locates the start and the end with low continuity. Our method consists of two modules: boundary discrimination and proposal evaluation. The boundary discrimination module trains a model to understand the relationship between two frames and uses the continuity of frames to generate proposals. The proposal evaluation module removes background proposals via a classification network, and evaluates the integrity of proposals with probability features by an integrity network. Extensive experiments are conducted on two challenging datasets: THUMOS14 and ActivityNet 1.3, and the results demonstrate that our method outperforms the state-of-the-art proposal generation methods. Keywords Temporal action proposal generation · Temporal action localization · Action proposal evaluation

1 Introduction Nowadays, with the rapid development of Internet, the number of videos is explosively increasing, leading to the wide applications of automatic video analysis in the field of living  Xinxiao Wu

[email protected] Tianyu Li [email protected] Bing Bing [email protected] 1

Beijing Laboratory of Intelligent Information Technology, School of Computer Science, Beijing Institute of Technology(BIT), Beijing, 100081, People’s Republic of China

Multimedia Tools and Applications

broadcast and security. As an important task in the automatic video analysis, temporal action detection [2, 20, 32, 34, 40] aims to detect action instances in untrimmed long videos, i.e., predicting both temporal boundaries and the action category. Akin to object detection [10, 11, 26, 28, 37], temporal action detection can be divided into two stages: temporal action proposal generation and classification. The purpose of proposal generation is to generate candidate temporal regions with action instances, which has a greater impact on detection result than proposal classification. Thus recently temporal action proposal generation [16, 30, 35] has attracted growing attentions and the key issue is how to improve the detection performance by generating high-quality proposals. Existing methods for action proposal generation can be roughly categorized into two main types: Top-down me