Complementary Boundary Estimation Network for Temporal Action Proposal Generation
- PDF / 2,406,951 Bytes
- 21 Pages / 439.37 x 666.142 pts Page_size
- 102 Downloads / 204 Views
Complementary Boundary Estimation Network for Temporal Action Proposal Generation Jinding Wang1 · Haifeng Hu1 Accepted: 5 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Temporal Action Detection is an important yet challenging task, in which temporal action proposal generation plays an important part. Since the temporal boundaries of action instances in videos are often ambiguous, it’s difficult to locate them precisely. Boundary Sensitive Network (BSN) (Lin et al. in ECCV, 2018) is a state-of-the-art corner-based method that can generate high-quality proposals with high recall rate. It contains a temporal evaluation network and a proposal evaluation network to generate and evaluate proposals separately, which can find the temporal boundaries of action instances directly to produce proposals with flexible temporal intervals and evaluate the quality of proposals. But BSN still has some issues: (1) Due to the small reception field of temporal evaluation network, it often generates many false temporal boundaries. (2) Evaluating the quality of proposals is a difficult task and not well solved in the paper. To address these issues, we propose Complementary Boundary Estimation Network (CBEN), an improved approach to temporal action proposal generation based on the framework of BSN. Specifically, we improve BSN in two aspects: Firstly, considering the temporal evaluation network of BSN can only capture local information and tends to have high response at background segments, we combine it with a new network with larger reception field to better identify false temporal action boundaries. Secondly, to evaluate the quality of temporal action proposals more accurately, we propose a class-based proposal evaluation network and combine it with a tIoU-based proposal evaluation network to filter out low-quality proposals. Extensive experiments on THUMOS14 and ActivityNet-1.3 datasets indicate that CBEN can achieve better performance than current mainstream methods on temporal action proposal generation. We further combine CBEN with an off-the-shelf action classifier, and show consistent performance improvements on THUMOS14 dataset. Keywords Temporal action proposal generation · Temporal boundary evaluation · Proposal evaluation · Network fusion
B
Haifeng Hu [email protected] Jinding Wang [email protected]
1
School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou 510275, China
123
J. Wang, H. Hu
1 Introduction In recent years, with the continuous development of electronic devices and the Internet, the amount of videos is exploding. Compared with images, videos are more natural and semantically rich information medium. Most current researches on video understanding focus on action recognition, which aims to classify manually trimmed video clips according to the categories of actions. Although there have been quite a lot of achievements in action recognition, real-world videos are usually untrimmed and the actions of interest are typicall
Data Loading...