Dilated temporal relational adversarial network for generic video summarization
- PDF / 3,559,841 Bytes
- 25 Pages / 439.642 x 666.49 pts Page_size
- 38 Downloads / 198 Views
Dilated temporal relational adversarial network for generic video summarization Yujia Zhang1,2 · Michael Kampffmeyer3 · Xiaodan Liang4 · Dingwen Zhang5 · Min Tan1,2 · Eric P. Xing4 Received: 6 November 2018 / Revised: 7 June 2019 / Accepted: 2 September 2019 / © Springer Science+Business Media, LLC, part of Springer Nature 2019
Abstract The large amount of videos popping up every day, make it more and more critical that key information within videos can be extracted and understood in a very short time. Video summarization, the task of finding the smallest subset of frames, which still conveys the whole story of a given video, is thus of great significance to improve efficiency of video understanding. We propose a novel Dilated Temporal Relational Generative Adversarial Network (DTR-GAN) to achieve frame-level video summarization. Given a video, it selects the set of key frames, which contain the most meaningful and compact information. Specifically, DTR-GAN learns a dilated temporal relational generator and a discriminator with three-player loss in an adversarial manner. A new dilated temporal relation (DTR) unit is introduced to enhance temporal representation capturing. The generator uses this unit to effectively exploit global multi-scale temporal context to select key frames and to complement the commonly used Bi-LSTM. To ensure that summaries capture enough key video representation from a global perspective rather than a trivial randomly shorten sequence, we present a discriminator that learns to enforce both the information completeness and compactness of summaries via a three-player loss. The loss includes the generated summary loss, the random summary loss, and the real summary (ground-truth) loss, which play important roles for better regularizing the learned model to obtain useful summaries. Comprehensive experiments on three public datasets show the effectiveness of the proposed approach. Keywords Video summarization · Dilated temporal relation · Generative adversarial network · Three-player loss Work done while Yujia Zhang was at CMU Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11042-019-08175-y) contains supplementary material, which is available to authorized users. Yujia Zhang
[email protected]
Extended author information available on the last page of the article.
Multimedia Tools and Applications
1 Introduction Driven by the large number of videos that are being produced every day, video summarization [30, 38, 50] plays an important role in extracting and analyzing key contents within videos. Video summarization techniques have recently gained increasing attention in an effort to facilitate large-scale video distilling [27, 32, 33, 47] due to its promising significance. They aim to generate summaries by selecting a small set of key frames/shots in the video while still conveying the whole story, and thus can improve efficiency of key information extraction and understanding. Essentially, video summarization techniques need to address two
Data Loading...