A bottom-up summarization algorithm for videos in the wild

  • PDF / 2,551,468 Bytes
  • 11 Pages / 595 x 791 pts Page_size
  • 45 Downloads / 180 Views

DOWNLOAD

REPORT


EURASIP Journal on Advances in Signal Processing

RESEARCH

Open Access

A bottom-up summarization algorithm for videos in the wild Gang Pan1 , Yaoxian Zheng1 , Rufei Zhang2 , Zhenjun Han3 , Di Sun4,1 and Xingming Qu1,5*

Abstract Video summarization aims to provide a compact video representation while preserving the essential activities of the original video. Most existing video summarization approaches relay on identifying important frames and optimizing target energy by a global optimum solution. But global optimum may fail to express continuous action or realistically validate how human beings perceive a story. In this paper, we present a bottom-up approach named clip growing for video summarization, which allows users to customize the quality of the video summaries. The proposed approach firstly uses clustering to oversegment video frames into video clips based on their similarity and proximity. Simultaneously, the importance of frames and clips is evaluated from their corresponding dissimilarity and representativeness. Then, video clips and frames are gradually selected according to their energy rank, until reaching the target length. Experimental results on SumMe dataset show that our algorithm can produce promising results compared to existing algorithms. Several video summarizations results are presented in supplementary material. Keywords: Video summarization, Clip growing, Bottom-up

1 Introduction Videos in the wild are abundant in personal collections as well as on the web. The processing demand has been increasing rapidly. A number of related work have been proposed over the past decade [1–3]. Such videos mostly have clutter background and abundant human action. And most of these videos remain unedited and contain a large quantity of redundant information. Therefore, several video processing tasks like video summarization need to be performed, which not only present audiences a compact version that captures most informative parts of the video but also benefit companies highly related to video processing and searching. According to [4], there are two fundamental types of video summarization: unsupervised methods [5–16] and supervised methods [17–25]. However, these tasks are usually treated as independent. Through experiments, we found that these tasks are actually related. The main idea of these tasks is first to measure the significance *Correspondence: [email protected] College of Intelligence and Computing, Tianjin University, Yaguan Road, Tianjin, China 5 Bobby B. Lyle School of Engineering, Southern Methodist University, Boaz Lane, Dallas 75205, USA Full list of author information is available at the end of the article 1

of video frames and then select the appropriate video frames according to the different needs of users. The previous summarization methods imply a global optimum with input frames under certain criteria, but the ideal conception seldom leads to satisfactory results. One possible reason is that people watch and understand videos from local perspective rather than from global