Show me where the action is!

PDF / 2,853,532 Bytes
26 Pages / 439.642 x 666.49 pts Page_size
4 Downloads / 241 Views

Show me where the action is! Automatic capturing and timeline generation for reality TV. Timothy Callemein1 · Tom Roussel1 · Ali Diba1 · Floris De Feyter1 · Wim Boes1 · Luc Van Eycken1 · Luc Van Gool1 · Hugo Van hamme1 · Tinne Tuytelaars1 · Toon Goedeme´ 1 Received: 9 September 2019 / Revised: 26 June 2020 / Accepted: 12 August 2020 / © The Author(s) 2020

Abstract Reality TV shows have gained popularity, motivating many production houses to bring new variants for us to watch. Compared to traditional TV shows, reality TV shows have spontaneous unscripted footage. Computer vision techniques could partially replace the manual labour needed to record and process this spontaneity. However, automated real-world video recording and editing is a challenging topic. In this paper, we propose a system that utilises state-of-the-art video and audio processing algorithms to, on the one hand, automatically steer cameras, replacing camera operators and on the other hand, detect all audiovisual action cues in the recorded video, to ease the job of the film editor. This publication has hence two main contributions. The first, automating the steering of multiple Pan-Tilt-Zoom PTZ cameras to take aesthetically pleasing medium shots of all the people present. These shots need to comply with the cinematographic rules and are based on the poses acquired by a pose detector. Secondly, when a huge amount of audio-visual data has been collected, it becomes labour intensive for a human editor retrieve the relevant fragments. As a second contribution, we combine state-of-the-art audio and video processing techniques for sound activity detection, action recognition, face recognition, and pose detection to decrease the required manual labour during and after recording. These techniques used during postprocessing produce meta-data allowing for footage filtering, decreasing the search space. We extended our system further by producing timelines uniting generated meta-data, allowing the editor to have a quick overview. We evaluated our system on three in-the-wild reality TV recording sessions of 24 hours (× 8 cameras) each taken in real households. Keywords Autonomous PTZ steering · Event timeline · Sound recognition · Facial recognition · Action recognition · Reality TV

This work was made possible by the Belgian production house Geronimo, the KULeuven GOA project CAMETRON and the Research Foundation Flanders (FWO-Vlaanderen). Timothy Callemein

[email protected] 1

Katholieke Universiteit Leuven, Sint-Katelijne-Waver, Belgium

Multimedia Tools and Applications

1 Introduction These days, collecting audiovisual data is easier and cheaper than ever before. Cameras and microphones have become ubiquitous and the storage for the data they collect has expanded. This allows for devices that continuously record their surroundings. To generate entertaining reality TV shows from these continuous recordings, however, one needs to be able to separate the wheat from the chaff. Recent progress in machine learning offers solutions for this info

Data Loading...

Show me where the action is!

Recommend Documents

Supramolecular Science: Where It Is and Where It Is Going

What is Affirmative Action?

Where Does Creativity Come from? What Is Creativity? Where Is Creativity Going in Giftedness?

Friction and the Continuum Limit - Where is the Boundary?

Where is the Harm in Dying Prematurely? An Epicurean Answer

Insights on Environmental Changes Where the World is Heading

Where Is the Future for Serious Music Being Produced Now?

Where in the World is Science and Technology Going?

Show

Japan: Where is Japan Headed after the Earthquake?

If the Universe Is Teeming with Aliens ... WHERE IS EVERYBODY? Seven

Motivation Potential Is Not Motivation in Action