Coherent video generation for multiple hand-held cameras with dynamic foreground

  • PDF / 1,983,750 Bytes
  • 16 Pages / 612 x 808 pts Page_size
  • 122 Downloads / 179 Views

DOWNLOAD

REPORT


Coherent video generation for multiple hand-held cameras with dynamic foreground Fang-Lue Zhang1 ( ), Connelly Barnes2 , Hao-Tian Zhang3 , Junhong Zhao1 , and Gabriel Salas1 c The Author(s) 2020. 

1

Abstract For many social events such as public performances, multiple hand-held cameras may capture the same event. This footage is often collected by amateur cinematographers who typically have little control over the scene and may not pay close attention to the camera. For these reasons, each individually captured video may fail to cover the whole time of the event, or may lose track of interesting foreground content such as a performer. We introduce a new algorithm that can synthesize a single smooth video sequence of moving foreground objects captured by multiple hand-held cameras. This allows later viewers to gain a cohesive narrative experience that can transition between different cameras, even though the input footage may be less than ideal. We first introduce a graph-based method for selecting a good transition route. This allows us to automatically select good cut points for the hand-held videos, so that smooth transitions can be created between the resulting video shots. We also propose a method to synthesize a smooth photorealistic transition video between each pair of hand-held cameras, which preserves dynamic foreground content during this transition. Our experiments demonstrate that our method outperforms previous state-of-the-art methods, which struggle to preserve dynamic foreground content.

Introduction

With modern camera technology, people can easily capture high-quality videos with hand-held cameras and smartphones. The process of capturing daily events and sharing them on social networks or storing them privately has become an important part of the everyday life of many people. However, the photographers making such casual captures often have little control of the scene. Additionally, the photographer is often an amateur, who may lack photographic skills, attention, or time, and thus may fail to capture the desired object or the full time range of the event. Thus, if a viewer is watching a video from any single camera, he/she may find that the foreground object he/she is interested in simply moves out of the camera, or the camera may stop capturing entirely at a time that is not ideal. These issues can be quite frustrating for the viewer. However, for many cases such as in public events or performances, multiple cameras may capture the same event. If a foreground object of interest moves out of the frame of one camera, then the object may still be captured by another capture. In this paper, we investigate the problem of producing a single coherent video with smooth transitions by taking as input such a casually captured multi-camera feed. Specifically, we assume that the input consists of different videos of the same dynamic foreground objects that were captured at different but overlapping ranges of time. Researchers in computer vision and computer graphics have proposed powerful video processing an