Progressive Multi-granularity Analysis for Video Prediction

PDF / 7,439,666 Bytes
18 Pages / 595.276 x 790.866 pts Page_size
84 Downloads / 281 Views

Progressive Multi-granularity Analysis for Video Prediction Jingwei Xu1 · Bingbing Ni1 · Xiaokang Yang1,2 Received: 6 June 2019 / Accepted: 26 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Video prediction is challenging as real-world motion dynamics are usually multi-modally distributed. Existing stochastic methods commonly formulate random noise input with simple prior distribution, which is insufficient to model highly complex motion dynamics. This work proposes a progressive multiple granularity analysis framework to tackle the above difficulty. Firstly, to achieve coarse alignment, the input sequence is matched to prototype motion dynamics in the training set, based on self-supervised auto-encoder learning via motion/appearance disentanglement. Secondly, motion dynamics is transferred from the matched prototype sequence to input sequence via adaptively learned kernel, and the predicted frames are further refined through a motion-aware prediction model. Extensive qualitative and quantitative experiments on three widely used video prediction datasets demonstrate that: (1) the proposed framework essentially decomposes the hard task into a series of more approachable sub-tasks where a better solution is easier to be sought and (2) our proposed method performs favorably against state-of-the-art prediction methods. Keyword Video prediction · Multiple granularity analysis

1 Introduction As a naturally data-driven routine to model the dynamics of a sophisticated system, video prediction has demonstrated tremendous potential value in many downstream applications (Pathak et al. 2017; Kurutach et al. 2018; Nair et al. 2018), such as model-based reinforcement learning, driving path planning and robot manipulation. Given several consecutive frames as input, the goal of video prediction is to generate raw pixels of future frames. Therefore, different from the conventional video semantic prediction problem whose output is relatively a low-dimensional label vector, this task requires pixel-level prediction, i.e., usually with multiple timestamps. Pixel-wise prediction leads to solution space growing exponentially w.r.t. spatial and temporal size of predicted frames. Conventional methods (Finn et al. 2016; Jia et al. Communicated by Ivan Laptev.

B

Bingbing Ni [email protected] Jingwei Xu [email protected]

1

Shanghai Jiao Tong University, Shanghai 200240, China

2

MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China

2016; Denton and Birodkar 2017) with recurrent and deterministic architecture often fail to predict high quality video frames. Shape deformation and prediction mismatch are typical issues yet to be solved. All these problems closely relate to one fundamental challenge in this task, i.e., error accumulation (Pathak et al. 2017; Kurutach et al. 2018; Nair et al. 2018). The main reason of this challenge lies in that longterm prediction leads to a highly complex and multi-modal distribution of future frames, but

Data Loading...

Progressive Multi-granularity Analysis for Video Prediction

Recommend Documents

Probabilistic Future Prediction for Video Scene Understanding

Learning Progressive Joint Propagation for Human Motion Prediction

Video Analysis

Intra prediction based on geometry padding for omnidirectional video coding

Sequential Multi-fusion Network for Multi-channel Video CTR Prediction

QoS intelligent prediction for mobile video networks: a GR approach

Least-Square Prediction for Backward Adaptive Video Coding

Multiple Source Alignment for Video Analysis

Robust Speaking Face Identification for Video Analysis

Video Analysis and Coding for Robust Transmission

Multiple Source Alignment for Video Analysis

A Progressive Non-discriminatory Intensity Equalization Algorithm for Face Analysis