Learnable spatiotemporal feature pyramid for prediction of future optical flow in videos

PDF / 1,395,481 Bytes
10 Pages / 595.276 x 790.866 pts Page_size
11 Downloads / 277 Views

ORIGINAL PAPER

Learnable spatiotemporal feature pyramid for prediction of future optical flow in videos Laisha Wadhwa1 · Snehasis Mukherjee2 Received: 4 April 2020 / Revised: 25 August 2020 / Accepted: 15 October 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract The success of deep learning-based techniques in solving various computer vision problems motivated the researchers to apply deep learning to predict the optical flow of a video in the next frame. However, the problem of predicting the motion of an object in the next few frames remains an unsolved and less explored problem. Given a sequence of frames, predicting the motion in the next few frames of the video becomes difficult in cases where the displacement of optical flow vector across frames is large. Traditional CNNs often fail to learn the dynamics of the objects across frames in case of large displacements of objects in consecutive frames. In this paper, we present an efficient CNN based on the concept of feature pyramid for extracting the spatial features from a few consecutive frames. The spatial features extracted from consecutive frames by a modified PWC-Net architecture are fed into a bidirectional LSTM for obtaining the temporal features. The proposed spatiotemporal feature pyramid is able to capture the abrupt motion of the moving objects in video, especially when displacement of the object is large across the consecutive frames. Further, the proposed spatiotemporal pyramidal feature can effectively predict the optical flow in next few frames, instead of predicting only the next frame. The proposed method of predicting optical flow outperforms the state of the art when applied on challenging datasets such as “MPI Sintel Final Pass,” “Monkaa” and “Flying Chairs” where abrupt and large displacement of the moving objects in consecutive frames is the main challenge. Keywords Motion prediction · Optical flow prediction · Feature pyramid · LSTM

1 Introduction Optical flow estimation in videos is becoming a popular computer vision problem among the researchers, due to its potential application in several research areas including action recognition [1], autonomous driving [2] and video editing [3]. Several efforts have been made for estimating optical flow, leading to impressive performances on challenging benchmark datasets [4,5]. However, predicting optical flow in future frames is much more challenging and less explored research problem. Prediction of optical flow in future frames is important because of the potential applications in event prediction, autonomous driving, robot navigation and many other areas. There has been some recent efforts made for optical flow prediction [6–11]. Some of the existing approaches for optical flow prediction rely on rein-

B

Snehasis Mukherjee [email protected]

1

IIIT SriCity, Chittoor, India

2

Shiv Nadar University, Greater Noida, India

forcement learning approaches where an object [2] or a patch [12] is modeled as an agent that performs actions based on its current state and

Data Loading...

Learnable spatiotemporal feature pyramid for prediction of future optical flow in videos

Recommend Documents

Unsupervised Optical Flow Estimation Based on Improved Feature Pyramid

Multi-scale Object Detection in Optical Remote Sensing Images Using Atrous Feature Pyramid Network

Pyramid Ricci flow in higher dimensions

Joint Pyramid Feature Representation Network for Vehicle Re-identification

Unsupervised Learning of Optical Flow with Deep Feature Similarity

Dense feature pyramid network for cartoon dog parsing

Pedestrian tracking in thermal videos using TFM (tri-feature matrix)

Multi-scale spatiotemporal graph convolution network for air quality prediction

Probabilistic Future Prediction for Video Scene Understanding

Prediction Regarding Future Events

An End-to-End Learnable Flow Regularized Model for Brain Tumor Segmentation

Spatiotemporal variations of thunderstorm frequency and its prediction over Bangladesh