Temporal capsule networks for video motion estimation and error concealment

PDF / 3,220,004 Bytes
9 Pages / 595.276 x 790.866 pts Page_size
95 Downloads / 293 Views

ORIGINAL PAPER

Temporal capsule networks for video motion estimation and error concealment Arun Sankisa1 · Arjun Punjabi1 · Aggelos K. Katsaggelos1 Received: 9 August 2019 / Revised: 6 January 2020 / Accepted: 6 March 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract In this paper, we present a temporal capsule network architecture to encode motion in videos as an instantiation parameter. The extracted motion is used to perform motion-compensated error concealment. We modify the original architecture and use a carefully curated dataset to enable the training of capsules spatially and temporally. First, we add the temporal dimension by taking co-located “patches” from three consecutive frames obtained from standard video sequences to form input data “cubes.” Second, the network is designed with an initial feature extraction layer that operates on all three dimensions to generate spatiotemporal features. Additionally, we implement the PrimaryCaps module with a recurrent layer, instead of a conventional convolutional layer, to extract short-term motion-related temporal dependencies and encode them as activation vectors in the capsule output. Finally, the capsule output is combined with the most-recent past frame and passed through a fully connected reconstruction network to perform motion-compensated error concealment. We study the effectiveness of temporal capsules by comparing the proposed model with architectures that do not include capsules. Although the quality of the reconstruction shows room for improvement, we successfully demonstrate that capsules-based architectures can be designed to operate in the temporal dimension to encode motion-related attributes as instantiation parameters. The accuracy of motion estimation is evaluated by comparing both the reconstructed frame outputs and the corresponding optical flow estimates with ground truth data. Keywords Capsule networks · Conv3D · ConvLSTM · Error concealment · Motion estimation

1 Introduction Since the neural networks (CNN or ConvNet), [1, 2], and a deep CNN architecture [3], numerous works have highlighted their effectiveness in processing natural signals, particularly in their ability to learn hierarchal relationships of objects in images, i.e., low-level features such as edges that progressively build up to more complex, composite structures such as motifs and objects. This ability has been utilized in training networks to perform a wide variety of tasks such as classification [5, 6], object recognition [7, 8],

B

Arun Sankisa [email protected] Arjun Punjabi [email protected] Aggelos K. Katsaggelos [email protected]

1

Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL, USA

or inpainting [9, 10], and more generally, in extracting spatial correlations typical of natural images. Recent models such as recurrent neural networks (RNNs), in particular long short-term memory (LSTM) modules, have gained popularity in solving problems that require networks to understand

Data Loading...

Temporal capsule networks for video motion estimation and error concealment

Recommend Documents

New Hybrid Error Concealment for Digital Compressed Video

Motion Estimation Techniques for Digital Video Coding

Motion Pyramid Networks for Accurate and Efficient Cardiac Motion Estimation

Motion Estimation for Video Coding Efficient Algorithms and Architec

Spatial Error Concealment Based on Edge Visual Clearness for Image/Video Communication

Multiresolution Motion Estimation for Low-Rate Video Frame Interpolation

Dynamically Adaptive Fast Motion Estimation Algorithm for HD Video

Transcoding-Based Error-Resilient Video Adaptation for 3G Wireless Networks

An error-based video quality assessment method with temporal information

Error Concealment for INTRA-Frame Losses over Packet Loss Channels

Error Estimation for Classification

Classification-Based Spatial Error Concealment for Visual Communications