Video Super-Resolution with Recurrent Structure-Detail Network

Most video super-resolution methods super-resolve a single reference frame with the help of neighboring frames in a temporal sliding window. They are less efficient compared to the recurrent-based methods. In this work, we propose a novel recurrent video

  • PDF / 4,886,068 Bytes
  • 16 Pages / 439.37 x 666.142 pts Page_size
  • 19 Downloads / 178 Views

DOWNLOAD

REPORT


Department of Electronic Engineering, Tsinghua University, Beijing, China [email protected],[email protected] 2 Noah’s Ark Lab, Huawei Technologies, Shenzhen, China {x.jia,songjiang.li,tian.qi1}@huawei.com 3 School of Eie, The University of Sydney, Sydney, Australia [email protected]

Abstract. Most video super-resolution methods super-resolve a single reference frame with the help of neighboring frames in a temporal sliding window. They are less efficient compared to the recurrent-based methods. In this work, we propose a novel recurrent video super-resolution method which is both effective and efficient in exploiting previous frames to super-resolve the current frame. It divides the input into structure and detail components which are fed to a recurrent unit composed of several proposed two-stream structure-detail blocks. In addition, a hidden state adaptation module that allows the current frame to selectively use information from hidden state is introduced to enhance its robustness to appearance change and error accumulation. Extensive ablation study validate the effectiveness of the proposed modules. Experiments on several benchmark datasets demonstrate superior performance of the proposed method compared to state-of-the-art methods on video super-resolution. Code is available at https://github.com/junpan19/RSDN. Keywords: Video super-resolution Two-stream block

1

· Recurrent neural network ·

Introduction

Super-resolution is one of the fundamental problem in image processing, which aims at reconstructing a high resolution (HR) image from a single low-resolution (LR) image or a sequence of LR images. According to the number of input frames, the field of SR can be divided into two categories, i.e., single image superresolution (SISR) and multi-frame super-resolution (MFSR). For SISR, the key issue is to exploit natural image prior for compensating missing details; while for MFSR, how to take full advantage from additional temporal information is of pivotal importance. In this work, we focus on the video super-resolution (VSR) T. Isobe—The work was done in Noah’s Ark Lab, Huawei Technologies. c Springer Nature Switzerland AG 2020  A. Vedaldi et al. (Eds.): ECCV 2020, LNCS 12357, pp. 645–660, 2020. https://doi.org/10.1007/978-3-030-58610-2_38

646

T. Isobe et al.

Fig. 1. VSR results on the City sequence in Vid4. Our method produces finer details and stronger edges with better balance between speed and performance than both temporal sliding window based [7, 12, 26, 27, 29] and recurrent based methods [4, 23]. Blue box represents recurrent-based and green box represents sliding window based methods. Runtimes (ms) are calculated on an HR image of size 704 × 576. (Color figure online)

task which belongs to MFSR. It draws much attention in both research and industrial communities because of its great value on computational photography and surveillance (Fig. 1). In the last several years, great attempts have been made to exploit multiframe information for VSR. One category of approaches utilize multi-fra