Deformable Kernel Networks for Joint Image Filtering

  • PDF / 12,634,186 Bytes
  • 22 Pages / 595.276 x 790.866 pts Page_size
  • 50 Downloads / 208 Views

DOWNLOAD

REPORT


Deformable Kernel Networks for Joint Image Filtering Beomjun Kim1 · Jean Ponce2 · Bumsub Ham1 Received: 16 October 2019 / Accepted: 15 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Joint image filters are used to transfer structural details from a guidance picture used as a prior to a target image, in tasks such as enhancing spatial resolution and suppressing noise. Previous methods based on convolutional neural networks (CNNs) combine nonlinear activations of spatially-invariant kernels to estimate structural details and regress the filtering result. In this paper, we instead learn explicitly sparse and spatially-variant kernels. We propose a CNN architecture and its efficient implementation, called the deformable kernel network (DKN), that outputs sets of neighbors and the corresponding weights adaptively for each pixel. The filtering result is then computed as a weighted average. We also propose a fast version of DKN that runs about seventeen times faster for an image of size 640 × 480. We demonstrate the effectiveness and flexibility of our models on the tasks of depth map upsampling, saliency map upsampling, cross-modality image restoration, texture removal, and semantic segmentation. In particular, we show that the weighted averaging process with sparsely sampled 3 × 3 kernels outperforms the state of the art by a significant margin in all cases. Keywords Joint filtering · Convolutional neural networks · Depth map upsampling · Cross-modality image restoration · Texture removal · Semantic segmentation

1 Introduction Image filtering with a guidance signal, a process called guided or joint filtering, has been used in a variety of computer vision and graphics tasks, including depth map upsampling (Ferstl et al. 2013; Ham et al. 2018; Kopf et al. 2007; Li et al. 2016; Park et al. 2011; Yang et al. 2007), cross-modality image restoration (He et al. 2013; Shen et al. 2015; Yan et al. 2013), texture removal (Ham et al. 2018; Karacan et al. 2013; Xu et al. 2012; Zhang et al. 2014), scale-space filtering (Ham et al. 2018), dense correspondence (Ham et al. 2016; Hosni et al. 2013) and semantic

segmentation (Barron and Poole 2016). For example, highresolution color images can be used as guidance to enhance the spatial resolution of depth maps (Kopf et al. 2007). The basic idea behind joint image filtering is to transfer structural details from the guidance image to the target one, typically by estimating spatially-variant kernels from the guidance. Concretely, given the target image f and the guidance image g, the filtering output fˆ at position p = (x, y) is expressed as a weighted average (He et al. 2013; Kopf et al. 2007; Tomasi and Manduchi 1998):

Bumsub Ham [email protected] Beomjun Kim [email protected] Jean Ponce [email protected]

1

School of Electrical and Electronic Engineering, Yonsei University, Seoul, Korea

2

Inria and DI-ENS, Département d’Informatique de l’ENS, CNRS, PSL University, Paris, France

Wpq ( f , g) f q ,

(1)

q∈N (p)

Communi