Deep Self-correlation Descriptor for Dense Cross-Modal Correspondence

We present a novel descriptor, called deep self-correlation (DSC), designed for establishing dense correspondences between images taken under different imaging modalities, such as different spectral ranges or lighting conditions. Motivated by local self-s

  • PDF / 3,411,584 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 56 Downloads / 199 Views

DOWNLOAD

REPORT


2

Yonsei University, Seoul, South Korea {srkim89,khsohn}@yonsei.ac.kr Chungnam National University, Daejeon, South Korea [email protected] 3 Microsoft Research, Beijing, China [email protected]

Abstract. We present a novel descriptor, called deep self-correlation (DSC), designed for establishing dense correspondences between images taken under different imaging modalities, such as different spectral ranges or lighting conditions. Motivated by local self-similarity (LSS), we formulate a novel descriptor by leveraging LSS in a deep architecture, leading to better discriminative power and greater robustness to non-rigid image deformations than state-of-the-art descriptors. The DSC first computes self-correlation surfaces over a local support window for randomly sampled patches, and then builds hierarchical self-correlation surfaces by performing an average pooling within a deep architecture. Finally, the feature responses on the self-correlation surfaces are encoded through a spatial pyramid pooling in a circular configuration. In contrast to convolutional neural networks (CNNs) based descriptors, the DSC is trainingfree, is robust to cross-modal imaging, and can be densely computed in an efficient manner that significantly reduces computational redundancy. The state-of-the-art performance of DSC on challenging cases of cross-modal image pairs is demonstrated through extensive experiments. Keywords: Cross-modal correspondence · Deep architecture correlation · Local self-similarity · Non-rigid deformation

1

·

Self-

Introduction

In many computer vision and computational photography applications, images captured under different imaging modalities are used to supplement the data provided in color images. Typical examples of other imaging modalities include near-infrared [1–3] and dark flash [4] photography. More broadly, photos taken under different imaging conditions, such as different exposure settings [5], blur levels [6,7], and illumination [8], can also be considered as cross-modal [9,10]. Establishing dense correspondences between cross-modal image pairs is essential for combining their disparate information. Although powerful global optimizers may help to improve the accuracy of correspondence estimation to some This work was done while Seungryong Kim was an intern at Microsoft Research. c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part VIII, LNCS 9912, pp. 679–695, 2016. DOI: 10.1007/978-3-319-46484-8 41

S. Kim et al.

Ground truth→ −15 −10

−5

0

5

search range

10

15

SIFT CNN DASC DSCN

Matching cost

SIFT CNN DASC DSCN

Matching cost

Matching cost

680

Ground truth→

−15 −10

−5

0

5

search range

10

15

SIFT CNN DASC DSCN Ground truth→

−15 −10

−5

0

5

search range

10

15

Fig. 1. Examples of matching cost profiles, computed with different descriptors along the scan lines of A, B, and C for image pairs under severe non-rigid deformations and illumination changes. Unlike other descriptors, DSC yields reliable global minima.

extent [11,12], they face inherent limi