Deep Self-correlation Descriptor for Dense Cross-Modal Correspondence
We present a novel descriptor, called deep self-correlation (DSC), designed for establishing dense correspondences between images taken under different imaging modalities, such as different spectral ranges or lighting conditions. Motivated by local self-s
- PDF / 3,411,584 Bytes
- 17 Pages / 439.37 x 666.142 pts Page_size
- 56 Downloads / 199 Views
2
Yonsei University, Seoul, South Korea {srkim89,khsohn}@yonsei.ac.kr Chungnam National University, Daejeon, South Korea [email protected] 3 Microsoft Research, Beijing, China [email protected]
Abstract. We present a novel descriptor, called deep self-correlation (DSC), designed for establishing dense correspondences between images taken under different imaging modalities, such as different spectral ranges or lighting conditions. Motivated by local self-similarity (LSS), we formulate a novel descriptor by leveraging LSS in a deep architecture, leading to better discriminative power and greater robustness to non-rigid image deformations than state-of-the-art descriptors. The DSC first computes self-correlation surfaces over a local support window for randomly sampled patches, and then builds hierarchical self-correlation surfaces by performing an average pooling within a deep architecture. Finally, the feature responses on the self-correlation surfaces are encoded through a spatial pyramid pooling in a circular configuration. In contrast to convolutional neural networks (CNNs) based descriptors, the DSC is trainingfree, is robust to cross-modal imaging, and can be densely computed in an efficient manner that significantly reduces computational redundancy. The state-of-the-art performance of DSC on challenging cases of cross-modal image pairs is demonstrated through extensive experiments. Keywords: Cross-modal correspondence · Deep architecture correlation · Local self-similarity · Non-rigid deformation
1
·
Self-
Introduction
In many computer vision and computational photography applications, images captured under different imaging modalities are used to supplement the data provided in color images. Typical examples of other imaging modalities include near-infrared [1–3] and dark flash [4] photography. More broadly, photos taken under different imaging conditions, such as different exposure settings [5], blur levels [6,7], and illumination [8], can also be considered as cross-modal [9,10]. Establishing dense correspondences between cross-modal image pairs is essential for combining their disparate information. Although powerful global optimizers may help to improve the accuracy of correspondence estimation to some This work was done while Seungryong Kim was an intern at Microsoft Research. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part VIII, LNCS 9912, pp. 679–695, 2016. DOI: 10.1007/978-3-319-46484-8 41
S. Kim et al.
Ground truth→ −15 −10
−5
0
5
search range
10
15
SIFT CNN DASC DSCN
Matching cost
SIFT CNN DASC DSCN
Matching cost
Matching cost
680
Ground truth→
−15 −10
−5
0
5
search range
10
15
SIFT CNN DASC DSCN Ground truth→
−15 −10
−5
0
5
search range
10
15
Fig. 1. Examples of matching cost profiles, computed with different descriptors along the scan lines of A, B, and C for image pairs under severe non-rigid deformations and illumination changes. Unlike other descriptors, DSC yields reliable global minima.
extent [11,12], they face inherent limi
Data Loading...