Cross-modal subspace learning via kernel correlation maximization and discriminative structure-preserving

PDF / 1,922,395 Bytes
17 Pages / 439.642 x 666.49 pts Page_size
101 Downloads / 199 Views

Cross-modal subspace learning via kernel correlation maximization and discriminative structure-preserving Jun Yu1,2 · Xiao-Jun Wu1,2 Received: 23 April 2019 / Revised: 10 December 2019 / Accepted: 23 April 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract How to measure the distance between heterogeneous data is still an open problem. Many research works have been developed to learn a common subspace where the similarity between different modalities can be calculated directly. However, most of existing works focus on learning a latent subspace but the semantically structural information is not well preserved. Thus, these approaches cannot get desired results. In this paper, we propose a novel framework, termed Cross-modal subspace learning via Kernel correlation maximization and Discriminative structure-preserving (CKD), to solve this problem in two aspects. Firstly, we construct a shared semantic graph to make each modality data preserve the neighbor relationship semantically. Secondly, we introduce the Hilbert-Schmidt Independence Criteria (HSIC) to ensure the consistency between feature-similarity and semantic-similarity of samples. Our model not only considers the inter-modality correlation by maximizing the kernel correlation but also preserves the semantically structural information within each modality. The extensive experiments are performed to evaluate the proposed framework on the three public datasets. The experimental results demonstrate that the proposed CKD is competitive compared with the classic subspace learning methods. Keywords Cross-modal retrieval · Subspace learning · Kernel correlation · Discriminative · HSIC

1 Introduction Recently, the rapid development of the Internet and the explosive growth of multimedia including text, image, video, audio have greatly enriched people’s life but magnified the challenge of information retrieval. Representative image retrieval methods, such as Regionbased image retrieval [43], Color-based image retrieval [4], Contour Points Distribution Histogram(CPDH) [28], Inverse Document Frequency (IDF) [44], Content-based image Xiao-Jun Wu

wu [email protected] 1

The School of Artificial Intelligence and Computer Science, Jiangnan University, 214122, Wuxi, China

2

The Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, 214122, Wuxi, China

Multimedia Tools and Applications

retrieval [19], can not directly be applied to multimodal retrieval. Multimodal data refers to those different types of data but with the same semantic content, for example, recording video clips, music, photos and tweets of a concert. Cross-modal retrieval which aims to take one type of data as the query to return the relevant data of another type has attracted much attention. The cross-modal retrieval methods need to solve a basic problem, i.e. how to measure the relevance between heterogeneous modalities. There are two strategies to solve this problem: one is to directly calculate t

Data Loading...

Cross-modal subspace learning via kernel correlation maximization and discriminative structure-preserving

Recommend Documents

Canonical Correlation Discriminative Learning for Domain Adaptation

Low-Rank Discriminative Adaptive Graph Preserving Subspace Learning

Discriminative low-rank projection for robust subspace learning

Joint Learning of Distance Metric and Kernel Classifier via Multiple Kernel Learning

Multi-view Subspace Adaptive Learning via Autoencoder and Attention

Dual subspace learning via geodesic search on Stiefel manifold

Asymmetric discriminative correlation filters for visual tracking

Discriminative Learning in Biometrics

Statistical Learning and Kernel Methods

Geometric Estimation via Robust Subspace Recovery

A Meta-Q-Learning Approach to Discriminative Correlation Filter based Visual Tracking

Discriminative Context-Aware Correlation Filter Network for Visual Tracking