Semantic ranking structure preserving for cross-modal retrieval

  • PDF / 1,980,649 Bytes
  • 11 Pages / 595.224 x 790.955 pts Page_size
  • 15 Downloads / 154 Views

DOWNLOAD

REPORT


Semantic ranking structure preserving for cross-modal retrieval Hui Liu1,2 · Yong Feng1,2

· Mingliang Zhou1,3 · Baohua Qiang4,5

Accepted: 3 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Cross-modal retrieval not only needs to eliminate the heterogeneity of modalities, but also needs to constrain the return order of retrieval results. Accordingly, we propose a novel common representation space learning method, called Semantic Ranking Structure Preserving (SRSP) for Cross-modal Retrieval in this paper. First, the dependency relationship between labels is used to minimize the discriminative loss of multi-modal data and mine potential relationships between samples to get richer semantic information in the common space. Second, we constrain the correlation ranking of representations in common space, so as to break the modal gap and promote the multi-modal correlation learning. The comprehensive experimental comparison results show that our algorithm substantially enhances the performance and consistently outperforms very recent algorithms in terms of widely used cross-modal benchmark datasets. Keywords Cross-modal retrieval · Common space learning · Graph convolutional · Semantic structure preserving

1 Introduction Due to the rapid growth of multimedia data, the requirements for information retrieval are no longer limited to a single modal data, making cross-modal retrieval draws

 Yong Feng

[email protected] Hui Liu [email protected] Mingliang Zhou [email protected] 1

College of Computer Science, Chongqing University, Chongqing 400030, China

2

Key Laboratory of Dependable Service Computing in Cyber Physical Society, Ministry of Education, Chongqing University, Chongqing 400030, China

3

State Key Lab of Internet of Things for Smart City, University of Macau, Taipa, Macau 999078, China

4

Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, China

5

Guangxi Key Laboratory of Optoelectronic Information Processing, Guilin University of Electronic Technology, Guilin 541004, China

interest of researchers in recent years [1–3]. Cross-modal retrieval, which retrieves semantically similar data from another modality (e.g., image) by taking one modality (e.g., text) as a query, shows more flexible performance than single-modal retrieval [4–6]. Due to the inconsistency of the feature representation and distribution of data from different modalities, the similarity of multi-modal data cannot be directly measured. In order to eliminate the heterogeneity of different modalities, various cross-modal methods have been proposed, such as common space learning [7, 8], cross-modal similarity measurement [9, 10], relevant feedback analysis, etc. The common space learning methods map the respective features of different modal data into a common semantic representation space. The similarity between different modal data can be directly measured in this space by the distance metrics. In particular, in order to increase t