Semantics-preserving hashing based on multi-scale fusion for cross-modal retrieval

PDF / 1,206,876 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
11 Downloads / 194 Views

Semantics-preserving hashing based on multi-scale fusion for cross-modal retrieval Hong Zhang 1,2 & Min Pan 1,2 Received: 13 March 2020 / Revised: 18 August 2020 / Accepted: 11 September 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

Research on hash-based cross-modal retrieval has been a hotspot in the field of content-based multimedia retrieval research. Most deep cross-modal hashing methods only consider intermodal loss that can remain local information of training data, and ignore the loss within data samples of the same modality that can remain the global information of dataset. In addition, they also ignore the factor that different scales of single modal data contain different semantic information, which affects the representation of data features. In this paper, we propose a semantics-preserving hashing method based on multi-scale fusion. More concretely, a multiscale fusion pooling model is proposed for both image feature training network and text feature training network. Therefore, we can extract the multi-scale features of image dataset and solve the sparsity problem of text BOW vectors. When constructing the loss function, we consider intra-modal loss while considering inter-modal loss. Therefore, the output hash code retains both global and local underlying semantic correlation when image and text feature training network are trained. Experiment results on NUS-WIDE and MIRFlickr-25 K prove that against other existing methods, our algorithm improves cross-modal retrieval accuracy. Keywords Cross-modal retrieval . Multi-scale fusion . Hash learning . Semantics preserving . Deep learning

1 Introduction Development in information technology has led to explosive growth of multimedia data. At the same time, people’s demand for information search to obtain diverse results is increasing.

* Hong Zhang [email protected]

1

College of Computer Science & Technology, Wuhan University of Science & Technology, Wuhan 430081, China

2

Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan, China

Multimedia Tools and Applications

Therefore, there are more and more researches [18, 23, 33, 19, 31, 20, 32, 15] on multimedia data analysis and cross-modal retrieval technology. Cross-modal retrieval is point to all relevant data of other modalities are accurately and quickly retrieved through the data of one modal. Hash learning is widely used in cross-modal retrieval models [27, 21, 29, 1], because of its good low storage and efficient retrieval. In the past few decades of research, there are many hash methods for single-modal retrieval [25, 22, 16, 14, 8, 13, 35]. However, these methods are not suitable for cross-modal hash retrieval, because of the semantic gap between data in different modalities. Most existing cross-modal retrieval hashing methods [34] solve semantic gaps by mining the correlations of different modal data. The main cross-modal hashing methods can be divided into two categories: deep cross-modal hashing

Data Loading...

Semantics-preserving hashing based on multi-scale fusion for cross-modal retrieval

Recommend Documents

Autoencoder-based self-supervised hashing for cross-modal retrieval

Kernel-Based Supervised Discrete Hashing for Image Retrieval

An efficient retrieval approach for encrypted speech based on biological hashing and spectral subtraction

A Novel Unsupervised Hashing Method for Image Retrieval Based on K-Reciprocal Nearest Neighbors

A retrieval algorithm for encrypted speech based on convolutional neural network and deep hashing

DSHPoolF: deep supervised hashing based on selective pool feature map for image retrieval

Deep hashing for multi-label image retrieval: a survey

Semi-supervised discrete hashing for efficient cross-modal retrieval

Sensitivity based image filtering for multi-hashing in large scale image retrieval problems

Fast Similar Patient Retrieval from Large Scale Healthcare Data: A Deep Learning-Based Binary Hashing Approach

Information Fusion in Multimedia Information Retrieval

Hashing