Deep Hashing with Hash-Consistent Large Margin Proxy Embeddings

  • PDF / 1,328,589 Bytes
  • 20 Pages / 595.276 x 790.866 pts Page_size
  • 39 Downloads / 209 Views

DOWNLOAD

REPORT


Deep Hashing with Hash-Consistent Large Margin Proxy Embeddings Pedro Morgado1

· Yunsheng Li1 · Jose Costa Pereira2 · Mohammad Saberian3 · Nuno Vasconcelos1

Received: 20 April 2019 / Accepted: 21 July 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Image hash codes are produced by binarizing the embeddings of convolutional neural networks (CNN) trained for either classification or retrieval. While proxy embeddings achieve good performance on both tasks, they are non-trivial to binarize, due to a rotational ambiguity that encourages non-binary embeddings. The use of a fixed set of proxies (weights of the CNN classification layer) is proposed to eliminate this ambiguity, and a procedure to design proxy sets that are nearly optimal for both classification and hashing is introduced. The resulting hash-consistent large margin (HCLM) proxies are shown to encourage saturation of hashing units, thus guaranteeing a small binarization error, while producing highly discriminative hash-codes. A semantic extension (sHCLM), aimed to improve hashing performance in a transfer scenario, is also proposed. Extensive experiments show that sHCLM embeddings achieve significant improvements over state-of-the-art hashing procedures on several small and large datasets, both within and beyond the set of training classes. Keywords Proxy embeddings · Metric learning · Image retrieval · Hashing · Transfer learning

1 Introduction Image retrieval is a classic problem in computer vision. Given a query image, a nearest-neighbor search is performed on an image database, using a suitable image representation and similarity function (Smeulders et al. 2000). Hashing methods enable efficient search by representing each image with a binary string, known as the hash code. This enables efficient indexing mechanisms, such as hash tables, or similarity functions, such as Hamming distances, implementable with logical operations. The goal is thus to guarantee that similar images are represented by similar hash codes (Andoni and Indyk 2006; Datar et al. 2004; Mu and Yan 2010). Early hashing techniques approximated nearest neighbor searches between low-level features (Datar et al. 2004; Mu and Yan 2010; Gong et al. 2013; Weiss et al. 2009). However, humans judge similarity based on image semantics, such Communicated by Li Liu, Matti Pietikäinen, Jie Qin, Jie Chen, Wanli Ouyang, Luc Van Gool.

B

Pedro Morgado [email protected]

1

Department of Electrical and Computer Engineering, University of California, San Diego, USA

2

Huawei Technologies, Noah’s Ark Lab, London, UK

3

Netflix, Scotts Valley, USA

as scenes, objects, and attributes. This inspired the use of semantic representations for image retrieval (Lampert et al. 2009; Rasiwasia et al. 2007; Li et al. 2010) and, by extension, hashing (Xia et al. 2014; Lin et al. 2015; Zhang et al. 2015; Zhong et al. 2016). Modern hashing techniques rely on semantic embeddings implemented with convolutional neural networks (CNNs), as illustrated in Fig. 1 (right). A CNN featu