Fine-Grained Instance-Level Sketch-Based Image Retrieval

  • PDF / 2,177,196 Bytes
  • 17 Pages / 595.276 x 790.866 pts Page_size
  • 109 Downloads / 217 Views

DOWNLOAD

REPORT


Fine-Grained Instance-Level Sketch-Based Image Retrieval Qian Yu1,2

· Jifei Song2 · Yi-Zhe Song2 · Tao Xiang2 · Timothy M. Hospedales2,3

Received: 29 March 2019 / Accepted: 5 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract The problem of fine-grained sketch-based image retrieval (FG-SBIR) is defined and investigated in this paper. In FG-SBIR, free-hand human sketch images are used as queries to retrieve photo images containing the same object instances. It is thus a cross-domain (sketch to photo) instance-level retrieval task. It is an extremely challenging problem because (i) visual comparisons and matching need to be executed under large domain gap, i.e., from black and white line drawing sketches to colour photos; (ii) it requires to capture the fine-grained (dis)similarities of sketches and photo images while free-hand sketches drawn by different people present different levels of deformation and expressive interpretation; and (iii) annotated crossdomain fine-grained SBIR datasets are scarce, challenging many state-of-the-art machine learning techniques, particularly those based on deep learning. In this paper, for the first time, we address all these challenges, providing a step towards the capabilities that would underpin a commercial sketch-based object instance retrieval application. Specifically, a new largescale FG-SBIR database is introduced which is carefully designed to reflect the real-world application scenarios. A deep cross-domain matching model is then formulated to solve the intrinsic drawing style variability, large domain gap issues, and capture instance-level discriminative features. It distinguishes itself by a carefully designed attention module. Extensive experiments on the new dataset demonstrate the effectiveness of the proposed model and validate the need for a rigorous definition of the FG-SBIR problem and collecting suitable datasets. Keywords Fine-grained · Sketch understanding · Image retrieval · Cross-modality · Deep learning

1 Introduction Existing image retrieval paradigms are still dominated by methods that use text or exemplar images as input Communicated by Patrick Perez. Q. Yu, J. Song have contributed equally to this work.

B

Qian Yu [email protected] Jifei Song [email protected] Yi-Zhe Song [email protected] Tao Xiang [email protected] Timothy M. Hospedales [email protected]

1

Beihang University, Beijing, China

2

SketchX, CVSSP, University of Surrey, Surrey, UK

3

University of Edinburgh, Edinburgh, UK

(Krizhevsky and Hinton 2011; Moulin et al. 2014; Johnson et al. 2015; Noh et al. 2017). Since the main applications of image retrieval is to find specific object instances (e.g., a particular shoe worn by a pedestrian that one just saw on the street), the two modalities have different strengths and weaknesses: textual queries are easy to obtain (just involving typing some words), but often unable to accurately describe the visual appearance of the object instance (e.g., it can be a tall order for a non-fashion-e