Semantically Tied Paired Cycle Consistency for Any-Shot Sketch-Based Image Retrieval

  • PDF / 4,034,538 Bytes
  • 20 Pages / 595.276 x 790.866 pts Page_size
  • 61 Downloads / 184 Views

DOWNLOAD

REPORT


Semantically Tied Paired Cycle Consistency for Any-Shot Sketch-Based Image Retrieval Anjan Dutta1

· Zeynep Akata2

Received: 14 May 2019 / Accepted: 19 June 2020 © The Author(s) 2020

Abstract Low-shot sketch-based image retrieval is an emerging task in computer vision, allowing to retrieve natural images relevant to hand-drawn sketch queries that are rarely seen during the training phase. Related prior works either require aligned sketchimage pairs that are costly to obtain or inefficient memory fusion layer for mapping the visual information to a semantic space. In this paper, we address any-shot, i.e. zero-shot and few-shot, sketch-based image retrieval (SBIR) tasks, where we introduce the few-shot setting for SBIR. For solving these tasks, we propose a semantically aligned paired cycle-consistent generative adversarial network (SEM-PCYC) for any-shot SBIR, where each branch of the generative adversarial network maps the visual information from sketch and image to a common semantic space via adversarial training. Each of these branches maintains cycle consistency that only requires supervision at the category level, and avoids the need of aligned sketch-image pairs. A classification criteria on the generators’ outputs ensures the visual to semantic space mapping to be class-specific. Furthermore, we propose to combine textual and hierarchical side information via an auto-encoder that selects discriminating side information within a same end-to-end model. Our results demonstrate a significant boost in any-shot SBIR performance over the state-of-the-art on the extended version of the challenging Sketchy, TU-Berlin and QuickDraw datasets.

1 Introduction Matching natural images with free-hand sketches, i.e. sketchbased image retrieval (SBIR) (Yu et al. 2015, 2016a; Liu et al. 2017; Pang et al. 2017; Song et al. 2017b; Shen et al. 2018; Zhang et al. 2018; Chen and Fang 2018; Kiran Yelamarthi et al. 2018; Dutta and Akata 2019; Dey et al. 2019) has received a lot of attention. Since sketches can effectively express shape, pose and some fine-grained details of the tarCommunicated by Jun-Yan Zhu, Hongsheng Li, Eli Shechtman, MingYu Liu, Jan Kautz, Antonio Torralba. Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11263-020-01350-x) contains supplementary material, which is available to authorized users.

B

Anjan Dutta [email protected] Zeynep Akata [email protected]

1

Department of Computer Science, Innovation Centre, University of Exeter, Streatham Campus, Exeter EX4 4RN, UK

2

Cluster of Excellence Machine Learning, Tübingen AI Center, University of Tübingen, 72076 Tübingen, Germany

get images, SBIR serves a favorable scenario complementary to the conventional text-image cross-modal retrieval or the classical content based image retrieval protocol. This may be because in some situations it is difficult to provide a textual description or a suitable image of the desired query, whereas, an user can easily draw a sketch of the desired object on a touch sc