Learning efficient text-to-image synthesis via interstage cross-sample similarity distillation
- PDF / 2,126,961 Bytes
- 12 Pages / 595 x 842 pts (A4) Page_size
- 16 Downloads / 163 Views
. RESEARCH PAPER .
February 2021, Vol. 64 120102:1–120102:12 https://doi.org/10.1007/s11432-020-2900-x
Special Focus on Deep Learning for Computer Vision
Learning efficient text-to-image synthesis via interstage cross-sample similarity distillation Fengling MAO1,2 , Bingpeng MA3* , Hong CHANG2,3 , Shiguang SHAN2,3,4 & Xilin CHEN2,3 1
School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; 2 Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; 3 University of Chinese Academy of Sciences, Beijing 100049, China; 4 CAS Center for Excellence in Brain Science and Intelligence Technology, Shanghai 200031, China
Received 21 January 2020/Revised 8 March 2020/Accepted 26 April 2020/Published online 17 November 2020
Abstract For a given text, previous text-to-image synthesis methods commonly utilize a multistage generation model to produce images with high resolution in a coarse-to-fine manner. However, these methods ignore the interaction among stages, and they do not constrain the consistent cross-sample relations of images generated in different stages. These deficiencies result in inefficient generation and discrimination. In this study, we propose an interstage cross-sample similarity distillation model based on a generative adversarial network (GAN) for learning efficient text-to-image synthesis. To strengthen the interaction among stages, we achieve interstage knowledge distillation from the refined stage to the coarse stages with novel interstage cross-sample similarity distillation blocks. To enhance the constraint on the cross-sample relations of the images generated at different stages, we conduct cross-sample similarity distillation among the stages. Extensive experiments on the Oxford-102 and Caltech-UCSD Birds-200-2011 (CUB) datasets show that our model generates visually pleasing images and achieves quantitatively comparable performance with state-of-the-art methods. Keywords
generative adversarial network (GAN), text-to-image synthesis, knowledge distillation
Citation Mao F L, Ma B P, Chang H, et al. Learning efficient text-to-image synthesis via interstage cross-sample similarity distillation. Sci China Inf Sci, 2021, 64(2): 120102, https://doi.org/10.1007/s11432-020-2900-x
1
Introduction
Image generation [1–3] has achieved remarkable progress owing to the flourishing development of deep learning. Many applications of image generation [4–11], such as style-transfer [5], video generation [6], image-to-image translation [8,9], image inpainting [7], and text-to-image synthesis [12–17], have attracted increasing attention. For a given text, the text-to-image synthesis task aims at producing images that are of high quality and semantically consistent with the given text. Serveral methods [12–17] for text-to-image synthesis have been proposed. Reed et al. [12] proposed the classic single-stage generative adversarial network (GAN) framework bas
Data Loading...