Automatic Attribute Discovery with Neural Activations
How can a machine learn to recognize visual attributes emerging out of online community without a definitive supervised dataset? This paper proposes an automatic approach to discover and analyze visual attributes from a noisy collection of image-text data
- PDF / 2,716,038 Bytes
- 17 Pages / 439.37 x 666.142 pts Page_size
- 32 Downloads / 195 Views
University of North Carolina at Chapel Hill, Chapel Hill, USA [email protected] 2 NTT Media Intelligence Laboratories, Yokosuka, Japan 3 Tohoku University, Sendai, Japan
Abstract. How can a machine learn to recognize visual attributes emerging out of online community without a definitive supervised dataset? This paper proposes an automatic approach to discover and analyze visual attributes from a noisy collection of image-text data on the Web. Our approach is based on the relationship between attributes and neural activations in the deep network. We characterize the visual property of the attribute word as a divergence within weakly-annotated set of images. We show that the neural activations are useful for discovering and learning a classifier that well agrees with human perception from the noisy realworld Web data. The empirical study suggests the layered structure of the deep neural networks also gives us insights into the perceptual depth of the given word. Finally, we demonstrate that we can utilize highly-activating neurons for finding semantically relevant regions.
Keywords: Concept discovery detection
1
·
Attribute discovery
·
Saliency
Introduction
In a social photo sharing service such as Flickr, Pinterest or Instagram, a new word can emerge at any moment, and even the same word can change its semantics and transforms our vocabulary set at any time. For instance, the word wicked (literally means evil or morally wrong) is often used as a synonym of really among teenagers in these recent years - “Wow, that game is wicked awesome! ”. In such a dynamic environment, how can we discover emerging visual concepts and build a visual classifier for each concept without a concrete dataset? It is unrealistic to manually build a high-quality dataset for learning every visual concepts for every application domains, even if some of the difficulty can be mitigated by the human-in-the-loop approach [2,3]. All we have are the observations but not definitions, provided in the form of co-occurring words and images. In this paper, we consider an automatic approach to learn visual attributes from the open-world vocabulary on the Web. There have been numerous attempts of learning novel concepts from the Web in the past [1,5,6,9,37]. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part IV, LNCS 9908, pp. 252–268, 2016. DOI: 10.1007/978-3-319-46493-0 16
Automatic Attribute Discovery with Neural Activations
253
What distinguishes our work from the previous efforts is in that we try to understand potentially-attribute words in terms of perception inside neural networks. Deep networks demonstrate outstanding performance in object recognition [11,15,27], and successfully apply to a wide range of tasks including learning from noisy data [30,31] or sentiment analysis [13,34]. In this paper, we focus on the analysis of neural activations to identify the degree of being visually perceptible, namely visualness of a given attribute, and take advantage of the layered structure of the deep model to deter
Data Loading...