Toward Optimized Multimodal Concept Indexing

Information retrieval on the (social) web moves from a pure term-frequency-based approach to an enhanced method that includes conceptual multimodal features on a semantic level. In this paper, we present an approach for semantic-based keyword search and f

PDF / 484,201 Bytes
12 Pages / 439.37 x 666.142 pts Page_size
87 Downloads / 187 Views

DOWNLOAD

REPORT

Abstract. Information retrieval on the (social) web moves from a pure term-frequency-based approach to an enhanced method that includes conceptual multimodal features on a semantic level. In this paper, we present an approach for semantic-based keyword search and focus especially on its optimization to scale it to real-world sized collections in the social media domain. Furthermore, we present a faceted indexing framework and architecture that relates content to semantic concepts to be indexed and searched semantically. We study the use of textual concepts in a social media domain and observe a signiﬁcant improvement from using a concept-based solution for keyword searching. We address the problem of time-complexity that is critical issue for concept-based methods by focusing on optimization to enable larger and more real-world style applications. Keywords: Semantic indexing

1

· Concept · Social web · Word2Vec

Introduction

The past decade has witnessed the massive growth of the social web, the continued impact and expansion of the world wide web and the increasing importance and synergy of content modalities, such as text, images, videos, opinions, and other data. There are currently about 200 active social networks1 that attract visitors in the range of the 100s of millions each month. Online visitors spend considerable amounts of time on social network platforms where they constantly contribute, consume, and implicitly evaluate content. The Facebook community alone, with over 1.2 billion members, shares the impressive amount of 30 billion pieces of content every month [15]. The knowledge contained in these massive data networks is unprecedented and, when harvested, can be made useful for many applications. Although research has started to automatically mine information from these rich sources, the problem of knowledge extraction from multimedia content remains diﬃcult. The main challenges are the heterogeneity of the data, the scalability of the processing methods and the reliability of their predictions. 1

http://en.wikipedia.org/wiki/List of social networking websites.

c Springer International Publishing Switzerland 2015 J. Cardoso et al. (Eds.): KEYWORD 2015, LNCS 9398, pp. 141–152, 2015. DOI: 10.1007/978-3-319-27932-9 13

142

N. Rekabsaz et al.

In order to address these challenges in the social web domain, recent researches exploit the use of semantics in multimodal information retrieval and specially in image retrieval [11]. However, the focus resided on image processing and, so far, the methods used for text similarity for the purpose of multimodal retrieval are fairly mainstream [22]. In this work, we focus on semantic-based keyword search while speciﬁcally considering the optimization of the processing time, thus making our approach manageable in an information system. This paper has two contributions. As the first contribution, we explored the eﬀect of semantic similarity and optimization methods in text-based image retrieval in social media by applying Word2Vec [16] and Random Indexing (RI) [

Data Loading...

Toward Optimized Multimodal Concept Indexing

Recommend Documents

Indexing

Multimodal Interaction with W3C Standards Toward Natural User Interf

Toward Rapid Stroke Diagnosis with Multimodal Deep Learning

Indexing

Indexing

Indexing, Hilbert R-Tree, Spatial Indexing, Multimedia Indexing

Indexing Trajectories

Indexing Granularity

Speaker Indexing

Indexing, Spatial

Distance Indexing

Spatial Indexing