Information Abstraction from Crises Related Tweets Using Recurrent Neural Network
Social media has become an important open communication medium during crises. The information shared about a crisis in social media is massive, complex, informal and heterogeneous, which makes extracting useful information a difficult task. This paper pre
- PDF / 703,133 Bytes
- 12 Pages / 439.37 x 666.142 pts Page_size
- 22 Downloads / 256 Views
Abstract. Social media has become an important open communication medium during crises. The information shared about a crisis in social media is massive, complex, informal and heterogeneous, which makes extracting useful information a difficult task. This paper presents a first step towards an approach for information extraction from large Twitter data. In brief, we propose a Recurrent Neural Network based model for text generation able to produce a unique text capturing the general consensus of a large collection of twitter messages. The generated text is able to capture information about different crises from tens of thousand of tweets summarized only in a 2000 characters text.
Keywords: Information abstraction Twitter data · Crisis management
1
·
Recurrent neural network
·
Introduction
Social media has become the de facto open crises communication medium [1]. It plays a pivotal role in most crises today, from getting life signs from people affected to communicating with responders [2]. However, processing and extracting useful information and inferring valuable knowledge from such social media messages is difficult for several reasons. The messages are typically brief, informal, and heterogeneous (mix of languages, acronyms, and misspellings) with varying quality, and it may be required to know the context of a message to understand its meaning. Moreover, people also post information on other mundane events, which introduces additional noise into the data. The state-of-the-art in the area of information discovery using machine learning mostly centres on supervised learning techniques. Those techniques are based on training an algorithm on sets of text from each topic to learn a predictive function, which in turn is used to classify new texts into a previously learnt topic [3]. A limitation of this approach is the scope of the topics: If a new text about an unforeseen topic is presented to the algorithm, such as a new crisis, it will wrongly classify it as one of the existing topics. Another challenge is that crises are diverse and the number of topics discussed in social media during a single c IFIP International Federation for Information Processing 2016 Published by Springer International Publishing Switzerland 2016. All Rights Reserved L. Iliadis and I. Maglogiannis (Eds.): AIAI 2016, IFIP AICT 475, pp. 441–452, 2016. DOI: 10.1007/978-3-319-44944-9 38
442
M. Ben Lazreg et al.
crisis is large, dynamic, and changing from crisis to crisis. Moreover, applying a classifier trained on data from previous disasters on the next disaster may not perform well in practise. This can be explained by the fact that the next disaster will typically be more or less unique compared to the previous ones. Accordingly, a loss of accuracy occurs even if the crises have many similarities. Alternatively, unsupervised techniques try to look for co-occurrences of terms in the text as a metric of similarity [4] or infer the word distribution of a set of words the text contains and use it for document clustering [5]. Moreover, differen
Data Loading...