Interactive Big Data Visualization Model Based on Hot Issues (Online News Articles)

Big data is a popular term used to describe a massive volume of data, which is a key component of the current information age. Such data is complex and difficult to understand, and therefore, may be not useful for users in that state. News extraction, agg

  • PDF / 1,130,505 Bytes
  • 11 Pages / 439.37 x 666.14 pts Page_size
  • 95 Downloads / 211 Views

DOWNLOAD

REPORT


)

1

3

Advanced Analytics Engineering Center (AAEC), Shah Alam, Malaysia [email protected] 2 Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, 40450 Shah Alam, Selangor, Malaysia {zaleha,nasiroh}@tmsk.uitm.edu.my Faculty of Computer and Information Technology, Al Madinah International University, Shah Alam, Malaysia [email protected] Abstract. Big data is a popular term used to describe a massive volume of data, which is a key component of the current information age. Such data is complex and difficult to understand, and therefore, may be not useful for users in that state. News extraction, aggregation, clustering, news topic detection and tracking, and social network analysis are some of the several attempts that have been made to manage the massive data in social media. Current visualization tools are difficult to adapt to the constant growth of big data, specifically in online news articles. Therefore, this paper proposes Interactive Big Data Visualization Model Based on Hot Issues (IBDVM). IBDVM can be used to visualize hot issues in daily news articles. It is based on textual data clusters in textual databases that improve the performance, accuracy, and quality of big data visualization. This model is useful for online news reader, news agencies, editors, and researchers who involve in textual documents domains. Keywords: Big data · Visual analytics · Interactive visualization · Clustering · Information extraction

1

Introduction

The present immense use of computers in daily life has led to the creation of big data. Big data refers to a collection of vast data stored together, and this term is used to describe the exponential growth of structured and unstructured data. Big data has three characteristics: volume, velocity, and variety [1, 2]. Volume refers to the data size, velocity represents the data speed, and variety denotes the data type (e.g., text, video, or music). It is a dataset that cannot be analyzed by relational database tools, statistical analysis tools, and visualization aids that have become popular in the past 20 years, during which the digitized sensor data have begun to develop rapidly. As such, computer scientists, economists, political scien‐ tists, bio-informaticists, and other scholars are finding efficient ways to access massive quantities of information and represent these data in meaningful ways.

© Springer Nature Singapore Pte Ltd. 2016 M.W. Berry et al. (Eds.): SCDS 2016, CCIS 652, pp. 89–99, 2016. DOI: 10.1007/978-981-10-2777-2_8

90

W.M.S. Yafooz et al.

Handling big data involves four main issues: storage, management, processing, and visualization. For storage, big data requires a huge storage capacity. Big data manage‐ ment, which encompasses data organization, is the most important among these issues. In general, a relational database is typically used for structured and small data. Big data is difficult to process using traditional database techniques because the process of loading the data into a traditional relational database is too time-con