Text classification algorithms for mining unstructured data: a SWOT analysis

PDF / 817,221 Bytes
11 Pages / 595.276 x 790.866 pts Page_size
108 Downloads / 551 Views

ORIGINAL RESEARCH

Text classification algorithms for mining unstructured data: a SWOT analysis Akshi Kumar1

•

Vikrant Dabas2 • Parul Hooda1

Received: 14 March 2017 / Accepted: 21 December 2017 Bharati Vidyapeeth’s Institute of Computer Applications and Management 2018

Abstract It has become increasingly crucial and imperative to facilitate knowledge extraction for decision support and deliver targeted information to analysts that span wide application domains. Interestingly, the buzzing term ‘‘big data’’ which is estimated to be 90% unstructured further makes it difficult to tap and analyze information with traditional tools. Text mining entails defining a process which transforms and substitutes this unstructured data into a structured one to discover knowledge. Use of classification algorithms to intelligently mine text has been studied extensively across literature. This study predominantly surveys the text classification algorithms employed in the process of mining unstructured data to report a conclusive analysis on the trend of their use in terms of their respective strengths, weaknesses, opportunities and threats (SWOT). The scope of these algorithms is then explored apropos the application area of sentiment analysis, a typical text classification task. A mapping which determines the unexplored social media technologies and the extent of use of these algorithms within respective social media is proffered to give an insight to the amount of work that has been done in the domain of machine learning based sentiment analysis on social media. & Akshi Kumar [email protected] Vikrant Dabas [email protected] Parul Hooda [email protected] 1

Department of Computer Science and Engineering, Delhi Technological University, New Delhi, India

2

Department of Computer Science, College of Computing and Informatics, University of North Carolina, Charlotte, USA

Keywords Web mining Text mining Text classification SWOT Sentiment analysis Social media

1 Introduction Since its inception, the World Wide Web (WWW or Web) has been recognized as the largest transformable-information construct. It is characterized as an interactive, hyperlinked, heterogeneous, distributed and dynamic channel to disseminate information. By all measures, the Web is considerably vast and moreover its infusion in day-to-day e-activities has compelled transforming and expanding the existing technological revolutions of information-based era to establish a novel knowledge-based era. It has become increasingly crucial and imperative to provide users with technologies for distilling this untapped source of information. Recent pertinent literature reports that approximately 80% of the amount of information available on the Web is in unstructured formats, such as Email, news articles, Web pages [1]. Interestingly, the buzzing term ‘‘big data’’ refers to extremely large datasets that are complex and hard to analyze with conventional tools. Though bigdata can include both structured and unstructured data, but IDC [2] estimates t

Data Loading...

Text classification algorithms for mining unstructured data: a SWOT analysis

Recommend Documents

Empirical Analysis of Classification Algorithms in Data Stream Mining

Text Data Mining

Sentiment-Based Data Mining Approach for Classification and Analysis

Data Augmentation with Transformers for Text Classification

Pre-trained Data Augmentation for Text Classification

Representing unstructured text semantics for reasoning purpose

Building a Machine Learning Model for Unstructured Text Classification: Towards Hybrid Approach

A multi-criteria based SWOT analysis of sustainable planning for mining and mineral industry in Pakistan

Data Mining Classification Models for Industrial Planning

Text Mining for Qualitative Data Analysis in the Social Sciences A S

Text classification and sentiment analysis

Recursive Neural Text Classification Using Discourse Tree Structure for Argumentation Mining and Sentiment Analysis Task