Text classification algorithms for mining unstructured data: a SWOT analysis

  • PDF / 817,221 Bytes
  • 11 Pages / 595.276 x 790.866 pts Page_size
  • 108 Downloads / 337 Views

DOWNLOAD

REPORT


ORIGINAL RESEARCH

Text classification algorithms for mining unstructured data: a SWOT analysis Akshi Kumar1



Vikrant Dabas2 • Parul Hooda1

Received: 14 March 2017 / Accepted: 21 December 2017  Bharati Vidyapeeth’s Institute of Computer Applications and Management 2018

Abstract It has become increasingly crucial and imperative to facilitate knowledge extraction for decision support and deliver targeted information to analysts that span wide application domains. Interestingly, the buzzing term ‘‘big data’’ which is estimated to be 90% unstructured further makes it difficult to tap and analyze information with traditional tools. Text mining entails defining a process which transforms and substitutes this unstructured data into a structured one to discover knowledge. Use of classification algorithms to intelligently mine text has been studied extensively across literature. This study predominantly surveys the text classification algorithms employed in the process of mining unstructured data to report a conclusive analysis on the trend of their use in terms of their respective strengths, weaknesses, opportunities and threats (SWOT). The scope of these algorithms is then explored apropos the application area of sentiment analysis, a typical text classification task. A mapping which determines the unexplored social media technologies and the extent of use of these algorithms within respective social media is proffered to give an insight to the amount of work that has been done in the domain of machine learning based sentiment analysis on social media. & Akshi Kumar [email protected] Vikrant Dabas [email protected] Parul Hooda [email protected] 1

Department of Computer Science and Engineering, Delhi Technological University, New Delhi, India

2

Department of Computer Science, College of Computing and Informatics, University of North Carolina, Charlotte, USA

Keywords Web mining  Text mining  Text classification  SWOT  Sentiment analysis  Social media

1 Introduction Since its inception, the World Wide Web (WWW or Web) has been recognized as the largest transformable-information construct. It is characterized as an interactive, hyperlinked, heterogeneous, distributed and dynamic channel to disseminate information. By all measures, the Web is considerably vast and moreover its infusion in day-to-day e-activities has compelled transforming and expanding the existing technological revolutions of information-based era to establish a novel knowledge-based era. It has become increasingly crucial and imperative to provide users with technologies for distilling this untapped source of information. Recent pertinent literature reports that approximately 80% of the amount of information available on the Web is in unstructured formats, such as Email, news articles, Web pages [1]. Interestingly, the buzzing term ‘‘big data’’ refers to extremely large datasets that are complex and hard to analyze with conventional tools. Though bigdata can include both structured and unstructured data, but IDC [2] estimates t