Text classification algorithms for mining unstructured data: a SWOT analysis
- PDF / 817,221 Bytes
- 11 Pages / 595.276 x 790.866 pts Page_size
- 108 Downloads / 337 Views
ORIGINAL RESEARCH
Text classification algorithms for mining unstructured data: a SWOT analysis Akshi Kumar1
•
Vikrant Dabas2 • Parul Hooda1
Received: 14 March 2017 / Accepted: 21 December 2017 Bharati Vidyapeeth’s Institute of Computer Applications and Management 2018
Abstract It has become increasingly crucial and imperative to facilitate knowledge extraction for decision support and deliver targeted information to analysts that span wide application domains. Interestingly, the buzzing term ‘‘big data’’ which is estimated to be 90% unstructured further makes it difficult to tap and analyze information with traditional tools. Text mining entails defining a process which transforms and substitutes this unstructured data into a structured one to discover knowledge. Use of classification algorithms to intelligently mine text has been studied extensively across literature. This study predominantly surveys the text classification algorithms employed in the process of mining unstructured data to report a conclusive analysis on the trend of their use in terms of their respective strengths, weaknesses, opportunities and threats (SWOT). The scope of these algorithms is then explored apropos the application area of sentiment analysis, a typical text classification task. A mapping which determines the unexplored social media technologies and the extent of use of these algorithms within respective social media is proffered to give an insight to the amount of work that has been done in the domain of machine learning based sentiment analysis on social media. & Akshi Kumar [email protected] Vikrant Dabas [email protected] Parul Hooda [email protected] 1
Department of Computer Science and Engineering, Delhi Technological University, New Delhi, India
2
Department of Computer Science, College of Computing and Informatics, University of North Carolina, Charlotte, USA
Keywords Web mining Text mining Text classification SWOT Sentiment analysis Social media
1 Introduction Since its inception, the World Wide Web (WWW or Web) has been recognized as the largest transformable-information construct. It is characterized as an interactive, hyperlinked, heterogeneous, distributed and dynamic channel to disseminate information. By all measures, the Web is considerably vast and moreover its infusion in day-to-day e-activities has compelled transforming and expanding the existing technological revolutions of information-based era to establish a novel knowledge-based era. It has become increasingly crucial and imperative to provide users with technologies for distilling this untapped source of information. Recent pertinent literature reports that approximately 80% of the amount of information available on the Web is in unstructured formats, such as Email, news articles, Web pages [1]. Interestingly, the buzzing term ‘‘big data’’ refers to extremely large datasets that are complex and hard to analyze with conventional tools. Though bigdata can include both structured and unstructured data, but IDC [2] estimates t
Data Loading...