An integrated semi-automated framework for domain-based polarity words extraction from an unannotated non-English corpus

PDF / 2,602,699 Bytes
28 Pages / 439.37 x 666.142 pts Page_size
62 Downloads / 295 Views

An integrated semi‑automated framework for domain‑based polarity words extraction from an unannotated non‑English corpus Mohammed Kaity1 · Vimala Balakrishnan1

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Building sentiment analysis resources is a fundamental step before developing any sentiment analysis model. Sentiment lexicons are one of these critical resources. However, many non-English languages suffer from a severe shortage of these resources and lexicons. This study proposes an integrated framework for extracting domain-based polarity words from unannotated massive non-English corpus. The framework consists of three layers, namely lexicon-based, corpus-based and humanbased. The first two layers automatically recognize and extract new polarity words from a massive unannotated corpus using initial seed lexicons. A key advantage of the proposed framework is that it only needs an initial seed lexicon and unannotated corpus to start the extraction process. Therefore, the framework is semi-automated due to the use of seed lexicons. Experiments on three languages indicate the proposed framework outperformed existing lexicons, achieving F-scores of 77.8%, 83.8% and 68.6% for the Arabic, French and Malay lexicons, respectively. Keywords Multilingual sentiment analysis · Sentiment lexicon · Polarity words · Social media analysis · Unannotated corpus

1 Introduction Over the past 2 decades, sentiment analysis on social media data has received increasing interests. The primary aim of sentiment analysis is to extract embedded opinions found in a given data, such as opinions on products, services, news, social and political events [1, 2]. Many techniques have been developed to classify opinions, one of which is the sentiment lexicon-based approach. A sentiment lexicon is described as a list of opinion or opinionated words and phrases with their sentiment * Vimala Balakrishnan [email protected] 1

Department of Information Systems, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia

13

Vol.:(0123456789)

M. Kaity, V. Balakrishnan

categories or orientations [3–5]. The sentiment orientations indicate the polarity and strength of the words and phrases in the sentiment lexicon (e.g. positive, negative, 1, − 1). Sentiment lexicons can be employed for lexicon-based classification to calculate text polarity (i.e. positive, negative, or neutral) by collecting the orientation values of the polarity or sentiment words in context [3, 6]. Furthermore, sentiment lexicons have been shown to be extremely helpful when used to extract features using machine learning algorithms [5, 7]. Although several researchers have studied the problem of building and expanding sentiment lexicons, there are still many unresolved limitations. For example, the majority of those studies focused on Englishbased sentiment lexicons, while in many other languages these lexicons are either limited or not available, such as Arabic, French and Malay. More oft

Data Loading...

An integrated semi-automated framework for domain-based polarity words extraction from an unannotated non-English corpus

Recommend Documents

An integrated P2P framework for E-learning

Wellness Protocol for Smart Homes An Integrated Framework for Ambien

ReaderBench : An Integrated Cohesion-Centered Framework

Evaluation Framework for Automatic Ontology Extraction Tools: An Experiment

An Integrated Framework for Energy-Economy-Emissions Modeling A Case

An Integrated Framework for 24-hours Fire Detection

An integrated framework for visualizing and forecasting realized covariance matrices

Towards an Integrated Framework for Semantic Product Memories

An integrated framework of change management for social CRM implementation

An Integrated Object and Machine Learning Approach for Tree Canopy Extraction from UAV Datasets

Power Extraction from Several Interconnecting Solar PV Networks for an Electrically Integrated TEG System Under Weather

Virtual element method (VEM)-based topology optimization: an integrated framework