A novel semi supervised approach for text classification

PDF / 701,162 Bytes
11 Pages / 595.276 x 790.866 pts Page_size
40 Downloads / 245 Views

ORIGINAL RESEARCH

A novel semi supervised approach for text classification Debaditya Barman1 • Nirmalya Chowdhury2

Received: 25 December 2017 / Accepted: 2 April 2018 Ó Bharati Vidyapeeth’s Institute of Computer Applications and Management 2018

Abstract Text categorization, also known as text classification is a supervised classification problem. It aims to assign a predefined class label or group to a new or unknown text document. Most of the time we need a collection of large data from each class to train the classifier. It may be noted that, it is very hard or expensive to collect labelled text data. In most cases we assign the label manually which is neither cost effective nor efficient. In this paper, we have introduced a semi-supervised classification approach where the learner needs very small amount of labelled data with a large amount of unlabeled data to assign a class label to a new or unknown text document. The proposed method uses Kohonen self organizing map (SOM) for labelling the unlabeled data and three classifiers namely support vector machine (SVM), Naı¨ve Bayes (NB), and decision tree (DT): classification and regression tree (CART) for observing the accuracy of classification. The experimental results obtained show the effectiveness of our proposed method. Keywords Text categorization Semi supervised learning Kohonen self organizing map Naı¨ve Bayes Decision tree Support vector machine

& Nirmalya Chowdhury [email protected] Debaditya Barman [email protected] 1

Department of Computer and System Sciences, VisvaBharati, Santiniketan 731235, India

2

Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India

1 Introduction Advancement of social networking, blogging, micro-blogging has provided the opportunity to develop various applications in natural language processing (NLP). Text classification is a very popular research area of NLP. Over the last decade it grew exponentially [1–6] due to easy accessibility of the digital text documents. Text classification implies automated assignment of textual data to predefined classes. Sometimes either the number of such classes is not known or the class labels are not known. In this case initially some clustering technique is employed to obtain the appropriate grouping of a given set of text documents, then such groups are labelled based on some criteria or heuristic. Several machine learning algorithms had been applied successfully to categorize text documents based on their content. Perhaps Naive Bayes (NB) algorithm is the frequently used classifier to solve text categorization problem. Researchers used two different generative models: multivariate Bernoulli [7–11] and multinomial [12–16] event model while designing the NB classifier. Algorithms based on artificial neural network (ANN) [17–19], decision tree (DT) [20–23], k-nearest neighbor (KNN) [24–27], and SVM [28, 29] had been employed frequently to solve the text categorization problem. Ruiz and Srinivasan [5] proposed a method based on princi

Data Loading...

A novel semi supervised approach for text classification

Recommend Documents

Semi-Supervised Classification

HydraMix-Net: A Deep Multi-task Semi-supervised Learning Approach for Cell Detection and Classification

Semi-supervised Classification of Chest Radiographs

Joint Semi-supervised Similarity Learning for Linear Classification

Joint consensus and diversity for multi-view semi-supervised classification

Expanding Training Set for Graph-Based Semi-supervised Classification

MixLab: An Informative Semi-supervised Method for Multi-label Classification

Semi-Supervised Consensus Clustering for ECG Pathology Classification

Hierarchical graph attention networks for semi-supervised node classification

Semi-supervised Weighted Ternary Decision Structure for Multi-category Classification

Semi-Supervised

Fine-grained Multi-label Sexism Classification Using Semi-supervised Learning