A weighted ensemble-based active learning model to label microarray data

  • PDF / 1,824,452 Bytes
  • 15 Pages / 595.276 x 790.866 pts Page_size
  • 70 Downloads / 179 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

A weighted ensemble-based active learning model to label microarray data Rajonya De 1 & Anuran Chakraborty 1

&

Agneet Chatterjee 1 & Ram Sarkar 1

Received: 12 November 2019 / Accepted: 26 July 2020 # International Federation for Medical and Biological Engineering 2020

Abstract Classification of cancerous genes from microarray data is an important research area in bioinformatics. Large amount of microarray data are available, but it is very costly to label them. This paper proposes an active learning model, a semisupervised classification approach, to label the microarray data using which predictions can be made even with lesser amount of labeled data. Initially, a pool of unlabeled instances is given from which some instances are randomly chosen for labeling. Successive selection of instances to be labeled from unlabeled pool is determined by selection algorithms. The proposed method is devised following an ensemble approach to combine the decisions of three classifiers in order to arrive at a consensus which provides a more accurate prediction of the class label to ensure that each individual classifier learns in an uncorrelated manner. Our method combines the heuristic techniques used by an active learning algorithm to choose training samples with the multiple learning paradigm attained by an ensemble to optimize the search space by choosing efficiently from an already sparse learning pool. On evaluating the proposed method on 10 microarray datasets, we achieve performance which is comparable with state-ofthe-art methods. The code and datasets are given at https://github.com/anuran-Chakraborty/Active-learning. Keywords Active learning . Classifier ensemble . Gene expression . Cancer classification

1 Introduction Machine learning algorithms can be broadly classified into two categories, namely, supervised learning algorithms and unsupervised learning algorithms. Supervised learning algorithms involve classifying samples based on their class labels. The success of any supervised learning model depends on the availability of large amount of labeled data. In traditional supervised learning algorithm, the system receives all the training samples and develops a model using the same. The motivation behind developing an active learning framework seeds from this very ground. In the modern world, cheap, unlabeled data are plenty, but obtaining their labels is generally costly. An active learner, unlike its passive counterpart, not only trains its model but also interacts with the unlabeled data to choose the most optimal samples which make the model more

* Anuran Chakraborty [email protected] 1

Computer Science and Engineering, Jadavpur University, 188, Raja Subodh Chandra Mallick Road, Kolkata 700032, India

robust in terms of classification ability, updating the model with every such interaction. An active learner does not deal with a fixed size training data, increasing the same at every iteration of the algorithm. It is assumed that this freedom reduces the learner’s need for large quantitie