An instance and variable selection approach in pixel-based classification for automatic white blood cells segmentation

  • PDF / 3,131,059 Bytes
  • 18 Pages / 595.276 x 790.866 pts Page_size
  • 76 Downloads / 210 Views

DOWNLOAD

REPORT


SHORT PAPER

An instance and variable selection approach in pixel‑based classification for automatic white blood cells segmentation Nesma Settouti1   · Meryem Saidi1 · Mohammed El Amine Bechar1 · Mostafa El Habib Daho1 · Mohamed Amine Chikh1 Received: 7 February 2019 / Accepted: 4 February 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract Instance and variable selection involve identifying a subset of instances and variables such that the learning process will use only this subset with better performances and lower cost. Due to the huge amount of data available in many fields, data reduction is considered as an NP-hard problem. In this paper, we present a simultaneous instance and variable selection approach based on the Random Forest-RI ensemble methods in the aim to discard noisy and useless information from the original data set. We proposed a selection principle based on two concepts: the ensemble margin and the importance variable measure of Random Forest-RI. Experiments were conducted on cytological images for the automatic segmentation and recognition of white blood cells WBC (nucleus and cytoplasm). Moreover, in order to explore the performance of our proposed approach, experiments were carried out on standardized datasets from UCI and ASU repository, and the obtained results of the instances and variable selection by the Random Forest classifier are very encouraging. Keywords  Instance and variable selection · Random Forest · Data reduction · Small target detection · Automatic segmentation · Pixel-based classification · White blood cells

1 Introduction Nowadays, the huge amount of data available in many fields makes the search of an optimal subset from a large-size dataset an NP-hard problem. The data reduction process aims to clean the original dataset by removing redundant, missing and useless instances and/or features. The classifier build using this dataset should be as good or nearly good as the one built from the whole dataset. In the context of medical image segmentation, the aim is to build an algorithm that takes an image as its input and results out the segmentation of the region of interest (ROI). The small target detection problems held the attention of many researchers [19, 27, 31, 32, 61]. Generally, image segmentation was applied by several techniques as thresholding, edge-based segmentation, region-based segmentation or segmentation based on pixel-based classification. However, the segmentation based on pixel-based classification * Nesma Settouti nesma.settouti@univ‑tlemcen.dz 1



Biomedical Engineering Laboratory GBM, University of Tlemcen, Tlemcen, Algeria

is time-consuming due to the high number of instances and variables (features) which represent each pixel characteristics. It is quite clear that we do not need all the variables to classify all pixels in an image. Specifically, certain relevant features can be conveniently summarized by looking at the relative positioning color or texture of various ROI. However, in image classification, many other potentia