Mining Outliers from Medical Datasets Using Neighbourhood Rough Set and Data Classification with Neural Network

In this paper, a neighbourhood rough set is modified and applied as a data pre-processing method to select samples from a data set before training with a radial basis function neural network (RBFN). Data samples that are not selected for training is consi

PDF / 160,160 Bytes
10 Pages / 439.37 x 666.142 pts Page_size
56 Downloads / 215 Views

DOWNLOAD

REPORT

Abstract In this paper, a neighbourhood rough set is modiﬁed and applied as a data pre-processing method to select samples from a data set before training with a radial basis function neural network (RBFN). Data samples that are not selected for training is considered as outliers. Four medical datasets from a famous repository were used and results were compared in terms of number of training samples and accuracy between the proposed model and RBFN. The results are encouraging where classiﬁcation accuracy of the proposed model is improved after outlier removal. Results are compared with other classiﬁcation models as well using a medical dataset. The proposed model is competitive to give high classiﬁcation accuracy. Keywords Neural network

⋅

Rough set

⋅

Outlier

⋅

Medical data

1 Background of the Study Since many decades ago, a lot of machine-learning and data-mining technologies have been introduced to support the decision of diagnostic and prognostic tasks. However, there are many open questions pending to answer by delivering more effective solutions, for instance, about the prediction capability of machine learning and its power in knowledge discovery for decision support in the medical domain [1–3]. One of the challenging tasks in mining medical data is to ﬁnd outliers [2]. Outliers may carry information that is deviated from the norm of the data that could lead to a doubt whether they are generated by a different mechanism [4]. Outliers P.Y. Goh (✉) ⋅ S.C. Tan ⋅ W.P. Cheah Multimedia University, Jln. Ayer Keroh Lama, 75450 Melaka, Malaysia e-mail: [email protected] S.C. Tan e-mail: [email protected] W.P. Cheah e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2017 A. Bhatti et al. (eds.), Emerging Trends in Neuro Engineering and Neural Computation, Series in BioEngineering, DOI 10.1007/978-981-10-3957-7_12

219

220

P.Y. Goh et al.

could affect the prediction capability of a data-mining model. As such, pre-processing medical data using an outlier detection method may be helpful to reduce the negative impacts of outliers on the model’s prediction capability. Rough set theory (RST) is efﬁcient for ﬁnding hidden patterns in data [5, 6]. It has been actively applied in various application domains, to name a few, such as medical, ﬁnance and image processing [5]. These hidden patterns could be outliers. The application of RST for outlier detection can be seen in [6–9]. A limitation of RST is it is designed to deal with categorical data only. A generalized model, i.e. neighbourhood rough set (NRS), was proposed by Hu et al. [10] who extended RST to process numerical data. In [10], NRS was proposed to identify a subset of features effective for data classiﬁcation. In this paper, NRS is modiﬁed to determine outliers in a data set. It is assumed that by pre-processing input samples with the proposed outlier detection method, the classiﬁcation performance of a predictive model such as a neural network could be better and more reliable. Neural network is well known and popular as one of the classiﬁcatio

Data Loading...

Mining Outliers from Medical Datasets Using Neighbourhood Rough Set and Data Classification with Neural Network

Recommend Documents

Mammogram Image Classification Using Rough Neural Network

Decision Rule Mining in Rough Set Theory

Modified Soft Rough set for Multiclass Classification

Feature Extraction and Classification of Gestures from Myo-Electric Data Using a Neural Network Classifier

Data Mining, Rough Sets and Granular Computing

Fingerprint Alteration Classification Using Convolutional Neural Network

Rough Set Classifications and Performance Analysis in Medical Health Care

Medical Data Classification Using Jaya Optimized ELM

Dealing with Imbalanced and Weakly Labelled Data in Machine Learning using Fuzzy and Rough Set Methods

Prediction and Classification of Semi-structured Medical Data Using Artificial Neural Fuzzy Inference System

Information Retrieval Using Rough Set Approximations

Rough Mode: A Generalized Centroid Proposal for Clustering Categorical Data Using the Rough Set Theory