Mining Outliers from Medical Datasets Using Neighbourhood Rough Set and Data Classification with Neural Network
In this paper, a neighbourhood rough set is modified and applied as a data pre-processing method to select samples from a data set before training with a radial basis function neural network (RBFN). Data samples that are not selected for training is consi
- PDF / 160,160 Bytes
- 10 Pages / 439.37 x 666.142 pts Page_size
- 56 Downloads / 196 Views
Abstract In this paper, a neighbourhood rough set is modified and applied as a data pre-processing method to select samples from a data set before training with a radial basis function neural network (RBFN). Data samples that are not selected for training is considered as outliers. Four medical datasets from a famous repository were used and results were compared in terms of number of training samples and accuracy between the proposed model and RBFN. The results are encouraging where classification accuracy of the proposed model is improved after outlier removal. Results are compared with other classification models as well using a medical dataset. The proposed model is competitive to give high classification accuracy. Keywords Neural network
⋅
Rough set
⋅
Outlier
⋅
Medical data
1 Background of the Study Since many decades ago, a lot of machine-learning and data-mining technologies have been introduced to support the decision of diagnostic and prognostic tasks. However, there are many open questions pending to answer by delivering more effective solutions, for instance, about the prediction capability of machine learning and its power in knowledge discovery for decision support in the medical domain [1–3]. One of the challenging tasks in mining medical data is to find outliers [2]. Outliers may carry information that is deviated from the norm of the data that could lead to a doubt whether they are generated by a different mechanism [4]. Outliers P.Y. Goh (✉) ⋅ S.C. Tan ⋅ W.P. Cheah Multimedia University, Jln. Ayer Keroh Lama, 75450 Melaka, Malaysia e-mail: [email protected] S.C. Tan e-mail: [email protected] W.P. Cheah e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2017 A. Bhatti et al. (eds.), Emerging Trends in Neuro Engineering and Neural Computation, Series in BioEngineering, DOI 10.1007/978-981-10-3957-7_12
219
220
P.Y. Goh et al.
could affect the prediction capability of a data-mining model. As such, pre-processing medical data using an outlier detection method may be helpful to reduce the negative impacts of outliers on the model’s prediction capability. Rough set theory (RST) is efficient for finding hidden patterns in data [5, 6]. It has been actively applied in various application domains, to name a few, such as medical, finance and image processing [5]. These hidden patterns could be outliers. The application of RST for outlier detection can be seen in [6–9]. A limitation of RST is it is designed to deal with categorical data only. A generalized model, i.e. neighbourhood rough set (NRS), was proposed by Hu et al. [10] who extended RST to process numerical data. In [10], NRS was proposed to identify a subset of features effective for data classification. In this paper, NRS is modified to determine outliers in a data set. It is assumed that by pre-processing input samples with the proposed outlier detection method, the classification performance of a predictive model such as a neural network could be better and more reliable. Neural network is well known and popular as one of the classificatio
Data Loading...