An Empirical Analysis of Unsupervised Learning Approach on Medical Databases

The early prediction of disease from diverse clinical features is among the critical job for health care practitioners. The clinical database are generally integrated from various sources such as electronic health records, administrative health records, a

  • PDF / 283,246 Bytes
  • 8 Pages / 439.37 x 666.142 pts Page_size
  • 36 Downloads / 199 Views

DOWNLOAD

REPORT


Abstract The early prediction of disease from diverse clinical features is among the critical job for health care practitioners. The clinical database are generally integrated from various sources such as electronic health records, administrative health records, and monitoring facilities, including CT scan or ultra sonic images. Thus it employs numerous efforts by clinical and data mining specialists to discover knowledge from large and complex clinical databases for future medical diagnosis. The discovery of patterns should be an automated process as data is voluminous and complex in nature. To discover hidden and novel information from such databases a proficient methodological technique must be involved. In this article we have laid emphasis on diabetes mellitus II dataset to discover clusters of variant shape and size. In current approach we have initially preprocessed datasets to reduce missing, noise and inconsistent values from database. Further preprocessed data is clustered using Density Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm; however the key parameters are controlled in DBSCAN clustering algorithm to discuss the comparative results for efficient discovery of clusters with variant shapes and size. The study relatively determines the clusters with variants shapes and size from diabetes mellitus II datasets for future medical diagnosis of disease.



Keywords Data mining Density based spatial clustering of applications with noise (DBSCAN) Preprocessing Diabetes mellitus II Clustering







R. Chauhan (✉) Amity University, Noida, India e-mail: [email protected] H. Kaur (✉) Hamdard University, Delhi, India e-mail: [email protected] R. Puri (✉) College of Educators, Ontario, Canada e-mail: [email protected] © Springer Science+Business Media Singapore 2017 K.R. Attele et al. (eds.), Emerging Trends in Electrical, Communications and Information Technologies, Lecture Notes in Electrical Engineering 394, DOI 10.1007/978-981-10-1540-3_7

63

64

R. Chauhan et al.

1 Introduction In India the nationwide prevalence of diabetic is as high as 9 % and in some parts it’s relatively as 20 % according to the International Diabetes Federation (IDF) and Madras Diabetes Research Foundation [1]. This disease is one of the major causes of several other diseases such as heart attack, kidney failure, eye diseases and other complications. To overcome the generalized scenario Indian government has declared significant amount of funding for diabetes disease. However number of emphasis has been laid down by India’s National Program for Prevention and Control of Diabetes, and Cardiovascular Diseases and Stroke (NPCDS) which was launched in 2008, to implement prevention techniques and plans to educate people about risk factors for diabetes. Moreover, diabetes is lifelong disease and data generated for individual cases might be too voluminous and complex to understand by human capabilities alone. To overcome flaws of complexity among medical databases several data mining techniques are i