Towards Predicting Risk of Coronary Artery Disease from Semi-Structured Dataset

  • PDF / 1,079,344 Bytes
  • 10 Pages / 595.276 x 790.866 pts Page_size
  • 58 Downloads / 143 Views

DOWNLOAD

REPORT


ORIGINAL RESEARCH ARTICLE

Towards Predicting Risk of Coronary Artery Disease from Semi‑Structured Dataset Smita Roy1 · Asif Ekbal1 · Samrat Mondal1 · Maunendra Sankar Desarkar2 · Shubham Chattopadhyay3 Received: 3 August 2019 / Revised: 17 January 2020 / Accepted: 21 February 2020 © International Association of Scientists in the Interdisciplinary Areas 2020

Abstract Many kinds of disease-related data are now available and researchers are constantly attempting to mine useful information out of these. Medical data are not always homogeneous and in structured form, and mostly they are time-stamped data. Thus, special care is required to prevent any kind of information loss during mining such data. Mining medical data is challenging as predicting the non-accurate result is often not acceptable in this domain. In this paper, we have analyzed a partially annotated coronary artery disease (CAD) dataset which was originally in a semi-structured form. We have created a set of some well-defined features from the dataset, and then build predictive models for CAD risk identification using different supervised learning algorithms. We then further enhanced the performances of the models using a feature selection technique. Experiments show that results are quite interesting, and are expected to help medical practitioners for investigating CAD risk in patients. Keywords  CAD · Decision tree · Feature selection · Precision · SVM

1 Introduction Data mining plays a significant role in discovering important also, previously unknown information from a huge collection of data in all sectors. In the medical domain, millions of the complex type of data are getting generated every day on which data mining techniques can be successfully applied * Smita Roy [email protected] Asif Ekbal [email protected] Samrat Mondal [email protected] Maunendra Sankar Desarkar [email protected] Shubham Chattopadhyay [email protected] 1



Department of Computer Science and Engineering, Indian Institute of Technology Patna, Bihar, India

2



Department of Computer Science and Engineering, Indian Institute of Technology Hyderabad, Telangana, India

3

Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, West Bengal, India



to extract non-trivial information. Different stakeholders can get benefits such as the medical practitioners can use these information for treating and diagnosing patients, hospital resources can be efficiently managed by the authorities, medical insurance companies can decide their policies and detect fraudulent claims, the government can decide on proper healthcare infrastructure building which is among the major points to be mentioned. Among many diseases which are of major concern in every country, coronary artery disease (CAD) is a silent killer and quite common nowadays. Coronary artery disease (CAD) is a kind of heart disease that is very much prevalent in the United States. It can lead to heart attacks for many people. It is caused due to a