A nonparametric copula-based decision tree for two random variables using MIC as a classification index

  • PDF / 2,422,910 Bytes
  • 16 Pages / 595.276 x 790.866 pts Page_size
  • 105 Downloads / 237 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

FOCUS

A nonparametric copula-based decision tree for two random variables using MIC as a classification index Y. A. Khan1,3 • Q. S. Shan1 • Q. Liu1 • S. Z. Abbas2,3

 Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract The copula is well-known for learning scale-free measures of dependence among variables and has invited much interest in recent years. At the very coronary heart of the copula, the concept is the well-known theorem of Sklar. It states that any multivariate distribution function can be disintegrated into the marginal distributions and a copula, which comprises the reliance between variables. On the other hand, the decision tree is a renowned nonparametric dominant modeling approach used for both regression and labeling problems. A decision tree represents a tree-structured classification of the data into surprising instructions for simplicity and prediction reason. In this paper, we are going to appraise with novel nonparametric copula-based decision tree organization using a measure of dependence: maximal information coefficient as classification index for two related variables which best classify the data concerning looking at the factors, but additionally ranked the factors in line with their inferences. Additionally, we pre-test the splitting criteria value to anticipate growing branches of the decision tree at each infant node. For example, we followed our proposed method to credit card records for Taiwan and coronary heart disease records of Pakistan and acquired the desirable outcomes. As a result, the anticipated method of initiating two-variable decision trees is tested using constructive tools for classification, prediction and reconnecting critical factors in statistics, finance, fitness sciences, machine learning, and many other associated fields. Keywords Nonparametric  Copula  Dependence  Decision tree  Maximal information coefficient  Classification

1 Introduction

Communicated by Kannan. & Q. S. Shan [email protected] & S. Z. Abbas [email protected] Y. A. Khan [email protected] Q. Liu [email protected] 1

School of Statistics, Jiangxi University of Finance and Economics, Nanchang 330013, Jiangxi, People’s Republic of China

2

School of Mathematics and Statistics, Beijing Institute of Technology, Beijing 100081, China

3

Department of Mathematics and Statistics, Hazara University, Mansehra, Pakistan

Fact mining is the insertion of implicit, previously unknown and rotationally useful information from records. Additionally, it is the abstraction of large databases into favorable statistics or facts, and the records are called capability. Facts mining is continually injected in techniques for pinpointing and recitation of structural patterns in data as a device for helping those facts and make a prediction. Fact mining consists of five major principles: 1. Quotation, remodel and load transaction records onto the statistics warehouse gadget 2. Keep and control the evidence in multidimensional database devi