Hybrid adaptive index model for binary response data

  • PDF / 1,175,880 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 41 Downloads / 231 Views

DOWNLOAD

REPORT


Hybrid adaptive index model for binary response data Ke Wan1 · Kensuke Tanioka3 · Hiroyuki Minami1 · Masahiro Mizuta1   · Toshio Shimokawa2 Received: 2 May 2020 / Accepted: 15 October 2020 © Japanese Federation of Statistical Science Associations 2020

Abstract We often meet the case in data analysis that the explanatory variables can be occasionally divided into two groups. One group comprises the variables that researchers consider controllable, and the other group comprises those they do not. We call them controllable and uncontrollable variables, respectively. In the study, we deal with binary response data and aim to estimate the relationship between the binary response and controllable variables. Logistic regression model is typically used in binary response data. In addition to that, AIM (Adaptive Index Model; (Tian and Tibshirani Biostatics 12:68–86, 2010)) can also be used in binary response data. Contrast with logistic regression model, AIM can explain the result easier using binary rules but the prediction accuracy of AIM is shown worse than that of logistic regression model. Considering the interpretability and accuracy, it is better to apply AIM to controllable variables and adjust the effect of uncontrollable variables using logistic regression model. Therefore, we propose the method combining AIM and logistic regression model, called hybrid adaptive index model (HAIM), to give best solution. Keywords  Production rule · Logistic regression model · Controllable explanatory variables · Uncontrollable explanatory variables

1 Introduction In this paper, we focus on the situation that explanatory variables in binary response data can be divided into controllable variables and uncontrollable variables. Controllable variables are the variables which can be changed or controlled, such as salt intake and alcohol intake. Uncontrollable variables are the variables which are not controllable in general, such as sex and age. For example, in medical field, the best way to prevent cerebral infraction is to manage each risk * Masahiro Mizuta [email protected] Extended author information available on the last page of the article

13

Vol.:(0123456789)



Japanese Journal of Statistics and Data Science

factor and keep in a good condition. However, not all the risk factors are controllable, e.g. age is a known risk factor but is uncontrollable. The management of the controllable variables such as salt intake and alcohol intake seems to be much more important. Therefore, it is significant to derive the interpretable attributes of subjects based on controllable variables and detect the relationship between the response and these attributes of subjects. To detect the relationship between the attributes of subjects and response, the classification methods or subgroup identification methods can be used. For example, the tree-based methods such as CART (Classification and Regression Trees; Breiman et  al. 1984), IT (Interaction Trees; Su et  al. 2009, 2011) and SIDES (Subgroup Identification Based on Difference Effe