Modal linear regression using log-concave distributions

  • PDF / 1,480,226 Bytes
  • 16 Pages / 439.37 x 666.142 pts Page_size
  • 86 Downloads / 249 Views

DOWNLOAD

REPORT


Online ISSN 2005-2863 Print ISSN 1226-3192

RESEARCH ARTICLE

Modal linear regression using log‑concave distributions Sunyul Kim1 · Byungtae Seo1  Received: 31 March 2020 / Accepted: 23 September 2020 © Korean Statistical Society 2020

Abstract The modal linear regression suggested by Yao and Li (Scand J Stat 41(3):656–671, 2014) models the conditional mode of a response Y given a vector of covariates 𝐳 as a linear function of 𝐳 . To identify the conditional mode of Y given 𝐳 , existing methods utilize a kernel density estimator to obtain the distribution of Y given 𝐳 . Like other kernel-based methods, these require a suitable choice of tuning parameters, and no unified objective function exists for estimating regression parameters. In this paper, we propose a model-based modal linear regression using a family of log-concave distributions. The proposed method does not require tuning parameters and enables us to construct an explicit likelihood function. To estimate the regression parameters with an estimated log-concave density, we turned the log-likelihood into a sum of affine functions using a dual representation of piecewise linear concave functions so that well-known linear programming techniques can be adopted. Simulation studies reveal that the proposed method produces more efficient estimators than kernel-based methods. A real data example is also presented to illustrate the applicability of the proposed method. Keywords  Log-concave distribution · Modal regression · Maximum likelihood · Linear programming · Mode-constrained log-concave distribution

1 Introduction Typical linear regression models assume that the mean response of Y is a linear function of a vector of covariate 𝐳 , i.e. E(Y|𝐳) = 𝐳⊤ 𝜷  , where 𝜷 is an unknown regression coefficient vector. Because the mean itself is the most popular and intuitive summarizing quantity for given random variable Y or data, modeling E(Y|𝐳) seems to be the most natural and desirable in many applications. * Byungtae Seo [email protected] Sunyul Kim [email protected] 1



Sungkyunkwan University, Seoul, Korea

13

Vol.:(0123456789)



Journal of the Korean Statistical Society

Alternatively, one can also consider a linear regression model using the median, that is, Median(Y|𝐳) = 𝐳⊤ 𝜷  . This median regression can also be viewed as a special form of the quantile regression. It is well known that the median regression is generally less efficient than the mean regression but is robust to outliers. These regression models enable us to explore some representative information from the data by assuming a certain model for the center of data, and the mean or median regression has been successfully used for this purpose. However, when data are not symmetrically distributed, both the mean and median cannot provide representative information for the data. In this case, the mode would be an important alternative to the mean or median. Based on this point, Yao and Li (2014) proposed a modal linear regression model assuming Mode(Y|𝐳) = 𝐳⊤ 𝜷 to capture the conditional mode in regre