Estimation and clustering for partially heterogeneous single index model

  • PDF / 433,157 Bytes
  • 28 Pages / 439.37 x 666.142 pts Page_size
  • 80 Downloads / 183 Views

DOWNLOAD

REPORT


Estimation and clustering for partially heterogeneous single index model Fangfang Wang1 · Lu Lin2,3

· Lei Liu4 · Kangning Wang2

Received: 13 October 2019 / Revised: 4 August 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract In this paper, our goal is to estimate the homogeneous parameter and cluster the heterogeneous parameters in a partially heterogeneous single index model (PHSIM). To achieve the goal, the minimization criterion for such a single index model is first transformed into a least-squares optimization problem in the population form. Based on the least-squares objective function, we introduce an empirical version for the PHSIM. By minimizing such an empirical version, we estimate the homogeneous parameter and the subgroup-averages of the heterogeneous index directions, and then use a fusion penalized method to identify the subgroup structure of the PHSIM. By the proposed methodologies, the homogeneous parameter and the heterogeneous index directions can be consistently estimated, and the heterogeneous parameters can be consistently clustered. Moreover, the new clustering procedure is simple and robust. Simulation studies are carried out to examine the performance of the proposed methodologies. Keywords Single index model · Homogeneity · Convex clustering · Subgroup-average · Estimation consistency

The research was supported by NNSF Projects (11971265, 11901356) of China.

B

Lu Lin [email protected]; [email protected]

1

Zhongtai Securities Institute for Financial Studies, Shandong University, Jinan, China

2

School of Statistics, Shandong Technology and Business University, Yantai, China

3

School of Statistics, Qufu Normal University, Qufu, China

4

Division of Biostatistics, Washington University in St. Louis, St. Louis, USA

123

F. Wang et al.

1 Introduction In this paper, we are interested in the following heterogeneous model: yi = xiT β + gi (zi ) + εi , i = 1, · · · , n,

(1.1)

where yi , i = 1, · · · , n are independent observations of the response variable Y ∈ R; xi and zi , i = 1, · · · , n, are independent and identically distributed (i.i.d.) observations of covariates X ∈ R p and Z ∈ R q respective; εi , i = 1, · · · , n, are i.i.d. random errors with E(εi |xi , zi ) = 0 and Var(εi |xi , zi ) = σ 2 ; gi (·), i = 1, · · · , n, are unknown subject-specific functions; and β = (β1 , · · · , β p )T is an unknown coefficient vector for the covariates X . Such a model has its practical significance in some areas. For example, household electricity consumption changes with the seasons. As the season changes, there may be significant changes in household electricity consumption. In winter and summer, most people may use air conditioning which has a large power consumption, while in spring and autumn, electricity consumption will drop. The two cases may have different functions about season (or air temprature). In this example, Y could denote quarterly electricity consumption, X denotes some observed covariates such as electricity price, income, living space, etc,