Selective Cascade of Residual ExtraTrees

PDF / 3,727,735 Bytes
11 Pages / 595.276 x 790.866 pts Page_size
2 Downloads / 228 Views

ORIGINAL RESEARCH

Selective Cascade of Residual ExtraTrees Qimin Liu1 · Fang Liu2 Received: 18 May 2020 / Accepted: 30 September 2020 © Springer Nature Singapore Pte Ltd 2020

Abstract We propose a novel tree-based ensemble method named Selective Cascade of Residual ExtraTrees (SCORE). SCORE draws inspiration from representation learning, incorporates regularized regression with variable selection features, and utilizes boosting to improve prediction and reduce generalization errors. We also develop a variable importance measure to increase the explainability of SCORE. Our computer experiments show that SCORE provides comparable or superior performance in prediction against ExtraTrees, random forest, gradient boosting machine, and neural networks; and the proposed variable importance measure for SCORE is comparable to studied benchmark methods. Finally, the predictive performance of SCORE remains stable across hyper-parameter values, suggesting potential robustness to hyper-parameter specification. Keywords Extremely randomized trees · Boosting · Ensemble learning · Regularized regression · Variable importance measure · Explainability

Introduction Ensemble learning methods combine multiple learning algorithms to achieve better learning performance than that from individual learning algorithms. The success of ensemble methods, in part, can be attributed to the diversity among the constituent members, which helps mitigate over-fitting and reduce the generalization error [1]. We focus on tree-based ensemble methods for regression. Tree-based ensemble methods construct more than one decision tree and, by appreciating the diversity among the trees, increase the generalizability of the ensemble. Such diversity often comes from perturbation in the optimization process of individual trees. For example, tree bagging generates bootstrap samples, from which decision trees are obtained and averaged [2]. Random subspace selects a pseudo-random subset of features when building trees [3]. Random forest (RF) selects a random subset of the features for each * Qimin Liu [email protected] Fang Liu [email protected] 1

Department of Psychology and Human Development, Vanderbilt University, Nashville, TN, USA

Department of Applied and Computational Mathematics & Statistics, University of Notre Dame, Notre Dame, IN, USA

2

candidate split, and trains the trees with bootstrap samples [4]. The computational costs increase drastically as the number of trees increase in tree-based ensemble approaches given both the optimization and the perturbation processes. One way to reduce the computational burden is to replace optimization with randomization processes. C4.5, for example, randomly chooses from the best 20 splits in constructing individual trees [1]; extremely randomized trees (ExtraTrees) randomly selects cut-point for each candidate split variable [5]; and [6] proposes “randomly” choosing a non-tested feature at each level of the tree without using any training data. However, the extreme randomness and lack of optimizatio

Data Loading...

Selective Cascade of Residual ExtraTrees

Recommend Documents

Differential response of microbial diversity and abundance to hydrological residual time and age in cascade reservoirs

Collision Cascade Densification of Materials

Stress Cascade

Stress Cascade

Classifier Cascade

Understanding processing parameters affecting residual stress in selective laser melting of Inconel 718 through numerica

Microstructure, Mechanical Properties and Residual Stress of Selective Laser Melted AlSi10Mg

Thermal and Residual Stress Modelling of the Selective Laser Sintering Process

Structural Studies of MAP Kinase Cascade Components

Good Cascade Impactor Practices

A Deadly Cascade

Mid-IR Interband Cascade Lasers