Coalition Feature Interpretation and Attribution in Algorithmic Trading Models
- PDF / 1,998,558 Bytes
- 18 Pages / 439.37 x 666.142 pts Page_size
- 94 Downloads / 212 Views
Coalition Feature Interpretation and Attribution in Algorithmic Trading Models James V. Hansen1 Accepted: 20 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract The ability to correctly interpret a prediction model’s output is critically important in many problem spheres. Accurate interpretation generates user trust in the model, provides insight into how a model may be improved, and supports understanding of the process being modeled. Absence of this capability has constrained algorithmic trading from making use of more powerful predictive models, such as XGBoost and Random Forests. Recently, the adaptation of coalitional game theory has led to the development of consistent methods of determining feature importance for these models (SHAP).This study designs and tests a novel method of integrating the capabilities of SHAP into predictive models for algorithmic trading. Keywords SHAP · Feature importance · Algorithmic trading · Back-testing · Portfolio optimization
1 Introduction Shapely values have been applied in market trading research for some time. One of the foremost purposes has been the analysis of portfolio risk (Mussard and Terraza 2008). More recently, as data sets have grown in size and complexity, advances in machine learning (ML) technology have unlocked new and promising possibilities. Rida (2019) applies ML to credit scoring, with encouraging results. The study’s XGBoost ML ensemble attained better results than by methods conventionally used by commercial banks. Central to this improvement was the integration of SHAP values to determine feature importance, which enabled users to interpret and improve the resulting model. Other benefits included promoting trust in the model, hypotheses of causal relationships, and assessing when legal requirements were being satisfied.
* James V. Hansen [email protected] 1
Marriott School, Brigham Young University, Provo, UT 84602, USA
13
Vol.:(0123456789)
J. V. Hansen
This is noteworthy, as limitations on interpretability have been a constraining factor in the use of powerful methods such as deep learning and ensemble models in banking, insurance, healthcare, and other industries (Lundberg and Erion 2018). The issue has been underscored by the growing availability of big data, which has boosted the potential for complex models (Ribeiro et al. 2016). This has incentivized an accelerated interest in developing robust methods for interpreting machine learning models and measuring feature importance (Bach 2015; Hall and Gill 2018; Jansen 2018; Koshiyama et al. 2020; Lipovetsky and Conklin 2001; Shrikumar et al. 2017; Shrikumar 2016). Yet, it is shown in (Lundberg and Lee 2017) that all these methods can be inconsistent. That is to say, the features that are actually most important may not always be given the highest feature importance score. In fact, a model can change such that it relies more on a given feature, yet the importance estimate assigned to that feature decreases. This casts doubt on any comparis
Data Loading...