Effect of machine learning re-sampling techniques for imbalanced datasets in 18 F-FDG PET-based radiomics model on progn
- PDF / 825,587 Bytes
- 10 Pages / 595.276 x 790.866 pts Page_size
- 91 Downloads / 199 Views
ORIGINAL ARTICLE
Effect of machine learning re-sampling techniques for imbalanced datasets in 18F-FDG PET-based radiomics model on prognostication performance in cohorts of head and neck cancer patients Chenyi Xie 1 & Richard Du 1 Varut Vardhanabhuti 1
&
Joshua WK Ho 2
&
Herbert H Pang 3
&
Keith WH Chiu 1
&
Elaine YP Lee 1
&
Received: 11 November 2019 / Accepted: 3 March 2020 # Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract Purpose Biomedical data frequently contain imbalance characteristics which make achieving good predictive performance with data-driven machine learning approaches a challenging task. In this study, we investigated the impact of re-sampling techniques for imbalanced datasets in PET radiomics-based prognostication model in head and neck (HNC) cancer patients. Methods Radiomics analysis was performed in two cohorts of patients, including 166 patients newly diagnosed with nasopharyngeal carcinoma (NPC) in our centre and 182 HNC patients from open database. Conventional PET parameters and robust radiomics features were extracted for correlation analysis of the overall survival (OS) and disease progression-free survival (DFS). We investigated a cross-combination of 10 re-sampling methods (oversampling, undersampling, and hybrid sampling) with 4 machine learning classifiers for survival prediction. Diagnostic performance was assessed in hold-out test sets. Statistical differences were analysed using Monte Carlo cross-validations by post hoc Nemenyi analysis. Results Oversampling techniques like ADASYN and SMOTE could improve prediction performance in terms of G-mean and Fmeasures in minority class, without significant loss of F-measures in majority class. We identified optimal PET radiomics-based prediction model of OS (AUC of 0.82, G-mean of 0.77) for our NPC cohort. Similar findings that oversampling techniques improved the prediction performance were seen when this was tested on an external dataset indicating generalisability. Conclusion Our study showed a significant positive impact on the prediction performance in imbalanced datasets by applying resampling techniques. We have created an open-source solution for automated calculations and comparisons of multiple resampling techniques and machine learning classifiers for easy replication in future studies. Keywords
18
F-FDG PET . Radiomics . Re-sampling techniques . Imbalanced datasets . Head and neck cancer
Introduction This article is part of the Topical Collection on Oncology - Head and Neck Electronic supplementary material The online version of this article (https://doi.org/10.1007/s00259-020-04756-4) contains supplementary material, which is available to authorized users. * Varut Vardhanabhuti [email protected] 1
Department of Diagnostic Radiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Queen Mary Hospital, Hong Kong SAR, China
2
School of Biomedical Science, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
3
School of Public Health, Li Ka Shing Faculty
Data Loading...