Comparison of Jackknife and Hybrid-Boost Model Averaging to Predict Surgery Durations: A Case Study

  • PDF / 1,884,129 Bytes
  • 17 Pages / 595.276 x 790.866 pts Page_size
  • 75 Downloads / 168 Views

DOWNLOAD

REPORT


ORIGINAL RESEARCH

Comparison of Jackknife and Hybrid‑Boost Model Averaging to Predict Surgery Durations: A Case Study K. W. Soh1   · C. Walker1 · M. O’Sullivan1 · J. Wallace2 Received: 12 June 2020 / Accepted: 18 September 2020 © Springer Nature Singapore Pte Ltd 2020

Abstract We implement jackknife model averaging (JMA) and a new prediction technique—hybrid-boost model averaging (HbMA)— to a surgical dataset that includes categorical explanatory variables. The model requirements for HbMA are different to that for JMA. HbMA generally does not require decent models to be included in the model average. However, the utility of HbMA is limited by the possibility of multiple solutions for the HbMA weights. Both model averaging approaches are comparable under the appropriate conditions. Among all the model averages considered, the best jackknife model average gives slightly better predictions of the surgery durations than the best hybrid-boost model average when evaluated on our surgical dataset. Finally, we discuss several methods that may further improve the performance of HbMA. Keywords  Model averaging · Jackknife · Hybrid model · Linear regression

Introduction One of the many considerations for scheduling an elective surgery to an operating theatre session is its predicted duration. Accurate predictions facilitate the planning of surgical lists without under- or over-booking and the resultant consequences from under-utilised theatre resources or theatre overruns [1]. Consequences of underutilisation are lower earnings and reduced patient throughput, while the consequences of overruns are either expensive overtime for surgical teams or surgery cancellations that also result in reduced patient throughput. In the context of predicting surgery durations, numerous statistical and machine learning tools, such as Bayesian methods [2], neural networks [3], random forests [4] and regression techniques [3–6], have been implemented. These tools are based on model selection which finds the best model for the dataset based on a particular criterion. Model selection assumes that the chosen prediction model correctly identifies the relationship that generates the

* K. W. Soh [email protected]; [email protected] 1



Department of Engineering Science, University of Auckland, Auckland, New Zealand



North Shore Hospital, Auckland, New Zealand

2

dataset. This assumption may not hold, and we often have a set of candidate models that fit the data almost equally well. Therefore, an alternative approach to model selection is to consider model averaging. The key distinction between model selection and model averaging may be illustrated with the following supervised learning example on surgery duration prediction. Consider a set of training surgeries of which paired input–output data are used to separately construct several models, say three. The construction of each model to determine model parameters requires in-sample prediction errors to be minimised. In model selection, the performance of each model is evaluated by t