Comparative evaluation of machine learning models for groundwater quality assessment
- PDF / 2,235,426 Bytes
- 23 Pages / 547.044 x 736.903 pts Page_size
- 22 Downloads / 321 Views
Comparative evaluation of machine learning models for groundwater quality assessment Shine Bedi · Ashok Samal · Chittaranjan Ray · Daniel Snow
Received: 27 February 2020 / Accepted: 20 October 2020 / Published online: 21 November 2020 © Springer Nature Switzerland AG 2020
Abstract Contamination from pesticides and nitrate in groundwater is a significant threat to water quality in general and agriculturally intensive regions in particular. Three widely used machine learning models, namely, artificial neural networks (ANN), support vector machines (SVM), and extreme gradient boosting (XGB), were evaluated for their efficacy in predicting contamination levels using sparse data with non-linear relationships. The predictive ability of the models was assessed using a dataset consisting of 303 wells across 12 Midwestern states in the USA. Multiple hydrogeologic, water quality, and land use features were chosen as the independent variables, and classes were based on measured concentration ranges of nitrate and pesticide. This study evaluates the classification performance of the models for two, three, and four class scenarios and compares them with the corresponding regression models. The study also examines the issue of class imbalance and tests the efficacy of three class imbalance mitigation techniques: S. Bedi () · A. Samal Computer Science and Engineering, University of Nebraska, Lincoln, NE, USA e-mail: [email protected] C. Ray Nebraska Water Center, University of Nebraska, Lincoln, NE, USA D. Snow Water Sciences Laboratory, University of Nebraska, Lincoln, NE, USA
oversampling, weighting, and oversampling and weighting, for all the scenarios. The models’ performance is reported using multiple metrics, both insensitive to class imbalance (accuracy) and sensitive to class imbalance (F1 score and MCC). Finally, the study assesses the importance of features using gametheoretic Shapley values to rank features consistently and offer model interpretability. Keywords Artificial neural networks (ANN) · Support vector machines (SVM) · XGBoost · Data imbalance · Feature importance · Groundwater quality
Introduction Groundwater comprises almost 30% of the world’s freshwater, with the remaining 69% found in the glaciers, ice sheets, ice caps, and the icebergs, and in rivers and lakes (DeSimone et al. 2015). Groundwater now supplies drinking water for 51% of the total US population and 99% of the rural population (Maupin et al. 2010). This trend is found in other countries as around 160 million people depend on the drinking water supplied by a single aquifer located at Huang-Huai-Hai plain in eastern China (Sampat 2000). Nitrate in groundwater can be derived from many sources, but nitrate concentrations in groundwater underlying agricultural and urban areas commonly
776
Page 2 of 23
are higher than in other areas because of contributions from sources associated with human activities (DeSimone et al. 2015). Once a contaminant is introduced to groundwater, it is transported throughout the aquifer by the groundwater f
Data Loading...