Proposed formulation of surface water quality and modelling using gene expression, machine learning, and regression tech

  • PDF / 2,726,639 Bytes
  • 19 Pages / 595.276 x 790.866 pts Page_size
  • 74 Downloads / 161 Views

DOWNLOAD

REPORT


RESEARCH ARTICLE

Proposed formulation of surface water quality and modelling using gene expression, machine learning, and regression techniques Muhammad Izhar Shah 1 & Muhammad Faisal Javed 1 & Taher Abunama 2 Received: 4 August 2020 / Accepted: 30 October 2020 # Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract The rising water pollution from anthropogenic factors motivates further research in developing water quality predicting models. The available models have certain limitations due to limited timespan data and the incapability to provide empirical expressions. This study is devoted to model and derive empirical equations for surface water quality of upper Indus river basin using a 30-year dataset with machine learning techniques and then to determine the most reliable model capable to accurately predict river water quality. Total dissolve solids (TDS) and electrical conductivity (EC) were used as dependent variables, whereas eight parameters were used as independent variables with 70 and 30% data for model training and testing, respectively. Various evaluation criteria, i.e., Nash-Sutcliffe efficiency (NSE), root mean square error (RMSE), coefficient of determination (R2), and mean absolute error (MAE), were used to assess the performance of models. The data is also validated with the help of k-fold cross-validation using R2 and RMSE. The results indicated a strong correlation with NSE and R2 both above 0.85 for all the developed models. Gene expression programming (GEP) outperformed both artificial neural network (ANN) and linear and non-linear regression models for TDS and EC. The sensitivity and parametric analyses revealed that bicarbonate is the most sensitive parameter influencing both TDS and EC models. Two equations were derived and formulated to represent the novel results of GEP model to help authorities in the effective monitoring of river water quality. Keywords Surface water quality . Machine learning algorithms . Regression . Sensitivity and parametric analyses . k-fold cross-validation

Introduction Surface water is a vital resource that is necessary for all aspects of life. The quality of water is affected by pollutants and its distribution with the flow (Kargar et al. 2020). Due to lack of facilities and infrastructure in developing countries, major Responsible Editor: Marcus Schulz * Muhammad Izhar Shah [email protected] Muhammad Faisal Javed [email protected] Taher Abunama [email protected] 1

Department of Civil Engineering, COMSATS University Islamabad, Abbottabad Campus, Abbottabad 22060, Pakistan

2

Institute for Water and Wastewater Technology, Durban University of Technology, PO Box 1334, Durban, South Africa

portion of the liquid waste is deposed to various surface water bodies. Moreover, the rapid industrialization and population growth adversely affect the quality of surface water bodies. The term water quality is used to define the condition of water covering its physical, chemical, and biological properties (Alizadeh et al. 2018). The quality of wat