Data-driven models for predicting microbial water quality in the drinking water source using E. coli monitoring and hydrometeorological data
Journal article, 2022

Rapid changes in microbial water quality in surface waters pose challenges for production of safe drinking water. If not treated to an acceptable level, microbial pathogens present in the drinking water can result in severe consequences for public health. The aim of this paper was to evaluate the suitability of data-driven models of different complexity for predicting the concentrations of E. coli in the river Göta älv at the water intake of the drinking water treatment plant in Gothenburg, Sweden. The objectives were to (i) assess how the complexity of the model affects the model performance; and (ii) identify relevant factors and assess their effect as predictors of E. coli levels. To forecast E. coli levels one day ahead, the data on laboratory measurements of E. coli and total coliforms, Colifast measurements of E. coli, water temperature, turbidity, precipitation, and water flow were used. The baseline approaches included Exponential Smoothing and ARIMA (Autoregressive Integrated Moving Average), which are commonly used univariate methods, and a naive baseline that used the previous observed value as its next prediction. Also, models common in the machine learning domain were included: LASSO (Least Absolute Shrinkage and Selection Operator) Regression and Random Forest, and a tool for optimising machine learning pipelines – TPOT (Tree-based Pipeline Optimization Tool). Also, a multivariate autoregressive model VAR (Vector Autoregression) was included. The models that included multiple predictors performed better than univariate models. Random Forest and TPOT resulted in higher performance but showed a tendency of overfitting. Water temperature, microbial concentrations upstream and at the water intake, and precipitation upstream were shown to be important predictors. Data-driven modelling enables water producers to interpret the measurements in the context of what concentrations can be expected based on the recent historic data, and thus identify unexplained deviations warranting further investigation of their origin.

Microbial water quality

Drinking water

E. coli

Machine learning

Artificial intelligence

Author

Ekaterina Sokolova

Chalmers, Architecture and Civil Engineering, Water Environment Technology

Oscar Ivarsson

Chalmers, Computer Science and Engineering (Chalmers), CSE Verksamhetsstöd, Data Science Research Engineers

Ann Lillieström

Chalmers, Computer Science and Engineering (Chalmers), CSE Verksamhetsstöd, Data Science Research Engineers

Nora Speicher

Chalmers, Computer Science and Engineering (Chalmers), CSE Verksamhetsstöd, Data Science Research Engineers

Henrik Rydberg

City of Gothenburg

Mia Bondelind

Chalmers, Architecture and Civil Engineering, Water Environment Technology

Science of the Total Environment

0048-9697 (ISSN)

Vol. 802 149798

Subject Categories

Water Engineering

Water Treatment

Oceanography, Hydrology, Water Resources

DOI

10.1016/j.scitotenv.2021.149798

PubMed

34454142

More information

Latest update

9/14/2021