Data-driven models for predicting microbial water quality in the drinking water source using E. coli monitoring and hydrometeorological data
Artikel i vetenskaplig tidskrift, 2022

Rapid changes in microbial water quality in surface waters pose challenges for production of safe drinking water. If not treated to an acceptable level, microbial pathogens present in the drinking water can result in severe consequences for public health. The aim of this paper was to evaluate the suitability of data-driven models of different complexity for predicting the concentrations of E. coli in the river Göta älv at the water intake of the drinking water treatment plant in Gothenburg, Sweden. The objectives were to (i) assess how the complexity of the model affects the model performance; and (ii) identify relevant factors and assess their effect as predictors of E. coli levels. To forecast E. coli levels one day ahead, the data on laboratory measurements of E. coli and total coliforms, Colifast measurements of E. coli, water temperature, turbidity, precipitation, and water flow were used. The baseline approaches included Exponential Smoothing and ARIMA (Autoregressive Integrated Moving Average), which are commonly used univariate methods, and a naive baseline that used the previous observed value as its next prediction. Also, models common in the machine learning domain were included: LASSO (Least Absolute Shrinkage and Selection Operator) Regression and Random Forest, and a tool for optimising machine learning pipelines – TPOT (Tree-based Pipeline Optimization Tool). Also, a multivariate autoregressive model VAR (Vector Autoregression) was included. The models that included multiple predictors performed better than univariate models. Random Forest and TPOT resulted in higher performance but showed a tendency of overfitting. Water temperature, microbial concentrations upstream and at the water intake, and precipitation upstream were shown to be important predictors. Data-driven modelling enables water producers to interpret the measurements in the context of what concentrations can be expected based on the recent historic data, and thus identify unexplained deviations warranting further investigation of their origin.

Microbial water quality

Drinking water

E. coli

Machine learning

Artificial intelligence


Ekaterina Sokolova

Chalmers, Arkitektur och samhällsbyggnadsteknik, Vatten Miljö Teknik

Oscar Ivarsson

Chalmers, Data- och informationsteknik, CSE Verksamhetsstöd

Ann Lillieström

Chalmers, Data- och informationsteknik, CSE Verksamhetsstöd

Nora Speicher

Chalmers, Data- och informationsteknik, CSE Verksamhetsstöd

Henrik Rydberg

Göteborgs Stad

Mia Bondelind

Chalmers, Arkitektur och samhällsbyggnadsteknik, Vatten Miljö Teknik

Science of the Total Environment

0048-9697 (ISSN) 1879-1026 (eISSN)

Vol. 802 149798




Oceanografi, hydrologi, vattenresurser





Mer information

Senast uppdaterat