Data-driven modeling of hydraulic head time series: results and lessons learned from the 2022 groundwater modeling challenge
Preprint, 2024

This paper presents the results of the 2022 groundwater time series modeling challenge, where 15 teams from different institutes applied various data-driven models to simulate hydraulic head time series at four monitoring wells. Three of the wells were located in Europe and one in the USA, in different hydrogeological settings but all in temperate or continental climates. Participants were provided with approximately 15 years of measured heads at (almost) regular time intervals and daily measurements of weather data starting some 10 years prior to the first head measurements and extending around 5 years after the last head measurement. The participants were asked to simulate the measured heads (the calibration period), provide a forecast for around 5 years after the last measurement (the validation period for which weather data was provided but not head measurements), and to include an uncertainty estimate. Three different groups of models were identified among the submissions: lumped-parameter models (3 teams), machine learning models (4 teams), and deep learning models (8 teams). Lumped-parameter models apply relatively simple response functions with few parameters, while the artificial intelligence models used models of varying complexity, generally with more parameters and more input, including input engineered from the provided data (e.g., multi-day averages).

The models were evaluated on their performance to simulate the heads in the calibration period and the validation period. Different metrics were used to assess performance including metrics for average relative fit, average absolute fit, fit of extreme (high or low) heads, and the coverage of the uncertainty interval. For all wells, reasonable performance was obtained by at least one team from each of the three groups. However, the performance was not consistent across submissions within each groups, which implies that application of each method to individual sites requires significant effort and experience. Especially estimates of the uncertainty interval varied widely between teams, although some teams submitted confidence intervals rather than prediction intervals. There was not one team, let alone one method, that performed best for all wells and all performance metrics. Lumped-parameter models generally performed as well as artificial intelligence models, except for the well in the USA, where the lumped-parameter models did not use (or use to the full benefit) the provided river stage data, which was crucial for obtaining a good model. In conclusion, the challenge was a successful initiative to compare different models and learn from each other. Future challenges are needed to investigate, e.g., the performance of models in more variable climatic settings, to simulate head series with significant gaps, or to estimate the effect of drought periods.


Raoul Collenteur

Eawag - Swiss Federal Institute of Aquatic Science and Technology

Ezra Haaf

Chalmers, Arkitektur och samhällsbyggnadsteknik, Geologi och geoteknik

Mark Bakker

TU Delft

Tanja Liesch

Karlsruher Institut für Technologie (KIT)

A. Wunsch


Annan data- och informationsvetenskap



Oceanografi, hydrologi, vattenresurser


Hållbar utveckling



Mer information