FEDAMON: A fully automated framework for communication-efficient continuous distributed monitoring with error guarantees
Journal article, 2026

Efficient monitoring of large distributed systems is critical for applications such as data center load balancing, fleet management, and smart grid energy optimization. This paper addresses the continuous distributed monitoring problem, where a central coordinator tracks statistics from numerous distributed nodes in real time. We present FEDAMON, a novel Forecast-based, Error-bounded, and Data-Aware approach to continuous distributed Monitoring that significantly reduces communication costs while maintaining accuracy. Instead of transmitting all values to the coordinator, our event-based monitoring leverages lightweight forecasting models at the coordinator and distributed nodes to predict the evolution of observations, communicating when deviations exceed an error threshold. To adapt to dynamically changing data streams, we introduce a data-aware model selection strategy that optimizes the trade-off between communication and accuracy. Our solution is communication-efficient, fully automated, and equipped with dynamic error control, reducing system parametrization to a single error tolerance of three preset levels. FEDAMON reduces communication to 10% of the baseline with less than 2% error across all streams on diverse datasets on average. Moreover, the standard parameter solution surpasses even the best calibrated single models across all error bounds, achieving up to 33% improvement in communication efficiency with identical error guarantees. Further gains of 25% in accuracy is obtained by tuning the data-aware control factor without extra cost. In addition, our framework generalizes effectively to previously unseen datasets. Finally, our dynamic error control achieves comparable performance to fixed bounds. Results highlight the scalability and robustness of FEDAMON, enabling fully automatic, real-time monitoring with large communication savings and marginal error.

Distributed tracking

Continuous monitoring

Distributed data streams

Data-aware approaches

Network monitoring

Author

Yixing Zhang

Chalmers, Computer Science and Engineering (Chalmers), Computer and Network Systems

University of Gothenburg

Romaric Duvignau

Chalmers, Computer Science and Engineering (Chalmers), Computer and Network Systems

University of Gothenburg

Information Systems

0306-4379 (ISSN)

Vol. 141 102751

READY: Rethinking Monitoring for Large Distributed Systems

Computer Science and Engineering (Chalmers), 2024-03-01 -- 2029-03-01.

Areas of Advance

Information and Communication Technology

Subject Categories (SSIF 2025)

Computer Sciences

Networked, Parallel and Distributed Computing

Infrastructure

C3SE (-2020, Chalmers Centre for Computational Science and Engineering)

DOI

10.1016/j.is.2026.102751

Related datasets

Source code [dataset]

URI: https://github.com/yixingzhang11/fedamon/tree/v1.0 DOI: https://doi.org/10.5281/zenodo.15310950

More information

Latest update

6/11/2026