FEDAMON: A Forecast-Based, Error-Bounded and Data-Aware Approach to Continuous Distributed Monitoring
Paper in proceeding, 2025

Efficiently monitoring distributed systems is critical for applications such as data center load balancing, fleet management, and smart grid energy optimization. Traditional continuous monitoring solutions often require significant communication overhead, straining network resources. This paper addresses the continuous distributed monitoring problem, where a central coordinator needs to track statistics from numerous distributed nodes in real-time. We propose a novel forecast-based, error-bounded, and data-aware approach that significantly reduces communication costs while maintaining
accurate monitoring. Instead of transmitting all observed values to the central coordinator, our event-based monitoring leverages lightweight forecasting models at edge nodes. Both the coordinator and distributed nodes predict the evolution of local values, communicating only when deviations exceed a predefined error threshold. To adapt to dynamically changing trends in data streams, we introduce a data-aware model selection strategy that optimizes
the balance between communication frequency and monitoring accuracy. Our solution is evaluated on diverse datasets and results demonstrate a substantial reduction in communication overhead with minimal impacts on accuracy, outperforming baseline monitoring regarding communication complexity, e.g., sending, on average, only 10% of baseline update events while maintaining less than 2% average error across all monitored streams. Furthermore, we show that our standard parameter solution even surpasses the best calibrated single models, achieving up to a 17% improvement in
communication overhead with identical guarantees on maximum error. Optimizing the control factor in data-aware approach leads to a 13% improvement in performance, reducing error by 1%, without incurring additional communication costs. We believe our approach offers a scalable and efficient solution, enabling fully automatic, real-time monitoring with optimized performance.

network monitor- ing

distributed tracking

continuous monitoring

distributed data streams

data-aware approaches

Author

Yixing Zhang

Chalmers, Computer Science and Engineering (Chalmers), Computer and Network Systems

University of Gothenburg

Romaric Duvignau

Chalmers, Computer Science and Engineering (Chalmers), Computer and Network Systems

DEBS 2025 - Proceedings of the 19th ACM International Conference on Distributed and Event-based Systems

39-50
979-8-4007-1332-3 (ISBN)

19th ACM International Conference on Distributed and Event-based Systems
Gothenburg, Sweden,

READY: Rethinking Monitoring for Large Distributed Systems

Computer Science and Engineering (Chalmers), 2024-03-01 -- 2029-03-01.

Subject Categories (SSIF 2025)

Computer Sciences

DOI

10.1145/3701717.3730544

ISBN

9798400713323

More information

Latest update

8/22/2025