FEDAMON: A Forecast-Based, Error-Bounded and Data-Aware Approach to Continuous Distributed Monitoring
Paper in proceeding, 2025

Efficiently monitoring distributed systems is critical for applications
such as data center load balancing, fleet management, and smart
grid energy optimization. Traditional continuous monitoring solu-
tions often require significant communication overhead, straining
network resources. This paper addresses the continuous distributed
monitoring problem, where a central coordinator needs to track
statistics from numerous distributed nodes in real-time. We propose
a novel forecast-based, error-bounded, and data-aware approach
that significantly reduces communication costs while maintaining
accurate monitoring. Instead of transmitting all observed values
to the central coordinator, our event-based monitoring leverages
lightweight forecasting models at edge nodes. Both the coordi-
nator and distributed nodes predict the evolution of local values,
communicating only when deviations exceed a predefined error
threshold. To adapt to dynamically changing trends in data streams,
we introduce a data-aware model selection strategy that optimizes
the balance between communication frequency and monitoring
accuracy. Our solution is evaluated on diverse datasets and results
demonstrate a substantial reduction in communication overhead
with minimal impacts on accuracy, outperforming baseline monitor-
ing regarding communication complexity, e.g., sending, on average,
only 10% of baseline update events while maintaining less than
2% average error across all monitored streams. Furthermore, we
show that our standard parameter solution even surpasses the best
calibrated single models, achieving up to a 17% improvement in
communication overhead with identical guarantees on maximum
error. Optimizing the control factor in data-aware approach leads to
a 13% improvement in performance, reducing error by 1%, without
incurring additional communication costs. We believe our approach
offers a scalable and efficient solution, enabling fully automatic,
real-time monitoring with optimized performance.

network monitor- ing

distributed tracking

data-aware approaches

continuous monitoring

distributed data streams

Author

Yixing Zhang

Network and Systems

Romaric Duvignau

Network and Systems

DEBS 2025 - Proceedings of the 19th ACM International Conference on Distributed and Event-based Systems

39-50
979-8-4007-1332-3 (ISBN)

19th ACM International Conference on Distributed and Event-based Systems
Gothenburg, Sweden,

READY: Rethinking Monitoring for Large Distributed Systems

Computer Science and Engineering (Chalmers), 2024-03-01 -- 2029-03-01.

Subject Categories (SSIF 2025)

Computer Sciences

DOI

10.1145/3701717.3730544

More information

Latest update

6/10/2025