FEDAMON: A Forecast-Based, Error-Bounded and Data-Aware Approach to Continuous Distributed Monitoring
Paper i proceeding, 2025

Efficiently monitoring distributed systems is critical for applications
such as data center load balancing, fleet management, and smart
grid energy optimization. Traditional continuous monitoring solu-
tions often require significant communication overhead, straining
network resources. This paper addresses the continuous distributed
monitoring problem, where a central coordinator needs to track
statistics from numerous distributed nodes in real-time. We propose
a novel forecast-based, error-bounded, and data-aware approach
that significantly reduces communication costs while maintaining
accurate monitoring. Instead of transmitting all observed values
to the central coordinator, our event-based monitoring leverages
lightweight forecasting models at edge nodes. Both the coordi-
nator and distributed nodes predict the evolution of local values,
communicating only when deviations exceed a predefined error
threshold. To adapt to dynamically changing trends in data streams,
we introduce a data-aware model selection strategy that optimizes
the balance between communication frequency and monitoring
accuracy. Our solution is evaluated on diverse datasets and results
demonstrate a substantial reduction in communication overhead
with minimal impacts on accuracy, outperforming baseline monitor-
ing regarding communication complexity, e.g., sending, on average,
only 10% of baseline update events while maintaining less than
2% average error across all monitored streams. Furthermore, we
show that our standard parameter solution even surpasses the best
calibrated single models, achieving up to a 17% improvement in
communication overhead with identical guarantees on maximum
error. Optimizing the control factor in data-aware approach leads to
a 13% improvement in performance, reducing error by 1%, without
incurring additional communication costs. We believe our approach
offers a scalable and efficient solution, enabling fully automatic,
real-time monitoring with optimized performance.

network monitor- ing

distributed tracking

data-aware approaches

continuous monitoring

distributed data streams

Författare

Yixing Zhang

Nätverk och System

Romaric Duvignau

Nätverk och System

DEBS 2025 - Proceedings of the 19th ACM International Conference on Distributed and Event-based Systems

39-50
979-8-4007-1332-3 (ISBN)

19th ACM International Conference on Distributed and Event-based Systems
Gothenburg, Sweden,

READY: Rethinking Monitoring for Large Distributed Systems

Data- och informationsteknik, 2024-03-01 -- 2029-03-01.

Ämneskategorier (SSIF 2025)

Datavetenskap (datalogi)

DOI

10.1145/3701717.3730544

Mer information

Senast uppdaterat

2025-06-10