Robust and Interpretable Machine Learning for Network Quality Prediction with Noisy and Incomplete Data
Artikel i vetenskaplig tidskrift, 2025
Accurate classification of optical communication signal quality is crucial for maintaining the reliability and performance of high-speed communication networks. While existing supervised learning approaches achieve high accuracy on laboratory-collected datasets, they often face difficulties in generalizing to real-world conditions due to the lack of variability and noise in controlled experimental data. In this study, we propose a targeted data augmentation framework designed to improve the robustness and generalization of binary optical signal quality classifiers. Using the OptiCom Signal Quality Dataset, we systematically inject controlled perturbations into the training data including label boundary flipping, Gaussian noise addition, and missing-value simulation. To further approximate real-world deployment scenarios, the test set is subjected to additional distribution shifts, including feature drift and scaling. Experiments are conducted under 5-fold cross-validation to evaluate the individual and combined impacts of augmentation strategies. Results show that the optimal augmentation setting (flip_rate = 0.10, noise_level = 0.50, missing_rate = 0.20) substantially improve robustness to unseen distributions, raising accuracy from 0.863 to 0.950, precision from 0.384 to 0.632, F1 from 0.551 to 0.771, and ROC-AUC from 0.926 to 0.999 compared to model without augmentation. Our research provides an example for balancing data augmentation intensity to optimize generalization without over-compromising accuracy on clean data.
optical networks
robust machine learning
SHAP interpretability
data augmentation
quality of transmission (QoT) estimation
label noise