Robust and Interpretable Machine Learning for Network Quality Prediction with Noisy and Incomplete Data
Artikel i vetenskaplig tidskrift, 2025

Accurate classification of optical communication signal quality is crucial for maintaining the reliability and performance of high-speed communication networks. While existing supervised learning approaches achieve high accuracy on laboratory-collected datasets, they often face difficulties in generalizing to real-world conditions due to the lack of variability and noise in controlled experimental data. In this study, we propose a targeted data augmentation framework designed to improve the robustness and generalization of binary optical signal quality classifiers. Using the OptiCom Signal Quality Dataset, we systematically inject controlled perturbations into the training data including label boundary flipping, Gaussian noise addition, and missing-value simulation. To further approximate real-world deployment scenarios, the test set is subjected to additional distribution shifts, including feature drift and scaling. Experiments are conducted under 5-fold cross-validation to evaluate the individual and combined impacts of augmentation strategies. Results show that the optimal augmentation setting (flip_rate = 0.10, noise_level = 0.50, missing_rate = 0.20) substantially improve robustness to unseen distributions, raising accuracy from 0.863 to 0.950, precision from 0.384 to 0.632, F1 from 0.551 to 0.771, and ROC-AUC from 0.926 to 0.999 compared to model without augmentation. Our research provides an example for balancing data augmentation intensity to optimize generalization without over-compromising accuracy on clean data.

optical networks

robust machine learning

SHAP interpretability

data augmentation

quality of transmission (QoT) estimation

label noise

Författare

Pei Huang

Student vid Chalmers

Yicheng Li

Aarhus Universitet

Hai Gong

Zhejiang University

Herman Koara

Zhejiang University

Photonics

23046732 (eISSN)

Vol. 12 10 965

Styrkeområden

Informations- och kommunikationsteknik

Ämneskategorier (SSIF 2025)

Datavetenskap (datalogi)

Artificiell intelligens

DOI

10.3390/photonics12100965

Mer information

Senast uppdaterat

2025-11-05