Human-Grounded Evaluation of Large Language Models for Optical Network Automation
Other conference contribution, 2026

Large language models (LLMs) are increasingly adopted for network automation, yet their output quality and inference cost can vary substantially across LLMs families. We present HuGLEN, a stepwise evaluation pipeline that uses an LLM-as-a-judge together with a small set of expert ratings to enable scalable and reproducible comparison of candidate LLMs, and to rank them using a quality efficiency score (QES). We demonstrate HuGLEN on translating outputs from an explainable artificial intelligence (XAI) model for optical network quality of transmission (QoT) estimation task into operator- friendly explanations. Our results show that a medium-sized LLM (12B parameters) achieves the highest QES, indicating the best trade-off between explanation quality and efficiency. Overall, HuGLEN reduces the human-labeling burden while supporting consistent model selection for operator-facing automation tasks.

LLM evaluation

Quality-Efficiency Score (QES)

Quality of Transmission (QoT)

human-grounded evaluation

network automation

energy efficiency

arge language models (LLMs)

explainable AI (XAI)

Author

Kiarash Rezaei

Chalmers, Electrical Engineering, Communication, Antennas and Optical Networks

Omran Ayoub

University of Applied Sciences and Arts of Southern Switzerland

Paolo Monti

Chalmers, Electrical Engineering, Communication, Antennas and Optical Networks

Carlos Natalino Da Silva

Chalmers, Electrical Engineering, Communication, Antennas and Optical Networks

26th International Conference on Transparent Optical Networks ICTON 2026
Prague, Czech Republic,

Sustainable Technologies for Advanced Resilient and Energy-Efficient Networks - Advance

VINNOVA (2025-02987), 2025-12-01 -- 2028-11-17.

Areas of Advance

Information and Communication Technology

Subject Categories (SSIF 2025)

Communication Systems

Telecommunications

Computer and Information Sciences

Infrastructure

C3SE (-2020, Chalmers Centre for Computational Science and Engineering)

More information

Latest update

6/2/2026 1