Human-Grounded Evaluation of Large Language Models for Optical Network Automation
Övrigt konferensbidrag, 2026

Large language models (LLMs) are increasingly adopted for network automation, yet their output quality and inference cost can vary substantially across LLMs families. We present HuGLEN, a stepwise evaluation pipeline that uses an LLM-as-a-judge together with a small set of expert ratings to enable scalable and reproducible comparison of candidate LLMs, and to rank them using a quality efficiency score (QES). We demonstrate HuGLEN on translating outputs from an explainable artificial intelligence (XAI) model for optical network quality of transmission (QoT) estimation task into operator- friendly explanations. Our results show that a medium-sized LLM (12B parameters) achieves the highest QES, indicating the best trade-off between explanation quality and efficiency. Overall, HuGLEN reduces the human-labeling burden while supporting consistent model selection for operator-facing automation tasks.

LLM evaluation

Quality-Efficiency Score (QES)

Quality of Transmission (QoT)

human-grounded evaluation

network automation

energy efficiency

arge language models (LLMs)

explainable AI (XAI)

Författare

Kiarash Rezaei

Chalmers, Elektroteknik, Kommunikation, Antenner och Optiska Nätverk

Omran Ayoub

University of Applied Sciences and Arts of Southern Switzerland

Paolo Monti

Chalmers, Elektroteknik, Kommunikation, Antenner och Optiska Nätverk

Carlos Natalino Da Silva

Chalmers, Elektroteknik, Kommunikation, Antenner och Optiska Nätverk

26th International Conference on Transparent Optical Networks ICTON 2026
Prague, Czech Republic,

Hållbara teknologier för avancerade, motståndskraftiga och energieffektiva nätverk - Advance

VINNOVA (2025-02987), 2025-12-01 -- 2028-11-17.

Styrkeområden

Informations- och kommunikationsteknik

Ämneskategorier (SSIF 2025)

Kommunikationssystem

Telekommunikation

Data- och informationsvetenskap (Datateknik)

Infrastruktur

C3SE (-2020, Chalmers Centre for Computational Science and Engineering)

Mer information

Senast uppdaterat

2026-06-02