A Reference Model for Empirically Comparing LLMs with Humans
Paper i proceeding, 2025

Large Language Models (LLM) have shown stunning abilities to carry out tasks that were previously conducted by humans. The future role of humans and the responsibilities assigned to non-human LLMs affect society fundamentally. In that context, LLMs have often been compared to humans. However, it is surprisingly difficult to make a fair empirical comparison between humans and LLMs. To address those difficulties, we aim at establishing a systematic approach to guide researchers in comparing LLMs with humans across various tasks. In a literature review, we examined key differences and similarities among several existing studies. We developed a reference model of the information flow based on that literature exploration. We propose a framework to support researchers in designing and executing studies, and in assessing LLMs with respect to humans. Future studies can use the reference model as guidance for designing and reporting their own unique study design. We want to support researchers and the society to take a maturation step in this emerging and constantly growing field.

LLM

empirical evaluation

Reference Model

Författare

Kurt Schneider

Leibniz Universität Hannover

Farnaz Fotrousi

Chalmers, Data- och informationsteknik, Interaktionsdesign och Software Engineering

Göteborgs universitet

Rebekka Wohlrab

Carnegie Mellon University (CMU)

Chalmers, Data- och informationsteknik, Interaktionsdesign och Software Engineering

Proceedings - International Conference on Software Engineering

02705257 (ISSN)

130-134
9798331537074 (ISBN)

47th IEEE/ACM International Conference on Software Engineering: Software Engineering in Society, ICSE-SEIS 2025
Ottawa, Canada,

WASP SAS

Wallenberg AI, Autonomous Systems and Software Program, 2018-01-01 -- 2023-01-01.

Ämneskategorier (SSIF 2025)

Programvaruteknik

Människa-datorinteraktion (interaktionsdesign)

DOI

10.1109/ICSE-SEIS66351.2025.00018

Mer information

Senast uppdaterat

2025-07-02