A Reference Model for Empirically Comparing LLMs with Humans
Paper in proceeding, 2025

Large Language Models (LLM) have shown stunning abilities to carry out tasks that were previously conducted by humans. The future role of humans and the responsibilities assigned to non-human LLMs affect society fundamentally. In that context, LLMs have often been compared to humans. However, it is surprisingly difficult to make a fair empirical comparison between humans and LLMs. To address those difficulties, we aim at establishing a systematic approach to guide researchers in comparing LLMs with humans across various tasks. In a literature review, we examined key differences and similarities among several existing studies. We developed a reference model of the information flow based on that literature exploration. We propose a framework to support researchers in designing and executing studies, and in assessing LLMs with respect to humans. Future studies can use the reference model as guidance for designing and reporting their own unique study design. We want to support researchers and the society to take a maturation step in this emerging and constantly growing field.

LLM

empirical evaluation

Reference Model

Author

Kurt Schneider

University of Hanover

Farnaz Fotrousi

Chalmers, Computer Science and Engineering (Chalmers), Interaction Design and Software Engineering

University of Gothenburg

Rebekka Wohlrab

Carnegie Mellon University (CMU)

Chalmers, Computer Science and Engineering (Chalmers), Interaction Design and Software Engineering

Proceedings - International Conference on Software Engineering

02705257 (ISSN)

130-134
9798331537074 (ISBN)

47th IEEE/ACM International Conference on Software Engineering: Software Engineering in Society, ICSE-SEIS 2025
Ottawa, Canada,

WASP SAS: Structuring data for continuous processing and ML systems

Wallenberg AI, Autonomous Systems and Software Program, 2018-01-01 -- 2023-01-01.

Subject Categories (SSIF 2025)

Software Engineering

Human Computer Interaction

DOI

10.1109/ICSE-SEIS66351.2025.00018

More information

Latest update

7/2/2025 6