Human versus GPT-4 in qualitative analysis: A comparative reanalysis of patient interview data following anterior cruciate ligament injury rehabilitation
Journal article, 2026
Results: While the human-made analysis produced one overarching theme supported by three main categories and nine sub-categories, GPT-4′s analysis resulted in four themes, six main categories, and 15 sub-categories. Both analyses captured uncertainty and the impact of knee-related symptoms. GPT-4′s results showed a suspiciously equal distribution of codes across sub-categories, and introduced a theme not grounded in the source data. Multiple prompts were required to produce and organize the material.
Conclusion: The analysis performed by humans and GPT-4 had similarities and differences. The use of GPT-4 for qualitative analysis in its present form is challenging and needs to be performed across several steps. Currently, GPT-4 should not be used as the only tool in a qualitative analysis of interview data.
Qualitative research
Language processing
Rehabilitation
Author
Ramana Piussi
Sahlgrenska University Hospital
University of Gothenburg
Justin Schneiderman
University of Gothenburg
Yinan Yu
Chalmers, Computer Science and Engineering (Chalmers), Functional Programming
Kristian Samuelsson
Sahlgrenska University Hospital
University of Gothenburg
Eric Hamrin Senorski
University of Gothenburg
Sahlgrenska University Hospital
Knee
0968-0160 (ISSN) 18735800 (eISSN)
Vol. 60 104388Subject Categories (SSIF 2025)
Orthopaedics
Artificial Intelligence
DOI
10.1016/j.knee.2026.104388
PubMed
41707572