A practical guide to the implementation of AI in orthopaedic research, Part 6: How to evaluate the performance of AI research?

Felix C. Oettl; Ayoosh Pareek; Philipp W. Winkler; Bálint Zsidai; James Pruneski; Eric Hamrin Senorski; Sebastian Kopf; Christophe Ley; Elmar Herbst; Jacob F. Oeding; Alberto Grassi; Michael T. Hirschmann; Volker Musahl; Kristian Samuelsson; Thomas Tischer; Robert Feldt

doi:10.1002/jeo2.12039

A practical guide to the implementation of AI in orthopaedic research, Part 6: How to evaluate the performance of AI research?
Review article, 2024

Artificial intelligence's (AI) accelerating progress demands rigorous evaluation standards to ensure safe, effective integration into healthcare's high-stakes decisions. As AI increasingly enables prediction, analysis and judgement capabilities relevant to medicine, proper evaluation and interpretation are indispensable. Erroneous AI could endanger patients; thus, developing, validating and deploying medical AI demands adhering to strict, transparent standards centred on safety, ethics and responsible oversight. Core considerations include assessing performance on diverse real-world data, collaborating with domain experts, confirming model reliability and limitations, and advancing interpretability. Thoughtful selection of evaluation metrics suited to the clinical context along with testing on diverse data sets representing different populations improves generalisability. Partnering software engineers, data scientists and medical practitioners ground assessment in real needs. Journals must uphold reporting standards matching AI's societal impacts. With rigorous, holistic evaluation frameworks, AI can progress towards expanding healthcare access and quality. Level of Evidence: Level V.

digitalization

healthcare

performance metrics

Author

Felix C. Oettl

Hospital for Special Surgery - New York

Schulthess Klinik

Ayoosh Pareek

Hospital for Special Surgery - New York

Philipp W. Winkler

Sahlgrenska University Hospital

Johannes Kepler University of Linz (JKU)

University of Gothenburg

Bálint Zsidai

Sahlgrenska University Hospital

University of Gothenburg

James Pruneski

Tripler Regional Med Center

Eric Hamrin Senorski

University of Gothenburg

Sahlgrenska University Hospital

Sebastian Kopf

Medizinische Hochschule Brandenburg Theodor Fontane

Christophe Ley

University of Luxembourg

Elmar Herbst

Division of General Internal Medicine

Jacob F. Oeding

University of Gothenburg

Mayo Clinic Alix School of Medicine

Alberto Grassi

IRCCS Istituto Ortopedico Rizzoli, Bologna

Michael T. Hirschmann

Canton Hospital Basel-Land

University of Basel

Volker Musahl

UPMC Sports Medicine

Kristian Samuelsson

University of Gothenburg

Sahlgrenska University Hospital

Thomas Tischer

Malteser Waldkrankenhaus Erlangen

Universitymedicine Rostock

Robert Feldt

Chalmers, Computer Science and Engineering (Chalmers), Software Engineering (Chalmers)

Other publications Research

Journal of Experimental Orthopaedics

2197-1153 (eISSN)

Vol. 11 3 e12039

Subject Categories (SSIF 2011)

Social Sciences Interdisciplinary

DOI

10.1002/jeo2.12039

Publication data connected to DOI

More information

Latest update

8/22/2025

A practical guide to the implementation of AI in orthopaedic research, Part 6: How to evaluate the performance of AI research? Review article, 2024

Author

Felix C. Oettl

Ayoosh Pareek

Philipp W. Winkler

Bálint Zsidai

James Pruneski

Eric Hamrin Senorski

Sebastian Kopf

Christophe Ley

Elmar Herbst

Jacob F. Oeding

Alberto Grassi

Michael T. Hirschmann

Volker Musahl

Kristian Samuelsson

Thomas Tischer

Robert Feldt

Journal of Experimental Orthopaedics

Subject Categories (SSIF 2011)

DOI

More information

Latest update

A practical guide to the implementation of AI in orthopaedic research, Part 6: How to evaluate the performance of AI research?
Review article, 2024