A practical guide to the implementation of AI in orthopaedic research, Part 6: How to evaluate the performance of AI research?
Review article, 2024

Artificial intelligence's (AI) accelerating progress demands rigorous evaluation standards to ensure safe, effective integration into healthcare's high-stakes decisions. As AI increasingly enables prediction, analysis and judgement capabilities relevant to medicine, proper evaluation and interpretation are indispensable. Erroneous AI could endanger patients; thus, developing, validating and deploying medical AI demands adhering to strict, transparent standards centred on safety, ethics and responsible oversight. Core considerations include assessing performance on diverse real-world data, collaborating with domain experts, confirming model reliability and limitations, and advancing interpretability. Thoughtful selection of evaluation metrics suited to the clinical context along with testing on diverse data sets representing different populations improves generalisability. Partnering software engineers, data scientists and medical practitioners ground assessment in real needs. Journals must uphold reporting standards matching AI's societal impacts. With rigorous, holistic evaluation frameworks, AI can progress towards expanding healthcare access and quality. Level of Evidence: Level V.

performance metrics

AI

digitalization

ML

healthcare

Author

Felix C. Oettl

Hospital for Special Surgery - New York

Schulthess Klinik

Ayoosh Pareek

Hospital for Special Surgery - New York

Philipp W. Winkler

Sahlgrenska University Hospital

Johannes Kepler University of Linz (JKU)

University of Gothenburg

Bálint Zsidai

University of Gothenburg

Sahlgrenska University Hospital

James Pruneski

Tripler Regional Med Center

Eric Hamrin Senorski

Sahlgrenska University Hospital

University of Gothenburg

Sebastian Kopf

Medizinische Hochschule Brandenburg Theodor Fontane

Christophe Ley

University of Luxembourg

Elmar Herbst

Division of General Internal Medicine

Jacob F. Oeding

Mayo Clinic Alix School of Medicine

University of Gothenburg

Alberto Grassi

IRCCS Istituto Ortopedico Rizzoli, Bologna

Michael T. Hirschmann

Kantonsspital Baselland

University of Basel

Volker Musahl

UPMC Sports Medicine

Kristian Samuelsson

University of Gothenburg

Sahlgrenska University Hospital

Thomas Tischer

Malteser Waldkrankenhaus Erlangen

Universitymedicine Rostock

Robert Feldt

Chalmers, Computer Science and Engineering (Chalmers), Software Engineering (Chalmers)

Journal of Experimental Orthopaedics

2197-1153 (eISSN)

Vol. 11 3 e12039

Subject Categories (SSIF 2011)

Social Sciences Interdisciplinary

DOI

10.1002/jeo2.12039

More information

Latest update

7/3/2024 9