A New AI Evaluation Cosmos: Ready to Play the Game?

J. Hernandez-Orallo; M. Baroni; J. Bieger; N. Chmait; D. L. Dowe; K. Hofmann; F. Martinez-Plumed; Claes Strannegård; K. R. Thorissons

doi:10.1609/aimag.v38i3.2748

A New AI Evaluation Cosmos: Ready to Play the Game?
Journal article, 2017

We report on a series of new platforms and events dealing with AI evaluation that may change the way in which AI systems are compared and their progress is measured. The introduction of a more diverse and challenging set of tasks in these platforms can feed AI research in the years to come, shaping the notion of success and the directions of the field. However, the playground of tasks and challenges presented there may misdirect the field without some meaningful structure and systematic guidelines for its organization and use. Anticipating this issue, we also report on several initiatives and workshops that are putting the focus on analyzing the similarity and dependencies between tasks, their difficulty, what capabilities they really measure and ultimately on elaborating new concepts and tools that can arrange tasks and benchmarks into a meaningful taxonomy.

Author

J. Hernandez-Orallo

Polytechnic University of Valencia (UPV)

M. Baroni

Facebook, Inc.

University of Trento

J. Bieger

Reykjavik University

N. Chmait

Monash University

D. L. Dowe

Monash University

K. Hofmann

Microsoft Corporation

F. Martinez-Plumed

Polytechnic University of Valencia (UPV)

Claes Strannegård

Chalmers, Computer Science and Engineering (Chalmers), Computing Science (Chalmers)

Other publications Research

K. R. Thorissons

Reykjavik University

Icelandic Institute of Intelligent Machines

AI Magazine

0738-4602 (ISSN)

Vol. 38 3 66-69

Subject Categories (SSIF 2011)

Computer Science

DOI

10.1609/aimag.v38i3.2748

Publication data connected to DOI

More information

Latest update

6/1/2026 2

A New AI Evaluation Cosmos: Ready to Play the Game? Journal article, 2017