Does the dataset meet your expectations? Explaining sample representation in image data
Paper i proceeding, 2020

Since the behavior of a neural network model is adversely affected by a lack of diversity in training data, we present a method that identifies and explains such deficiencies. When a dataset is labeled,we n ote that annotations alone are capable of providing a human interpretable summary of sample diversity. This allows explaining any lack of diversity as the mismatch found when comparing the actual distribution of annotations in the dataset with an expected distribution of annotations, specified manually to capture essential label diversity. While, in many practical cases, labeling (samples → annotations) is expensive, its inverse, simulation (annotations → samples) can be cheaper. By mapping the expected distribution of annotations into test samples using parametric simulation, we present a method that explains sample representation using the mismatch in diversity between simulated and collected data.
We then apply the method to examine a dataset of geometric shapes to qualitatively and quantitatively explain sample representation in terms
of comprehensible aspects such as size, position, and pixel brightness.


Dhasarathy Parthasarathy

Volvo Group

Chalmers, Data- och informationsteknik, Funktionell programmering

Anton Johansson

Chalmers, Matematiska vetenskaper, Tillämpad matematik och statistik

Proceedings of the 32nd Benelux Conference, BNAIC/Benelearn 2020

Proceedings of the 32nd Benelux Conference, BNAIC/Benelearn 2020
, ,


Datavetenskap (datalogi)

Mer information

Senast uppdaterat