The ontological politics of synthetic data: Normalities, outliers, and intersectional hallucinations

Francis Lee; Saghi Hajisharif; Ericka Johnson

doi:10.1177/20539517251318289

The ontological politics of synthetic data: Normalities, outliers, and intersectional hallucinations
Artikel i vetenskaplig tidskrift, 2025

Synthetic data is increasingly used as a substitute for real data due to ethical, legal, and logistical reasons. However, the rise of synthetic data also raises critical questions about its entanglement with the politics of classification and the reproduction of social norms and categories. This paper aims to problematize the use of synthetic data by examining how its production is intertwined with the maintenance of certain worldviews and classifications. We argue that synthetic data, like real data, is embedded with societal biases and power structures, leading to the reproduction of existing social inequalities. Through empirical examples, we demonstrate how synthetic data tends to highlight majority elements as the “normal” and minimize minority elements, and that the slight changes to the data structures that create synthetic data will also inevitably result in what we term “intersectional hallucinations.” These hallucinations are inherent to synthetic data and cannot be entirely eliminated without compromising the purpose of creating synthetic datasets. We contend that decisions about synthetic data involve determining which intersections are essential and which can be disregarded, a practice which will imbue these decisions with norms and values. Our study underscores the need for critical engagement with the mathematical and statistical choices in synthetic data production and advocates for careful consideration of the ontological and political implications of these choices during curatorial style production of synthetic structured data.

intersectionality

data ethics

classification

data bias

Synthetic structured data

ontological politics

Författare

Francis Lee

Linköpings universitet

Chalmers, Teknikens ekonomi och organisation, Science, Technology and Society

Forskning Andra publikationer

Saghi Hajisharif

Linköpings universitet

Ericka Johnson

Linköpings universitet

Big Data and Society

20539517 (eISSN)

Vol. 12 2

Ämneskategorier (SSIF 2025)

Social och ekonomisk geografi

DOI

10.1177/20539517251318289

Publikationsdata kopplat till DOI

Mer information

Senast uppdaterat

2025-04-23

The ontological politics of synthetic data: Normalities, outliers, and intersectional hallucinations Artikel i vetenskaplig tidskrift, 2025

Författare

Francis Lee

Saghi Hajisharif

Ericka Johnson

Big Data and Society

Ämneskategorier (SSIF 2025)

DOI

Mer information

Senast uppdaterat

The ontological politics of synthetic data: Normalities, outliers, and intersectional hallucinations
Artikel i vetenskaplig tidskrift, 2025