Data Labeling: An Empirical Investigation into Industrial Challenges and Mitigation Strategies
Paper i proceeding, 2020

Labeling is a cornerstone of supervised machine learning. However, in industrial applications, data is often not labeled, which complicates using this data for machine learning. Although there are well-established labeling techniques such as crowdsourcing, active learning, and semi-supervised learning, these still do not provide accurate and reliable labels for every machine learning use case in the industry. In this context, the industry still relies heavily on manually annotating and labeling their data. This study investigates the challenges that companies experience when annotating and labeling their data. We performed a case study using a semi-structured interview with data scientists at two companies to explore their problems when labeling and annotating their data. This paper provides two contributions. We identify industry challenges in the labeling process, and then we propose mitigation strategies for these challenges.

Machine learning

Data labeling

Case study

Författare

Teodor Fredriksson

Chalmers, Data- och informationsteknik, Software Engineering

David Issa Mattos

Chalmers, Data- och informationsteknik, Software Engineering

Jan Bosch

Chalmers, Data- och informationsteknik, Software Engineering

Helena Holmstrom Olsson

Malmö universitet

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

03029743 (ISSN) 16113349 (eISSN)

Vol. 12562 LNCS 202-216
9783030641474 (ISBN)

Product-Focused Software Process Improvement
Turin, Italy,

Ämneskategorier

Annan data- och informationsvetenskap

Lärande

Systemvetenskap

DOI

10.1007/978-3-030-64148-1_13

Mer information

Senast uppdaterat

2021-03-10