Data Labeling: An Empirical Investigation into Industrial Challenges and Mitigation Strategies

Teodor Fredriksson; David Issa Mattos; Jan Bosch; Helena Holmstrom Olsson

doi:10.1007/978-3-030-64148-1_13

Data Labeling: An Empirical Investigation into Industrial Challenges and Mitigation Strategies
Paper i proceeding, 2020

Labeling is a cornerstone of supervised machine learning. However, in industrial applications, data is often not labeled, which complicates using this data for machine learning. Although there are well-established labeling techniques such as crowdsourcing, active learning, and semi-supervised learning, these still do not provide accurate and reliable labels for every machine learning use case in the industry. In this context, the industry still relies heavily on manually annotating and labeling their data. This study investigates the challenges that companies experience when annotating and labeling their data. We performed a case study using a semi-structured interview with data scientists at two companies to explore their problems when labeling and annotating their data. This paper provides two contributions. We identify industry challenges in the labeling process, and then we propose mitigation strategies for these challenges.

Machine learning

Data labeling

Case study

Författare

Teodor Fredriksson

Chalmers, Data- och informationsteknik, Software Engineering

Forskning Andra publikationer

David Issa Mattos

Chalmers, Data- och informationsteknik, Software Engineering

Forskning Andra publikationer

Jan Bosch

Chalmers, Data- och informationsteknik, Software Engineering

Forskning Andra publikationer

Helena Holmstrom Olsson

Malmö universitet

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

03029743 (ISSN) 16113349 (eISSN)

Vol. 12562 LNCS 202-216
9783030641474 (ISBN)

Product-Focused Software Process Improvement
Turin, Italy,

Ämneskategorier (SSIF 2011)

Annan data- och informationsvetenskap

Lärande

Systemvetenskap

DOI

10.1007/978-3-030-64148-1_13

Publikationsdata kopplat till DOI

Mer information

Senast uppdaterat

2021-03-10

Data Labeling: An Empirical Investigation into Industrial Challenges and Mitigation Strategies Paper i proceeding, 2020

Författare

Teodor Fredriksson

David Issa Mattos

Jan Bosch

Helena Holmstrom Olsson

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Ämneskategorier (SSIF 2011)

DOI

Mer information

Senast uppdaterat

Data Labeling: An Empirical Investigation into Industrial Challenges and Mitigation Strategies
Paper i proceeding, 2020