Data Labeling: An Empirical Investigation into Industrial Challenges and Mitigation Strategies Authors
Paper i proceeding, 2020

Labeling is a cornerstone of supervised machine learning. However, in industrial applications, data is often not labeled, which complicates using this data for machine learning. Although there are well-established labeling techniques such as crowdsourcing, active learning, and semi-supervised learning, these still do not provide accurate and reliable labels for every machine learning use case in the industry. In this context, the industry still relies heavily on manually annotating and labeling their data. This study investigates the challenges that companies experience when annotating and labeling their data. We performed a case study using a semi-structured interview with data scientists at two companies to explore their problems when labeling and annotating their data. This paper provides two contributions. We identify industry challenges in the labeling process, and then we propose mitigation strategies for these challenges.

Machine learning

Case study

Data labeling

Författare

Teodor Fredriksson

Chalmers, Data- och informationsteknik, Software Engineering, Software Engineering for Testing, Requirements, Innovation and Psychology

David Issa Mattos

Chalmers, Data- och informationsteknik, Software Engineering, Software Engineering for Cyber Physical Systems

Jan Bosch

Chalmers, Data- och informationsteknik, Software Engineering, Software Engineering for Testing, Requirements, Innovation and Psychology

Helena Holmstrom Olsson

Malmö universitet

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

03029743 (ISSN) 16113349 (eISSN)

Vol. 12562 LNCS 202-216

Product-Focused Software Process Improvement
Turin, Italy,

Ämneskategorier

Annan data- och informationsvetenskap

Lärande

Systemvetenskap

DOI

10.1007/978-3-030-64148-1_13

ISBN

9783030641474

Mer information

Senast uppdaterat

2021-01-04