On the Impact of ML use cases on Industrial Data Pipelines

Aiswarya Raj Munappy; Jan Bosch; Helena Holmström Olsson; Anders Jansson

doi:10.1109/APSEC53868.2021.00053

On the Impact of ML use cases on Industrial Data Pipelines
Paper i proceeding, 2021

The impact of the Artificial Intelligence revolution is undoubtedly substantial in our society, life, firms, and employment. With data being a critical element, organizations are working towards obtaining high-quality data to train their AI models. Although data, data management, and data pipelines are part of industrial practice even before the introduction of ML models, the significance of data increased further with the advent of ML models, which force data pipeline developers to go beyond the traditional focus on data quality. The objective of this study is to analyze the impact of ML use cases on data pipelines. We assume that the data pipelines that serve ML models are given more importance compared to the conventional data pipelines. We report on a study that we conducted by observing software teams at three companies as they develop both conventional(Non-ML) data pipelines and data pipelines that serve ML-based applications. We study six data pipelines from three companies and categorize them based on their criticality and purpose. Further, we identify the determinants that can be used to compare the development and maintenance of these data pipelines. Finally, we map these factors in a two-dimensional space to illustrate their importance on a scale of low, moderate, and high.

conventional

determinants

Data Pipelines

ML-influenced

criticality

ML characteristics

Författare

Aiswarya Raj Munappy

Testing, Requirements, Innovation and Psychology

Forskning Andra publikationer

Jan Bosch

Testing, Requirements, Innovation and Psychology

Forskning Andra publikationer

Helena Holmström Olsson

Malmö universitet

Forskning Andra publikationer

Anders Jansson

China-Euro Vehicle Technology (CEVT) AB

Proceedings - Asia-Pacific Software Engineering Conference, APSEC

15301362 (ISSN)

Vol. 2021-December 463-472
9781665437844 (ISBN)

28th Asia-Pacific Software Engineering Conference, APSEC 2021
Virtual, Online, Taiwan,

Ämneskategorier (SSIF 2011)

Annan data- och informationsvetenskap

Mediateknik

Bioinformatik och systembiologi

DOI

10.1109/APSEC53868.2021.00053

Publikationsdata kopplat till DOI

Mer information

Senast uppdaterat

2022-03-24

On the Impact of ML use cases on Industrial Data Pipelines Paper i proceeding, 2021

Författare

Aiswarya Raj Munappy

Jan Bosch

Helena Holmström Olsson

Anders Jansson

Proceedings - Asia-Pacific Software Engineering Conference, APSEC

Ämneskategorier (SSIF 2011)

DOI

Mer information

Senast uppdaterat

On the Impact of ML use cases on Industrial Data Pipelines
Paper i proceeding, 2021