On the Impact of ML use cases on Industrial Data Pipelines
Paper i proceeding, 2021

The impact of the Artificial Intelligence revolution is undoubtedly substantial in our society, life, firms, and employment. With data being a critical element, organizations are working towards obtaining high-quality data to train their AI models. Although data, data management, and data pipelines are part of industrial practice even before the introduction of ML models, the significance of data increased further with the advent of ML models, which force data pipeline developers to go beyond the traditional focus on data quality. The objective of this study is to analyze the impact of ML use cases on data pipelines. We assume that the data pipelines that serve ML models are given more importance compared to the conventional data pipelines. We report on a study that we conducted by observing software teams at three companies as they develop both conventional(Non-ML) data pipelines and data pipelines that serve ML-based applications. We study six data pipelines from three companies and categorize them based on their criticality and purpose. Further, we identify the determinants that can be used to compare the development and maintenance of these data pipelines. Finally, we map these factors in a two-dimensional space to illustrate their importance on a scale of low, moderate, and high.



Data Pipelines



ML characteristics


Aiswarya Raj Munappy

Testing, Requirements, Innovation and Psychology

Jan Bosch

Testing, Requirements, Innovation and Psychology

Helena Holmström Olsson

Malmö universitet

Anders Jansson

China-Euro Vehicle Technology (CEVT) AB

Proceedings - Asia-Pacific Software Engineering Conference, APSEC

15301362 (ISSN)

Vol. 2021-December 463-472
9781665437844 (ISBN)

28th Asia-Pacific Software Engineering Conference, APSEC 2021
Virtual, Online, Taiwan,


Annan data- och informationsvetenskap


Bioinformatik och systembiologi



Mer information

Senast uppdaterat