On the Impact of ML use cases on Industrial Data Pipelines
Paper in proceeding, 2021

The impact of the Artificial Intelligence revolution is undoubtedly substantial in our society, life, firms, and employment. With data being a critical element, organizations are working towards obtaining high-quality data to train their AI models. Although data, data management, and data pipelines are part of industrial practice even before the introduction of ML models, the significance of data increased further with the advent of ML models, which force data pipeline developers to go beyond the traditional focus on data quality. The objective of this study is to analyze the impact of ML use cases on data pipelines. We assume that the data pipelines that serve ML models are given more importance compared to the conventional data pipelines. We report on a study that we conducted by observing software teams at three companies as they develop both conventional(Non-ML) data pipelines and data pipelines that serve ML-based applications. We study six data pipelines from three companies and categorize them based on their criticality and purpose. Further, we identify the determinants that can be used to compare the development and maintenance of these data pipelines. Finally, we map these factors in a two-dimensional space to illustrate their importance on a scale of low, moderate, and high.

conventional

determinants

Data Pipelines

ML-influenced

criticality

ML characteristics

Author

Aiswarya Raj Munappy

Testing, Requirements, Innovation and Psychology

Jan Bosch

Testing, Requirements, Innovation and Psychology

Helena Holmström Olsson

Malmö university

Anders Jansson

China-Euro Vehicle Technology (CEVT) AB

Proceedings - Asia-Pacific Software Engineering Conference, APSEC

15301362 (ISSN)

Vol. 2021-December 463-472
9781665437844 (ISBN)

28th Asia-Pacific Software Engineering Conference, APSEC 2021
Virtual, Online, Taiwan,

Subject Categories

Other Computer and Information Science

Media Engineering

Bioinformatics and Systems Biology

DOI

10.1109/APSEC53868.2021.00053

More information

Latest update

3/24/2022