Data management and Data Pipelines: An empirical investigation in the embedded systems domain
Licentiate thesis, 2021
Objective: The overall goal of this thesis is to empirically identify the data management challenges encountered during the development and maintenance of AI-enhanced embedded systems, propose an improved data management approach and empirically validate the proposed approach.
Method: To achieve the goal, we conducted this research in close collaboration with Software Center companies using a combination of different empirical research methods: case studies, literature reviews, and action research.
Results and conclusions: This research provides five main results. First, it identifies key data management challenges specific to Deep Learning models developed at embedded system companies. Second, it examines the practices such as DataOps and data pipelines that help to address data management challenges. We observed that DataOps is the best data management practice that improves the data quality and reduces the time tdevelop data products. The data pipeline is the critical component of DataOps that manages the data life cycle activities. The study also provides the potential faults at each step of the data pipeline and the corresponding mitigation strategies. Finally, the data pipeline model is realized in a small piece of data pipeline and calculated the percentage of saved data dumps through the implementation.
Future work: As future work, we plan to realize the conceptual data pipeline model so that companies can build customized robust data pipelines. We also plan to analyze the impact and value of data pipelines in cross-domain AI systems and data applications. We also plan to develop AI-based fault detection and mitigation system suitable for data pipelines.
data management
empirical investigation
artificial intelligence
data pipelines
embedded systems
software engineering
machine learning
Author
Aiswarya Raj Munappy
Chalmers, Computer Science and Engineering (Chalmers), Software Engineering (Chalmers)
Towards automated detection of data pipeline faults
Proceedings - Asia-Pacific Software Engineering Conference, APSEC,;Vol. 2020-December(2020)p. 346-355
Paper in proceeding
Modelling Data Pipelines
2020 46th Euromicro Conference on Software Engineering and Advanced Applications (SEAA),;(2020)p. 13-20
Paper in proceeding
From Ad-Hoc Data Analytics to DataOps
ICSSP '20: Proceedings of the International Conference on Software and System Processes,;(2020)p. 165-174
Paper in proceeding
Data Management Challenges for Deep Learning
Proceedings - 45th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2019,;(2019)p. 140-147
Paper in proceeding
Data Pipeline Management in Practice: Challenges and Opportunities
Lecture Notes in Computer Science,;Vol. 12562(2020)p. 168-184
Paper in proceeding
Software Engineering for AI/ML/DL
Chalmers AI Research Centre (CHAIR), 2019-11-01 -- 2022-11-01.
HoliDev - Holistic DevOps Framework
VINNOVA (2017-05218), 2018-01-01 -- 2019-12-31.
Subject Categories
Other Computer and Information Science
Software Engineering
Computer Science
Areas of Advance
Information and Communication Technology
Publisher
Chalmers
CSE Jupiter 473, , Jupiter building, Hörselgången 5, floor 4
Opponent: Daniela Soares Cruzes , NTNU, Norway