Data management and Data Pipelines: An empirical investigation in the embedded systems domain

Aiswarya Raj Munappy

Data management and Data Pipelines: An empirical investigation in the embedded systems domain
Licentiatavhandling, 2021

Context: Companies are increasingly collecting data from all possible sources to extract insights that help in data-driven decision-making. Increased data volume, variety, and velocity and the impact of poor quality data on the development of data products are leading companies to look for an improved data management approach that can accelerate the development of high-quality data products. Further, AI is being applied in a growing number of fields, and thus it is evolving as a horizontal technology. Consequently, AI components are increasingly been integrated into embedded systems along with electronics and software. We refer to these systems as AI-enhanced embedded systems. Given the strong dependence of AI on data, this expansion also creates a new space for applying data management techniques.
Objective: The overall goal of this thesis is to empirically identify the data management challenges encountered during the development and maintenance of AI-enhanced embedded systems, propose an improved data management approach and empirically validate the proposed approach.
Method: To achieve the goal, we conducted this research in close collaboration with Software Center companies using a combination of different empirical research methods: case studies, literature reviews, and action research.
Results and conclusions: This research provides five main results. First, it identifies key data management challenges specific to Deep Learning models developed at embedded system companies. Second, it examines the practices such as DataOps and data pipelines that help to address data management challenges. We observed that DataOps is the best data management practice that improves the data quality and reduces the time tdevelop data products. The data pipeline is the critical component of DataOps that manages the data life cycle activities. The study also provides the potential faults at each step of the data pipeline and the corresponding mitigation strategies. Finally, the data pipeline model is realized in a small piece of data pipeline and calculated the percentage of saved data dumps through the implementation.
Future work: As future work, we plan to realize the conceptual data pipeline model so that companies can build customized robust data pipelines. We also plan to analyze the impact and value of data pipelines in cross-domain AI systems and data applications. We also plan to develop AI-based fault detection and mitigation system suitable for data pipelines.

data management

empirical investigation

artificial intelligence

data pipelines

embedded systems

software engineering

machine learning

CSE Jupiter 473, , Jupiter building, Hörselgången 5, floor 4

Opponent: Daniela Soares Cruzes , NTNU, Norway

Online disputation

Författare

Aiswarya Raj Munappy

Chalmers, Data- och informationsteknik, Software Engineering

Forskning Andra publikationer

Towards automated detection of data pipeline faults

Proceedings - Asia-Pacific Software Engineering Conference, APSEC,;Vol. 2020-December(2020)p. 346-355

Paper i proceeding

Modelling Data Pipelines

Proceedings - 46th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2020,;(2020)p. 13-20

Paper i proceeding

From Ad-Hoc Data Analytics to DataOps

Proceedings - 2020 IEEE/ACM International Conference on Software and System Processes, ICSSP 2020,;(2020)p. 165-174

Paper i proceeding

Data Management Challenges for Deep Learning

Proceedings - 45th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2019,;(2019)p. 140-147

Paper i proceeding

Data Pipeline Management in Practice: Challenges and Opportunities

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),;Vol. 12562 LNCS(2020)p. 168-184

Paper i proceeding

Software Engineering for AI/ML/DL

CHAIR, 2019-11-01 -- 2022-11-01.

Visa projekt

HoliDev - Holistic DevOps Framework

VINNOVA (2017-05218), 2018-01-01 -- 2019-12-31.

Visa projekt

Ämneskategorier (SSIF 2011)

Annan data- och informationsvetenskap

Programvaruteknik

Datavetenskap (datalogi)

Styrkeområden

Informations- och kommunikationsteknik

Utgivare

Chalmers

CSE Jupiter 473, , Jupiter building, Hörselgången 5, floor 4

Online

Opponent: Daniela Soares Cruzes , NTNU, Norway

Mer information

Senast uppdaterat

2021-12-10

Data management and Data Pipelines: An empirical investigation in the embedded systems domain Licentiatavhandling, 2021

Författare

Aiswarya Raj Munappy

Towards automated detection of data pipeline faults

Modelling Data Pipelines

From Ad-Hoc Data Analytics to DataOps

Data Management Challenges for Deep Learning

Data Pipeline Management in Practice: Challenges and Opportunities

Software Engineering for AI/ML/DL

HoliDev - Holistic DevOps Framework

Ämneskategorier (SSIF 2011)

Styrkeområden

Utgivare

Mer information

Senast uppdaterat

Data management and Data Pipelines: An empirical investigation in the embedded systems domain
Licentiatavhandling, 2021