Synergizing Data Management, DataOps, and Data Pipelines for AI-Enhanced Embedded Systems

Aiswarya Raj Munappy

Synergizing Data Management, DataOps, and Data Pipelines for AI-Enhanced Embedded Systems
Doctoral thesis, 2024

Context: Data management is a critical aspect of any artificial intelligence (AI) initiative, playing a pivotal role in the development, training, and deployment of AI models. A well-structured approach to data management ensures that AI models are trained on reliable data, comply with ethical standards, and contribute positively to decision-making processes in embedded systems.

Objectives: This thesis is structured around three primary objectives. The first objective is to comprehensively understand and address the data management challenges associated with embedded systems. Building upon this understanding, the second objective is to explore the data management practices that can help alleviate the challenges of data management. Finally, the third objective aims to develop and validate the implementation approaches for enhanced data management.

Method: To achieve the objectives, we conducted research in close collaboration with industry and used a combination of different empirical research
methods like interpretive case studies, literature reviews, and action research.

Results: This thesis presents six main results. First, it identifies and categorizes data management challenges, solutions, and limitations. Second, it presents a stairway model delineating the stages of the evolution towards DataOps. Third, it proposes a model for evaluating the maturity of data pipelines and identifies determinants to assess the impact of machine learning (ML) on data pipelines. Fourth, it identifies the differences between unidirectional and bidirectional data pipelines and the significance, benefits, and challenges of bidirectional data pipelines. The thesis also provides a roadmap for the smooth migration from unidirectional to bidirectional data pipelines. Fifth, it presents and validates the conceptual model of an end-to-end data pipeline for ML/DL models. Finally, it presents and validates fault-tolerant data pipelines and an AI-powered 4-stage model for automated fault recovery in data pipelines.

Conclusion: In conclusion, this thesis demonstrates a well-structured approach to data management in AI-enhanced embedded systems, supported by
innovative practices and robust implementation approaches, that is essential for ensuring the reliability, and effectiveness of data in decision-making processes.

Robustness

Bidirectional

Fault-Tolerance

Automated Fault Recovery

Data Pipelines

Data Management Challenges

DataOps Evolution

Mötesrum 473 is located on Campus Lindholmen. Go to building Jupiter. Entrance from Hörselgången 5. Go to floor 4.

Opponent: Xaioefng Wang, university of Bolzano, Italy

Online defence

Author

Aiswarya Raj Munappy

Software Engineering 1

Other publications Research

In today's digital age, data is the backbone of decision-making and innovation for businesses worldwide. However, the journey from raw data to actionable insights is riddled with challenges, ranging from data silos to quality inconsistencies, leading to unreliable and ineffective insights. Enter the domain of data management—a crucial aspect of harnessing the full potential of data.

My research endeavors to revolutionize data management practices at an industrial scale. Through an in-depth exploration, we uncovered a multitude of challenges in managing data for deep learning applications. By combining insights from real-world case studies and cutting-edge literature reviews, we dissected the current state of data management approaches, paving the way for a more robust and efficient methodology. With the surge of Artificial Intelligence (AI) technologies, our study takes a bold step forward by modelling a resilient data pipeline tailored for AI-enhanced embedded systems. This pipeline acts as a lifeline, guiding the flow of data seamlessly through complex networks, ensuring reliability and accuracy at every stage. But our journey doesn't end there. We draw attention to the faults that are frequently missed but exist in data pipelines. By identifying these faults and implementing proactive mitigation strategies, we empower industries to navigate the challenges of data management with confidence, minimizing human interventions, and maximizing productivity.

This thesis serves as a guide for both academia and industry. Researchers are invited to delve deeper into the practical challenges of data management and data pipelines left unexplored. Meanwhile, industry practitioners are encouraged to reflect on the pivotal role of adopting tailored data management and data pipeline practices, particularly in the domain of AI-enhanced embedded systems. As we embark on this transformative journey, let us embrace the power of data management as a catalyst for innovation and progress, propelling us towards a future where data-driven decisions shape a brighter tomorrow.

Software Engineering for AI/ML/DL

CHAIR, 2019-11-01 -- 2022-11-01.

Show Project

Infrastructure

C3SE (-2020, Chalmers Centre for Computational Science and Engineering)

Subject Categories (SSIF 2011)

Software Engineering

ISBN

978-91-8103-052-5

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5510

Publisher

Chalmers