Synergizing Data Management, DataOps, and Data Pipelines for AI-Enhanced Embedded Systems
Doctoral thesis, 2024

Context: Data management is a critical aspect of any artificial intelligence (AI)

initiative, playing a pivotal role in the development, training, and deployment

of AI models. A well-structured approach to data management ensures that

AI models are trained on reliable data, comply with ethical standards, and

contribute positively to decision-making processes in embedded systems.

Objectives: This thesis is structured around three primary objectives. The

first objective is to comprehensively understand and address the data management

challenges associated with embedded systems. Building upon this

understanding, the second objective is to explore the data management practices

that can help alleviate the challenges of data management. Finally, the

third objective aims to develop and validate the implementation approaches

for enhanced data management.

Method: To achieve the objectives, we conducted research in close collaboration

with industry and used a combination of different empirical research

methods like interpretive case studies, literature reviews, and action research.

Results: This thesis presents six main results. First, it identifies and categorizes

data management challenges, solutions, and limitations. Second, it presents

a stairway model delineating the stages of the evolution towards DataOps.

Third, it proposes a model for evaluating the maturity of data pipelines and

identifies determinants to assess the impact of machine learning (ML) on

data pipelines. Fourth, it identifies the differences between unidirectional and

bidirectional data pipelines and the significance, benefits, and challenges of

bidirectional data pipelines. The thesis also provides a roadmap for the smooth

migration from unidirectional to bidirectional data pipelines. Fifth, it presents

and validates the conceptual model of an end-to-end data pipeline for ML/DL

models. Finally, it presents and validates fault-tolerant data pipelines and an

AI-powered 4-stage model for automated fault recovery in data pipelines.

Conclusion: In conclusion, this thesis demonstrates a well-structured approach

to data management in AI-enhanced embedded systems, supported by

innovative practices and robust implementation approaches, that is essential for

ensuring the reliability, and effectiveness of data in decision-making processes.

Automated Fault Recovery

Bidirectional

Data Management Challenges

DataOps Evolution

Robustness

Fault-Tolerance

Data Pipelines

Mötesrum 473 is located on Campus Lindholmen. Go to building Jupiter. Entrance from Hörselgången 5. Go to floor 4.
Opponent: Xaioefng Wang, university of Bolzano, Italy

Author

Aiswarya Raj Munappy

Software Engineering 1

In today's digital age, data is the backbone of decision-making and innovation for businesses worldwide. However, the journey from raw data to actionable insights is riddled with challenges, ranging from data silos to quality inconsistencies, leading to unreliable and ineffective insights. Enter the domain of data management—a crucial aspect of harnessing the full potential of data.

My research endeavors to revolutionize data management practices at an industrial scale. Through an in-depth exploration, we uncovered a multitude of challenges in managing data for deep learning applications. By combining insights from real-world case studies and cutting-edge literature reviews, we dissected the current state of data management approaches, paving the way for a more robust and efficient methodology. With the surge of Artificial Intelligence (AI) technologies, our study takes a bold step forward by modelling a resilient data pipeline tailored for AI-enhanced embedded systems. This pipeline acts as a lifeline, guiding the flow of data seamlessly through complex networks, ensuring reliability and accuracy at every stage. But our journey doesn't end there. We draw attention to the faults that are frequently missed but exist in data pipelines. By identifying these faults and implementing proactive mitigation strategies, we empower industries to navigate the challenges of data management with confidence, minimizing human interventions, and maximizing productivity.

This thesis serves as a guide for both academia and industry. Researchers are invited to delve deeper into the practical challenges of data management and data pipelines left unexplored. Meanwhile, industry practitioners are encouraged to reflect on the pivotal role of adopting tailored data management and data pipeline practices, particularly in the domain of AI-enhanced embedded systems. As we embark on this transformative journey, let us embrace the power of data management as a catalyst for innovation and progress, propelling us towards a future where data-driven decisions shape a brighter tomorrow.

Software Engineering for AI/ML/DL

Chalmers AI Research Centre (CHAIR), 2019-11-01 -- 2022-11-01.

Infrastructure

C3SE (Chalmers Centre for Computational Science and Engineering)

Subject Categories

Software Engineering

ISBN

978-91-8103-052-5

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5510

Publisher

Chalmers

Mötesrum 473 is located on Campus Lindholmen. Go to building Jupiter. Entrance from Hörselgången 5. Go to floor 4.

Online

Opponent: Xaioefng Wang, university of Bolzano, Italy

More information

Latest update

5/20/2024