Synergizing Data Management, DataOps, and Data Pipelines for AI-Enhanced Embedded Systems
Doctoral thesis, 2024
initiative, playing a pivotal role in the development, training, and deployment
of AI models. A well-structured approach to data management ensures that
AI models are trained on reliable data, comply with ethical standards, and
contribute positively to decision-making processes in embedded systems.
Objectives: This thesis is structured around three primary objectives. The
first objective is to comprehensively understand and address the data management
challenges associated with embedded systems. Building upon this
understanding, the second objective is to explore the data management practices
that can help alleviate the challenges of data management. Finally, the
third objective aims to develop and validate the implementation approaches
for enhanced data management.
Method: To achieve the objectives, we conducted research in close collaboration
with industry and used a combination of different empirical research
methods like interpretive case studies, literature reviews, and action research.
Results: This thesis presents six main results. First, it identifies and categorizes
data management challenges, solutions, and limitations. Second, it presents
a stairway model delineating the stages of the evolution towards DataOps.
Third, it proposes a model for evaluating the maturity of data pipelines and
identifies determinants to assess the impact of machine learning (ML) on
data pipelines. Fourth, it identifies the differences between unidirectional and
bidirectional data pipelines and the significance, benefits, and challenges of
bidirectional data pipelines. The thesis also provides a roadmap for the smooth
migration from unidirectional to bidirectional data pipelines. Fifth, it presents
and validates the conceptual model of an end-to-end data pipeline for ML/DL
models. Finally, it presents and validates fault-tolerant data pipelines and an
AI-powered 4-stage model for automated fault recovery in data pipelines.
Conclusion: In conclusion, this thesis demonstrates a well-structured approach
to data management in AI-enhanced embedded systems, supported by
innovative practices and robust implementation approaches, that is essential for
ensuring the reliability, and effectiveness of data in decision-making processes.
Automated Fault Recovery
Bidirectional
Data Management Challenges
DataOps Evolution
Robustness
Fault-Tolerance
Data Pipelines
Author
Aiswarya Raj Munappy
Software Engineering 1
My research endeavors to revolutionize data management practices at an industrial scale. Through an in-depth exploration, we uncovered a multitude of challenges in managing data for deep learning applications. By combining insights from real-world case studies and cutting-edge literature reviews, we dissected the current state of data management approaches, paving the way for a more robust and efficient methodology. With the surge of Artificial Intelligence (AI) technologies, our study takes a bold step forward by modelling a resilient data pipeline tailored for AI-enhanced embedded systems. This pipeline acts as a lifeline, guiding the flow of data seamlessly through complex networks, ensuring reliability and accuracy at every stage. But our journey doesn't end there. We draw attention to the faults that are frequently missed but exist in data pipelines. By identifying these faults and implementing proactive mitigation strategies, we empower industries to navigate the challenges of data management with confidence, minimizing human interventions, and maximizing productivity.
This thesis serves as a guide for both academia and industry. Researchers are invited to delve deeper into the practical challenges of data management and data pipelines left unexplored. Meanwhile, industry practitioners are encouraged to reflect on the pivotal role of adopting tailored data management and data pipeline practices, particularly in the domain of AI-enhanced embedded systems. As we embark on this transformative journey, let us embrace the power of data management as a catalyst for innovation and progress, propelling us towards a future where data-driven decisions shape a brighter tomorrow.
Software Engineering for AI/ML/DL
Chalmers AI Research Centre (CHAIR), 2019-11-01 -- 2022-11-01.
Infrastructure
C3SE (Chalmers Centre for Computational Science and Engineering)
Subject Categories
Software Engineering
ISBN
978-91-8103-052-5
Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5510
Publisher
Chalmers
Mötesrum 473 is located on Campus Lindholmen. Go to building Jupiter. Entrance from Hörselgången 5. Go to floor 4.
Opponent: Xaioefng Wang, university of Bolzano, Italy