An exploratory study on automatic identification of assumptions in the development of deep learning frameworks
Journal article, 2025

Context: Stakeholders constantly make assumptions in the development of deep learning (DL) frameworks. These assumptions are related to various types of software artifacts (e.g., requirements, design decisions, and technical debt) and can turn out to be invalid, leading to system failures. Existing approaches and tools for assumption management usually depend on manual identification of assumptions. However, assumptions are scattered in various sources (e.g., code comments, commits, pull requests, and issues) of DL framework development, and manually identifying assumptions has high costs (e.g., time and resources). Objective: The objective of the study is to evaluate different classification models for the purpose of identification with respect to assumptions from the point of view of developers and users in the context of DL framework projects (i.e., issues, pull requests, and commits) on GitHub. Method: First, we constructed a new and largest dataset (i.e., the AssuEval dataset) of assumptions collected from the TensorFlow and Keras repositories on GitHub. Then we explored the performance of seven non-transformers based models (e.g., Support Vector Machine, Classification and Regression Trees), the ALBERT model, and three decoder-only models (i.e., ChatGPT, Claude, and Gemini) for identifying assumptions on the AssuEval dataset. Results: The study results show that ALBERT achieves the best performance (f1-score: 0.9584) for identifying assumptions on the AssuEval dataset, which is much better than the other models (the 2nd best f1-score is 0.8858, achieved by the Claude 3.5 Sonnet model). Though ChatGPT, Claude, and Gemini are popular models, we do not recommend using them to identify assumptions in DL framework development because of their low performance. Fine-tuning ChatGPT, Claude, Gemini, or other language models (e.g., Llama3, Falcon, and BLOOM) specifically for assumptions might improve their performance for assumption identification. Conclusions: This study provides researchers with the largest dataset of assumptions for further research (e.g., assumption classification, evaluation, and reasoning) and helps researchers and practitioners better understand assumptions and how to manage them in their projects (e.g., selection of classification models for identifying assumptions).

TensorFlow

Keras

Assumption

Automatic identification

Deep learning framework

Author

Chen Yang

Nanjing University

Shenzhen Polytechnic University

Peng Liang

Wuhan University

Zinan Ma

Student at Chalmers

Science of Computer Programming

0167-6423 (ISSN)

Vol. 240 103218

Subject Categories

Software Engineering

Computer Science

DOI

10.1016/j.scico.2024.103218

More information

Latest update

10/7/2024