An empirical evaluation of pre-trained large language models for repairing declarative formal specifications
Artikel i vetenskaplig tidskrift, 2025

Automatic Program Repair (APR) has garnered significant attention as a practical research domain focused on automatically fixing bugs in programs. While existing APR techniques primarily target imperative programming languages like C and Java, there is a growing need for effective solutions applicable to declarative software specification languages. This paper systematically investigates the capacity of Large Language Models (LLMs) to repair declarative specifications in Alloy, a declarative formal language used for software specification. We designed six different repair settings, encompassing single-agent and dual-agent paradigms, utilizing various LLMs. These configurations also incorporate different levels of feedback, including an auto-prompting mechanism for generating prompts autonomously using LLMs. Our study reveals that dual-agent with auto-prompting setup outperforms the other settings, albeit with a marginal increase in the number of iterations and token usage. This dual-agent setup demonstrated superior effectiveness compared to state-of-the-art Alloy APR techniques when evaluated on a comprehensive set of benchmarks. This work is the first to empirically evaluate LLM capabilities to repair declarative specifications, while taking into account recent trending LLM concepts such as LLM-based agents, feedback, auto-prompting, and tools, thus paving the way for future agent-based techniques in software engineering.

Declarative specification

Formal methods

Automatic program repair

Alloy language

LLMs

Författare

Mohannad Alhanahnah

Göteborgs universitet

Chalmers, Data- och informationsteknik, Interaktionsdesign och Software Engineering

Md Rashedul Hasan

University of Nebraska - Lincoln

Lisong Xu

University of Nebraska - Lincoln

Hamid Bagheri

University of Nebraska - Lincoln

Empirical Software Engineering

1382-3256 (ISSN) 1573-7616 (eISSN)

Vol. 30 5 149

Ämneskategorier (SSIF 2025)

Programvaruteknik

Datavetenskap (datalogi)

DOI

10.1007/s10664-025-10687-1

Relaterade dataset

AlloySpecRepair [dataset]

URI: https://github.com/Mohannadcse/AlloySpecRepair

Mer information

Senast uppdaterat

2025-08-05