What question did this study set out to answer?

January 21, 2026

Exploring Large Language Models in Resolving Environment-Related Crash Bugs: Localizing and Repairing

Key Points

To evaluate the effectiveness of large language models in resolving environment-related crash bugs compared to code-related ones.
Constructed a dataset of 100 environment-related crash bugs for evaluation.
Compared performance of LLMs in resolving different types of crash bugs with varying contextual information.
Implemented various prompting strategies, including active inquiry for enhanced bug resolution.
Evaluated IntDiagSolver on 41 crash bugs using multiple LLMs, measuring improvements in accuracy.
LLMs showed better performance on code-related crash bugs than environment-related ones.
Localization challenges were more significant for code-related crashes, while repairing was tougher for environment-related crashes.
IntDiagSolver demonstrated consistent improvements in accuracy, ranging from 9.1% to 53.3% across different metrics.
The methodology is effective on unseen data, showcasing strong generalizability.

Abstract

Software crash bugs cause unexpected program behaviors or even abrupt termination, thus demanding immediate resolution. However, resolving crash bugs can be challenging due to their complex root causes, which can originate from issues in the source code or external factors like third-party library dependencies. Large language models (LLMs) have shown promise in software engineering tasks, leveraging their extensive pre-training on text and code corpora. However, existing research predominantly focuses on the capability of LLMs to localize and repair code-related crash bugs, leaving their effectiveness in resolving environment-related crash bugs in real-world software unexplored. To fill this gap, we conducted the first comprehensive study to assess the capability of LLMs in resolving real-world environment-related crash bugs, using a newly constructed dataset of 100 representative crash bugs. We first systematically compare LLMs’ performance in resolving code-related and environment-related crash bugs with varying levels of crash contextual information. Our findings reveal that the LLM performs better in resolving code-related crash bugs compared to environment-related ones. Specifically, localization is the primary challenge for resolving code-related crashes, while repair poses a greater challenge for environment-related crashes. Furthermore, we investigate the impact of different prompt strategies on improving the resolution of environment-related crash bugs, incorporating different prompt templates and multi-round interactions. Building on this, we further explore an advanced active inquiry prompting strategy, which leverages the self-planning capabilities of LLMs to conduct systematic and continuous questioning aimed at identifying potential environmental factors that contribute to crashes. Based on these explorations, we propose IntDiagSolver, an interactive methodology designed to enable precise crash bug resolution through ongoing engagement with LLMs. Extensive evaluations of IntDiagSolver on a dataset of 41 crash bugs across multiple LLMs (including GPT-3.5, GPT-4, Claude, CodeLlama, DeepSeek-R1, and Qwen-3-Coder) demonstrate consistent improvements in resolution accuracy, with substantial enhancements ranging from 9.1% to 43.3% in localization and 9.1% to 53.3% in repair. Furthermore, the strong performance of IntDiagSolver on the latest expanded multilingual dataset of 42 crash bugs highlights its strong generalizability and effectiveness on previously unseen data.

Bookmark

Cite This Study

Du et al. (Mon,) studied this question.

synapsesocial.com/papers/69706ce9b6488063ad5c1ad2 https://doi.org/https://doi.org/10.1145/3788866

Bookmark