Software crash bugs cause unexpected program behaviors or even abrupt termination, thus demanding immediate resolution. However, resolving crash bugs can be challenging due to their complex root causes, which can originate from issues in the source code or external factors like third-party library dependencies. Large language models (LLMs) have shown promise in software engineering tasks, leveraging their extensive pre-training on text and code corpora. However, existing research predominantly focuses on the capability of LLMs to localize and repair code-related crash bugs, leaving their effectiveness in resolving environment-related crash bugs in real-world software unexplored. To fill this gap, we conducted the first comprehensive study to assess the capability of LLMs in resolving real-world environment-related crash bugs, using a newly constructed dataset of 100 representative crash bugs. We first systematically compare LLMs’ performance in resolving code-related and environment-related crash bugs with varying levels of crash contextual information. Our findings reveal that the LLM performs better in resolving code-related crash bugs compared to environment-related ones. Specifically, localization is the primary challenge for resolving code-related crashes, while repair poses a greater challenge for environment-related crashes. Furthermore, we investigate the impact of different prompt strategies on improving the resolution of environment-related crash bugs, incorporating different prompt templates and multi-round interactions. Building on this, we further explore an advanced active inquiry prompting strategy, which leverages the self-planning capabilities of LLMs to conduct systematic and continuous questioning aimed at identifying potential environmental factors that contribute to crashes. Based on these explorations, we propose IntDiagSolver, an interactive methodology designed to enable precise crash bug resolution through ongoing engagement with LLMs. Extensive evaluations of IntDiagSolver on a dataset of 41 crash bugs across multiple LLMs (including GPT-3.5, GPT-4, Claude, CodeLlama, DeepSeek-R1, and Qwen-3-Coder) demonstrate consistent improvements in resolution accuracy, with substantial enhancements ranging from 9.1% to 43.3% in localization and 9.1% to 53.3% in repair. Furthermore, the strong performance of IntDiagSolver on the latest expanded multilingual dataset of 42 crash bugs highlights its strong generalizability and effectiveness on previously unseen data.
Du et al. (Mon,) studied this question.