What question did this study set out to answer?

The aim is to systematically review the use of large language models in generating test cases for software reliability.

April 1, 2026Open Access

Test case generation using large language models: a systematic literature review

Key Points

The aim is to systematically review the use of large language models in generating test cases for software reliability.
Conducted a systematic literature review of 38 peer-reviewed articles from 2020 to 2025.
Analyzed datasets, training techniques, and integration strategies in test case generation.
Searched in major databases like Science Direct, IEEE Xplore, ACM Digital Library, and SpringerLink.
LLMs enhance speed and coverage in automated test case generation.
Identified challenges related to dataset quality and integration complexities.
Presented potential solutions to improve LLM applications in test case generation.

Abstract

Abstract Test case generation is a time-consuming and labor-intensive task vital to ensuring software reliability. Automating this process is critical for increasing efficiency and reducing potential human errors in test case generation. This study systematically examined the applications and motivations of Large Language Models (LLMs) in test case generation. The Systematic Literature Review (SLR) method was chosen to identify gaps in the existing literature and comprehensively evaluate the impact of LLMs in this field. 38 peer-reviewed articles published between 2020 and 2025 in databases such as Science Direct, IEEE Xplore, ACM Digital Library, and SpringerLink, addressing the use of LLMs in test case generation, were systematically analyzed. The review evaluated the datasets used, LLM training and test generation techniques, targeted programming languages, preprocessing and postprocessing methods, and integration strategies with existing software workflows. The findings highlight the ability of LLMs to increase the speed and coverage of test case generation in test automation, highlight challenges such as dataset quality and integration complexities, and suggest potential solutions to address these issues. This review provides an important resource for researchers using LLMs in automated test generation, providing insights into their capabilities and encouraging further research in this area.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Murat Tasarsu

Ahmet Vedat Tokmak

Cagatay Catal

Journals

Cluster Computing

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Test case generation using large language models: a systematic literature review

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider