What question did this study set out to answer?

This research explores the application of AI in automating the generation of system requirements and test cases to enhance software engineering processes.

April 30, 2026Open Access

AI-Driven Approaches to System Requirements and Test Case Generation: A New Paradigm in Software Engineering

Key Points

This research explores the application of AI in automating the generation of system requirements and test cases to enhance software engineering processes.
Leverages Extended Finite State Machine models as formal inputs for large language models.
Uses five system models to assess the proposed AI-driven framework.
Compares outputs from various AI models against a human-made reference standard.
AI models achieved over 90% accuracy with EFSM-based prompting.
Claude Sonnet generated the most reliable findings among AI models.
ChatGPT exhibited high flexibility while DeepSeek demonstrated exceptional runtime efficiency.

Abstract

Artificial intelligence (AI) is a new paradigm in software engineering that automates key phases of the development cycle. The methods of creating test cases and designing requirements are still mostly manual and prone to error. Unclear requirements can result in expensive rework and undiscovered defects in the development process. Scalability and dependability are crucial concerns in complex systems. These shortcomings highlight the need for improved methods to enhance accuracy and consistency throughout these critical phases. To generate well-organized system requirements, this article outlines a clear strategy that leverages Extended Finite State Machine models as formal inputs for large language models (LLMs). Five system models are used to assess the suggested framework. The comparison analysis evaluates the accuracy, completeness, test coverage, and runtime efficiency of the artifacts. Along with a comparison with a human-made reference standard, the study evaluates the performance of LLMs such as ChatGPT-5, Claude Sonnet 4.5, and DeepSeek V3.2. The findings demonstrate that AI models can achieve human-comparable accuracy by exceeding 90% with EFSM-based prompting. Claude Sonnet generated the most reliable findings, ChatGPT demonstrated exceptional flexibility, and DeepSeek demonstrated exceptional runtime economy. These findings show that human–AI workflows provide a new paradigm in scalable, traceable, and reproducible system engineering.

AI-Driven Approaches to System Requirements and Test Case Generation: A New Paradigm in Software Engineering

Key Points

Abstract

Cite This Study