Recent advancements in artificial intelligence (AI), particularly in large language models (LLMs), offer new possibilities for automating requirements generation from elicitation interviews. This study compares the performance of ChatGPT-4 and DeepSeek-V3 in generating software requirements based on transcribed stakeholder interviews. Using two case studies, the LLMs were tasked with identifying functional and non-functional requirements. The results indicate that ChatGPT-4 performed better in extracting precise requirements, particularly nonfunctional ones, while DeepSeek-V3 demonstrated advantages in efficiency. However, both models exhibited limitations in handling ambiguity and properly categorizing requirements. This study highlights the potential of LLMs in Requirements Engineering while emphasizing the need for improved prompt/dialogues techniques and human supervision. Future research should explore hybrid AI-human approaches and domain-specific fine-tuning to enhance requirement extraction accuracy.
Almeida et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: