What type of study is this?

This is a Case Report study (also classified as: Observational).

September 10, 2025

From Elicitation Interviews to Software Requirements: Evaluating LLM Performance in Requirement Generation

Key Points

ChatGPT-4 outperformed DeepSeek-V3 in extracting precise non-functional requirements, indicating varying capabilities of LLMs in requirement generation.
DeepSeek-V3 showed advantages in efficiency during requirement extraction, suggesting a trade-off with accuracy in LLM performance.
Both models struggled with ambiguity and categorizing requirements, highlighting existing limitations in current LLM capabilities.
The study suggests future research explore hybrid AI-human approaches to enhance requirement extraction accuracy and effectiveness.

Abstract

Recent advancements in artificial intelligence (AI), particularly in large language models (LLMs), offer new possibilities for automating requirements generation from elicitation interviews. This study compares the performance of ChatGPT-4 and DeepSeek-V3 in generating software requirements based on transcribed stakeholder interviews. Using two case studies, the LLMs were tasked with identifying functional and non-functional requirements. The results indicate that ChatGPT-4 performed better in extracting precise requirements, particularly nonfunctional ones, while DeepSeek-V3 demonstrated advantages in efficiency. However, both models exhibited limitations in handling ambiguity and properly categorizing requirements. This study highlights the potential of LLMs in Requirements Engineering while emphasizing the need for improved prompt/dialogues techniques and human supervision. Future research should explore hybrid AI-human approaches and domain-specific fine-tuning to enhance requirement extraction accuracy.

Bookmark

From Elicitation Interviews to Software Requirements: Evaluating LLM Performance in Requirement Generation

Key Points

Abstract

Cite This Study

Also Consider

Also Consider