In today’s competitive digital landscape, application usability plays a critical role in user satisfaction and retention. Negative user reviews offer valuable insights into real-world usability issues, yet traditional analysis methods often fall short in scalability and contextual understanding. This paper proposes an intelligent framework that utilizes large language models (LLMs), including GPT-4, Gemini, and BLOOM, to automate the extraction of actionable usability recommendations from negative app reviews. By applying prompting and fine-tuning techniques, the framework transforms unstructured feedback into meaningful suggestions aligned with three core usability dimensions: correctness, completeness, and satisfaction. A manually annotated dataset of Instagram negative reviews was used to evaluate model performance. Results show that GPT-4 consistently outperformed other models, achieving BLEU scores up to 0.64, ROUGE scores up to 0.80, and METEOR scores up to 0.90—demonstrating high semantic accuracy and contextual relevance in generated recommendations. Gemini and BLOOM, while improved through fine-tuning, showed significantly lower performance. This study also introduces a practical, web-based tool that enables real-time review analysis and recommendation generation, supporting data-driven, user-centered software development. These findings illustrate the potential of LLM-based frameworks to enhance software usability analysis and accelerate feedback-driven design processes.
Alsaleh et al. (Mon,) studied this question.