What question did this study set out to answer?

The aim is to evaluate how well large language models support qualitative research and meet methodological standards.

April 23, 2026Open Access

Generative artificial intelligence in qualitative analysis: a critical examination of tools, trust and rigor

Key Points

The aim is to evaluate how well large language models support qualitative research and meet methodological standards.
Conducted a literature review of academic papers from 2020 to 2025.
Performed a proof-of-concept experiment evaluating five large language models on qualitative analysis tasks.
Developed an evaluation framework aligning LLM outputs with qualitative research quality criteria.
Found significant variation in performance among the models regarding contextual comprehension and coding accuracy.
Models performed well in literature retrieval and summarizing content but struggled with depth in critical analysis.
Recommendations included exercising critical oversight of LLM outputs and refining evaluation rubrics for ethical AI use.

Abstract

This study addresses a critical gap in existing research by systematically comparing the performance of five popular large language models (LLMs) in supporting high-quality qualitative research. Our methodology combines a literature review of academic papers from 2020 to 2025 with a proof-of-concept experiment evaluating ScholarAI, ChatGPT-4o, Claude 3.5 Sonnet, NotebookLM and Perplexity on key qualitative analysis tasks. We sought to determine how well these generative artificial intelligence (AI) models meet established standards of methodological rigor in qualitative analysis. Findings reveal significant variation in LLM performance: the models excelled at efficiently retrieving relevant literature, summarizing content and generating insights, but exhibited inconsistencies in contextual comprehension, coding accuracy and depth of critical analysis. These results informed a novel evaluation framework aligning LLM outputs with qualitative research quality criteria, contributing guidance for researchers and practitioners. We recommend that practitioners leverage LLMs to improve productivity while exercising critical oversight of their outputs, and that researchers address ethical concerns and refine evaluation rubrics to ensure responsible AI integration. Overall, this work establishes a foundation for responsible human–AI collaboration in qualitative research by highlighting both the opportunities and challenges of using generative AI to enhance methodological rigor and accessibility.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper