August 17, 2025Open Access

Context-Grounded Factuality Enhancement in LLM Responses via Multi-Stage Critique and Refinement

Key Points

The framework improves fact consistency score and context grounding score in LLMs significantly.
Evaluated on datasets like HotpotQA and ELI5, it outperforms baseline models and standard correction strategies.
Using a multi-stage chain-of-thought reasoning process helps identify and correct factual inconsistencies effectively.
Despite higher computational costs, this approach enhances the reliability of LLM-generated content.

Abstract

Large Language Models (LLMs) often suffer from factual hallucinations and contextual detachment, significantly limiting their reliability in critical applications. To address these issues, we propose an innovative automated framework, "Context-Grounded Factuality Enhancement in LLM Responses via Multi-Stage Critique and Refinement." Our method leverages the inherent reasoning capabilities of pre-trained LLMs themselves, operating in a zero-shot manner without requiring any fine-tuning. It simulates a "Fact Verifier-Content Reviser" role within the LLM, guiding it through a multi-stage Chain-of-Thought (CoT) reasoning process to systematically identify, classify, and correct factual inconsistencies and ungrounded statements against provided source documents. Evaluated on challenging datasets, HotpotQA and ELI5, our framework significantly outperforms baseline LLMs and existing simple self-correction strategies in terms of Fact Consistency Score (FCS) and Context Grounding Score (CGS). Notably, our CoT-guided prompting strategy consistently yields superior results, achieving state-of-the-art performance with Llama 3 70B. Human evaluations further corroborate the enhanced factual accuracy and contextual grounding, alongside maintained fluency. While involving increased computational cost due to explicit reasoning, our framework demonstrates a robust and effective approach to improving the trustworthiness of LLM-generated content.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper