Large Language Models (LLMs) often suffer from factual hallucinations and contextual detachment, significantly limiting their reliability in critical applications. To address these issues, we propose an innovative automated framework, "Context-Grounded Factuality Enhancement in LLM Responses via Multi-Stage Critique and Refinement." Our method leverages the inherent reasoning capabilities of pre-trained LLMs themselves, operating in a zero-shot manner without requiring any fine-tuning. It simulates a "Fact Verifier-Content Reviser" role within the LLM, guiding it through a multi-stage Chain-of-Thought (CoT) reasoning process to systematically identify, classify, and correct factual inconsistencies and ungrounded statements against provided source documents. Evaluated on challenging datasets, HotpotQA and ELI5, our framework significantly outperforms baseline LLMs and existing simple self-correction strategies in terms of Fact Consistency Score (FCS) and Context Grounding Score (CGS). Notably, our CoT-guided prompting strategy consistently yields superior results, achieving state-of-the-art performance with Llama 3 70B. Human evaluations further corroborate the enhanced factual accuracy and contextual grounding, alongside maintained fluency. While involving increased computational cost due to explicit reasoning, our framework demonstrates a robust and effective approach to improving the trustworthiness of LLM-generated content.
Salma Ali (Tue,) studied this question.