What question did this study set out to answer?

The aim is to improve medical question answering by enhancing factual reliability and efficiency using a resource-constrained framework.

February 19, 2026

Augmenting Small Language Model for Better Medical Question Answering through Source Authentication

Key Points

The aim is to improve medical question answering by enhancing factual reliability and efficiency using a resource-constrained framework.
Integration of a fine-tuned 1.5B-parameter Small Language Model with retrieval-augmented generation and fact-checking.
Evaluation on PubMedQA and other medical benchmarks.
Development of internal metrics like Factual Consistency Score to predict answer quality.
Achieved competitive performance with 85% fewer resources than 7B-parameter models.
Reduced hallucinations by 37% compared to non-fact-checked models.
Demonstrated the effectiveness of internal metrics for query performance prediction.

Abstract

Medical question answering systems require both factual reliability and computational efficiency for real-world deployment. We address this dual challenge by introducing a resource-constrained framework that integrates a fine-tuned 1.5B-parameter Small Language Model with retrieval-augmented generation and a calibrated fact-checking module specifically optimized for medical domain constraints. Our key contribution lies not merely in component integration, but in demonstrating how internal system metrics, particularly our Factual Consistency Score (FCS) serve as effective Query Performance Prediction (QPP) indicators for answer reliability in complex RAG pipelines, a critical gap in current IR paradigms. Evaluated on PubMedQA and standard medical benchmarks, our approach achieves competitive performance while operating on a single T4 GPU with 7GB memory footprint, requiring 85% fewer resources than comparable 7B-parameter medical LLMs. The framework reduces hallucinations by 37% (measured by FCS) compared to non-fact-checked baselines, though it currently has limitations in handling multi-hop medical reasoning and cross-sentence verification. Our work provides a practical blueprint for developing accessible, trustworthy medical QA systems that balance performance with infrastructure constraints, establishing that internal consistency metrics can effectively predict answer quality where traditional QPP methods fall short. The implementation demonstrates that resource-constrained environments need not sacrifice reliability when thoughtfully designed verification mechanisms are integrated into the retrieval-generation pipeline.

Ask AI

Helpful

Bookmark