Medical question answering systems require both factual reliability and computational efficiency for real-world deployment. We address this dual challenge by introducing a resource-constrained framework that integrates a fine-tuned 1.5B-parameter Small Language Model with retrieval-augmented generation and a calibrated fact-checking module specifically optimized for medical domain constraints. Our key contribution lies not merely in component integration, but in demonstrating how internal system metrics, particularly our Factual Consistency Score (FCS) serve as effective Query Performance Prediction (QPP) indicators for answer reliability in complex RAG pipelines, a critical gap in current IR paradigms. Evaluated on PubMedQA and standard medical benchmarks, our approach achieves competitive performance while operating on a single T4 GPU with 7GB memory footprint, requiring 85% fewer resources than comparable 7B-parameter medical LLMs. The framework reduces hallucinations by 37% (measured by FCS) compared to non-fact-checked baselines, though it currently has limitations in handling multi-hop medical reasoning and cross-sentence verification. Our work provides a practical blueprint for developing accessible, trustworthy medical QA systems that balance performance with infrastructure constraints, establishing that internal consistency metrics can effectively predict answer quality where traditional QPP methods fall short. The implementation demonstrates that resource-constrained environments need not sacrifice reliability when thoughtfully designed verification mechanisms are integrated into the retrieval-generation pipeline.
Das et al. (Mon,) studied this question.