Pre-registered methodology for a within-subject paired observational audit of quantization-induced inconsistency in three sub-4B clinical small language models (Qwen-MediCare-BD 3. 1B, Gemma-3-4B-it, Phi-4-mini-instruct 3. 8B) across four GGUF K-quant levels (Q6K, Q5KM, Q4KM, Q3KM) on 100 stratified ACI-Bench encounters with 20 seeds per condition (24, 000 generations, 228, 000 pairwise metric evaluations). Reference-free metric pipeline: BERTScore-F1, ROUGE-L, MEDCON-F1, numerical Jaccard with paired extraction-volume control, and bidirectional NLI contradiction (three aggregations). Statistical analysis: Friedman omnibus, Wilcoxon-Pratt pairwise, BCa bootstrap with tie-corrected z0, Holm-Bonferroni correction, simulation-based power analysis (MDE δ ≥ 0. 246 at 80% power). Three-judge LLM convergent validity panel (Claude Opus 4. 7, Gemini 3. 1 Pro, DeepSeek Expert Mode) on two parallel tracks. Dual-criterion deployment-readiness matrix
Building similarity graph...
Analyzing shared references across papers
Loading...
Md. Hasibul Islam Shanto
Abul Bashar Saurov
Anupom Bhowmik
American International University-Bangladesh
Building similarity graph...
Analyzing shared references across papers
Loading...
Shanto et al. (Tue,) studied this question.
www.synapsesocial.com/papers/6a17dd923fad632b0f9da3bb — DOI: https://doi.org/10.5281/zenodo.20389426