This work investigates the effectiveness of 4-bit quantization and parameter-efficient fine-tuning (QLoRA) applied to the Llama-3-8B language model for clinical biomedical reasoning tasks. The study demonstrates that expert-level reasoning performance can be achieved on consumer-grade hardware by carefully balancing quantization precision, adapter configuration, and domain-specific instruction tuning. Results highlight the trade-offs between memory efficiency, accuracy, and deployability, supporting broader access to advanced medical AI systems under strict computational constraints.
Aditya Verma (Mon,) studied this question.