What question did this study set out to answer?

The study aims to improve legal reasoning in financial calculations by integrating advanced AI architectures and addressing uncertainties in legal data.

April 15, 2026Open Access

Hybrid QLoRA-RAG Architecture for Saudi End-of-Service Benefits Calculation: Synthetic Data Generation and Uncertainty Quantification for Legal Reasoning

Key Points

The study aims to improve legal reasoning in financial calculations by integrating advanced AI architectures and addressing uncertainties in legal data.
Developed a hybrid architecture combining QLoRA fine-tuning and RAG.
Created a synthetic dataset reflecting real-world legal complexities with 10,000 samples.
Implemented uncertainty quantification using multiple techniques like MC Dropout and temperature scaling.
Conducted evaluations on held-out synthetic test cases stratified by complexity.
Performed human evaluations with legal experts to assess performance.
Achieved 94.2% accuracy and 91.5% legal citation correctness in synthetic evaluations.
Demonstrated performance improvements of 5.8–8.7 percentage points over isolated components.
Provided Expected Calibration Error of 0.043 and 89.4% precision in ambiguous case detection.
Received an overall expert rating of 4.4/5 with unanimous recommendation for pilot deployment.

Abstract

Deploying large language models for high-stakes domain-specific reasoning requires addressing challengesabsent from standard benchmarks: handling incomplete information, quantifying uncertainty, and performing multi-step numerical calculations with authoritative source attribution. We present a hybrid architecture combining parameter-efficient fine-tuning via Quantized Low-Rank Adaptation (QLoRA) with Retrieval-Augmented Generation (RAG), evaluated on Saudi Arabia’s End-of-Service Benefits calculation—a legally binding financial computation involving 16 interacting legal provisions across 35 termination scenarios. Our contributions include: a comprehensive synthetic dataset of 10,000 samples systematically modeling real-world legal consultation complexities—incomplete information (15%), conflicting evidence (10%), legal interpretation ambiguities (5%), and adversarial examples (5%)—grounded in empirical distributions from 47,382 actual cases, 3,847 labor court disputes, and expert interviews (n=23); a hybrid architectural approach demonstrating that combining QLoRA fine-tuning (0.42% trainable parameters, 93.5% memory reduction) with retrieval-augmented generation yields complementary benefits, outperforming isolated components by 5.8–8.7 percentage points;and integrated uncertainty quantification mechanisms combining epistemic (MC Dropout), aleatoric (retrieval confidence, linguistic hedging), and calibration (temperature scaling) methods achieving Expected Calibration Error of 0.043 and 89.4% precision in detecting ambiguous cases requiring human review. Evaluation on 1,000 held-out synthetic test cases—stratified across six complexity tiers—shows 94.2% accuracy (±5% tolerance), 91.5% legal citation correctness, and graceful degradation across complexity tiers (98.7% standard cases → 82.0% adversarial examples). We note that all quantitative evaluation is conducted on synthetic data; real-world deployment validation remains an important next step. Human evaluation by five Saudi legal experts (inter-rater κ = 0.73) yields 4.4/5 overall rating with unanimous recommendation for pilot deployment. While our primary evaluation relies on synthetic data and focuses on a single legal calculation domain, the methodological framework—synthetic modeling of domain ambiguity, architectural patterns for parametric-retrieval integration, and uncertainty-aware human-AI collaboration—provides a transferable template for specialized reasoning tasks requiring numerical precision, source attribution, and confidence calibration. We discuss threats to external validity and outline concrete steps toward real-world validation.

Bookmark

View Full Paper

Bookmark

View Full Paper

Hybrid QLoRA-RAG Architecture for Saudi End-of-Service Benefits Calculation: Synthetic Data Generation and Uncertainty Quantification for Legal Reasoning

Key Points

Abstract

Cite This Study