This paper presents EMRE (Epistemic Mixture of Reinforced Experts), an epistemic calibration component for BDI agents that learns when to trust the LLM sensor for a specific tenant. The problem: LLMs deployed as typed sensors in BDI architectures assign confidence scores agnostic to the tenant's document distribution. A new tenant and one with 500 months of history both receive confidence=0. 95 — structurally incorrect. EMRE combines a Mixture of Experts architecture (per-type k-NN experts over OpenAI text-embedding-3-large, 3072 dimensions) with a per-tenant gating network trained via RLHF. Reward signals derive from EVR Gate outcomes (automatic) and human reviewer verdicts (RLHF) — no labeled dataset required. Main contributions: 1. EMRE architecture: k-NN experts with adaptive per-tenant gating updated by bandit gradient ascent. 2. Learning Boundary Theorem: formal proof that EMRE training preserves all five HADD invariants unconditionally. The BDI engine, HTN planner, and Tribunal remain deterministic regardless of how many documents EMRE has processed. 3. Adaptive escalation threshold τ (n): automatically calibrates human oversight as tenant history accumulates. From τ=0. 95 at n=0 to τ=0. 80 at n≥100. 4. Production validation: 31 legal documents (actas de disconformidad), mode transition coldₛtart→knn at n=10, τ=0. 903 at n=31, ECR=0. 0, embeddings verified at 3072 dims in PostgreSQL. To our knowledge, this is the first application of RLHF with a MoE architecture to the epistemic boundary of a formally verified deterministic BDI system, positioning MINERVA as a Kautz Type-2 neurosymbolic agent that is simultaneously adaptive and certifiable under EU AI Act Article 9. Implemented and validated in production as part of the MINERVA HADD architecture.
Jaime et al. (Sun,) studied this question.