What question did this study set out to answer?

This research aims to improve how BDI agents calibrate their confidence in LLM sensor outputs for different tenants.

May 5, 2026Open Access

EMRE: Epistemic Confidence Calibration in BDI Agents via RLHF and Mixture of Experts

Key Points

This research aims to improve how BDI agents calibrate their confidence in LLM sensor outputs for different tenants.
Developed the EMRE architecture which combines k-NN experts with adaptive gating.
Trained gating network using Reinforcement Learning from Human Feedback (RLHF).
Validated the system using 31 legal documents and monitored confidence thresholds.
EMRE achieved a calibration threshold of τ=0.903 at n=31, improving confidence scoring accuracy.
Learning Boundary Theorem proved that all five HADD invariants were preserved.
Production validation confirmed the adaptive gating mechanism's effectiveness with zero classification errors.

Abstract

This paper presents EMRE (Epistemic Mixture of Reinforced Experts), an epistemic calibration component for BDI agents that learns when to trust the LLM sensor for a specific tenant. The problem: LLMs deployed as typed sensors in BDI architectures assign confidence scores agnostic to the tenant's document distribution. A new tenant and one with 500 months of history both receive confidence=0. 95 — structurally incorrect. EMRE combines a Mixture of Experts architecture (per-type k-NN experts over OpenAI text-embedding-3-large, 3072 dimensions) with a per-tenant gating network trained via RLHF. Reward signals derive from EVR Gate outcomes (automatic) and human reviewer verdicts (RLHF) — no labeled dataset required. Main contributions: 1. EMRE architecture: k-NN experts with adaptive per-tenant gating updated by bandit gradient ascent. 2. Learning Boundary Theorem: formal proof that EMRE training preserves all five HADD invariants unconditionally. The BDI engine, HTN planner, and Tribunal remain deterministic regardless of how many documents EMRE has processed. 3. Adaptive escalation threshold τ (n): automatically calibrates human oversight as tenant history accumulates. From τ=0. 95 at n=0 to τ=0. 80 at n≥100. 4. Production validation: 31 legal documents (actas de disconformidad), mode transition coldₛtart→knn at n=10, τ=0. 903 at n=31, ECR=0. 0, embeddings verified at 3072 dims in PostgreSQL. To our knowledge, this is the first application of RLHF with a MoE architecture to the epistemic boundary of a formally verified deterministic BDI system, positioning MINERVA as a Kautz Type-2 neurosymbolic agent that is simultaneously adaptive and certifiable under EU AI Act Article 9. Implemented and validated in production as part of the MINERVA HADD architecture.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper