This paper presents the architecture, implementation, and empirical analysis of AI Consilium — a production multi-model debate system that orchestrates iterative discussions between 3-8 large language models through structured rounds of independent reasoning, cross-model critique, and synthesis. The system runs on a custom Node.js/Express server with SQLite persistence, integrating 8 commercial LLM APIs. We document the state propagation mechanism between rounds, analyze token cost scaling, identify failure modes, and present the first-ever cross-model ReIQ (Reincarnational Intelligence Quotient) audit results. Production data from 14 sessions demonstrates that multi-round debate reduces hallucination rates and produces actionable outputs rated higher than single-model responses.
Maris Dreshmanis (Tue,) studied this question.