This deposit contains a working research draft describing a multi-agent deliberation architecture for long-horizon sequential decision making in large language model–based agents. Using Zork I as a challenging interactive fiction testbed, the work argues that single-pass inference places excessive cognitive burden on a single model call, leading to looping behavior and poor arbitration between competing objectives. The paper proposes an explicit separation between proposal generation and decision selection through specialized mission agents, a dedicated explorer agent, and a distinct arbitration step. The contribution is architectural and methodological rather than performance-driven; results are preliminary and intended to motivate further investigation into long-horizon agent control, exploration–exploitation tradeoffs, and reasoning transparency.
Michael D. Lane (Mon,) studied this question.