The second rule of Descartes’ Method calls for dividing every difficulty into as many parts as possible for its better resolution. Frontier language models, for all the scale of their training, do not spontaneously do this decomposition work when faced with a complex analytical question. They write directly. Our hypothesis is that a mid-tier model can play the role of a preliminary analytical draft (framing the question, identifying its axes, naming its tensions, anticipating its failure modes), provided the draft is sufficiently deepened before it is passed on. We test this with a pipeline in which six instances of a Qwen 3.5 9B each produce a plan corresponding to a specific perspective, and in which each plan critiques and rewrites itself until it has nothing left to add. A Gemini Flash 3 then consumes these plans, after forming its own analysis, to write the answer. On 25 pre-registered analytical questions from Arena-Hard, judged blind by three frontier judges from distinct families, assisted answers win at 74–76 % on thematic coverage, with no loss of accuracy and 8 % less length. The gain flows entirely through refinement: a first- pass plan actively degrades the domain substance of the answer, and neutral padding degrades its coverage. A planner-size ablation (9B, 4B, 2B, 0.8B) shows two things. First, the expected result: the coverage help decreases with size. Then, a paradoxical effect on accuracy: the 2B significantly degrades it, whereas the 0.8B, weaker still, does not. The 0.8B is too visibly bad at this task to be taken seriously. Its plans degenerate, and the executor filters them out. The 2B is just convincing enough to be integrated without suspicion, and too unreliable to be integrated without damage.
Henry Sauphar (Wed,) studied this question.