Autonomous experiment loops are scaling rapidly: Karpathy's AutoResearch established the paradigm; SkyPilot parallelised it; Bilevel Autoresearch meta-optimised search mechanisms; Centaur hybridised LLM and classical search; Sibyl introduced self-evolving harness architecture; and AutoResearchClaw added verifiable reporting, failure-to-information conversion, and human-in-the-loop intervention modes. We identify a recurring pattern: each system improves search, execution, or memory, but none resolves two persistent bottlenecks — research-evaluator legitimacy (whether the metric captures the research objective) and judgment preservation (whether mechanistic context, failed trials, and structural insight survive compression across agent interfaces). We propose a four-part architectural model: search prior, execution evaluator, research evaluator, and judgment-preservation channel. Scaling improves the first two more easily than the latter two. We explain this asymmetry using the Constraint Inheritance Lemma from the representational theory of grounding (Badkur & Dak, 2026b): universally quantified constraints are robust under composition, while existentially quantified constraints are fragile. We propose a benchmark protocol comparing seven interface architectures on three outcome dimensions — metric progress, discovery quality, and judgment preservation — including an evaluator-audit condition that tests whether human review at high-leverage points improves research validity. Companion to: "What Survives Recursive Training: Three Bridges, the Evaluator Regress, and the Path to AGI" (Dak & Badkur, 2026).
Building similarity graph...
Analyzing shared references across papers
Loading...
Prashi Badkur
Indian Institute of Technology Bombay
Mohit Dak
Birla Institute of Technology and Science, Pilani
Columbia University
London Business School
Indian Institute of Technology Bombay
Building similarity graph...
Analyzing shared references across papers
Loading...
Badkur et al. (Thu,) studied this question.
synapsesocial.com/papers/6a1a82b80307b78509434726 — DOI: https://doi.org/10.5281/zenodo.20422120