What question did this study set out to answer?

This campaign aims to show that smaller, constrained neural architectures can outperform larger models in task accuracy and interpretability.

May 4, 2026Open Access

Project GlassBox: Structure Over Scale in Neural Reasoning — A 101-Phase Campaign on Architectural Transparency, Antifragile Adaptation, and the AGI Horizon

Key Points

This campaign aims to show that smaller, constrained neural architectures can outperform larger models in task accuracy and interpretability.
Conducted a 101-phase experimental campaign using ARC-AGI for benchmarking abstract reasoning.
Implemented a 77K-parameter graph neural network and a 1.45M-parameter transformer for comparison.
Employed techniques like Monte Carlo Tree Search, latent dynamics, and self-prediction to enhance performance.
The GlassBox Agent achieved 88.9% accuracy with structured parameters compared to 43.9% accuracy of the larger baseline model.
Monte Carlo Tree Search reached a peak accuracy of 91.95% using only 8 rollouts.
The AI demonstrated a 99.2% success prediction ability from its internal states, highlighting its self-awareness.

Abstract

Project GlassBox is a systematic 101-phase experimental campaign demonstrating that small, structurally constrained neural architectures can simultaneously achieve superior task performance and unprecedented interpretability compared to large unconstrained models. Using ARC-AGI as a benchmark for abstract visual reasoning, a 77K-parameter Graph Neural Network with Pointer attention (the "GlassBox Agent") outperforms a 1.45M-parameter Transformer baseline (56.8% vs 43.9% full match accuracy). Through test-time gradient adaptation with geometric data augmentation, accuracy reaches 87.4%, and the Ultimate Configuration — L2 ablation at 20%, adaptation LR of 0.1, and Model Soup inference (K=5) — achieves 88.9% accuracy with 2.0% standard deviation across 3 seeds. In v4, I report a 20-phase extension (82–101) pushing GlassBox to its frontier: (1) Monte Carlo Tree Search with meta-initialization achieves 91.95% — the campaign's peak accuracy with just 8 rollouts (P87); (2) MuZero-style Latent Dynamics replaces real model execution with a learned latent simulator, achieving 117× speedup while maintaining 88.5% accuracy (P94); (3) Latent Self-Prediction reveals that the AI can predict its own success from internal states with 99.2% accuracy (P97); (4) the Latent Verifier operationalizes self-prediction as an inference-time candidate selector, achieving 89.7% — outperforming hand-crafted demo loss heuristics (P100); and (5) Unified V-MCTS (P101) integrates continuous dynamics with the latent verifier, achieving equivalent accuracy at half the computation cost. What's new in v4: Chapter XIV — Test-Time Compute Frontier (P82–90): Dynamic pondering, 10-step TTT sufficiency via Reptile, Meta-MCTS peak of 91.95%, and PRM-guided scaling laws. Chapter XV — The AlphaZero Paradigm (P91–97): Expert iteration, macro-action discovery, MuZero latent dynamics (117× speedup), and 99.2% self-prediction probes. Chapter XVI — The Latent Liberation (P98–101): Continuous action embeddings, Latent Verifier (89.7%), and unified Verifier-Guided MCTS. Updated figures: 101-phase journey, v4 waterfall recipe, v4 breakthrough map, plus 10 new experiment figures. Key Results: Structure > Scale: 77K structured parameters outperform 1.45M unstructured parameters (19× smaller, higher accuracy) MCTS Peak: 91.95% via Meta-MCTS with 8 rollouts — the campaign's highest accuracy MuZero Speedup: 117× faster inference via learned latent dynamics (55s vs 6449s) AI Self-Knowledge: 99.2% success prediction from internal states — the AI "knows when it knows" Latent Verifier: Self-prediction outperforms hand-crafted heuristics (+3.5pp) 82.8% Attribution: Full causal path tracing for 82.8% of predictions (3.3× LLM state-of-the-art) Hydra Self-Repair: 95.8% recovery after 50% neuron destruction Variance Regularization: 4.3× variance reduction via gradient-based ablation Source code: https://github.com/hafufu-stack/glassbox Acknowledgments This research was conducted entirely independently, without institutional affiliation or corporate funding. The author currently faces financial constraints that make it increasingly difficult to maintain subscriptions to AI services essential for this line of research. To sustain and improve the quality of future work, the author is actively seeking community sponsorship. Details are available at https://github.com/sponsors/hafufu-stack.

Read Full Paperexternally

AI에게 질문

Bookmark

View Full Paper