What question did this study set out to answer?

To demonstrate that small neural architectures achieve better task performance and interpretability than larger models.

May 7, 2026Open Access

Project GlassBox: Structure Over Scale in Neural Reasoning — A 123-Phase Campaign on Architectural Transparency, Cross-Domain Transfer, and the AGI Horizon

Key Points

To demonstrate that small neural architectures achieve better task performance and interpretability than larger models.
Conducted a 123-phase experimental campaign
Utilized Graph Neural Network and Transformer benchmarks for abstract visual reasoning
Implemented gradient adaptation and geometric data augmentation for improved accuracy
Graph Neural Network achieved 88.9% accuracy, outperforming the Transformer baseline
AI demonstrated 99.2% accuracy in self-prediction on internal states
Symbolic regression discovered significant mathematical concepts from neural reasoning

Abstract

Project GlassBox is a systematic 123-phase experimental campaign demonstrating that small, structurally constrained neural architectures can simultaneously achieve superior task performance and unprecedented interpretability compared to large unconstrained models. Using ARC-AGI as a benchmark for abstract visual reasoning, a 77K-parameter Graph Neural Network with Pointer attention (the "GlassBox Agent") outperforms a 1.45M-parameter Transformer baseline (56.8% vs 43.9%). Through test-time gradient adaptation with geometric data augmentation, accuracy reaches 87.4%, and the Ultimate Configuration — L2 ablation at 20%, adaptation LR of 0.1, and Model Soup inference (K=5) — achieves 88.9% accuracy with 2.0% standard deviation across 3 seeds. In v4, (1) Monte Carlo Tree Search with meta-initialization achieves 91.95% — the campaign's peak accuracy with just 8 rollouts (P87); (2) MuZero-style Latent Dynamics replaces real model execution with a learned latent simulator, achieving 117× speedup while maintaining 88.5% accuracy (P94); (3) Latent Self-Prediction reveals that the AI can predict its own success from internal states with 99.2% accuracy (P97); (4) the Latent Verifier operationalizes self-prediction as an inference-time candidate selector, achieving 89.7% — outperforming hand-crafted demo loss heuristics (P100); and (5) Unified V-MCTS (P101) integrates continuous dynamics with the latent verifier, achieving equivalent accuracy at half the computation cost. What's new in v5: Chapter XVII — The Kaggle Gauntlet (P102–108): Real-world deployment achieves 85.1% within competition time budgets. Latent Decompilation reverse-compiles 87.2% of neural reasoning trajectories into human-readable symbolic programs. Chapter XVIII — The Universal Foundation (P109–111): A frozen ARC backbone solves Sokoban puzzles at 94.0% with only 1K trainable parameters — proving universal physical law internalization. Chapter XIX — The Singularity Engine (P112–117): 150× parameter expansion yields exactly 0% accuracy gain (P112), definitively proving structure over scale. Three LLM-proposed architecture mutations all lose to the original 36K design (P114). Symbolic regression discovers Kepler's a1.5 exponent from the frozen backbone (P115). Theory of Mind achieves 100% prediction on deterministic strategies (P116). Chapter XX — The Epilogue (P118–123): Sensory-motor adapter enables embodied robotics (0%→10%), Latent Atlas visualizes the concept universe, DeepDream reveals what the AI "dreams," and Latent Sonification generates a 51-second symphony from thought trajectories. Updated figures: 123-phase journey, v5 waterfall recipe, v5 breakthrough map, and definitive structure-vs-scale evidence. Key Results: Structure > Scale: 77K structured parameters outperform 1.45M unstructured (19× smaller, higher accuracy); 150× scaling yields 0% gain MCTS Peak: 91.95% via Meta-MCTS with 8 rollouts Universal Transfer: Frozen ARC backbone solves Sokoban at 94.0% MuZero Speedup: 117× faster inference via learned latent dynamics AI Self-Knowledge: 99.2% success prediction from internal states 87.2% Decompilation: Neural reasoning reverse-compiled to symbolic programs 82.8% Attribution: Full causal path tracing (3.3× LLM state-of-the-art) Global Optimum: All LLM-proposed architecture mutations lose to original design Source code: https://github.com/hafufu-stack/glassbox Acknowledgments This research was conducted entirely independently, without institutional affiliation or corporate funding. The author currently faces financial constraints that make it increasingly difficult to maintain subscriptions to AI services essential for this line of research. To sustain and improve the quality of future work, the author is actively seeking community sponsorship. Details are available at https://github.com/sponsors/hafufu-stack.

Read Full Paperexternally

AI에게 질문

Bookmark

View Full Paper