What question did this study set out to answer?

The study aims to investigate the application of stochastic resonance for improving multi-task generalization in large language models (LLMs) without reliance on oracles.

April 11, 2026Open Access

SNN-Synthesis v5: Oracle-Free LLM Self-Evolution—From 16% to 100% via Stochastic Resonance ExIt

Key Points

The study aims to investigate the application of stochastic resonance for improving multi-task generalization in large language models (LLMs) without reliance on oracles.
Extended the stochastic resonance investigation from architecture-invariant validation to multi-task generalization.
Implemented Noisy Beam Search (NBS) and QLoRA fine-tuning on Mistral-7B.
Conducted multiple phases of experiments (31-32b) focusing on noise optimization and iterative learning.
Achieved 89.5% accuracy in GSM8K multi-task reasoning, up from a 53% baseline.
LLM-ExIt method resulted in a 100% solve rate in three iterations on Modified Hanoi, demonstrating effective self-evolution.
Optimal noise level σ* for tasks was found to be inversely related to task complexity.

Abstract

SNN-Synthesis v5 extends the investigation of stochastic resonance in neural networks from architecture-invariant validation (v4) to multi-task generalization and Oracle-free LLM self-evolution. v5 Headline Results GSM8K Multi-Task NBS (Phase 31/31b): Noisy Beam Search on Mistral-7B with GSM8K math reasoning achieves 89. 5% accuracy at K=11 (from 53% baseline, +36. 5pp). The optimal noise is σ*=0. 01—an order of magnitude smaller than Hanoi (σ*=0. 15) —revealing that σ* scales inversely with task complexity. LLM-ExIt (Phase 32b): Combining NBS miracle collection with QLoRA fine-tuning, Mistral-7B achieves 100% solve rate in 3 iterations (16% → 94% → 98% → 100%) on Modified Hanoi—without any Oracle, reward shaping, or human demonstrations. This completes the pipeline from CNN ExIt (v3) to full Oracle-free LLM self-evolution. Three New Experiments (Phases 31–32b) (I) GSM8K NBS with σ=0. 15 (Phase 31): The Hanoi-optimal σ destroys math reasoning (53% → 7% at K=1). K scaling partially recovers (34% at K=11), but remains below baseline—proving σ* is task-dependent. (II) GSM8K σ Optimization (Phase 31b): Testing σ ∈ 0. 01, 0. 03, 0. 05 reveals a broad optimal band at small σ. σ=0. 01 with K=11 achieves 89. 5% (+36. 5pp). All small σ values yield similar K=11 performance (87. 5–89. 5%). (III) LLM-ExIt (Phase 32b): The culminating experiment. NBS collects miracle trajectories (K=11, σ=0. 15), QLoRA distills them into the LLM. Three iterations: 16% → 94% → 98% → 100%. LLM-ExIt converges faster than CNN ExIt (3 vs. 5 iterations) due to the LLM's richer representational capacity. v4 Foundations (Phases 27–30) (IV) LLM Noisy Beam Search (Phase 29): K=11 achieves 100% solve rate on Mistral-7B Modified Hanoi (from 16% baseline). Architecture-invariant across 63K CNN → 7B LLM. (V–VII) Frame Stacking learnability threshold, Extended Two-Condition Map (7 ARC-AGI-3 games), Dynamic K scheduling (fixed K optimal). v3 Foundations (Phases 1–26) (I) Noisy Beam Search: 78% clear rate on ARC-AGI-3. (II) SNN-ExIt: 99% clear rate from zero knowledge. (III) Two-Condition Theory: Activation energy + state-action learnability. Key Findings (v5) NBS generalizes to math reasoning: GSM8K 53% → 89. 5% at K=11 with σ=0. 01 σ* is task-dependent: Constrained puzzles (σ*=0. 15–0. 20) vs. open-ended math (σ*=0. 01) LLM-ExIt achieves 100%: Oracle-free self-evolution in 3 iterations ExIt transfers from CNN to LLM: Only the policy representation changes (MLP → QLoRA) Scale amplification: +76pp gain on 7B LLM vs. +66pp on 63K CNN Static noise remains optimal: All dynamic strategies fail (Phases 8ext, 30) ExIt is self-healing: Fewer seed miracles can paradoxically improve performance 33 experiments spanning 63K–7B parameters, CNNs to Transformers, 8 task domains, and 7 ARC-AGI-3 interactive environments. Code and data: https: //github. com/hafufu-stack/SNN-Synthesis Acknowledgments This research was conducted entirely independently, without institutional affiliation or corporate funding. The author currently faces financial constraints that make it increasingly difficult to maintain subscriptions to AI services essential for this line of research. To sustain and improve the quality of future work, the author is actively seeking community sponsorship. Details are available at https: //github. com/sponsors/hafufu-stack.

Read Full Paperexternally

AIに質問

Bookmark

View Full Paper