What question did this study set out to answer?

This study investigates the phenomenon of grokking in quantum circuits and its relationship with entanglement dynamics.

May 5, 2026Open Access

Unitarity Enables Grokking: Entanglement Phase Transitions in Quantum Circuits on Modular Arithmetic

Key Points

This study investigates the phenomenon of grokking in quantum circuits and its relationship with entanglement dynamics.
Examined 9-layer parameterized quantum circuits (463 parameters) and a classical transformer (14,304 parameters) under varying weight decay conditions.
Tracked bipartite entanglement entropy (EE) to assess correlation with grokking success.
Conducted multiple trials (20 seeds) to compare grok rates and test accuracies between architectures.
Quantum circuits achieved a 25% grok rate without weight decay compared to 0% for the classical model when weight decay was excluded.
Quantum test accuracy improved to 68.7% without weight decay, while classical accuracy dropped to 42.3% (Mann–Whitney p=0.002).
Grok-then-ungrok phenomenon observed with entanglement entropy overshooting 4.2, resulting in test accuracy dropping from 93.7% to 11.3%.

Abstract

Grokking---delayed generalization long after memorization---has been observed almost exclusively in classical networks trained with weight decay. We show that 9-layer parameterized quantum circuits (PQCs, 463 parameters) grok modular addition (a+b 23) without weight decay, achieving a 25\% grok rate across 20 seeds. A classical transformer (14, 304 parameters) requires weight decay (=1. 0) for 100\% grokking and fails without it. Weight decay has opposite effects on the two architectures: removing it improves quantum test accuracy (68. 7\% vs. \ 42. 3\%; Mann--Whitney p=0. 002) while eliminating classical grokking entirely. Tracking the bipartite entanglement entropy (EE) of the circuit state, we find that successful grokking occurs in a sweet spot (EE 3. 1, 4. 1), while EE overshoot above 4. 2 triggers a ``grok-then-ungrok'' phenomenon---test accuracy rises to 93. 7\% then collapses to 11. 3\% as entanglement saturates. Ablations confirm that entanglement is necessary (product-state circuits reach 0. 2\%) and that PQCs are more parameter-efficient than size-matched classical models. These results connect quantum unitarity to implicit regularization and entanglement dynamics to generalization transitions.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

liang wang (Sun,) studied this question.

synapsesocial.com/papers/69f988e215588823dae17d7d https://doi.org/https://doi.org/10.5281/zenodo.19995436

Bookmark

View Full Paper