We introduce Thermodynamic Natural Gradient Descent (NGD-T), an optimizer that enforces a physical speed-cost constraint by combining Fisher-preconditioned updates with a dissipation-aware step-size regulator. While natural gradient methods are known to follow the steepest descent direction in information geometry, we provide a thermodynamic reinterpretation: Natural Gradient Flow uniquely minimizes instantaneous irreversible dissipation for a fixed loss decrease. NGD-T implements this principle in discrete updates by (i) preconditioning gradients with an approximate inverse Fisher, (ii) computing the geometric norm \: {\: }₅=\: L^{\: }F^-1\: L, and (iii) mapping a user-specified dissipation budget \: Q₁ₔ₃₆₄ₓ to a step size \: \: ₓ that saturates the speed-cost bound. We present numerically stable constructions for rank-deficient Fisher estimates, a hybrid nullspace fallback, and scalable K-FAC integration with eigendecomposition caching. On CIFAR-10, ImageNet, and transformer architectures, NGD-T matches or exceeds Adam in convergence while substantially reducing predicted irreversible dissipation and maintaining comparable wall-clock time. NGD-T provides a principled, tunable trade-off between learning speed and thermodynamic cost with theoretical convergence guarantees.
Jie You (Mon,) studied this question.