What question did this study set out to answer?

This work aims to present new reinforcement learning architectures that utilize quantum wave dynamics for better policy generation and stability.

May 19, 2026Open Access

Gradient Descent Through Quantum Wave Dynamics: WDDS-v3 and WDDS-v3.1 for Physics-Native Reinforcement Learning

Puntos clave

This work aims to present new reinforcement learning architectures that utilize quantum wave dynamics for better policy generation and stability.
Developed WDDS-v3 and WDDS-v3.1 architectures.
Implemented learnable potential fields and FFT-based wave evolution for gradient propagation.
Incorporated stabilization mechanisms like action smoothing and reward normalization in WDDS-v3.1.
On HalfCheetah-v4, WDDS-v3.1 achieved a mean return of -11.46 compared to -48.76 for WDDS-v3 and -60.40 for a random baseline.
WDDS-v3 reached a best episode score of 259.7 on SignalNav-v0, demonstrating effective policy discovery.
WDDS frameworks showed strong learning gains and improved stability across various benchmarks.

Resumen

This work presents WDDS-v3 and WDDS-v3.1, physics-native reinforcement-learning architectures in which policy-relevant representations are generated through differentiable Schrödinger-type wave dynamics. The framework uses a learnable potential field and FFT-based wave evolution, allowing gradients to propagate through wave-field computation while preserving the wave-dynamical substrate as the core processing mechanism. WDDS-v3.1 adds stabilization mechanisms for continuous control, including action smoothing, reward normalization, checkpoint restoration, behavioral-cloning replay from high-return trajectories, and exploration-noise decay. Across discrete control, signal navigation, and continuous-control benchmarks, WDDS demonstrates strong learning gains, high best-case discovery, and improved continuous-control stability. On HalfCheetah-v4, WDDS-v3.1 achieves a mean return of -11.46 over 3 seeds and 100 episodes, compared with -48.76 for WDDS-v3-Phase3 and -60.40 for the random baseline. On SignalNav-v0, WDDS-v3 reaches a best episode score of 259.7, indicating strong best-case wave-field policy discovery. These results support WDDS as a promising wave-based adaptive computation framework for physics-inspired reinforcement learning.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo