What type of study is this?

This is a Quantitative Study study.

October 8, 2025Open Access

Data-Driven Exploration for a Class of Continuous-Time Indefinite Linear--Quadratic Reinforcement Learning Problems

Puntos clave

Adaptive exploration improves learning efficiency while achieving a sublinear regret bound in reinforcement learning.
This approach contrasts with fixed exploration schedules, reducing the need for extensive tuning during training.
Numerical experiments show that adaptive strategies enhance convergence and performance over non-adaptive methods.
The method dynamically adjusts entropy regularization and variability, offering flexibility in learning control problems.

Resumen

We study reinforcement learning (RL) for the same class of continuous-time stochastic linear--quadratic (LQ) control problems as in huang2024sublinear, where volatilities depend on both states and controls while states are scalar-valued and running control rewards are absent. We propose a model-free, data-driven exploration mechanism that adaptively adjusts entropy regularization by the critic and policy variance by the actor. Unlike the constant or deterministic exploration schedules employed in huang2024sublinear, which require extensive tuning for implementations and ignore learning progresses during iterations, our adaptive exploratory approach boosts learning efficiency with minimal tuning. Despite its flexibility, our method achieves a sublinear regret bound that matches the best-known model-free results for this class of LQ problems, which were previously derived only with fixed exploration schedules. Numerical experiments demonstrate that adaptive explorations accelerate convergence and improve regret performance compared to the non-adaptive model-free and model-based counterparts.

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo

Cite This Study

Huang et al. (Tue,) studied this question.

synapsesocial.com/papers/68e6494525bc5bdb98713958 https://doi.org/https://doi.org/10.48550/arxiv.2507.00358

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo