What question did this study set out to answer?

The central aim is to develop a reinforcement learning algorithm for optimal execution in trading environments.

March 13, 2026Open Access

Reinforcement learning for continuous-time optimal execution: actor–critic algorithm and error analysis

Key Points

The central aim is to develop a reinforcement learning algorithm for optimal execution in trading environments.
Developed an actor-critic reinforcement learning algorithm focused on optimal execution.
Utilized a mean-quadratic variation objective with Shannon entropy.
Introduced a recalibration step alongside standard policy updates.
Performed finite-time error analysis to assess convergence.
Tested the algorithm in various market simulators.
Achieved optimal value function and feedback policy in closed form.
Demonstrated linear convergence under suitable learning rate conditions.
Empirical tests showed advantages over classical and deep-learning approaches.

Abstract

Abstract We propose an actor–critic reinforcement learning (RL) algorithm for the optimal execution problem. We formulate a mean–quadratic variation objective regularised by Shannon entropy under the celebrated Almgren–Chriss model by allowing stochastic policies. We obtain in closed form the optimal value function and the optimal feedback policy, which is Gaussian. We then utilise these analytical results to parametrise our value function and control policy for RL. While standard actor–critic RL algorithms perform policy evaluation update and policy gradient update alternatingly, we introduce a recalibration step in addition to these two updates, which turns out to be critical for convergence. We develop a finite-time error analysis of our algorithm and show that it converges linearly under suitable conditions on the learning rates. We test our algorithm in three different types of market simulators built on the Almgren–Chriss model, historical data of order flow and a stochastic model of limit order books. Empirical results demonstrate the advantages of our algorithm over the classical statistical approach and a deep-learning-based RL algorithm.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Wang et al. (Tue,) studied this question.

synapsesocial.com/papers/69b3abf602a1e69014ccd3ef — DOI: https://doi.org/10.1007/s00780-026-00589-5

Authors

Boyu Wang

Beijing Institute of Technology

Xuefeng Gao

Chinese University of Hong Kong

Lingfei Li

Journals

Finance and Stochastics

Actions

Institutions

Chinese University of Hong Kong

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Reinforcement learning for continuous-time optimal execution: actor–critic algorithm and error analysis

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion