Model-free deep reinforcement learning (DRL) offers a flexible framework for sequential decision-making in finance but faces unique challenges from the stochastic, non-stationary nature of financial markets. We examine the proximal policy optimization (PPO) algorithm for dynamic trading with time-varying alpha and price impact. Using a simulated environment with a closed-form optimal policy, we benchmark PPO’s efficiency and accuracy. We demonstrate how methods of bounding and rescaling its continuous action space significantly impact the training and performance of the DRL agent. We find that clipping the action space yields faster in-sample convergence, and rescaling actions to match the actual range of possible trades is essential for unbiased convergence to the optimal solution. In empirical tests on Dow Jones Industrial Average data with bootstrapped alphas, we show that PPO performance improves when signals are stronger and forecasts span multiple horizons. Our findings highlight the importance of domain-specific adaptations, particularly action space engineering and informative state design, when applying DRL to trading.
Brini et al. (Sat,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: