This study proposes a hierarchical signal-to-policy learning framework for risk-aware portfolio optimization that integrates model-based return forecasting, explainable machine learning, and deep reinforcement learning (DRL) within a unified architecture. In the first stage, next-period returns are estimated using gradient-boosted tree models, and SHAP-based feature attributions are extracted to provide transparent, factor-level explanations of the predictive signals. In the second stage, a Proximal Policy Optimization (PPO) agent incorporates both predictive forecasts and explanatory signals into its state representation and learns dynamic allocation policies under a mean–CVaR reward function that explicitly penalizes tail risk while controlling trading frictions. By separating signal extraction from policy learning, the proposed architecture allows the use of economically interpretable predictive signals to incorporate into the policy’s state representation while preserving the flexibility and adaptability of reinforcement learning. Empirical evaluations on U.S. sector ETFs and Dow Jones Industrial Average constituents show that the hierarchical framework delivers higher and stable out-of-sample risk-adjusted returns relative to both a single-layer DRL agent trained solely on technical indicators, a mean–CVaR optimized portfolio using the same parameters used in the proposed hierarchical model and standard equal weight as well as index-based benchmarks. These results demonstrate that integrating explainable predictive signals with risk-sensitive reinforcement learning improves the robustness and stability of data-driven portfolio strategies.
Yu et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: