What question did this study set out to answer?

To develop a hierarchical framework for optimizing portfolios with risk-awareness and explainable models.

March 15, 2026Open Access

A Hierarchical Signal-to-Policy Learning Framework for Risk-Aware Portfolio Optimization

Key Points

To develop a hierarchical framework for optimizing portfolios with risk-awareness and explainable models.
Integrated model-based return forecasting and explainable machine learning
Utilized Proximal Policy Optimization (PPO) for dynamic allocation
Employed mean–CVaR reward function to penalize tail risk
Achieved higher out-of-sample risk-adjusted returns compared to conventional methods
Demonstrated improved robustness and stability in portfolio strategies

Abstract

This study proposes a hierarchical signal-to-policy learning framework for risk-aware portfolio optimization that integrates model-based return forecasting, explainable machine learning, and deep reinforcement learning (DRL) within a unified architecture. In the first stage, next-period returns are estimated using gradient-boosted tree models, and SHAP-based feature attributions are extracted to provide transparent, factor-level explanations of the predictive signals. In the second stage, a Proximal Policy Optimization (PPO) agent incorporates both predictive forecasts and explanatory signals into its state representation and learns dynamic allocation policies under a mean–CVaR reward function that explicitly penalizes tail risk while controlling trading frictions. By separating signal extraction from policy learning, the proposed architecture allows the use of economically interpretable predictive signals to incorporate into the policy’s state representation while preserving the flexibility and adaptability of reinforcement learning. Empirical evaluations on U.S. sector ETFs and Dow Jones Industrial Average constituents show that the hierarchical framework delivers higher and stable out-of-sample risk-adjusted returns relative to both a single-layer DRL agent trained solely on technical indicators, a mean–CVaR optimized portfolio using the same parameters used in the proposed hierarchical model and standard equal weight as well as index-based benchmarks. These results demonstrate that integrating explainable predictive signals with risk-sensitive reinforcement learning improves the robustness and stability of data-driven portfolio strategies.

A Hierarchical Signal-to-Policy Learning Framework for Risk-Aware Portfolio Optimization

Key Points

Abstract

Cite This Study

Also Consider

Also Consider