Offline reinforcement showing enormous potential in medical decision support systems. It does not rely on real-time interaction but can directly leverage historical medical data to optimize strategies. This paper focuses on a critical issue in this field, the estimation of counterfactual rewards, while reviewing the main technical approaches based on causal graph modeling in recent years. The paper systematically examines three mainstream methods. The first category involves causal correction mechanisms for unobserved confounding factors, which helps to enhance the robustness of policy evaluation. The second category utilizes structural causal models to improve both the model's reasoning capabilities and generalizability by generating counterfactual samples. The third introduces policy optimization frameworks with accountability and safety constraints, increasing the explainability and trustworthiness of medical AI in clinical environments. Additionally, the paper analyzes the limitations of current methods in terms of modeling assumptions, data integrity, and policy transfer and discusses future research prospects in causal modeling, privacy protection, and multimodal data integration.
Deng et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: