• A dynamic RL framework with DQN/PPO coupling is proposed for policy assessment • Enterprise heterogeneity is quantified within the assessment framework • Policy mix shows synergy, boosting emission reduction by 12.2% and cutting cost by 22.1% • A dynamic reward-weighting mechanism reduces policy effect decay from 3% to 1.2% • Government-enterprise interaction framework captures the core dynamics of environmental regulation The enterprise pollution discharge policies (EPDP) assessment is an important prerequisite for promoting the refinement of the environmental governance regime. The traditional assessment methods which base on static assumptions and linear models are inefficiency for policy implementation. This study develops a reinforcement learning assessment framework by coupling deep Q-network (DQN) with proximal policy optimization (PPO) algorithms for EPDP assessment. A government-enterprise-environment (GEE) interaction model is constructed to integrate the enterprise characteristics, discharging states, policy instruments, and environmental feedback. In the GEE interaction model, a multi-objective reward function is designed for the quantification of EPDP assessment by balancing the environmental, economic, and social benefits. The availability model is verified by comparing with high-frequency monitoring discharged data, watershed water quality, and policy enforcement records. Compared with traditional methods, the proposed framework shows superior performance in both predictive accuracy and dynamic responsiveness. It can characterize the scale-industry differentiated impact of firm heterogeneity on policy effectiveness. The policy mix integrating “constraint-incentive-support” dimensions demonstrate complementary coupling effects. This multi-dimensional approach enables to enhance the efficiency of discharge reduction as well as to alleviate compliance costs for enterprises. The efficacy of the clipping mechanism within PPO algorithm is further validated by GEE interaction model, which confirms its role in enhancing policy adaptability and robustness. This dynamic adjustment mechanism enables adaptive and robust decision-making for enterprises, allowing them to withstand the economic fluctuations while maintaining systemic stability under extreme scenarios. The developed framework in this study outlines an actionable path for green transition for enterprises from micro-level decision-making optimization to macro-level governance. And the methods and results are expected to offer differentiated regulatory and dynamically optimized policy guidance for governments on advancing environmental governance modernization.
Feng et al. (Sun,) studied this question.