What question did this study set out to answer?

The aim is to develop a reinforcement learning framework for assessing enterprise pollution discharge policies (EPDP) while considering dynamic interactions.

March 5, 2026Open Access

A framework of DQN/PPO coupling reinforcement learning for dynamic environmental policy decision assessment via multi-agent government-enterprise interaction

Key Points

The aim is to develop a reinforcement learning framework for assessing enterprise pollution discharge policies (EPDP) while considering dynamic interactions.
Developed a DQN/PPO coupling framework for policy assessment.
Constructed a government-enterprise-environment interaction model.
Designed a multi-objective reward function to quantify EPDP assessments.
Verified model with high-frequency monitoring and water quality data.
Achieved a 12.2% boost in emission reduction and a 22.1% cost reduction through the policy mix.
Reduced policy effect decay from 3% to 1.2% using a dynamic reward-weighting mechanism.
Showed superior performance compared to traditional methods in predictive accuracy and responsiveness.

Abstract

• A dynamic RL framework with DQN/PPO coupling is proposed for policy assessment • Enterprise heterogeneity is quantified within the assessment framework • Policy mix shows synergy, boosting emission reduction by 12.2% and cutting cost by 22.1% • A dynamic reward-weighting mechanism reduces policy effect decay from 3% to 1.2% • Government-enterprise interaction framework captures the core dynamics of environmental regulation The enterprise pollution discharge policies (EPDP) assessment is an important prerequisite for promoting the refinement of the environmental governance regime. The traditional assessment methods which base on static assumptions and linear models are inefficiency for policy implementation. This study develops a reinforcement learning assessment framework by coupling deep Q-network (DQN) with proximal policy optimization (PPO) algorithms for EPDP assessment. A government-enterprise-environment (GEE) interaction model is constructed to integrate the enterprise characteristics, discharging states, policy instruments, and environmental feedback. In the GEE interaction model, a multi-objective reward function is designed for the quantification of EPDP assessment by balancing the environmental, economic, and social benefits. The availability model is verified by comparing with high-frequency monitoring discharged data, watershed water quality, and policy enforcement records. Compared with traditional methods, the proposed framework shows superior performance in both predictive accuracy and dynamic responsiveness. It can characterize the scale-industry differentiated impact of firm heterogeneity on policy effectiveness. The policy mix integrating “constraint-incentive-support” dimensions demonstrate complementary coupling effects. This multi-dimensional approach enables to enhance the efficiency of discharge reduction as well as to alleviate compliance costs for enterprises. The efficacy of the clipping mechanism within PPO algorithm is further validated by GEE interaction model, which confirms its role in enhancing policy adaptability and robustness. This dynamic adjustment mechanism enables adaptive and robust decision-making for enterprises, allowing them to withstand the economic fluctuations while maintaining systemic stability under extreme scenarios. The developed framework in this study outlines an actionable path for green transition for enterprises from micro-level decision-making optimization to macro-level governance. And the methods and results are expected to offer differentiated regulatory and dynamically optimized policy guidance for governments on advancing environmental governance modernization.

A framework of DQN/PPO coupling reinforcement learning for dynamic environmental policy decision assessment via multi-agent government-enterprise interaction

Key Points

Abstract

Cite This Study