October 18, 2025Open Access

Optimizing Spatial Scales for Evaluating High-Resolution CO2 Fossil Fuel Emissions: Multi-Source Data and Machine Learning Approach

Key Points

The multiple proxy Extra-Trees model achieved R2 = 0.96, outperforming conventional methods in CO2 emission estimation.
Feature importance analysis revealed nighttime light and heavy industrial density as major contributors to emissions accuracy.
Comprehensive evaluation included seven machine learning algorithms, significantly improving estimation compared to single-proxy models.
The study proposes a transferable framework, addressing drawbacks of traditional top-down methods for emissions mapping.

Abstract

High-resolution CO2 fossil fuel emission data are critical for developing targeted mitigation policies. As a key approach for estimating spatial distributions of CO2 emissions, top–down methods typically rely upon spatial proxies to disaggregate administrative-level emission to finer spatial scales. However, conventional linear regression models may fail to capture complex non-linear relationships between proxies and emissions. Furthermore, methods relying on nighttime light data are mostly inadequate in representing emissions for both industrial and rural zones. To address these limitations, this study developed a multiple proxy framework integrating nighttime light, points of interest (POIs), population, road networks, and impervious surface area data. Seven machine learning algorithms—Extra-Trees, Random Forest, XGBoost, CatBoost, Gradient Boosting Decision Trees, LightGBM, and Support Vector Regression—were comprehensively incorporated to estimate high-resolution CO2 fossil fuel emissions. Comprehensive evaluation revealed that the multiple proxy Extra-Trees model significantly outperformed the single-proxy nighttime light linear regression model at the county scale, achieving R2 = 0.96 (RMSE = 0.52 MtCO2) in cross-validation and R2 = 0.92 (RMSE = 0.54 MtCO2) on the independent test set. Feature importance analysis identified brightness of nighttime light (40.70%) and heavy industrial density (21.11%) as the most critical spatial proxies. The proposed approach also showed strong spatial consistency with the Multi-resolution Emission Inventory for China, exhibiting correlation coefficients of 0.82–0.84. This study demonstrates that integrating local multiple proxy data with machine learning corrects spatial biases inherent in traditional top–down approaches, establishing a transferable framework for high-resolution emissions mapping.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Fang et al. (Sat,) studied this question.

synapsesocial.com/papers/68f35bfc73f0a7d050f47f11 https://doi.org/https://doi.org/10.3390/su17209009

Bookmark

View Full Paper