Point data, such as population, disease incidence, and greenhouse gas emissions, are commonly aggregated to a uniform grid of raster data for storage and representation. In many remote sensing applications, polygons are instead used to describe regions of interest (e.g., countries and cities) which form the spatial basis for analysis. The values associated with these polygons are estimated by aggregating the underlying gridded raster data within the boundary of the polygon. The conventional approach to this aggregation relies on determining if the grid cell centroid lies within the polygon, which has accuracy limitations with potentially severe consequences. In this work, we quantify the consequence of sub-optimally aggregating gridded raster data to polygons, demonstrating that the use of the centroid alone is rarely the most accurate. Across real-world population, greenhouse gas emissions, and snowfall datasets, we further demonstrate that these aggregation-method differences emerge systematically across commonly used geographic boundaries, particularly for coarse-resolution raster datasets aggregated to county- and city-scale polygons. We compare centroid aggregation with proportional aggregation and bilinear interpolation using 2× and 10× upsampling. The centroid method is consistently sub-optimal, generally underperforming alternative aggregation methods when the polygon area is larger than the grid cell area. Worse, the centroid method may exhibit up to 100× the error of the other aggregation methods when polygon area is no larger than grid cell area. Our findings suggest that centroid aggregation is often sub-optimal relative to alternative approaches, particularly in low-PGR settings.
Markakis et al. (Mon,) studied this question.