Abstract The proliferation of Social Media (SM) platforms has reshaped the public discourse and opinion formation landscape, offering rich sources of user-generated content for socio-economic and political studies. With vast amounts of digital records for public discussions, studies may include datasets with millions of documents and sources spanning multiple regions. This goes along with the need for data aggregation so that the results can be described and interpreted by human analysts. This paper addresses some challenges of effectively aggregating views expressed in SM discussions. We emphasize the need to carefully consider the appropriate aggregation level to avoid unwanted information loss and enhance interpretability. Leveraging measures from information theory, we focus on assessing and comparing geo-aggregations that capture public sentiment on the contentious issue of migration in Europe. Unlike previous studies, our approach centers on SM arguments, analyzing the aggregation of sources focused on migration rather than target area aggregations by mentions in the media. Through geotagging of posts and user profiles, we gain insights into user sentiment distributions across different zones. Furthermore, we contrast different levels of policy-driven regional aggregations against data-driven clustering techniques that represent information more compactly while preserving its underlying distribution. Our intra-country analysis can provide nuanced insights into local mood homogeneity, enabling decision-makers to tailor interventions to specific communities. This study contributes to a more comprehensive understanding of public perceptions of controversial issues at a regional level (e.g., the EU), emphasizing the importance of accurate and context-aware data aggregation.
Elejalde et al. (Mon,) studied this question.