• We build a contextualized Twitter-based language model to construct poverty metrics. • Twitter discourse predicts village-level poverty. • Less affluent communities focus more on concrete, local development concerns. • Kriging interpolation improves sparse social media data with uncertainty estimates. • We propose a novel data pipeline for poverty assessment which utilizes citizen participation. We present a novel pipeline for poverty assessment using social media and apply it to a dataset of 1.2 million geotagged tweets from Zambia. Leveraging mixed-methods topic modeling with domain-guided feature selection, we develop an interpretable language model that explains more than 60 % of the variation in village-level wealth. Our findings show that the tweets from poorer villages emphasize local, concrete needs, whereas those from wealthier villages focus on abstract development concepts. We also compare imputation methods for data-sparse contexts and find that kriging improves predictive accuracy by 15 % over standard approaches, while providing uncertainty quantification for adaptive sampling. This work demonstrates the viability of social media discourse as a participatory, scalable poverty monitoring tool in regions with limited data.
Jung et al. (Fri,) studied this question.