This study examines the performance of multi-label text classification models (i.e., Robustly optimized Bidirectional Encoder Representations from Transformers (BERT) approach (RoBERTa), Decoding-enhanced BERT with disentangled attention (DeBERTa), and Language Understanding with Knowledge-based Embeddings (LUKE)) in an effort to automatically semantically map Global Goals and Targets of the Sustainable Development Goals (SDGs). Trained on a diverse, multisectoral corpus, the models demonstrate high generalizability, especially the LUKE model, which achieved strong recall on the OSDG Community Dataset. Applying this model to a large corpus of scientific articles on sustainable development, we revealed an uneven distribution of scientific attention that favors environmental sustainability over social issues, such as equality and justice. Network analysis identified key nexuses in areas of environmental, social, and economic sustainability. The model’s utility in identifying bridging research areas indicates that sustainable consumption and production for all gender and biomass energy for circular bioeconomy can provide integrated approaches to achieving the SDGs. Future research integrating generative language models is suggested to enhance SDG classifier development.
Miyashita et al. (Tue,) studied this question.