Multilingual learning is key in natural language processing, but is challenged by the transfer–interference trade-off, where positive transfer benefits certain languages, while negative interference affects others. Prior methods, including linguistic-based and embedding-based language clustering, have attempted to address this; yet, they remain constrained by their static design and lack of task-specific feedback. In this study, we propose a novel computational strategy inspired by molecular design that constructs molecules with targeted properties. Languages are modeled as nodes in an undirected graph, with edges representing the transfer strength. This language molecule is optimized via Reinforcement Learning to adjust edge connections and weights to enhance positive transfer and minimize interference, where graph clustering is applied, and clusters are then evaluated on the Named Entity Recognition and POS tagging tasks using 25 languages from the WikiANN dataset and 12 typologically diverse languages from the UDPOS dataset. Compared to linguistic and embedding-based language clustering baselines, our method yields substantial improvements, especially for low-resource languages, with some showing over 35% increase in F1 score, while high-resource languages benefit moderately, confirming reduced transfer–interference trade-off. Our atom–language model offers a novel path for multilingual learning, inspired by molecular principles from physical sciences.
Bekuretsion et al. (Thu,) studied this question.