This paper presents a multi-task language-based framework for procedural content generation through reinforcement learning, which aims to improve the semantic alignment between linguistic commands and quantitative game surface features. While most previous methods in PCGRL have relied on numerical conditioning, the proposed approach, using a DeBERTa encoder and a multi-objective training scheme including regression, contrastive alignment, and hybrid learning, attempts to extract meaningful, generalizable, and structured representations of natural commands. To evaluate this framework, a structured dataset consisting of over 14,000 command-level pairs in the Super Mario environment is designed, which allows for the examination of single-task, collective, combinatorial, paraphrase, and extra-domain generalization. Experimental results show that the proposed model outperforms BERT-based methods in command following, semantic stability, and structural diversity of generated levels. The findings show that separating the semantic components of language and multi-objective training can be an effective step towards producing controllable, interpretable content that is aligned with human intent in PCGRL systems.
Nekahdari et al. (Mon,) studied this question.