August 26, 2025Open Access

Code Generation in Unseen Robot Programming Languages using LLMs

Key Points

Code generation achieved 88.5% Pass@1 for documentation-based retrieval, showing LLM potential in unseen languages.
Maximal Marginal Relevance enhanced retrieval relevance and reduced redundancy, contributing to better code generation outcomes.
Inadequate corpora in low-resource settings highlight the importance of using documentation effectively for code generation.
Advanced retrieval methods like Contextual Compression may unintentionally increase LLM bias towards popular languages like Python.

Abstract

While Large Language Models (LLMs) have demonstrated impressive code generation capabilities in widely used programming languages like Python, their effectiveness in domain specific languages (DSLs) introduced after their training cutoff remains limited. Specifically, code generation in unseen robot programming languages remains largely unexplored. Recent research has explored the use of in-context learning (ICL) combined with retrieval augmented generation (RAG) to bridge this gap, particularly for code generation in unseen languages. In low-resource settings, it is common to encounter an inadequate code corpus; in such scenarios documentation serves as a valuable resource. By dynamically balancing query relevance and reducing redundancy in retrieved demonstrations, we show that Maximal Marginal Relevance (MMR) based retrieval enables more contextually aligned and syntactically correct code generation, achieving 88.5% in Pass@1 for Documentation-based retrieval and 85.9% for Code-based retrieval on an unseen robot programming language. We also demonstrate how advanced retrieval strategies like Contextual Compression inadvertently amplify LLMs' inherent bias towards generating Python like code.

Read Full Paperexternally

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper