While Large Language Models (LLMs) have demonstrated impressive code generation capabilities in widely used programming languages like Python, their effectiveness in domain specific languages (DSLs) introduced after their training cutoff remains limited. Specifically, code generation in unseen robot programming languages remains largely unexplored. Recent research has explored the use of in-context learning (ICL) combined with retrieval augmented generation (RAG) to bridge this gap, particularly for code generation in unseen languages. In low-resource settings, it is common to encounter an inadequate code corpus; in such scenarios documentation serves as a valuable resource. By dynamically balancing query relevance and reducing redundancy in retrieved demonstrations, we show that Maximal Marginal Relevance (MMR) based retrieval enables more contextually aligned and syntactically correct code generation, achieving 88.5% in Pass@1 for Documentation-based retrieval and 85.9% for Code-based retrieval on an unseen robot programming language. We also demonstrate how advanced retrieval strategies like Contextual Compression inadvertently amplify LLMs' inherent bias towards generating Python like code.
Building similarity graph...
Analyzing shared references across papers
Loading...
Rani Venkata Satya Anirudh
B. Akshay
Adak Rajdeep
Building similarity graph...
Analyzing shared references across papers
Loading...
Anirudh et al. (Tue,) studied this question.
www.synapsesocial.com/papers/68af5f19ad7bf08b1eae23c4 — DOI: https://doi.org/10.36227/techrxiv.175616871.14339508/v1