What question did this study set out to answer?

This research aims to evaluate how different prompting strategies influence the performance of LLM-controlled robots in production systems.

May 15, 2026Open Access

LLM-Based Multimodal Prompting for Adaptive Robot Control in Production Systems

Puntos clave

This research aims to evaluate how different prompting strategies influence the performance of LLM-controlled robots in production systems.
Laboratory demonstrator setup to test LLM-controlled pick-and-place robot.
Comparison of zero-shot versus multimodal few-shot prompting strategies using visual examples.
Dedicated evaluation model utilizing metrics for plan success, action success, and plan optimality.
Multimodal few-shot prompting improves planning accuracy by a significant margin.
Robustness and adaptability of the LLM-controlled robot increase with multimodal prompting compared to zero-shot.
Quantitative metrics show enhanced performance in real-world task execution.

Resumen

Large Language Models (LLMs) open new opportunities for adaptive automation in production systems by enabling robots to interpret human instructions and generate context-aware actions. In contrast to conventional robot programming, which requires expert knowledge and frequent reconfiguration, LLM-based control promises greater flexibility and easier interaction between humans and machines. However, generic LLMs still face major challenges when applied to manufacturing environments, as they lack grounding in real-world perception and may produce infeasible or unsafe actions. This paper presents a laboratory demonstrator that evaluates how different prompting strategies affect the performance of an LLM-controlled pick-and-place robot. The study systematically compares zero-shot and multimodal few-shot prompting, where visual examples such as annotated video frames and image captions are integrated into the LLM input. A dedicated evaluation model with metrics for plan success, action success, and plan optimality is used to quantify system behavior. The experimental results demonstrate that multimodal few-shot prompting significantly improves planning accuracy, robustness, and adaptability compared to a zero-shot baseline. These findings illustrate the potential of LLM-driven control for future intelligent production systems that combine semantic reasoning, multimodal perception, and human-interpretable automation.

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo

Cite This Study

Koch et al. (Thu,) studied this question.

synapsesocial.com/papers/6a06b83de7dec685947aab35 https://doi.org/https://doi.org/10.5445/ir/1000193181

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo