What question did this study set out to answer?

This work investigates the adaptability of vision-language-action models for robotic inspection tasks.

May 31, 2026Open Access

Assessment of a Fine-Tuned Vision-Language-Action Model for Robotic Feature-Following Inspection

Key Points

This work investigates the adaptability of vision-language-action models for robotic inspection tasks.
Fine-tuned manipulation-pretrained VLA model for a feature-following task.
Introduced trajectory-based and action-level metrics for evaluation.
Conducted assessments in a real-world robotic inspection setup.
Model executed feature-following trajectories with performance comparable to human operators.
Demonstrated effective transfer of VLA capabilities from manipulation to inspection tasks.

Abstract

Abstract Modern manufacturing increasingly relies on robotics to achieve high throughput and quality, especially as production lines become more flexible and parts more customized. Robotic inspection is a critical enabler for quality assurance as it supports repeatable measurements while reducing human workload and variability. Recent vision-language-action (VLA) models have advanced robotic manipulation by integrating visual perception and language understanding for autonomous control. However, the application to robotic inspection, which requires accurate movement without altering the environment, remains underexplored. This work investigates the feasibility of adapting manipulation-pretrained VLA models to an inspection-oriented feature-following task and presents the following contributions: Tailored to the requirements of inspection problems, two approaches for assessing VLA performance are introduced: A trajectory-based evaluation metric to quantify performance in rollouts as well as an action-level metric, useful during the fine-tuning process. In addition, an open-source, manipulation-pretrained VLA model is fine-tuned for a feature-following task. This task represents a simplified 2D inspection setting, designed to capture core aspects of inspection problems encountered in domains such as manufacturing and infrastructure. The model successfully executes these complex feature-following trajectories with competitive performance relative to a human operator in a real-world robotic setup, demonstrating effective transfer from manipulation to this class of inspection tasks. While the study is limited in scale, the results provide initial evidence that VLA models can be extended beyond manipulation to support feature perception and motion generation in automated, robotic inspection. This suggests their potential to support more consistent and automated inspection processes, motivating further investigation into robustness and generalization.

Ask AI

Helpful

Bookmark

View Full Paper