What question did this study set out to answer?

The aim is to develop a unified framework for defect detection across different products in manufacturing using advanced AI techniques.

February 12, 2026Open Access

A unified vision-language model for cross-product defect detection in glove manufacturing

Puntos clave

The aim is to develop a unified framework for defect detection across different products in manufacturing using advanced AI techniques.
Developed a two-stage fine-tuning strategy with supervised and reinforcement components.
Implemented a multi-faceted reward function for optimizing detection accuracy.
Tested on a diverse glove manufacturing dataset to validate the model's effectiveness.
Achieved a mean Average Precision (mAP) of 0.63, comparable to specialized models.
Maintained competitive performance (mAP 0.61) using a unified model on mixed-product datasets.
Demonstrated scalability by effectively managing various product lines and defect types.

Resumen

Automated anomaly detection is vital to industrial quality control, yet conventional deep learning detectors often struggle with scalability. These models, typically following a rigid “one-model-per-task” paradigm, require separate systems for each product line, increasing operational complexity and cost in diverse manufacturing environments. To address this limitation, we propose a unified defect detection framework based on a Multimodal Large Language Model (MLLM). Our approach utilizes a two-stage fine-tuning strategy: Supervised Fine-Tuning (SFT) to impart domain-specific knowledge, followed by a novel Reinforcement Fine-Tuning (RFT) process that refines visual reasoning. This RFT stage is guided by a multi-faceted verifiable reward function designed to optimize localization accuracy, classification correctness, and output structure. On a challenging real-world glove manufacturing dataset, our RFT-enhanced MLLM achieves a mean Average Precision (mAP) of 0.63, which is comparable to a highly specialized YOLO baseline (0.62). More importantly, a single, unified MLLM trained on a mixed-product dataset maintains competitive performance (mAP 0.61), demonstrating its ability to dynamically handle different products and defect types via natural language prompts. This study validates the feasibility of using a single, flexible MLLM to replace multiple rigid models in complex industrial inspection, offering a scalable and cost-effective paradigm for future intelligent quality control systems. The open-source code will be released at https://github.com/GloamXun/Glove-MLLM .

Me gusta

Guardar

Ver artículo completo

Cite This Study

Zhao et al. (Wed,) studied this question.

synapsesocial.com/papers/698d6efe5be6419ac0d5502e https://doi.org/https://doi.org/10.1371/journal.pone.0339867

Me gusta

Guardar

Ver artículo completo