Abstract Helical tomotherapy systems (HTS) involve complex multi-system integration, where equipment faults vary significantly in severity and demand high interpretability in automated diagnosis. With the increasing deployment of HTS across distributed clinical environments, there is a growing need for cloud-enabled intelligent fault detection systems that support scalable monitoring and centralized analysis. This study aims to develop an interpretable visual classification framework for HTS fault detection via cloud-edge collaboration between Large Language Models (LLMs) and Vision-Language Models (VLMs). Based on adverse event records from 2023 to 2025 in our institution, we constructed a multi-category image dataset covering common HTS fault patterns, including Surface Damage, Deformation, Material Loss, Contamination, Texture Anomaly, Intensity Abnormality, and Structural Breakage. We propose a medical-oriented Retrieval-driven Reasoning framework (RdR-Med) for fault classification. Specifically, LLMs are deployed on the cloud to extract category-relevant fault descriptors and structural abnormality patterns, forming a structured retrieval database. During inference, edge devices perform lightweight visual feature extraction, while the cloud executes retrieval and multi-step deliberative reasoning via VLMs, enabling explicit reasoning before producing the final fault category prediction. Experiments on the constructed HTS fault dataset and two public industrial inspection datasets demonstrate that RdR-Med consistently outperforms conventional direct VLM matching approaches and standard convolutional neural network baselines in terms of classification accuracy. By integrating cloud-based knowledge sharing with edge-side real-time perception, the proposed method significantly enhances the scalability, robustness, generalization, and interpretability of HTS fault detection.
Yang et al. (Sat,) studied this question.