What question did this study set out to answer?

This study aims to improve bearing fault diagnosis by integrating simulation data with multimodal fine-tuning.

June 5, 2026Open Access

FM-LLM: A Lightweight Multimodal LLM with Simulation-Augmented Data for Mechanical Fault Diagnosis

Key Points

This study aims to improve bearing fault diagnosis by integrating simulation data with multimodal fine-tuning.
Generated simulated bearing fault datasets to augment scarce original samples.
Applied envelope analysis for noise suppression and fault feature enhancement.
Extracted time-frequency images and integrated them with text-modal data for multimodal fine-tuning.
FM-LLM showed improved accuracy in fault classification compared to traditional methods.
Parameter-efficient fine-tuning using LoRA and QLoRA significantly reduced computational costs.
Enhanced generalization performance across different operating conditions and equipment types.

Abstract

Industrial fault diagnosis is a key component of intelligent manufacturing systems. Diagnostic accuracy directly influences the safety of production systems and the efficiency of equipment operation and maintenance. However, traditional bearing fault diagnosis methods are limited by three critical constraints. First, single-modal sensor data are relied upon, which fails to comprehensively characterize the complex operational states of equipment. Second, raw fault samples are scarce, resulting in poor model adaptability in small-sample scenarios. Third, generalization performance is significantly degraded during cross-operating-condition and cross-equipment transfer, with multimodal information being insufficiently integrated. Moreover, in the application of large language models (LLMs) to fault diagnosis, challenges such as insufficient data volume, modal gaps, and high fine-tuning costs remain unresolved. To address these challenges, this study proposes FM-LLM, a bearing fault diagnosis method that integrates simulation-based data augmentation with lightweight multimodal fine-tuning of LLMs. The implementation of the method is described as follows. First, a large number of simulated bearing fault datasets are generated through simulation technology. These datasets are fused with original measured datasets, effectively mitigating the limitation of scarce fault samples in industrial scenarios. Second, the fused data are processed via envelope analysis to suppress noise interference and enhance fault feature signals. Third, time-frequency images are extracted from the processed data as the visual modality. Concurrently, text-modal data are constructed by integrating core bearing information, such as model specifications, operating parameters, and fault type descriptions. This process produces a multimodal fine-tuning dataset composed of time-frequency images and text. Finally, parameter-efficient fine-tuning techniques, namely Low-Rank Adaptation (LoRA) and Quantized Low-Rank Adaptation (QLoRA), are adopted. These techniques facilitate accurate learning and classification of bearing fault features by the LLM. Additionally, computational costs are significantly reduced, and the forgetting of pre-trained knowledge is prevented.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper