Key points are not available for this paper at this time.
Signal data are essential for condition monitoring, fault diagnosis, and decision-making across industrial domains, and research leveraging signal data has been actively pursued in areas such as healthcare and manufacturing. However, acquiring such data is costly and difficult due to factors such as the risk of equipment damage, the need for expert labeling, and the scarcity of fault data. Moreover, collected data often contain sensitive operational information, making sharing difficult, and enterprises are restricted from using high-performance models hosted on external servers due to security concerns. To address these challenges, we propose BearGen , a novel framework that combines the strong generative capabilities of Large Language Models (LLMs) with the precise data distribution learning of diffusion models to synthesize high-quality signal data in on-premise environments. BearGen first employs an LLM to generate descriptions of existing signals and then conditions a description-guided diffusion model on these descriptions to generate high-quality synthetic signals. We evaluated BearGen on eight publicly available bearing fault diagnosis datasets, and the results showed superior performance compared to existing approaches. In addition, we experimentally validated the reliability and usefulness of the generated signal descriptions. Further experiments under conditions simulating real industrial environments — such as limited data availability and severe data imbalance — verified the practical applicability of the framework. By operating in on-premise environments, BearGen resolves data security concerns while alleviating data scarcity and imbalance. Furthermore, by providing natural language descriptions, it enhances interpretability and offers significant potential for decision support in real-world industrial applications.
Lee et al. (Wed,) studied this question.