Abstract In industrial Predictive Maintenance (PdM), effective data-driven models are often limited by a scarcity of data, dataset imbalance, and the high costs of collecting failure data. By simulating realistic failure scenarios and enhancing model training, the synthetic data generation has emerged as a promising strategy to overcome these challenges. This article is a systematic literature review of 86 peer-reviewed articles published since 2020 that focus on synthetic data applications in medium-to-heavy machinery and industrial processes. Data generation techniques fall into four key categories: data augmentation, generative models, physics-based simulations and hybrid approaches, and feature-based transformations. This review analyzes the strengths, limitations, and adoption trends of each method. Findings reveal that hybrid and physics-informed models are particularly valuable in safety-critical domains where model transparency and adherence to physical laws are essential and industrial contexts demand higher reliability and contextual accuracy. To address these needs, the Synthetic Data-Enhanced PdM (SD-PdM) framework, a five-phase methodology for integrating synthetic data into maintenance strategies, is proposed. This framework supports scalable, explainable, and economically viable smart maintenance solutions.
Nieminen et al. (Wed,) studied this question.