What question did this study set out to answer?

June 3, 2026Open Access

Hybrid Modeling of Long-Memory Degradation Dynamics Using Fractional Difference Operators and Deep Reinforcement Learning

Key Points

The aim is to develop a hybrid model combining fractional dynamics and deep reinforcement learning for predictive maintenance in machinery.
Introduced the Grünwald–Letnikov fractional difference operator to model degradation trajectories.
Employed a bidirectional gated recurrent unit network to learn degradation representations.
Designed a deep Q-network to optimize maintenance decisions under uncertain conditions.
Achieved a mean maintenance lead time of 33.9 ± 12.6 steps.
Increased in-band rate from 0.55 ± 0.10 to 0.85 ± 0.06 (p=0.013).
Achieved a failure rate of 0.00 ± 0.00 in cross-dataset validation.

Abstract

Long-memory degradation processes in rotating machinery often exhibit nonlinear evolution, nonlocal temporal dependence, and hereditary characteristics, which are difficult to fully capture using conventional integer-order models or standard Markovian decision frameworks. To address this issue, this study proposes a hybrid fractional-dynamics and deep reinforcement learning framework for predictive maintenance of memory-dependent degradation systems. First, the Grünwald–Letnikov fractional difference operator is introduced to construct a fractional-memory representation of degradation trajectories, enabling the model to explicitly encode long-range dependence and accumulated historical degradation effects. Then, a bidirectional gated recurrent unit network is employed to learn sequential degradation representations from the fractional-memory state space, while a deep Q-network is designed to optimize maintenance decisions under uncertain degradation evolution. Experimental results on the IEEE PHM 2012 bearing dataset show that the proposed FM-BiGRU-DQN with safety-guided execution achieved a mean maintenance lead time of 33.9 ± 12.6 steps, an in-band rate of 0.85 ± 0.06, a failure rate of 0.00 ± 0.00, and a deployment reliability of 1.00 ± 0.00 over 10 independent random seeds. Compared with NM-BiGRU-DQN, the in-band rate increased from 0.55 ± 0.10 to 0.85 ± 0.06, with a paired-test p-value of 0.013. Cross-dataset validation on the XJTU-SY bearing dataset further achieved an in-band rate of 0.80 and a failure rate of 0.00. These results indicate that embedding fractional-memory dynamics into deep reinforcement learning improves maintenance timing accuracy, policy robustness, and deployment reliability for complex memory-dependent degradation systems.

Hybrid Modeling of Long-Memory Degradation Dynamics Using Fractional Difference Operators and Deep Reinforcement Learning

Key Points

Abstract

Cite This Study