March 3, 2026Open Access

Cloud disaster recovery model based on failure prediction

Key Points

CDR-TSP achieves a 65%–90% reduction in recovery time objective, shifting to active fault protection.
The model uses an encoder-decoder long short-term memory approach to improve prediction accuracy.
Extensive testing shows superiority over traditional statistical models with lower mean squared error metrics.
Automated resource orchestration enables rapid recovery during various fault scenarios, ensuring service continuity.

Abstract

Traditional Cloud Disaster Recovery (CDR) adopts a “fail-first, then recover” paradigm, leading to prolonged Recovery Time Objective (RTO) and Recovery Point Objective (RPO) that no longer meet enterprises’ stringent service continuity requirements. To address this gap, this paper proposes a CDR framework based on time series prediction (CDR-TSP) with a core Encoder-Decoder Long Short-Term Memory (ED-LSTM) model. The ED-LSTM model captures nonlinear characteristics and long-term dependencies in time-series monitoring data (e.g., Central Processing Unit utilization, network latency) to enable proactive fault perception. CDR-TSP integrates a primary-standby architecture, a fault perception system, and a programmable workflow engine to realize automated pre-failure protection and rapid resource orchestration. Extensive experiments are conducted on 5 open-source datasets and 2 real-world cloud production datasets. Results show that ED-LSTM outperforms traditional statistical models and deep learning methods in prediction accuracy, achieving the lowest Mean Squared Error (MSE) and Mean Absolute Error (MAE) across all datasets. In disaster recovery tests with four typical fault scenarios (downtime, network disconnection, resource exhaustion, periodic downtime), CDR-TSP reduces RTO by 65%–90% compared with manual recovery, and achieves near-zero RTO for periodic faults. This work shifts CDR from passive to active protection, significantly enhancing the disaster recovery resilience of business systems.

Cloud disaster recovery model based on failure prediction

Key Points

Abstract

Cite This Study