Los puntos clave no están disponibles para este artículo en este momento.
The IT infrastructure domain benefits from cloud computing because it delivers customizable resources available on demand. Creating reliable cloud system operations continues to be difficult because dynamic workload changes clash with unpredictable system failures and the intricate nature of distributed architectures. Monitoring methods relying on static thresholds together with rule-based alerts deliver reactive responses but they do not produce sufficient disruption prevention. The research investigates how AI facilitates predictive maintenance for cloud systems with the help of AWS CloudWatch combined with machine learning algorithms for advanced failure prediction and anomaly detection. This research introduces a framework that uses a combination of supervised and unsupervised ML models for AWS CloudWatch metrics and logs processing through Amazon SageMaker and AI analytics to deliver real-time monitoring and proactive fault prevention. The research shows how AI-enabled predictive maintenance cuts down both Mean Time to Detect (MTTD) and Mean Time to Repair (MTTR) leading to better resource use while decreasing service interruptions. Composite AI solutions alongside improved IoT integration and explainable AI systems are rising as potential solutions to overcome data quality, scalability issues and security concerns in AI monitoring. The next phase of investigation needs to prioritize improved computational precision and security to advance predictive maintenance methods for cloud services systems
Building similarity graph...
Analyzing shared references across papers
Loading...
Naga Surya Teja Thallam
Salesforce (United States)
Salesforce (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...
Naga Surya Teja Thallam (Wed,) studied this question.
synapsesocial.com/papers/6a030f804f17ebd4386522e8 — DOI: https://doi.org/10.63282/3050-9262.ijaidsml-v6i1p107