In view of the challenge of correlation analysis of operation and maintenance events brought by the scale expansion and dynamic evolution of cloud platform, this study proposes an intelligent framework integrating knowledge graph (KG) and automatic reasoning. By constructing a dynamic KG with time series attribute and incremental update mechanism, the semantic unification and real-time fusion of multi-source heterogeneous data such as logs, indicators and topologies are realized. A hybrid engine combining symbolic reasoning and numerical reasoning is designed. The former provides interpretable causal traceability, while the latter mines hidden association patterns. The weighted mixed decision-making mechanism is introduced to dynamically balance the two kinds of reasoning results to meet the second-level response requirements. The framework supports two core applications: root cause location and influence surface analysis, and can output interpretable decisions with reasoning paths. The experiment is based on Kubernetes simulation environment. In the test set of 5 typical fault scenarios and 23,000 data, the average root cause location accuracy reaches 95.0%, F1-Score is significantly better than the baseline method, and the response time keeps flat with the increase of event scale, which verifies its effectiveness and efficiency in complex cloud environment.
Yu et al. (Sun,) studied this question.