Los puntos clave no están disponibles para este artículo en este momento.
Abstract Microservices Architecture (MSA) has emerged as a dominant paradigm for building scalable and flexible distributed systems. By decomposing applications into loosely coupled services, MSA improves modularity and deployment agility, but also introduces significant challenges for system monitoring, anomaly detection, and root-cause identification. These challenges stem from the high degree of dynamic behaviour, service heterogeneity, and inter-service dependencies. This survey provides a comprehensive and systematic review of state-of-the-art approaches for anomaly detection and root-cause analysis in microservice-based systems, covering the period from 2012 to 2025. It examines 117 selected studies, categorized according to data collection methods, detection algorithms, diagnostic models, and evaluation metrics. Emphasis is placed on the role of machine learning, statistical inference, and trace-based methods in detecting and localizing faults. Furthermore, the survey highlights the limitations of existing techniques in terms of scalability, explainability, and real-time applicability. By consolidating scattered research into a unified analytical framework, this work contributes a detailed taxonomy of methods and a critical discussion of their effectiveness in container-based virtualized environments. The survey provides a representative and timely overview of the field, helping to map current progress and highlight remaining gaps. Finally, it outlines promising directions such as hybrid detection architectures, automated trace reasoning, and context-aware root-cause localization, offering guidance for researchers and practitioners seeking to enhance reliability, observability, and operational resilience in microservice systems.
Barata et al. (Mon,) studied this question.