What question did this study set out to answer?

The research aims to improve fault diagnosis in large language model applications by addressing unique challenges faced during production.

April 3, 2026

LLMRCA: Multilevel Root Cause Analysis for LLM Applications Using Multimodal Observability Data

Puntos clave

The research aims to improve fault diagnosis in large language model applications by addressing unique challenges faced during production.
Proposes LLMRCA, an unsupervised multilevel RCA framework.
Constructs a heterogeneous causal graph from metrics, logs, and traces.
Uses a Residual Graph Attention autoencoder to assess reconstruction errors.
Implements request classification to manage unstable response times and verification to minimize false positives.
LLMRCA outperforms eight baseline methods in fault diagnosis.
Achieves up to 5.1 times more accurate results for performance issues compared to existing methods.
Obtains 92.86% top-1 ranking accuracy for diagnosing response quality problems.

Resumen

Ensuring the reliability of large language model (LLM) applications in production environments is critical as LLMs are widely integrated into software systems. However, fault diagnosis in these applications is challenging due to distributed deployment and complex component interactions. Existing root cause analysis (RCA) methods fall short when applied to LLM applications because they ignore two key challenges: unstable response times and response quality-related silent faults. To address these challenges, we propose LLMRCA, the first multilevel and unsupervised RCA framework that leverages multimodal observability data to identify root causes in LLM applications. LLMRCA constructs a heterogeneous causal graph from metrics, logs, and traces, and employs a Residual Graph Attention autoencoder to calculate reconstruction errors for RCA. LLMRCA also introduces request classification to handle unstable response times and verification to reduce false positives. Experiments on a retrieval-augmented generation LLM application show that LLMRCA outperforms eight baseline methods, achieving up to \ (5. 1\) more accurate results than baselines for performance problems and 92. 86% top-1 ranking accuracy for response-quality problems. Furthermore, the ablation studies confirm the contributions of each data modality and different framework modules to effectiveness. Generally, our framework provides a practical approach for diagnosing both performance and quality faults in LLM applications.

Me gusta

Guardar