Multivariate time-series anomaly detection is often evaluated with point-adjusted metrics, which can overstate practical performance when alarms are judged at the event level. Explanation results are also frequently reported as descriptive attributions without directly testing whether selected variables are useful for diagnosis. This study revisits these issues through unified event-level evaluation and repair-based explanation, using DCdetector as the main case study rather than proposing a new detector architecture. Experiments on SMAP, MSL, and HAI 21.03 use full-coverage score export and standard event-level control metrics. The results show that point-adjusted scores can be much higher than stricter event-level measurements. Event-aware refinement changes the detection trade-off by improving event recovery and reducing delay in several settings, but its effect is dataset- and calibration-dependent. For explanation, variables are ranked by exact marginal counterfactual repair effect and evaluated by whether repair reduces anomaly scores more than random or heuristic alternatives. The results provide quantitative evidence that the ranked variables are diagnostically informative, while exact marginal verification is computationally expensive and better suited to offline alarm review and post hoc diagnosis than latency-critical deployment. Auxiliary checks with TranAD, Anomaly-Transformer, and DADA support the plausibility of the main observations, but the evidence remains detector-conditioned rather than a fully backbone-agnostic benchmark. Overall, this work provides a stricter and more verifiable protocol for evaluating anomaly detection, event-aware refinement, and explanation quality in multivariate time-series monitoring.
Zhong et al. (Sat,) studied this question.