TSADmetrics is an open-source Python library that provides a unified framework for evaluating anomaly detection models in time series. It integrates 34 metrics organized under a systematic taxonomy proposed in this work, capturing diverse evaluation dimensions including point-wise accuracy, temporal coverage, alignment, tolerance to shifts, and delay penalties. The library simplifies benchmarking through automated evaluation pipelines, configuration-driven workflows, and structured report generation. Its modular and extensible architecture allows researchers to easily incorporate new metrics and adapt the framework to specific experimental needs. By consolidating heterogeneous evaluation approaches under a common taxonomy and implementation, TSADmetrics promotes transparency, comparability, and scalability in anomaly detection research. As an additional contribution, this work includes a comprehensive empirical evaluation comprising four complementary studies covering general behavioural analysis, correlation and redundancy, latency oriented trade-off and computational complexity. These studies complement the formal taxonomy and implementations of TSADmetrics by demonstrating the usefulness of the framework and supporting practical metric selection and interpretation, thereby enhancing transparency, comparability, and scalability in anomaly-detection research.
Velasco et al. (Mon,) studied this question.