What question did this study set out to answer?

February 2, 2026Open Access

An End-to-End Benchmark Suite for Time-Series Forecasting

Key Points

This work aims to provide a standardized benchmarking framework for time series forecasting models across different domains.
Developed the TSB framework for fair and reproducible evaluations.
Included various model types including deep learning and statistical models.
Ensured the inclusion of diverse datasets from multiple domains.
Utilized core Python libraries to minimize dependencies.
Established a comprehensive evaluation methodology for time series forecasting.
Demonstrated the feasibility of reproducible benchmarking through TSB.
Highlighted the performance of various models across different datasets.

Abstract

Time series forecasting has become more and more critical across different domains like traffic, healthcare, economics and energy. Despite the past 20 year digital transformation that is happening in the entire world, the new methodologies that emerge, and despite the increasing past data that can work in our advantage as tools to help us on testing, the field suffers from incosistent evaluation practices, biased or poor benchmarking processes and a general lack of standarization across all the procedures that happen when performing time series forecasting. This thesis, is targeted towards providing a comprehensive and standarized benchmarking framework for time series forecasting. The TSB framework is developed to be fair, transparent, unbiased and easily reproducible. It is also developed to support a wide range of models and datesets. Such models come from different categories, like transformers, deep learning models, foundation models and even older models like statistical and machine learning models that tend to be neglected. In terms of datasets, we decided that it is very crucial to include a variaty of datasets from different domains, in order to make this framework viable and complete. All these logical components, the models, datasets, their parsers, the evaluation methodologies and overall benchmarking pipeline are bundled under a project, the TSB. The TSB framework takes sustainability very seriously. Minimal external dependencies are used, relying primarily on established or core Python libraries. Last but not least, the benchmarking results are displayed in critical diagram plots, so the research community can evaluate different models performance on different datasets, or even reproduce them.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper