This paper provides a detailed breakdown of a minimalist, fundamental Transformer-based architecture for forecasting univariate time series. It describes each processing step in detail, from input embedding and positional encoding to self-attention mechanisms and output projection. All of these steps are specifically tailored to sequential temporal data. By isolating and analyzing the role of each component, this paper demonstrates how Transformers capture long-term dependencies in time series. A simplified, interpretable Transformer model named ’minimalist Transformer’ is implemented and showcased using a simple example. It is then validated using the M3 forecasting competition benchmark, which is based on real-world data, and a number of data series generated by IoT sensors. The aim of this work is to serve as a practical guide and foundation for future Transformer-based forecasting innovations, providing a solid baseline that is simple to achieve but exhibits a stable forecasting ability not far behind that of state-of-the-art specialized designs.
Garagnani et al. (Tue,) studied this question.