This paper presents a unified and systematic study of compact Transformer architectures for time series forecasting. We introduce a modular framework that standardizes three widely used Transformer families— Autoformer , Informer , and PatchTST —into three principled architectural variants: Minimal , Standard , and Full , enabling controlled analysis of model capacity, inductive bias, and computational complexity. For each family, we provide consistent mathematical formulations, layer-wise descriptions, and end-to-end complexity characterizations. We conduct over 1500 controlled experiments on ten synthetic time series under varying patch lengths, forecast horizons, and noise levels. The results reveal clear and reproducible performance regimes: PatchTST Standard achieves the best overall accuracy and noise robustness, Autoformer variants excel on smooth and trend-dominated signals, and Informer variants exhibit sensitivity to noise and long horizons despite improved scalability. Complementing the empirical analysis, we derive new theoretical results that quantify noise attenuation, bias–variance trade-offs, and approximation–complexity guarantees specific to each architectural family. Finally, we demonstrate that these compact Transformer variants serve as effective and interpretable temporal encoders within an operator–theoretic forecasting framework. By embedding Autoformer , Informer , and PatchTST backbones into a Koopman-based latent dynamics model, we extend their applicability beyond synthetic benchmarks to real-world climate, cryptocurrency and electricity generation time series. Together, these results position compact, modular Transformers as scalable and theoretically grounded building blocks for scientific time series forecasting. • Unified framework for Autoformer , Informer , and PatchTST variants. • Theoretical bounds on noise reduction and attention complexity. • 1500 experiments reveal robustness trade-offs under noise. • PatchTST Standard excels in both clean and noisy regimes. • Lightweight modular Transformers for scientific forecasting.
Forootani et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: