Abstract Large pre-trained models have demonstrated remarkable capabilities across domains, but their comparative effectiveness in time series forecasting, especially against smaller, efficient models, remains underexplored. This work empirically examines whether pre-trained large-scale time series models (LSTSMs) trained on diverse datasets can outperform traditional non-pretrained small-scale transformers in forecasting tasks. We specifically compare large models trained from scratch against those benefiting from pretraining to measure the direct impact of transfer learning on forecasting performance. We analyze state-of-the-art (SOTA) pre-trained universal time series models (e.g., Moirai, GPT4TS, Timer, CALF, LLM4TS) alongside conventional small-scale transformers, evaluating accuracy and computational efficiency across multiple benchmarks. We further conduct an extensive ablation study across varying fine-tuning data sizes (10%, 25%, and 75%) to assess few-shot, moderate, and near full-data adaptation capabilities. Additionally, explainability of large time series models is examined using comprehensiveness via feature ablation, occlusion, integrated gradients and gradient shap methods. Besides that, interpretability of pretraining and finetuning strategies is also examined using spectral metrics via WeightWatcher to quantify layer-wise generalization and representation quality, while theoretical and quantitative computational complexity analyses, including parameter counts, training time, model sizes, and inference latency, highlight the trade-offs between predictive performance and resource efficiency. Our findings reveal the strengths and limitations of pre-trained large-scale models, providing insights into their suitability for time series tasks compared to task-specific small-scale architectures. The results highlight scenarios where pretraining offers advantages and where simpler models remain competitive.
Biswas et al. (Tue,) studied this question.