In recent years, Transformer-based methods have demonstrated proficiency in capturing complex patterns for time series forecasting. However, their quadratic complexity relative to input sequence length poses a significant bottleneck for scalability and real-world deployment. Recently, the Mamba architecture has emerged as a compelling alternative by mitigating the prohibitive computational overhead and latency inherent in Transformers. Nevertheless, a vanilla Mamba backbone often struggles to adequately characterize intricate temporal dynamics, particularly long-term trend shifts and non-stationary behaviors. To bridge the gap between Mamba’s global scanning and local dependency modeling, we propose C-T-Mamba, a hybrid framework that synergistically integrates a Mamba block, channel attention, and a temporal convolution block. Specifically, the Mamba block is leveraged to capture long-range temporal dependencies with linear scaling, the channel attention mechanism filters redundant information, and the temporal convolution block extracts multi-scale local and global features. Extensive experiments on five public benchmarks demonstrate that C-T-Mamba consistently outperforms state-of-the-art (SOTA) baselines (e.g., PatchTST and iTransformer), achieving average reductions of 4.3–18.5% in MSE and 3.9–16.2% in MAE compared to representative Transformer-based and CNN-based models. Inference scaling analysis reveals that C-T-Mamba effectively breaks the computational bottleneck; at a horizon of 1536, it achieves an 8.8× reduction in GPU memory and over 10× speedup compared to standard Transformers. At 2048 steps, its latency remains as low as 8.9 ms, demonstrating superior linear scaling. These results underscore that C-T-Mamba achieves SOTA accuracy while maintaining a minimal computational footprint, making it highly effective for long-term multivariate time series forecasting.
Liu et al. (Tue,) studied this question.