Long-term time series forecasting depends on modeling variation across time scales together with persistent temporal dependence. Existing frequency decomposition methods partially address multi-scale modeling but suffer from fixed frequency decomposition, simplistic feature fusion, and insufficient cross-scale interaction. Fixed splits can also shift the predictive distribution away from the input scale, which hurts long-horizon consistency. We describe AdaptiveFrequencyNet (AFNet), an architecture that combines learnable multi-scale filter banks with cross-scale attention fusion. WAD applies depthwise convolutions with branch-wise softmax mixing (a learnable time-domain bank, not a discrete wavelet transform (DWT) or hand-crafted Fourier subband split). AMSFF couples self-attention, multi-scale convolutions, and gating. CSAF uses cross-attention for progressive mixing across scales. On six standard benchmarks under a fixed multivariate protocol (shared lookback), AFNet is compared with nine strong baselines, including recent Transformer-style models iTransformer and Pathformer under the same evaluation procedure. AFNet attains competitive test errors at a moderate parameter budget (Table 4). Outcomes are dataset- and horizon-dependent: among tabulated dataset averages, iTransformer reports the lowest MSE on Electricity (0.170) with AFNet at 0.174, whereas on ETTh1 average MSE for AFNet (0.443) is close to Pathformer (0.439). On Weather, Pathformer and FreTS average lower MSE than AFNet in the same table. The reported numbers should be read only under this protocol. AMSFF and CSAF use standard multi-head attention; each attention block scales as Formula: see text in the processed sequence length T and is therefore not linear in T. Ablation experiments under the same protocol support the contribution of each module.
Cui et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: