What question did this study set out to answer?

The central aim is to enhance time series forecasting by integrating textual information using a novel cross-modal attention mechanism.

April 22, 2026Open Access

Aligning Textual Information with Time Series via Cross-Modal Attention for Time Series Forecasting

Key Points

The central aim is to enhance time series forecasting by integrating textual information using a novel cross-modal attention mechanism.
Introducing a multimodal framework called Text-Time Cross-Modal Attention (TTCA).
Employing a cross-attention mechanism with time series features as queries and text features as keys and values.
Evaluating TTCA on the Time-MMD dataset across nine real-world domains.
TTCA outperforms state-of-the-art unimodal baselines by an average of 3.29% in mean squared error (MSE) and 9.66% in mean absolute error (MAE).
It shows moderate performance gains over recent multimodal approaches, especially in event-driven scenarios.

Abstract

News and reports frequently drive future trends, yet traditional Time Series Forecasting often fails to capture these external influences. To integrate textual insights, we introduce Text-Time Cross-Modal Attention (TTCA), a multimodal framework that fuses numerical embeddings with text embeddings extracted from a pre-trained language model. TTCA employs a cross-attention mechanism that treats time series features as queries and textual features as keys and values. This architecture ensures that semantic context enhances, rather than overshadows, underlying temporal dynamics. Extensive evaluations on the Time-MMD dataset across nine real-world domains demonstrate that TTCA consistently outperforms state-of-the-art unimodal baselines, achieving average improvements of 3.29% in MSE and 9.66% in MAE. Furthermore, TTCA shows moderate performance gains over recent multimodal approaches, particularly in event-driven scenarios.

Aligning Textual Information with Time Series via Cross-Modal Attention for Time Series Forecasting

Key Points

Abstract

Cite This Study