• Taxonomy for multimodal forecasting by modality, model, fusion type, and task. • Review of 35 papers (2018–2025) analyzed through the proposed unified taxonomy. • Challenges: misalignment, modality imbalance, noisy inputs and generalization. • Trends: adaptive fusion, missing modality learning, LLMs, temporal GNN models. Multimodal learning has recently emerged as a powerful paradigm for financial forecasting, enabling the integration of heterogeneous data sources such as market time series, textual news, and relational graphs. This survey presents a unified taxonomy for multimodal financial forecasting models, structured along four key dimensions: input modalities, modelling architectures, fusion strategies, and predictive tasks. Using this taxonomy, we conduct a systematic review of 35 representative works published between 2018 and 2025, highlighting methodological trends, design choices, and performance patterns. Our analysis identifies persistent challenges, including temporal misalignment, modality imbalance, missing or noisy data, and limited cross-market generalization. We also discuss emerging trends and promising research directions, such as adaptive fusion, incomplete modality learning, and the integration of large language models and temporal graph neural networks, and analyse how architectural and fusion design choices impact practical considerations such as interpretability and deployability, aiming to bridge methodological innovation with domain-specific requirements.
D’Amico et al. (Sun,) studied this question.