Abstract Recent advances in generative artificial intelligence have enabled significant progress in reconstructing visual content from functional magnetic resonance imaging (fMRI) data. Early approaches were based on standalone generative models, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), which demonstrated the feasibility of neural decoding but were limited in capturing both structural fidelity and semantic richness. More recent developments have shifted toward hybrid, multi-stage reconstruction pipelines, in which neural signals are first mapped into structured latent representations (e.g., CLIP or VDVAE embeddings) and subsequently used to condition diffusion-based generative models for high-fidelity image synthesis. A structured narrative review of generative AI approaches for fMRI-based visual reconstruction is provided, analyzing the evolution from standalone generative models to representation-driven and diffusion-based architectures. A comparative analysis is conducted across major benchmarks, particularly the Natural Scenes Dataset (NSD) and the Generic Object Decoding (GOD) dataset, highlighting differences in model behavior, evaluation strategies, and reconstruction performance. In addition, a framework for scalable representation learning and dimensionality reduction is introduced to address key challenges associated with high-dimensional neural data and computational complexity. Critical limitations in current evaluation practices are also identified, including the lack of standardized metrics and the inherent trade-off between low-level visual fidelity and high-level semantic accuracy. Finally, emerging research directions are discussed, including domain-informed diffusion models, cross-subject generalization, multimodal integration, and large-scale foundation models, positioning generative AI-based neural decoding within a broader big data and computational neuroscience context.
Ahmed et al. (Sun,) studied this question.