Key points are not available for this paper at this time.
The externalization of the state of one's mind, which people refer to as "mind reading" in science fiction, is currently being realized through brain decoding research. This field of study aims to deepen our understanding of the human brain, which is among the least understood known biological structures, and to build better foundations for brain-computer interfaces. With the success of state-of-the-art latent diffusion models in image synthesis, a trend in recent studies is to map fMRI recordings to the image embedding space of these generative models. While this method significantly improved image reconstructions in terms of semantics, preserving perceptual features without losing semantic information remains challenging, especially with complex images of natural scenes. This research introduces Neuro-Vis, a novel fMRI-to-image pipeline based on Stable Diffusion that effectively integrates multiple semantic controls through predicted image embeddings and captions and multiple lightweight perceptual controls through predicted blurry initial images, depth maps, and color palettes. Neuro-Vis outperforms the current state-of-the-art methods in terms of consistency in low-level features while also rivaling them in terms of semantics. Furthermore, ablation experiments demonstrate the effectiveness of each component in Neuro-Vis for fMRI-to-image reconstruction.
Balisacan et al. (Fri,) studied this question.