Despite their remarkable capabilities, state-of-theart artificial intelligence models rely on deeply parameterized architectures that require extensive labeled datasets and multiple training epochs, revealing significant inefficiencies compared to biological intelligence in terms of data utilization and energy consumption. While self-supervised pretraining techniques have advanced the field, these approaches still demand considerable amounts of data to achieve high classification accuracy. Biological neural systems, in contrast, demonstrate remarkable efficiency through local learning rules and multi-modal integration capabilities. Drawing inspiration from these principles, we present BrAMA (Brain-inspired Architecture for Multimodal Association), a novel framework that constructs meaningful data representations by associating symbolic representations of multimodal signals, drawing inspiration from cognitivist principles and neuroscience. Through parameterized Hebbian connections between self-organizing maps, our enhanced learning mechanisms and semi-supervision capabilities, BrAMA achieves superior accuracy compared to state-of-the-art approaches while requiring significantly fewer training examples with only a single epoch. We demonstrate the effectiveness of our approach on multiple benchmark datasets including MNIST variants and introduce TISC50, a new standardized multimodal audio-visual benchmark. Experimental results show that BrAMA maintains robust performance with as few as 4 examples per class, significantly outperforming conventional gradient-based approaches in data-constrained scenarios. This work underscores the value of integrating principles from neuroscience and cognitive science to overcome fundamental limitations in contemporary machine learning approaches.
Grienay et al. (Mon,) studied this question.