March 3, 2026Open Access

SemBA-FAST: Semantic-based Bayesian attention applied to foveal active visual search tasks

Key Points

The use of semantic information significantly enhances target-present visual search reliability in static scenes.
Models leveraging Bayesian filters with semantic representation yield better results than traditional saliency-based methods.
Foveal vision enables efficient processing by reducing excess visual information, focusing on salient regions.
Deep object detectors can be adapted for foveal images with minimal computational costs, enhancing practical applications.

Abstract

Both robots and humans have visual sensors with limited fields of view that need to be controlled to explore the environment and search for objects. To make this process efficient, visual attention methods actively select the information that contributes the most to the success of the task. Two key factors are characteristic of human vision. First, sensors can have space-varying resolution to process only certain parts of the scene with high resolution. Second, the attentional focus is deployed at highly informative regions, e.g. highly conspicuous regions. In this paper, we propose the use of semantic information, readily available in state-of-the-art deep object detectors, as an effective method to guide visual target search tasks using foveal sensors, which we refer to as SemBA -FAST. Because state-of-the-art object detectors are trained in conventional Cartesian images, we propose methods to calibrate detections in foveated images without requiring retraining the deep models. The information collected across multiple saccades is fused using Bayesian filters that keep a semantic representation of the world with associated uncertainty, on which the next gaze direction is actively determined. The proposed model is compared with state-of-the-art saliency-based methods. Our results demonstrate that semantic information positively influences the performance of target-present visual search in static scenes, highlighting its importance in designing visual attention systems for robots. • Semantic information available on current deep learning models can be exploited in active visual search of known objects and brings advantages with respect to saliency-based models. • Foveal vision effectively reduces the amount of visual information to be processed. • Deep-learning pre-trained object detectors can be calibrated to foveal images with low computational effort. • Biologically inspired computational models provide better insights into human visual cognition. • Probabilistic framework for integrating information across multiple views and best next view planning that enhances interpretability and mathematical explainability.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

João Luzio

Instituto Superior Técnico

Alexandre Bernardino

INESC TEC

Plínio Moreno

Iscte – Instituto Universitário de Lisboa

Journals

Neurocomputing

Actions

Institutions

Instituto Superior Técnico

INESC TEC

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

SemBA-FAST: Semantic-based Bayesian attention applied to foveal active visual search tasks

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider