Both robots and humans have visual sensors with limited fields of view that need to be controlled to explore the environment and search for objects. To make this process efficient, visual attention methods actively select the information that contributes the most to the success of the task. Two key factors are characteristic of human vision. First, sensors can have space-varying resolution to process only certain parts of the scene with high resolution. Second, the attentional focus is deployed at highly informative regions, e.g. highly conspicuous regions. In this paper, we propose the use of semantic information, readily available in state-of-the-art deep object detectors, as an effective method to guide visual target search tasks using foveal sensors, which we refer to as SemBA -FAST. Because state-of-the-art object detectors are trained in conventional Cartesian images, we propose methods to calibrate detections in foveated images without requiring retraining the deep models. The information collected across multiple saccades is fused using Bayesian filters that keep a semantic representation of the world with associated uncertainty, on which the next gaze direction is actively determined. The proposed model is compared with state-of-the-art saliency-based methods. Our results demonstrate that semantic information positively influences the performance of target-present visual search in static scenes, highlighting its importance in designing visual attention systems for robots. • Semantic information available on current deep learning models can be exploited in active visual search of known objects and brings advantages with respect to saliency-based models. • Foveal vision effectively reduces the amount of visual information to be processed. • Deep-learning pre-trained object detectors can be calibrated to foveal images with low computational effort. • Biologically inspired computational models provide better insights into human visual cognition. • Probabilistic framework for integrating information across multiple views and best next view planning that enhances interpretability and mathematical explainability.
Building similarity graph...
Analyzing shared references across papers
Loading...
João Luzio
Instituto Superior Técnico
Alexandre Bernardino
INESC TEC
Plínio Moreno
Iscte – Instituto Universitário de Lisboa
Neurocomputing
Instituto Superior Técnico
INESC TEC
Building similarity graph...
Analyzing shared references across papers
Loading...
Luzio et al. (Wed,) studied this question.
synapsesocial.com/papers/69a75cd1c6e9836116a26031 — DOI: https://doi.org/10.1016/j.neucom.2026.132860
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: