March 3, 2026

Compact Mamba Multi-View for Object Detection

Key Points

Improved robustness and consistency in object detection for transparent materials was achieved using a new multi-view analysis technique.
The compact multi-view architecture integrates techniques like state-space modeling and hierarchical encoders to enhance detection performance.
Utilizing a unified multi-task objective, the model effectively combines geometric regression and ordinal classification for accurate outcomes.
This approach supports robust detection in complex scenarios, highlighting potential benefits in industrial inspection settings.

Abstract

Multi-view image analysis is a key enabler for robust perception when single viewpoints provide incomplete or ambiguous observations. This challenge is particularly pronounced in industrial inspection of transparent materials, where view-dependent optical effects, subtle surface degradations, and annotation noise significantly hinder reliable detection and severity assessment. In this work, we introduce a compact and efficient multi-view fusion architecture tailored to such constraints. Our approach combines shared-weight hierarchical encoders with selective state-space modeling to explicitly exploit cross-view and multi-scale correlations. Multi-View Mamba Blocks (MVMB) perform adaptive fusion at each feature level by coupling Mamba-based selective state-space layers with FiLM-driven cross-view conditioning, while a Global State-Space Fusion Block enforces long-range coherence across all views and resolutions. Task-specific decoding heads query the resulting global representation via cross-attention to jointly predict object localization and ordinal wear severity. The model is trained using a unified multi-task objective that integrates geometric regression, ordinal classification, cross-view consistency, feature alignment, and sequential smoothness. Extensive experiments on a challenging multi-view glass container inspection dataset demonstrate improved robustness, consistency, and scalability compared to strong baselines. To promote reproducibility and future research, we publicly release the proposed dataset at: https://datasets.liris.cnrs.fr/mvep-version1.

Compact Mamba Multi-View for Object Detection

Key Points

Abstract

Cite This Study