What question did this study set out to answer?

To develop an efficient semantic segmentation model that balances high-fidelity perception with real-time processing for autonomous driving.

April 7, 2026Open Access

AutoMamba: Efficient Autonomous Driving Segmentation Model with Mamba

Puntos clave

To develop an efficient semantic segmentation model that balances high-fidelity perception with real-time processing for autonomous driving.
Introduced a Hybrid-SSM architecture incorporating Depthwise Convolutions for local spatial information.
Implemented a Stage-Adaptive Mixed-Scanning strategy focusing on horizontal context and selective vertical scanning.
Applied Auxiliary Supervision and Online Hard Example Mining to combat long-tail forgetting in model training.
Achieved 67.79% mIoU on Cityscapes with 31.3% fewer FLOPs than SegFormer-B0.
Demonstrated efficient scaling at high resolutions without memory errors, unlike larger SegFormer models.

Resumen

Semantic segmentation for autonomous driving demands balancing high-fidelity perception with real-time latency. While Transformers achieve state-of-the-art results, their quadratic complexity bottlenecks high-resolution processing. State Space Models (SSMs) like Mamba offer linear complexity but often suffer from local detail loss and inefficient scanning strategies. We introduce AutoMamba, a tailored Hybrid-SSM architecture. We propose a Hybrid-SSM block incorporating Depthwise Convolutions to inject local spatial priors and a Stage-Adaptive Mixed-Scanning strategy. This strategy prioritizes horizontal context in early stages for road layouts while only activating vertical scanning in deep layers to preserve anisotropic structures like poles. Furthermore, we reveal that unlike Transformers, Mamba architectures require Auxiliary Supervision and Online Hard Example Mining (OHEM) to address “long-tail forgetting.” Experiments on Cityscapes and BDD100K under a training-from-scratch setting demonstrate AutoMamba’s superiority. Notably, AutoMamba-B0 achieves 67.79% mIoU on Cityscapes with 31.3% fewer FLOPs than SegFormer-B0. Moreover, while the larger SegFormer-B2 fails with Out-Of-Memory errors at 2048×2048 resolution, AutoMamba-B2 scales efficiently, validating its linear complexity advantage for next-generation perception systems.

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo