Abstract Drivable area segmentation (DAS) plays an important role in autonomous driving. Segment anything model (SAM) has recently emerged as a powerful foundation model, demonstrating remarkable potential across diverse downstream segmentation tasks through domain-specific parameter-efficient fine-tuning (PEFT). This paper explores effective adaptation strategies for applying SAM to DAS. However, existing approaches suffer from the following two limitations: 1) SAM employs a vanilla vision transformer (ViT) as its image encoder. However, the ViT struggles to extract multi-scale features without incurring substantial computational overhead; 2) current fine-tuning approaches for SAM have been found to inadequately explore traffic scene context. Thus, they are not fully optimized for DAS and leave much room for improvement. To address the above issues, we propose segment anything model for drivable area segmentation termed as DAS-SAM, a novel efficient adaption framework that fine-tunes SAM towards DAS. Our approach incorporates a lightweight, learnable network to extract multi-scale features and introduces three auxiliary learning objectives to incorporate traffic scene context. Furthermore, DAS-SAM employs mosaic image augmentation to improve robustness and generalization. Our framework is compatible with most of the existing PEFT methods, allowing for flexible integration that boosts performance. Extensive experiments on the BDD100k and Cityscapes datasets demonstrate that DAS-SAM outperforms both full fine-tuning and state-of-the-art PEFT methods.
Zhou et al. (Thu,) studied this question.