This study proposes an adaptive framework for understanding and modeling user preferences in self-driving behaviors through natural language interaction. A user survey conducted in North America revealed strong demand for customizable autonomous vehicle (AV) features, motivating the need for dynamic preference modeling. To capture diverse and context-specific verbal expressions of user intent, we leverage speech recognition and fine-tune a lightweight T5-base language model to classify preferences across predefined AV behavior categories. Given the computational constraints of in-vehicle environments, we adopt the T5-base model due to its efficiency and suitability for embedded deployment, in contrast to larger-scale LLMs. To overcome data scarcity, we applied a data augmentation strategy using a teacher model, increasing classification accuracy from 25% to 97%. The framework can integrate vision-language models (e.g., BLIP-2, CLIP, etc.) and multimodal sensor fusion (camera, LiDAR, radar) to represent traffic situations and support context-aware interpretation of user input. This approach enables the system to generalize user preferences across similar traffic conditions through similarity-based propagation. By supporting condition-specific behavioral expressions, the system can interpret and adapt user preferences accordingly. The proposed framework facilitates scalable, context-aware, and user-centered adaptation of autonomous vehicle behaviors, contributing to improved personalization and may improve system usability.
Lee et al. (Fri,) studied this question.