ABSTRACT Urban street perception is central to human‐centered planning, yet conventional approaches often rely on costly and unscalable resident surveys. To address this, we propose an automated, “survey‐free” analytical paradigm integrating a domain‐aligned Large Language Model framework with explainable machine learning. Using Baidu Street View imagery from Xihu (Hangzhou, China), our methodology employs a Vision Encoder and a trainable Connector to map visual features into a fine‐tuned Llama3 model, generating multidimensional perception assessments. Validated against expert ground‐truth descriptions, these qualitative narratives are seamlessly structured into quantifiable sentiment metrics via a comprehensive Natural Language Processing pipeline. Subsequent explainable modeling uncovers complex, nonlinear threshold dynamics and interactive dependencies between physical visual elements and perception scores. The analysis systematically reveals that the psychological impacts of individual street components are strictly governed by their spatial configurations and structural tipping points. Ultimately, this research demonstrates that combining multimodal AI with explainable algorithms provides a highly scalable and reproducible alternative to traditional surveys. By translating abstract urban observations into precise, data‐driven physical design guidelines, this framework empowers urban planners to execute targeted and effective street renewal strategies.
Building similarity graph...
Analyzing shared references across papers
Loading...
Xin Han
Yu Zhu
Lei Wang
Transactions in GIS
Peking University
Southeast University
Kyungpook National University
Building similarity graph...
Analyzing shared references across papers
Loading...
Han et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69fbefc0164b5133a91a3caa — DOI: https://doi.org/10.1111/tgis.70280