Well-designed urban environments crucially mitigate stress and enhance mental well-being through their restorative qualities. However, subjective surveys lack spatial scalability, while objective machine learning often fails to capture the complexity of human perceptual experiences. To address this gap, this study proposes a hybrid framework based on Vision Language Models (VLMs) and human experiences to assess the restorative quality of urban spaces. Using Shenzhen, China as a case study, we gathered subjective and objective knowledge from PRS-11 surveys and ChatGPT-4 descriptions of 566 street view images, incorporating this into VLM prompts via the Contrastive Language-Image Pretraining (CLIP) model. Through prompt engineering, the VLM evaluated 2,224 additional images, with semantic networks analyzing the decision-making process. Results demonstrate that: (1) our method significantly outperformed Random Forest, with an R² increase of 0.535 attributed to prior knowledge fusion; (2) restorative quality exhibits spatial heterogeneity, clustering in developed districts near park and coastal zones; and (3) semantic network analysis further revealed the decision rationales of VLMs across different restorative dimensions, providing design guidelines for low restorative quality spaces. This research offers a novel methodology for assessing restorative quality of urban spaces, providing practical tools for sustainable development and human mental well-being.
Ma et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: