What question did this study set out to answer?

The research aims to improve urban land use predictions at fine spatial scales using a combination of vision-language models and spatial dynamic modeling.

February 28, 2026Open Access

Fine-grained urban land use simulation: Integrating spatial dynamic modeling with a pre-trained vision-language model

Key Points

The research aims to improve urban land use predictions at fine spatial scales using a combination of vision-language models and spatial dynamic modeling.
Collected street view images from Shenzhen, China.
Applied UrbanCLIP for zero-shot inference of urban land use from images.
Developed a spatial dynamic model enhanced with polynomial regression for future simulations of urban dynamics.
Delineated eight distinct urban land use types using high-resolution classifications.
Simulated urban evolution trends towards 2035, integrating neighborhood influences and planning policies.
Achieved significant improvements in predictive accuracy and spatial detail compared to traditional methods.

Abstract

Accurate prediction of urban land use changes at fine spatial scales is essential for developing healthy and sustainable cities, yet traditional simulation models struggle to capture local dynamics due to limited availability of fine-grained data and insufficient complexity in modeling urban systems. To address these limitations, we propose a novel approach that leverages advances in pre-trained vision-language foundation models combined with spatial dynamic modeling to forecast detailed urban land use patterns. Specifically, we collected a spatially dense collection of street view images (SVIs) throughout Shenzhen, China, and applied UrbanCLIP, a specialized vision-language prompting framework, to perform zero-shot inference of urban land use directly from images without labeled datasets and model retraining. The resulting fine-grained classifications delineate eight distinct urban land use types, producing a detailed urban functional map. These high-resolution patterns were then integrated into a spatial dynamic model enhanced by polynomial regression to simulate urban evolution toward 2035. This approach effectively captures neighborhood influences, socioeconomic drivers, and urban planning policies. Our simulation provides actionable insights for sustainable development in Shenzhen by identifying areas for balanced growth, targeted infrastructure investments, and ecological preservation. Compared to conventional methods, our methodology significantly improves predictive accuracy and spatial granularity. By incorporating foundation models, our approach addresses traditional data constraints, offering scalable and robust tools for informed urban governance and decision-making. • Proposed a VLM-enhanced framework to predict fine-grained urban land use changes. • Achieved zero-shot land use inference based on street view images. • Produced high-resolution simulations of Shenzhen's urban dynamics toward 2035.

Read Full Paperexternally

KI fragen

Bookmark

View Full Paper

Cite This Study

Cai et al. (Thu,) studied this question.

synapsesocial.com/papers/69a287460a974eb0d3c02cc4 https://doi.org/https://doi.org/10.1016/j.compenvurbsys.2026.102416

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

KI fragen

Bookmark

View Full Paper