What question did this study set out to answer?

To introduce GVSABench for assessing the geo-visuospatial abilities of multimodal large language models.

February 12, 2026

Can AI Observe Geographical Space? GVSABench For Evaluating Geo‐Visuospatial Ability of Large Multimodal Models

Key Points

To introduce GVSABench for assessing the geo-visuospatial abilities of multimodal large language models.
Developed a benchmark with 851 image-based tasks across various dimensions.
Tested seven state-of-the-art multimodal large language models.
Utilized zero-shot, zero-CoT, and one-shot prompting strategies.
Overall accuracies of models remained moderate to low.
Significant variability in performance across models, languages, and tasks.
Limited improvements were observed with prompting strategies.
Scale-separation effects indicated differing performances in geographic versus non-geographic tasks.

Abstract

ABSTRACT This study introduces GVSABench, a comprehensive Geo‐Visuospatial Ability (GVSA) benchmark for evaluating the spatial abilities of multimodal large language models (MLLMs). The benchmark systematically spans intrinsic and extrinsic, static and dynamic, and geographic and non‐geographic dimensions, comprising 851 image‐based tasks. These tasks cover a variety of tasks including spatial visualization, spatial relation reasoning, scene interpretation, spatial orientation and localization, and map‐based problem‐solving. Seven state‐of‐the‐art MLLMs were tested under zero‐shot, zero‐CoT, and one‐shot prompting strategies. Results indicate that overall accuracies remain moderate to low, with significant variability across models, languages, and task types. Prompting strategies yield only limited improvements, underscoring that engineering alone cannot compensate for fundamental deficits in spatial cognition. Moreover, a scale‐separation effect was observed, with distinct performance patterns between geographic and non‐geographic tasks, as well as between small‐ and large‐scale contexts. These findings reveal the incomplete integration of visual, linguistic, and spatial reasoning in current MLLMs. GVSABench offers a reproducible and cognitively grounded framework for advancing future research on robust and human‐aligned spatial intelligence.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Can Liu

Zhiwei Wei

Hua Liao

Journals

Transactions in GIS

Actions

Institutions

Beijing Normal University

Hunan Normal University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Can AI Observe Geographical Space? GVSABench For Evaluating Geo‐Visuospatial Ability of Large Multimodal Models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study