What type of study is this?

This is a Quantitative Study study (also classified as: Experimental Study).

October 19, 2025Open Access

Hierarchical Prompt Engineering for Remote Sensing Scene Understanding with Large Vision-Language Models

Key Points

Hierarchical prompting significantly improves recognition accuracy and robustness in remote-sensing scenes, especially under varied conditions.
Experiments with five AID dataset variants demonstrate that the hierarchical approach outperforms standard prompting methods consistently.
Parameter-efficient adaptation techniques, such as LoRA and QLoRA, contribute to lower computational requirements while maintaining performance.
The approach establishes a repeatable baseline for remote-sensing classification, supporting reproducibility through detailed prompt templates.

Abstract

Vision–language models (VLMs) show promise for remote-sensing scene classification but still struggle with fine-grained categories and distribution shifts. We present a hierarchical prompting framework that decomposes recognition into a coarse-to-fine decision process with structured outputs, paired with parameter-efficient adaptation (LoRA/QLoRA). To assess robustness without relying on multiple external datasets, we construct five protocol variants of the AID dataset (V0-V4) that systematically vary label granularity, class consolidation, and augmentation settings. The design goals and construction rules of these variants, as well as their alignment with prompt styles, are summarized in Section 3.1.1 and Table 1. We enforce a split-before-augment pipeline (augmenting the training split only) to preclude leakage27. We further conduct a leakage audit using rotation/flip–invariant perceptual hashing across splits28 to guarantee reproducibility.Experiments across these AID variants show that hierarchical prompting consistently outperforms non-hierarchical prompts and matches or exceeds full fine-tuning while requiring substantially less compute. Ablations on prompt design, adaptation strategy, and model capacity, together with confusion matrices and class-wise metrics, demonstrate improved coarse- and fine-grained recognition as well as resilience to rotations and flips. The approach provides a strong, reproducible baseline for remote-sensing classification under constrained compute, with complete prompt templates and processing scripts supplied for replication.

Read Full Paperexternally

Ask AI

Mark Helpful

Bookmark

Relay

View Full Paper