Remote sensing plays a central role in monitoring Earth’s surface for environmental changes such as deforestation, urban expansion, water scarcity, and climate-induced disasters. However, the rapid increase in satellite image acquisition makes manual interpretation impractical. This study proposes a hybrid deep learning framework that automatically generates descriptive captions for remote sensing images, enabling environmental scientists to interpret large-scale Earth observation data efficiently. The framework integrates visual features extracted with a fine-tuned VGG-16 network and semantic representations learned through Word2Vec embeddings, which are fused and decoded via an attention-enhanced Long Short-Term Memory (LSTM) network. Applied to the UC Merced Land Use (UCM) and RSICD datasets, which cover diverse environmental categories, the model produces captions that describe land use and ecological conditions with high accuracy. Evaluation using BLEU, METEOR, ROUGE, and CIDEr metrics demonstrates superior performance compared to existing approaches. More importantly, the generated captions capture meaningful environmental attributes–such as vegetation loss, settlement growth, or presence of water bodies–that are critical for applications in climate change monitoring, disaster management, and sustainable land-use planning. This approach provides a pathway for large-scale, automated environmental assessments, supporting decision-making in Earth system science and policy.
Mehmood et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: