What question did this study set out to answer?

The aim is to enhance object detection and segmentation in UAV-based hyperspectral imagery using a new framework.

March 26, 2026Open Access

A novel deep learning-based object detection along semantic segmentation on aerial imagery

Key Points

The aim is to enhance object detection and segmentation in UAV-based hyperspectral imagery using a new framework.
Developed AeroResNet-Vision (ARV) for multimodal fusion of UAV imagery.
Combined ResNet-based DeepLabv3++ for semantic segmentation and YOLOv8 for object detection.
Utilized BRISK and MSER for feature extraction.
Applied image normalization and edge detection for preprocessing.
ARV achieved 97.20%, 98.50%, and 97.60% accuracy on three benchmark datasets.
Demonstrated superior performance in handling varying object scales and complex backgrounds.
Validated effectiveness through state-of-the-art accuracy in UAV image analysis.

Abstract

Deep learning has significantly advanced object detection and scene understanding, yet analyzing uncrewed aerial vehicle (UAV)-based hyperspectral imagery remains challenging due to its high spectral complexity and cluttered scenes. Most existing approaches address either detection or segmentation separately, or rely on limited data modalities, which constrains their effectiveness in complex aerial scenarios. To address this gap, we propose AeroResNet-Vision (ARV), a novel multimodal fusion framework for UAV imagery. ARV integrates state-of-the-art techniques into a unified pipeline: a ResNet-based DeepLabv3++ module for multi-scale semantic segmentation, classical Binary Robust Invariant Scalable Keypoints (BRISK), Maximally stable Extrema Regions (MSER) detectors for keypoint and region feature extraction, (You Only Look Once, version 8) YOLOv8 for efficient object detection, and a Vision Transformer (ViT) for context-aware classification. Additionally, image normalization and edge detection preprocessing are applied to enhance image quality and emphasize structural features. By fusing handcrafted and deep features from these components, ARV effectively handles varying object scales and complex backgrounds in aerial data. Experimental results on three benchmark datasets (ISPRS Potsdam, VEDAI, UAVid) demonstrate state-of-the-art accuracy—for instance, ARV achieved 97.20%, 98.50%, and 97.60% accuracy on these datasets, respectively. These findings validate the framework’s superior performance in UAV image analysis. In conclusion, the proposed multimodal approach provides a robust solution for aerial object recognition, and we recommend exploring lightweight model variants and self-supervised learning to further enhance its deployment potential.

Mark Helpful

Bookmark

Relay

View Full Paper