What question did this study set out to answer?

This research aims to enhance underwater object classification using hyperspectral imaging through advanced deep learning methods.

March 7, 2026Open Access

Attention-Based Capsule Network with Vision Transformer for Underwater Hyperspectral Image Classification

Key Points

This research aims to enhance underwater object classification using hyperspectral imaging through advanced deep learning methods.
Utilized Channel Attention Module (CAM) based U-Net for segmentation of underwater images.
Employed CapsNet for feature extraction across various hyperspectral bands.
Implemented Vision Transformer (ViT) for multi-class classification and long-range feature relationships.
Achieved classification based on spectral spatial characteristics determined by Region of Interest (ROI).
Achieved 95.3% classification accuracy on the underwater hyperspectral image dataset.
Secured a maximum Intersection over Union (IoU) of 0.88.
Obtained 95.2% accuracy in the segmentation process.
Reached 0.99 Area Under the Curve (AUC) for the classification of 8 substrate classes.

Abstract

The underwater investigations and research remain challenging due to various underwater distortion factors, scattering, and low wavelength absorption. Hyperspectral imaging helps in obtaining detailed information on each underwater object using the spectral reflectance, using 100 to 300 bands. The Hyperspectral imaging in underwater applications contains the 3D hyperspectral cube, which needs a high level of processing that results in high-accuracy classification. Hence, this study proposes the framework of hybrid deep learning techniques that work on segmentation, feature extraction, and classification processes. The Channel Attention Module (CAM) based U-Net architecture is used for the segmentation process to obtain the spectral spatial characteristics based on the Region of Interest (ROI). The CapsNet Feature extraction helps in obtaining the features of various bands, which helps in the classification of object class-wise using the pose-based relationships. The Vision Transformer (ViT) based classification that depends on the capsule vector token, carries out the multi-class classification by obtaining the global attention among the feature vectors and relationship-based long-range ROI feature data. In this way, the proposed model attains 95.3% accuracy using the maximum IoU of 0.88 and 95.2% of the segmentation process, which helps in achieving 0.99AUC for 8 substrate classes of the underwater HSI dataset.

Bookmark

View Full Paper

Bookmark

View Full Paper

Attention-Based Capsule Network with Vision Transformer for Underwater Hyperspectral Image Classification

Key Points

Abstract

Cite This Study