What question did this study set out to answer?

The aim is to develop AquaSelect, a framework that allows classifiers to selectively abstain from making predictions to enhance accuracy in visually degraded underwater environments.

May 16, 2026Open Access

AquaSelect: Learning when to abstain via score fusion for reliable underwater species classification

Key Points

The aim is to develop AquaSelect, a framework that allows classifiers to selectively abstain from making predictions to enhance accuracy in visually degraded underwater environments.
AquaSelect employs a lightweight binary selection head of 213K parameters trained on a frozen backbone.
It integrates temperature-calibrated confidence and image quality through logistic regression for prediction reliability.
The framework was evaluated across two underwater species datasets with different backbone models.
At 80% coverage, AquaSelect achieved an accuracy increase from 87.3% to 94.8% and a Macro F1 score increase from 81.5% to 88.6%.
It outperformed Softmax Response and Monte Carlo Dropout across all evaluations on the AQUA20 dataset.
The system runs at 149 FPS, providing a significant speed advantage over Deep Ensembles.

Abstract

Deep learning classifiers for fine-grained visual recognition provide no per-prediction reliability estimate, yet selective prediction methods that allow classifiers to abstain remain evaluated only on standard benchmarks, untested in domains where visual degradation drives failure patterns. We present AquaSelect, a post-hoc selective prediction framework that learns when to abstain rather than risk a misclassification. AquaSelect trains a lightweight binary selection head of 213K parameters on a frozen backbone to predict classifier correctness, fusing this with temperature-calibrated confidence and image quality features via interpretable logistic regression. Because the backbone remains frozen, the selection head can be retrained for new environments without touching the base classifier. Evaluated on two underwater species datasets, AQUA20 with 8,171 images across 20 classes and Sea Animals with 13,711 images across 23 classes, using ConvNeXt-Tiny and DeiT-Small backbones across three seeds, AquaSelect outperforms Softmax Response and Monte Carlo Dropout on all six seed-backbone evaluations on AQUA20 and improves mean coverage metrics on Sea Animals. At 80% coverage, accuracy rises from 87.3% to 94.8% and Macro F1 from 81.5% to 88.6%, surpassing the benchmark full-data accuracy of 90.69% despite using 15% less training data. We also report that RAPS conformal prediction sets averaging 3.7 to 5.0 classes are impractical for single-label classification, and fusing set sizes with learned scores degrades selection quality. Ablation identifies the learned selection head as the dominant component. The framework runs at 149 FPS, 2.8 times faster than Deep Ensembles, and applies to any classification system where errors carry asymmetric costs. • First selective prediction study for visually degraded classification. • Frozen backbone design allows redeployment by retraining 213K parameters. • Learned score fusion improves accuracy from 87.3% to 94.8% at 80% coverage. • Fusing conformal set sizes with learned scores degrades selection quality. • Cross-dataset validation on two marine benchmarks confirms generalizability.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper