The dynamic nature of acoustic environments—particularly the fluctuation of underwater channels and time-varying target observation angles—poses significant challenges for active sonar target recognition, a problem further aggravated by the scarcity of labeled training samples. To address these limitations, this paper proposes a novel recognition method enabling deep fusion of multi-domain temporal features extracted from target echoes. First, complementary features are extracted across spatial, time–frequency, and Doppler domains to achieve a comprehensive and discriminative representation of targets. Subsequently, we introduce a feature vector-level fusion mechanism designed specifically for few-shot learning, integrating a meta-knowledge-driven multi-stream feature extractor with an internal memory module within the feature tensor framework. This architecture constitutes the Multi-domain Temporal Feature Fusion Recognition Network (MTFF-RNet). The proposed approach is evaluated on a hybrid dataset combining simulated and experimental data, achieving a high recognition accuracy of 96.2% for both targets and interferents. Experimental results demonstrate that MTFF-RNet significantly enhances robustness and adaptability under varying underwater acoustic conditions and dynamic viewing geometries.
Liu et al. (Wed,) studied this question.