What type of study is this?

This is a Experimental Study study.

October 1, 2025Open Access

MAFF-Net: A Multi-level Acoustic Feature Fusion Network For Synthetic Audio Detection

Key Points

MAFF-Net significantly improves synthetic speech detection through advanced feature fusion techniques, leading to more reliable detection.
The multi-level acoustic feature extraction captures both low-level and high-level characteristics for enhanced accuracy across different approaches.
Experiments demonstrate that MAFF-Net outperformed existing models on a range of datasets, suggesting its superior robustness.
The introduction of the Chinese Advanced Synthetic Speech Dataset offers a unique benchmark to evaluate detection capabilities against various synthesis techniques.

Abstract

Voice spoofing attacks have become a significant challenge in today’s security domain. Although progress has been made in synthetic speech detection technology, existing detection methods still struggle to effectively identify unknown attack strategies. To address these challenges, we propose a novel multi-level acoustic feature fusion framework, MAFF-Net, which comprises three main components: multi-level acoustic feature extraction, cross-attention feature fusion and graph-aggregated detection module. The multi-level acoustic feature extraction module involves two complementary processes: multi-spectrogram feature extraction, which captures low-level physical characteristics of the audio signal, and Wav2vec2 feature extraction, which focuses on high-level speech representations. These multi-level features are subsequently integrated through cross-attention, enhancing the discriminative power of the model. To better evaluate the generalization capability of the proposed model, we introduce Chinese Advanced Synthetic Speech Dataset (CASSD), a new dataset that incorporates speech generated using 11 state-of-the-art synthesis techniques. Extensive experiments conducted across four different datasets demonstrate that our approach consistently outperforms existing single-model methods, highlighting the superior performance of MAFF-Net in synthetic speech detection.

Read Full Paperexternally

KI fragen

Bookmark

View Full Paper

Cite This Study

Chen et al. (Tue,) studied this question.

synapsesocial.com/papers/68dd89e6fe798ba2fc49823d https://doi.org/https://doi.org/10.1145/3767331

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

KI fragen

Bookmark

View Full Paper