What question did this study set out to answer?

The research aims to improve audio watermarking techniques to enhance imperceptibility and robustness against attacks.

April 17, 2026Open Access

A Robust Audio Watermarking Method Based on Dual-Encoder U-Net and Short-Time Fourier Transform

Puntos clave

The research aims to improve audio watermarking techniques to enhance imperceptibility and robustness against attacks.
Developed a dual-encoder U-Net framework for watermark embedding.
Utilized Short-Time Fourier Transform for audio feature extraction.
Implemented multi-scale feature fusion to integrate audio and watermark data.
Designed an extraction network with parallel convolutional paths for improved retrieval.
Achieved high imperceptibility in embedded watermarks compared to existing methods.
Demonstrated robustness with watermark extraction accuracy approaching 100% under various attacks.

Resumen

Audio watermarking technology plays a crucial role in digital copyright protection and content authentication. In recent years, audio watermarking methods based on deep neural networks have attracted significant attention. These methods typically consist of an encoder, a distortion simulation layer, and a decoder, enabling end-to-end training for watermark embedding and extraction. However, existing approaches still face limitations in encoder structure design, primarily reflected in the insufficient fusion between watermarks and audio features, as well as the restricted ability to model spectral details and overall structures, which affects the imperceptibility and robustness of audio watermarks. To address these issues, this paper proposes a robust audio watermarking method based on a dual-encoder U-Net and Short-Time Fourier Transform. The proposed framework constructs an embedding and extraction network for audio watermarking. Specifically, the watermark embedding network consists of a dual-encoder U-Net and a multi-scale feature fusion module, which effectively extracts and integrates features from the audio amplitude spectrogram and the watermark sequence, embedding the watermark into different spectral regions to enhance imperceptibility. Meanwhile, the watermark extraction network introduces a multi-scale fusion module that integrates local and global features through parallel convolutional paths with different receptive fields, significantly improving the watermark extraction performance. Experimental results show that the proposed method not only exhibits good imperceptibility compared to other methods on the three public datasets but also demonstrates excellent robustness against multiple attacks, with watermark extraction accuracy approaching 100% under most attacks.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo