Key points are not available for this paper at this time.
Talking head generation is an essential task in various real-world applications such as film making and virtual reality. To this end, recent works focus on the NeRF-based methods that can capture the 3D structural information of faces and generate more natural and vivid talking videos. However, the existing NeRF-based methods fail to accurately generate the audio-synced videos. In this paper, we point out that the previous methods do not consider the audio-visual representations explicitly, which is crucial for precise lip synchronization. Moreover, the existing methods struggle to generate high-frequency details, making the generation results unnatural. To overcome these problems, we propose a novel audio-synced and high-fidelity NeRF-based talking head generation framework, named Wav2NeRF, which learns audio-visual cross-modality representations and employs the wavelet transform for better visual quality. In precise, we adopt a 2D CNN-based neural rendering decoder to a NeRF-based encoder for fast generation of the whole image to employ a new multi-level SyncNet loss for accurate lip synchronization. We also propose a novel cross-attention module to effectively fuse the image and the audio representation. In addition, we integrate the wavelet transform into our framework by proposing the wavelet loss function to enhance high-frequency details. We demonstrate that the proposed method renders realistic and audio-synced talking head videos and shows outstanding performances on average in 4 representative metrics, including PSNR (+ 4.7%), SSIM (+ 2.2%), LMD (+ 51.3%), and SyncNet Confidence (+ 154.7%) compared to the NeRF-based current state-of-the-art methods.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ah-Hyung Shin
Kyung Hee University
Jae Ho Lee
Seoul National University
Jiwon Hwang
Theodore Roosevelt High School
Image and Vision Computing
Kyung Hee University
Electronics and Telecommunications Research Institute
Building similarity graph...
Analyzing shared references across papers
Loading...
Shin et al. (Fri,) studied this question.
synapsesocial.com/papers/68e67752b6db643587601476 — DOI: https://doi.org/10.1016/j.imavis.2024.105104