What question did this study set out to answer?

This research aims to improve 3D cartoon facial animation through an effective framework for modeling and driving animation from speech.

May 16, 2026Open Access

Let Toon Talk: Speech-Driven 3D Cartoon Animation via Parametric Modeling and Flow Matching

Key Points

This research aims to improve 3D cartoon facial animation through an effective framework for modeling and driving animation from speech.
Developed a two-stage cascaded framework for speech-driven animation.
Proposed a parametric adaptation mechanism for diverse facial topologies.
Created a hybrid dataset to enhance training and generalization.
Proven high-quality, temporally coherent animations on unseen cartoon characters.
Achieved identity-adaptive motion synthesis without pretraining on specific subjects.
Demonstrated effective generalization across various humanoid cartoon avatars.

Abstract

Speech-driven 3D cartoon facial animation remains underexplored due to the difficulty of handling heterogeneous geometries with exaggerated proportions, limited generalization to diverse unseen subjects, and the scarcity of datasets. To address these challenges, we propose Let Toon Talk, a two-stage cascaded framework that effectively mitigates these bottlenecks in both modeling and driving. It enables one-shot, speech-synchronized 3D animation from a single unseen humanoid cartoon image, driven by arbitrary audio. Specifically, for avatar modeling, we propose a parametric adaptation mechanism to capture diverse heterogeneous facial topologies, which subsequently guides a feed-forward reconstruction module to create high-quality 3D Gaussian Splatting (3DGS) avatars. Building upon this, for speech driving, we introduce an Identity-Adaptive Flow Matching network. This generative module effectively maps audio to precise facial dynamics, achieving identity-adaptive motion synthesis for diverse humanoid cartoon characters without per-subject pretraining. Furthermore, we construct a hybrid cartoon talking-face dataset with a systematic curation strategy to bridge the data gap. Extensive experiments demonstrate that our framework produces high-quality, temporally coherent animations, exhibiting effective generalization on unseen structurally humanoid cartoon characters.

Let Toon Talk: Speech-Driven 3D Cartoon Animation via Parametric Modeling and Flow Matching

Key Points

Abstract

Cite This Study