What question did this study set out to answer?

The aim is to evaluate and improve role-playing agents through a new benchmark for multi-character interactions in a text-speech context.

May 8, 2026

OmniCharacter++: Towards Comprehensive Benchmark for Realistic Role-Playing Agents

Key Points

The aim is to evaluate and improve role-playing agents through a new benchmark for multi-character interactions in a text-speech context.
Introduced the OmniCharacter++ benchmark comprising a large-scale dataset of 10,287 characters and 118,017 multi-turn dialogues.
Developed the UniCharacter-7B model capable of handling multi-character dynamics with a focus on vocal fidelity and semantic alignment.
Assessed the performance of state-of-the-art models using the comprehensive evaluation suite for dialogue quality and naturalness.
UniCharacter-7B produces more realistic and consistent role-playing responses in terms of attractiveness and consistency.
OmniCharacter++ presents significant challenges that current models struggle to meet, indicating areas for future improvement.

Abstract

Existing Role-Playing Agents (RPAs), powered by large language models, are predominantly evaluated on static, text-only, dyadic conversations, which inadequately reflect the complexity of realistic human interactions involving multiple interlocutors and multi-modal communication. To bridge this gap, we propose OmniCharacter++, the first benchmark for evaluating multi-character interactions in a joint text-speech context. Specifically, OmniCharacter++ contributes: (1) a large-scale dataset comprising 10,287 characters, 118,017 multi-turn dialogues, and over one million audio responses across 8 open-world topics and 31 subfields, covering diverse multi-modal role-playing scenarios; (2) a comprehensive evaluation suite for dialogue understanding, generation quality, and perceptual naturalness; and (3) UniCharacter-7B, a unified text-speech model trained on this dataset to manage complex multi-character dynamics, ensuring both role-specific vocal fidelity and cross-participant semantic alignment. Experimental results demonstrate that UniCharacter-7B achieves more realistic and consistent role-playing responses in terms of both attractiveness and consistency, while also highlighting that OmniCharacter++ poses substantial challenges for state-of-the-art models, charting a clear path for future research. The Code is publicly available at: https://github.com/zchoi/OmniCharacter-plus.

Bookmark

OmniCharacter++: Towards Comprehensive Benchmark for Realistic Role-Playing Agents

Key Points

Abstract

Cite This Study