What type of study is this?

This is a Literature Review study.

October 20, 2025Open Access

Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey

Key Points

This survey provides a structured taxonomy for assessing large audio-language models across four dimensions.
Evaluation metrics for large audio-language models encompass general auditory awareness, reasoning, dialogue ability, and fairness.
A comprehensive overview of LALM evaluations reveals existing benchmarks' fragmentation and the need for improved clarity.
Insights into challenges within the field suggest avenues for future research in large audio-language models' evaluations.

Abstract

With advancements in large audio-language models (LALMs), which enhance large language models (LLMs) with auditory capabilities, these models are expected to demonstrate universal proficiency across various auditory tasks. While numerous benchmarks have emerged to assess LALMs' performance, they remain fragmented and lack a structured taxonomy. To bridge this gap, we conduct a comprehensive survey and propose a systematic taxonomy for LALM evaluations, categorizing them into four dimensions based on their objectives: (1) General Auditory Awareness and Processing, (2) Knowledge and Reasoning, (3) Dialogue-oriented Ability, and (4) Fairness, Safety, and Trustworthiness. We provide detailed overviews within each category and highlight challenges in this field, offering insights into promising future directions. To the best of our knowledge, this is the first survey specifically focused on the evaluations of LALMs, providing clear guidelines for the community. We will release the collection of the surveyed papers and actively maintain it to support ongoing advancements in the field.

Read Full Paperexternally

Demander à l'IA

Bookmark

View Full Paper

Cite This Study

Yang et al. (Wed,) studied this question.

synapsesocial.com/papers/68f5c338e2d8b12842645c63 https://doi.org/https://doi.org/10.48550/arxiv.2505.15957

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Demander à l'IA

Bookmark

View Full Paper