What question did this study set out to answer?

The research aims to unify generative AI techniques across multiple modalities and provide a structured evaluation framework.

March 21, 2026Open Access

Generative AI for multimodal content: a survey with empirical and experimental evaluations

Key Points

The research aims to unify generative AI techniques across multiple modalities and provide a structured evaluation framework.
Developed a hierarchical taxonomy for generative AI techniques.
Conducted empirical analysis by aggregating results from diverse peer-reviewed studies.
Performed experimental validation with fresh benchmarks on representative datasets.
Introduced a metric-aligned evaluation framework tailored to specific domains.
Established the first large-scale, consensus-driven benchmarks for generative AI techniques.
Provided new experimental results that validate prior findings with comparative baselines.
Identified strengths and limitations of various generative families, supporting informed decision-making.

Abstract

Generative Artificial Intelligence (AI) has emerged as a transformative force, capable of synthesizing high-quality, coherent, and semantically rich multimodal content across text, vision, audio, video, and 3D/XR environments. Despite the explosive growth of research, existing surveys remain fragmented: they often focus narrowly on a single modality, provide only qualitative or scattered evaluations, and lack a unified taxonomy that systematically organizes techniques. More critically, most reviews neither aggregate empirical results from the literature nor contribute experimental validations, leaving readers without objective benchmarks for comparison. This survey fills these gaps by introducing a hierarchical taxonomy that unifies generative families and multimodal applications into a structured roadmap. We provide a dual-layered evaluation, combining (1) empirical analysis, where we aggregate and average reported results from a wide range of peer-reviewed studies, and (2) experimental validation, where we conduct fresh benchmarks on representative datasets to confirm and extend prior findings. This evidence-based methodology ensures fair, metric-aligned, and contextually meaningful comparisons across techniques. In addition, we introduce a technique-specific, metric-aligned evaluation framework, where each surveyed article is assessed using metrics tailored to its domain, with results summarized in detailed, justification-rich tables. Our study further provides a quantitative aggregation of performance metrics from the literature, establishing the first large-scale, consensus-driven benchmarks. Complementing this, we contribute new experimental results that validate and extend prior findings with fresh comparative baselines. By merging empirical evidence, experimental validation, and observational insights, we deliver a balanced perspective that highlights strengths, limitations, and trade-offs across generative families. Finally, we synthesize actionable insights and recommendations, guiding practitioners toward the most suitable methods for specific contexts (e.g., GANs for sharp, low-latency tasks).

KI fragen

Bookmark

View Full Paper