Generative Artificial Intelligence (AI) has emerged as a transformative force, capable of synthesizing high-quality, coherent, and semantically rich multimodal content across text, vision, audio, video, and 3D/XR environments. Despite the explosive growth of research, existing surveys remain fragmented: they often focus narrowly on a single modality, provide only qualitative or scattered evaluations, and lack a unified taxonomy that systematically organizes techniques. More critically, most reviews neither aggregate empirical results from the literature nor contribute experimental validations, leaving readers without objective benchmarks for comparison. This survey fills these gaps by introducing a hierarchical taxonomy that unifies generative families and multimodal applications into a structured roadmap. We provide a dual-layered evaluation, combining (1) empirical analysis, where we aggregate and average reported results from a wide range of peer-reviewed studies, and (2) experimental validation, where we conduct fresh benchmarks on representative datasets to confirm and extend prior findings. This evidence-based methodology ensures fair, metric-aligned, and contextually meaningful comparisons across techniques. In addition, we introduce a technique-specific, metric-aligned evaluation framework, where each surveyed article is assessed using metrics tailored to its domain, with results summarized in detailed, justification-rich tables. Our study further provides a quantitative aggregation of performance metrics from the literature, establishing the first large-scale, consensus-driven benchmarks. Complementing this, we contribute new experimental results that validate and extend prior findings with fresh comparative baselines. By merging empirical evidence, experimental validation, and observational insights, we deliver a balanced perspective that highlights strengths, limitations, and trade-offs across generative families. Finally, we synthesize actionable insights and recommendations, guiding practitioners toward the most suitable methods for specific contexts (e.g., GANs for sharp, low-latency tasks).
Kamal Taha (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: