What question did this study set out to answer?

The aim is to develop an efficient framework for generating high-resolution Thangka artwork through improved spatial reasoning.

June 30, 2026Open Access

MMED: a Mamba-enhanced multi-scale diffusion model for efficient thangka image generation

Key Points

The aim is to develop an efficient framework for generating high-resolution Thangka artwork through improved spatial reasoning.
Introduced MMED, integrating Mamba Spatial Mixer for adaptive sparse grid-scanning.
Developed Dual-Stream Gated Mamba Cross-Attention for better semantic-spatial alignment.
Implemented Adaptive Parallel Mamba Residual block to enhance feature propagation.
Reduced training time by 35% and inference latency by 60%.
Achieved 16% improvement in FID and 31% increase in IS compared to baselines.
Validated performance on the CUB dataset, indicating strong generalization capabilities.

Abstract

Text-to-image generation for Thangka artwork requires high-resolution synthesis and precise semantic–spatial alignment. However, existing diffusion models suffer from high computational overhead and struggle with spatial reasoning at scale. This paper introduces MMED, a Mamba-enhanced multi-scale diffusion framework for efficient Thangka generation. First, the Mamba Spatial Mixer (MSM) replaces quadratic self-attention with adaptive sparse grid-scanning, achieving near-linear complexity while capturing long-range dependencies. Second, the Dual-Stream Gated Mamba Cross-Attention (DSG-MCA) module couples textual instructions with positional encodings for fine-grained semantic-spatial precision. Third, the Adaptive Parallel Mamba Residual (APMR) block integrates convolution with state-space dynamics to improve feature propagation and training stability. Experiments on a curated Thangka dataset show MMED reduces training time by 35% and inference latency by 60%, while achieving 16% FID improvement and 31% IS increase over strong baselines. Superior performance on the CUB dataset validates the generalization capability, offering a new perspective for efficient cultural heritage generation.

KI fragen

Bookmark

View Full Paper

Cite This Study

Hu et al. (Sat,) studied this question.

synapsesocial.com/papers/6a435c08759b888809a5297e https://doi.org/https://doi.org/10.1038/s40494-026-02777-0

KI fragen

Bookmark

View Full Paper