What type of study is this?

This is a Experimental Study study.

September 30, 2025Open Access

A text image dual conditional stable diffusion model for oracle bone inscription decipherment

Key Points

Decipherment accuracy improves by 11% using the proposed model integrating images and text.
The dual-conditional model utilizes generative adversarial networks, enhancing character generation.
Low-rank adaptation reduces training costs while maintaining high generative quality.
Replacing CLIP with Chinese-CLIP enhances cross-modal semantic consistency for better results.

Abstract

Oracle Bone Inscriptions (OBI), the earliest systematic writing in China, are crucial for understanding early Chinese civilization. However, many inscriptions remain undeciphered due to limited data, complex glyphs, and weak semantic consistency. Although Generative Adversarial Networks (GAN) and Diffusion Models have introduced new possibilities to the field, most existing methods primarily focus on visual features and lack semantic integration. To address this challenge, we propose DCSD-OBI, a Dual-Conditional Stable Diffusion model that integrates OBI images and modern Chinese text during reverse diffusion to jointly learn structural and semantic features, thereby improving character generation accuracy. To reduce training cost while preserving generative quality, we adopt Low-Rank Adaptation (LoRA) to fine-tune only the U-Net's cross-attention modules. Furthermore, we replace CLIP with Chinese-CLIP, a version tailored for the Chinese language, to improve cross-modal semantic consistency. Experiments results show DCSD-OBI improves decipherment accuracy by 11%, highlighting its effectiveness and potential for advancing OBI research.

Read Full Paperexternally

Perguntar à IA

Bookmark

View Full Paper