What question did this study set out to answer?

The study aims to develop a robust method for realistic image fusion that adapts to varying contexts.

April 10, 2026

DreamFuse: Towards Realistic and Seamless Image Fusion Across Diverse Scenarios

Key Points

The study aims to develop a robust method for realistic image fusion that adapts to varying contexts.
Proposed a pipeline for high-quality fusion data generation.
Curated a diverse cross-scene dataset for object integration and replacement.
Introduced DreamFuse, leveraging the Diffusion Transformer architecture.
Incorporated a Positional Affine mechanism for spatial adjustments.
Utilized Localized Direct Preference Optimization for refining the model.
Demonstrated superior image fusion quality compared to state-of-the-art methods.
Achieved coherent integration of foreground and background features.
Supported diverse text-driven fusion applications.

Abstract

Image fusion seeks to seamlessly integrate fore ground objects with background scenes, producing realistic and harmonious fused images. While existing methods often insert objects directly, adaptive and interactive fusion-requiring contextual adaptation and foreground-background interplay-remains a challenging yet critical task. To address this, we first propose a pipeline for generating high-quality fusion data. By combining iterative in-context learning with existing tools, we curate a diverse cross-scene dataset supporting three core tasks: object integration, replacement, and attribute-referenced editing. Lever aging this, we introduce DreamFuse, a unified diffusion-based approach that jointly optimizes these capabilities. DreamFuse exploits the Diffusion Transformer (DiT) architecture, using its attention mechanism to extract and align foreground-background features for coherent fusion. For flexible control, we incorporate a Positional Affine mechanism, enabling precise spatial and scale adjustments while supporting diverse text-driven fusion. Fur thermore, we employ Localized Direct Preference Optimization (L-DPO), refining the model via human feedback to enhance harmony and consistency. Extensive experimental results demon strate DreamFuse's superiority over state-of-the-art approaches across multiple metrics.

Bookmark

DreamFuse: Towards Realistic and Seamless Image Fusion Across Diverse Scenarios

Key Points

Abstract

Cite This Study

Also Consider

Also Consider