What question did this study set out to answer?

The aim is to improve models' abilities to generalize across different domains by focusing on disentangling domain-invariant features from domain-specific ones.

February 8, 2026

Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization

Key Points

The aim is to improve models' abilities to generalize across different domains by focusing on disentangling domain-invariant features from domain-specific ones.
Developed a framework called Prompt Disentanglement via Language Guidance and Representation Alignment (PADG).
Utilized a large language model (LLM) to separate textual prompts into domain-invariant and domain-specific parts.
Introduced the Worst Explicit Representation Alignment (WERA) module to simulate domain shifts and align visual representations.
Demonstrated consistent improvements in performance over existing state-of-the-art methods in domain generalization.
Showed effectiveness on benchmarks such as PACS, VLCS, OfficeHome, DomainNet, and TerraInc.

Abstract

Domain Generalization (DG) seeks to develop models that perform well on unseen target domains by learning domain-invariant representations. Recent advances in pre-trained Visual Foundation Models (VFMs), such as CLIP, have shown strong potential for enhancing DG through prompt tuning. However, existing VFM-based prompt tuning methods often focus on task-specific adaptation rather than disentangling domain invariant features, leaving cross-domain generalization insufficiently explored. In this paper, we address this challenge by fully leveraging the controllable and flexible language prompt in VFMs. Observing that the text modality is inherently rich in semantics and easier to disentangle, we propose a novel frame work termed Prompt Disentanglement via Language Guidance and Representation Alignment (PADG). PADG first employs a large language model (LLM) to disentangle textual prompts into domain-invariant and domain-specific components, which then guide the learning of domain-invariant visual representations. To complement the limitations of text-only guidance, we further introduce the Worst Explicit Representation Alignment (WERA) module, which enhances visual invariance by simulating bounded domain shifts through learnable stylization prompts and aligning representations between original and perturbed samples. Extensive experiments on mainstream DG benchmarks, including PACS, VLCS, OfficeHome, DomainNet, and TerraInc, demonstrate that PADG consistently outperforms existing state of-the-art methods, validating its effectiveness in robust domain invariant representation learning. The code is available at: https://anonymous.4open.science/r/paper-5403/.

اسأل الذكاء الاصطناعي

Bookmark

اسأل الذكاء الاصطناعي

Bookmark

Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization

Key Points

Abstract

Cite This Study