Domain Generalization (DG) seeks to develop models that perform well on unseen target domains by learning domain-invariant representations. Recent advances in pre-trained Visual Foundation Models (VFMs), such as CLIP, have shown strong potential for enhancing DG through prompt tuning. However, existing VFM-based prompt tuning methods often focus on task-specific adaptation rather than disentangling domain invariant features, leaving cross-domain generalization insufficiently explored. In this paper, we address this challenge by fully leveraging the controllable and flexible language prompt in VFMs. Observing that the text modality is inherently rich in semantics and easier to disentangle, we propose a novel frame work termed Prompt Disentanglement via Language Guidance and Representation Alignment (PADG). PADG first employs a large language model (LLM) to disentangle textual prompts into domain-invariant and domain-specific components, which then guide the learning of domain-invariant visual representations. To complement the limitations of text-only guidance, we further introduce the Worst Explicit Representation Alignment (WERA) module, which enhances visual invariance by simulating bounded domain shifts through learnable stylization prompts and aligning representations between original and perturbed samples. Extensive experiments on mainstream DG benchmarks, including PACS, VLCS, OfficeHome, DomainNet, and TerraInc, demonstrate that PADG consistently outperforms existing state of-the-art methods, validating its effectiveness in robust domain invariant representation learning. The code is available at: https://anonymous.4open.science/r/paper-5403/.
Cheng et al. (Thu,) studied this question.