Latent Domain Prompt Learning for Vision-Language Models
Paper in proceeding, 2026

The objective of domain generalization (DG) is to enable models to be robust against domain shift. DG is crucial for deploying vision-language models (VLMs) in real-world applications, yet most existing methods rely on domain labels that may not be available and often ambiguous. We instead study the DG setting where models must generalize well without access to explicit domain labels. Our key idea is to represent an unseen target domain as a combination of latent domains automatically discovered from training data, enabling the model to adaptively transfer knowledge across domains. To realize this, we perform latent domain clustering on image features and fuse domain-specific text features based on the similarity between the input image and each latent domain. Experiments on four benchmarks show that this strategy yields consistent gains over VLM-based baselines and provides new insights into improving robustness under domain shift.

representation learning

latent domain clustering

Vision-language model

domain generalization

prompt learning

Author

Zhixing Li

Chalmers, Computer Science and Engineering (Chalmers), Functional Programming

Arsham Gholamzadeh Khoee

Chalmers, Computer Science and Engineering (Chalmers), Functional Programming

Yinan Yu

Chalmers, Computer Science and Engineering (Chalmers), Functional Programming

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

15206149 (ISSN)


979-8-3315-6701-9 (ISBN)

ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Barcelona, Spain,

Subject Categories (SSIF 2025)

Computer graphics and computer vision

Computer Sciences

DOI

10.1109/ICASSP55912.2026.11464001

More information

Latest update

5/20/2026