Text Prompt Augmentation for Zero-shot Out-of-Distribution Detection

Xixi Liu; Christopher Zach

doi:10.1007/978-3-031-73464-9_22

Text Prompt Augmentation for Zero-shot Out-of-Distribution Detection
Paper i proceeding, 2025

Out-of-distribution (OOD) detection has been extensively studied for the reliable deployment of deep-learning models. Despite great progress in this research direction, most works focus on discriminative classifiers and perform OOD detection based on single-modal representations that consist of either visual or textual features. Moreover, they rely on training with in-distribution (ID) data. The emergence of vision-language models (e.g. CLIP) allows to perform zero-shot OOD detection by leveraging multi-modal feature embeddings and therefore only rely on labels defining ID data. Several approaches have been devised but these either need a given OOD label set, which might deviate from real OOD data, or fine-tune CLIP, which potentially has to be done for different ID datasets. In this paper, we first adapt various OOD scores developed for discriminative classifiers to CLIP. Further, we propose an enhanced method named TAG based on Text prompt AuGmentation to amplify the separation between ID and OOD data, which is simple but effective, and can be applied on various score functions. Its performance is demonstrated on CIFAR-100 and large-scale ImageNet-1k OOD detection benchmarks. It consistently improves AUROC and FPR95 on CIFAR-100 across five commonly used architectures over four baseline OOD scores. The average AUROC and FPR95 improvements are 6.35% and 10.67%, respectively. The results for ImageNet-1k follow a similar, but less pronounced pattern.

Zero-shot out-of-distribution detection

Vision-language models

Författare

Xixi Liu

Chalmers, Elektroteknik, Signalbehandling och medicinsk teknik

Forskning Andra publikationer

Christopher Zach

Chalmers, Elektroteknik, Signalbehandling och medicinsk teknik

Forskning Andra publikationer

2024 European Conference on Computer Vision

0302-9743 (ISSN) 1611-3349 (eISSN)

Vol. 15131 LNCS 364-380
978-3-031-73464-9 (ISBN)

The 18th European Conference on Computer Vision ECCV 2024.
Milan, Italy,

Ämneskategorier (SSIF 2011)

Elektroteknik och elektronik

DOI

10.1007/978-3-031-73464-9_22

Publikationsdata kopplat till DOI

Mer information

Senast uppdaterat

2025-11-05

Text Prompt Augmentation for Zero-shot Out-of-Distribution Detection Paper i proceeding, 2025