Text Prompt Augmentation for Zero-shot Out-of-Distribution Detection
Paper i proceeding, 2024

Out-of-distribution (OOD) detection has been extensively studied for the reliable deployment of deep-learning models. Despite great progress in this research direction, most works focus on discriminative classifiers and perform OOD detection based on single-modal representations that consist of either visual or textual features. Moreover, they rely on training with in-distribution (ID) data. The emergence of vision-language models (e.g. CLIP) allows to perform zero-shot OOD detection by leveraging multi-modal feature embeddings and therefore only rely on labels defining ID data. Several approaches have been devised but these either need a given OOD label set, which might deviate from real OOD data, or fine-tune CLIP, which potentially has to be done for different ID datasets. In this paper, we first adapt various OOD scores developed for discriminative classifiers to CLIP. Further, we propose an enhanced method named TAG based on Text prompt AuGmentation to amplify the separation between ID and OOD data, which is simple but effective, and can be applied on various score functions. Its performance is demonstrated on CIFAR-100 and large-scale ImageNet-1k OOD detection benchmarks. It consistently improves AUROC and FPR95 on CIFAR-100 across five commonly used architectures over four baseline OOD scores. The average AUROC and FPR95 improvements are 6.35% and 10.67%, respectively. The results for ImageNet-1k follow a similar, but less pronounced pattern.

Zero-shot out-of-distribution detection

Vision-language models

Författare

Xixi Liu

Chalmers, Elektroteknik, Signalbehandling och medicinsk teknik

Christopher Zach

Chalmers, Elektroteknik, Signalbehandling och medicinsk teknik

2024 European Conference on Computer Vision

0302-9743 (ISSN) 1611-3349 (eISSN)

Vol. 15059-15147
9783031729324 (ISBN)

The 18th European Conference on Computer Vision ECCV 2024.
Milan, Italy,

Ämneskategorier

Elektroteknik och elektronik

DOI

10.1007/978-3-031-73232-4

Mer information

Senast uppdaterat

2024-10-31