arXiv:2406.00345 Abstract | arXiv Analytics

arXiv:2406.00345 [cs.CV]Abstract References Reviews Resources

DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection

Zhi Zhou, Ming Yang, Jiang-Xin Shi, Lan-Zhe Guo, Yu-Feng Li

Published 2024-06-01Version 1

Vision-language models (VLMs), such as CLIP, have demonstrated impressive zero-shot capabilities for various downstream tasks. Their performance can be further enhanced through few-shot prompt tuning methods. However, current studies evaluate the performance of learned prompts separately on base and new classes. This evaluation lacks practicality for real-world applications since downstream tasks cannot determine whether the data belongs to base or new classes in advance. In this paper, we explore a problem setting called Open-world Prompt Tuning (OPT), which involves tuning prompts on base classes and evaluating on a combination of base and new classes. By introducing Decomposed Prompt Tuning framework (DePT), we theoretically demonstrate that OPT can be solved by incorporating out-of-distribution detection into prompt tuning, thereby enhancing the base-to-new discriminability. Based on DePT, we present a novel prompt tuning approach, namely, Decomposed Context Optimization (DeCoOp), which introduces new-class detectors and sub-classifiers to further enhance the base-class and new-class discriminability. Experimental results on 11 benchmark datasets validate the effectiveness of DePT and demonstrate that DeCoOp outperforms current state-of-the-art methods, providing a significant 2% average accuracy improvement.

Comments: Accepted by ICML 2024. Code is available at: https://wnjxyk.github.io/DeCoOp

Categories: cs.CV, cs.LG

Keywords: out-of-distribution detection, robust prompt tuning, decomposed prompt tuning framework, decoop outperforms current state-of-the-art methods, downstream tasks

Tags: github project

Related articles: Most relevant | Search more

arXiv:2204.03934 [cs.CV] (Published 2022-04-08)

Does Robustness on ImageNet Transfer to Downstream Tasks?

Yutaro Yamada, Mayu Otani

arXiv:2109.01134 [cs.CV] (Published 2021-09-02)

Learning to Prompt for Vision-Language Models

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu

arXiv:2301.04101 [cs.CV] (Published 2023-01-10)