arXiv:2305.07304 Abstract | arXiv Analytics

arXiv:2305.07304 [cs.CV]Abstract References Reviews Resources

CLIP-Count: Towards Text-Guided Zero-Shot Object Counting

Ruixiang Jiang, Lingbo Liu, Changwen Chen

Published 2023-05-12Version 1

Recent advances in visual-language models have shown remarkable zero-shot text-image matching ability that is transferable to down-stream tasks such as object detection and segmentation. However, adapting these models for object counting, which involves estimating the number of objects in an image, remains a formidable challenge. In this study, we conduct the first exploration of transferring visual-language models for class-agnostic object counting. Specifically, we propose CLIP-Count, a novel pipeline that estimates density maps for open-vocabulary objects with text guidance in a zero-shot manner, without requiring any finetuning on specific object classes. To align the text embedding with dense image features, we introduce a patch-text contrastive loss that guides the model to learn informative patch-level image representations for dense prediction. Moreover, we design a hierarchical patch-text interaction module that propagates semantic information across different resolution levels of image features. Benefiting from the full exploitation of the rich image-text alignment knowledge of pretrained visual-language models, our method effectively generates high-quality density maps for objects-of-interest. Extensive experiments on FSC-147, CARPK, and ShanghaiTech crowd counting datasets demonstrate that our proposed method achieves state-of-the-art accuracy and generalizability for zero-shot object counting. Project page at https://github.com/songrise/CLIP-Count

Comments: Under review

Categories: cs.CV, cs.AI

Keywords: text-guided zero-shot object counting, generates high-quality density maps, informative patch-level image representations, crowd counting datasets demonstrate, effectively generates high-quality density

Related articles:

arXiv:2303.10318 [cs.CV] (Published 2023-03-18)

Crowd Counting with Online Knowledge Learning

Shengqin Jiang, Bowen Li, Fengna Cheng, Qingshan Liu

arXiv:2107.13271 [cs.CV] (Published 2021-07-28)

Spatial Uncertainty-Aware Semi-Supervised Crowd Counting

Yanda Meng, Hongrun Zhang, Yitian Zhao, Xiaoyun Yang, Xuesheng Qian, Xiaowei Huang, Yalin Zheng

arXiv:2209.02955 [cs.CV] (Published 2022-09-07)

Semi-supervised Crowd Counting via Density Agency