arXiv:2406.20081 Abstract | arXiv Analytics

arXiv:2406.20081 [cs.CV]Abstract References Reviews Resources

Segment Anything without Supervision

XuDong Wang, Jingfeng Yang, Trevor Darrell

Published 2024-06-28Version 1

The Segmentation Anything Model (SAM) requires labor-intensive data labeling. We present Unsupervised SAM (UnSAM) for promptable and automatic whole-image segmentation that does not require human annotations. UnSAM utilizes a divide-and-conquer strategy to "discover" the hierarchical structure of visual scenes. We first leverage top-down clustering methods to partition an unlabeled image into instance/semantic level segments. For all pixels within a segment, a bottom-up clustering method is employed to iteratively merge them into larger groups, thereby forming a hierarchical structure. These unsupervised multi-granular masks are then utilized to supervise model training. Evaluated across seven popular datasets, UnSAM achieves competitive results with the supervised counterpart SAM, and surpasses the previous state-of-the-art in unsupervised segmentation by 11% in terms of AR. Moreover, we show that supervised SAM can also benefit from our self-supervised labels. By integrating our unsupervised pseudo masks into SA-1B's ground-truth masks and training UnSAM with only 1% of SA-1B, a lightly semi-supervised UnSAM can often segment entities overlooked by supervised SAM, exceeding SAM's AR by over 6.7% and AP by 3.9% on SA-1B.

Comments: Code: https://github.com/frank-xwang/UnSAM

Categories: cs.CV, cs.LG

Keywords: first leverage top-down clustering methods, supervision, automatic whole-image segmentation, hierarchical structure, seven popular datasets

Tags: github project

Related articles: Most relevant | Search more

arXiv:2401.02418 [cs.CV] (Published 2024-01-04)

Learning to Prompt with Text Only Supervision for Vision-Language Models

Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer, Luc Van Gool, Federico Tombari

arXiv:2208.11718 [cs.CV] (Published 2022-08-24)

gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window

Mocho Go, Hideyuki Tachibana

arXiv:2502.00379 [cs.CV] (Published 2025-02-01)