{ "id": "2407.20664", "version": "v1", "published": "2024-07-30T08:59:05.000Z", "updated": "2024-07-30T08:59:05.000Z", "title": "3D-GRES: Generalized 3D Referring Expression Segmentation", "authors": [ "Changli Wu", "Yihang Liu", "Jiayi Ji", "Yiwei Ma", "Haowei Wang", "Gen Luo", "Henghui Ding", "Xiaoshuai Sun", "Rongrong Ji" ], "comment": "Accepted by ACM MM 2024 (Oral), Code: https://github.com/sosppxo/3D-GRES", "categories": [ "cs.CV" ], "abstract": "3D Referring Expression Segmentation (3D-RES) is dedicated to segmenting a specific instance within a 3D space based on a natural language description. However, current approaches are limited to segmenting a single target, restricting the versatility of the task. To overcome this limitation, we introduce Generalized 3D Referring Expression Segmentation (3D-GRES), which extends the capability to segment any number of instances based on natural language instructions. In addressing this broader task, we propose the Multi-Query Decoupled Interaction Network (MDIN), designed to break down multi-object segmentation tasks into simpler, individual segmentations. MDIN comprises two fundamental components: Text-driven Sparse Queries (TSQ) and Multi-object Decoupling Optimization (MDO). TSQ generates sparse point cloud features distributed over key targets as the initialization for queries. Meanwhile, MDO is tasked with assigning each target in multi-object scenarios to different queries while maintaining their semantic consistency. To adapt to this new task, we build a new dataset, namely Multi3DRes. Our comprehensive evaluations on this dataset demonstrate substantial enhancements over existing models, thus charting a new path for intricate multi-object 3D scene comprehension. The benchmark and code are available at https://github.com/sosppxo/3D-GRES.", "revisions": [ { "version": "v1", "updated": "2024-07-30T08:59:05.000Z" } ], "analyses": { "keywords": [ "generalized 3d referring expression segmentation", "sparse point cloud features", "multi-object 3d scene comprehension", "generates sparse point cloud" ], "tags": [ "github project" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }