Uni3DL: A unified model for 3D vision-language understanding

Li, Xiang; Ding, Jian; Chen, Zhaoyang; Elhoseiny, Mohamed

Download

Preview

Text
- Accepted Version

Advice

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Tools

Lists

Li, X. ORCID: https://orcid.org/0000-0002-9946-7000, Ding, J., Chen, Z. and Elhoseiny, M. (2024) Uni3DL: A unified model for 3D vision-language understanding. In: ECCV 2024, 29 Sep — 4 Oct 2024, Milan, Italy, pp. 74-92. doi: 10.1007/978-3-031-73337-6_5

Abstract/Summary

We present Uni3DL, a unified model for 3D Vision-Language understanding. Distinct from existing unified 3D vision-language models that mostly rely on projected multi-view images and support limited tasks, Uni3DL operates directly on point clouds and significantly broadens the spectrum of tasks in the 3D domain, encompassing both vision and vision-language tasks. At the core of Uni3DL, a query transformer is designed to learn task-agnostic semantic and mask outputs by attending to 3D visual features, and a task router is employed to selectively produce task-specific outputs required for diverse tasks. With a unified architecture, our Uni3DL model enjoys seamless task decomposition and substantial parameter sharing across tasks. Uni3DL has been rigorously evaluated across diverse 3D vision-language understanding tasks, including semantic segmentation, object detection, instance segmentation, visual grounding, 3D captioning, and text-3D cross-modal retrieval. It demonstrates performance on par with or surpassing state-of-the-art (SOTA) task-specific models. We hope our benchmark and Uni3DL model will serve as a solid step to ease future research in unified models in the realm of 3D vision-language understanding. Project page: https://uni3dl.github.io/.

Altmetric Badge

Dimensions Badge

Item Type	Conference or Workshop Item (Paper)
URI	https://centaur.reading.ac.uk/id/eprint/119818
Identification Number/DOI	10.1007/978-3-031-73337-6_5
Refereed	Yes
Divisions	No Reading authors. Back catalogue items Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
Publisher	Springer Nature Switzerland
Download/View statistics	View download statistics for this item

Download Statistics

Downloads

Downloads per month over past year

Deposit Details

CORE (COnnecting REpositories)

University Staff: Request a correction | Centaur Editors: Update this record

Date Deposited:	09 Jan 2025 09:41	Date item deposited into CentAUR
Last Modified:	31 Oct 2025 08:18	Date item last modified