Score: 2

Omni-View: Unlocking How Generation Facilitates Understanding in Unified 3D Model based on Multiview images

Published: November 10, 2025 | arXiv ID: 2511.07222v1

By: JiaKui Hu , Shanshan Zhao , Qing-Guo Chen and more

BigTech Affiliations: Alibaba

Potential Business Impact:

Builds 3D worlds from many pictures.

Business Areas:

Image Recognition Data and Analytics, Software

This paper presents Omni-View, which extends the unified multimodal understanding and generation to 3D scenes based on multiview images, exploring the principle that "generation facilitates understanding". Consisting of understanding model, texture module, and geometry module, Omni-View jointly models scene understanding, novel view synthesis, and geometry estimation, enabling synergistic interaction between 3D scene understanding and generation tasks. By design, it leverages the spatiotemporal modeling capabilities of its texture module responsible for appearance synthesis, alongside the explicit geometric constraints provided by its dedicated geometry module, thereby enriching the model's holistic understanding of 3D scenes. Trained with a two-stage strategy, Omni-View achieves a state-of-the-art score of 55.4 on the VSI-Bench benchmark, outperforming existing specialized 3D understanding models, while simultaneously delivering strong performance in both novel view synthesis and 3D scene generation.

OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes

CV and Pattern Recognition

Creates realistic 3D worlds from 2D pictures.

30 Oct 2025 0

89%

OmniScene: Attention-Augmented Multimodal 4D Scene Understanding for Autonomous Driving

CV and Pattern Recognition

Helps self-driving cars understand scenes like people.

24 Sep 2025 3

88%

OmniVGGT: Omni-Modality Driven Visual Geometry Grounded

CV and Pattern Recognition

Helps robots see and move better with more information.

13 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

17 pages

Omni-View: Unlocking How Generation Facilitates Understanding in Unified 3D Model based on Multiview images

Builds 3D worlds from many pictures.

Technical Abstract

OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes

OmniScene: Attention-Augmented Multimodal 4D Scene Understanding for Autonomous Driving

OmniVGGT: Omni-Modality Driven Visual Geometry Grounded