MVInverse: Feed-forward Multi-view Inverse Rendering in Seconds
By: Xiangzuo Wu , Chengwei Ren , Jun Zhou and more
Multi-view inverse rendering aims to recover geometry, materials, and illumination consistently across multiple viewpoints. When applied to multi-view images, existing single-view approaches often ignore cross-view relationships, leading to inconsistent results. In contrast, multi-view optimization methods rely on slow differentiable rendering and per-scene refinement, making them computationally expensive and hard to scale. To address these limitations, we introduce a feed-forward multi-view inverse rendering framework that directly predicts spatially varying albedo, metallic, roughness, diffuse shading, and surface normals from sequences of RGB images. By alternating attention across views, our model captures both intra-view long-range lighting interactions and inter-view material consistency, enabling coherent scene-level reasoning within a single forward pass. Due to the scarcity of real-world training data, models trained on existing synthetic datasets often struggle to generalize to real-world scenes. To overcome this limitation, we propose a consistency-based finetuning strategy that leverages unlabeled real-world videos to enhance both multi-view coherence and robustness under in-the-wild conditions. Extensive experiments on benchmark datasets demonstrate that our method achieves state-of-the-art performance in terms of multi-view consistency, material and normal estimation quality, and generalization to real-world imagery.
Similar Papers
Intrinsic Image Fusion for Multi-View 3D Material Reconstruction
CV and Pattern Recognition
Makes computer images look like real things.
Inverse Image-Based Rendering for Light Field Generation from Single Images
CV and Pattern Recognition
Makes one picture look like many from different angles.
FROMAT: Multiview Material Appearance Transfer via Few-Shot Self-Attention Adaptation
CV and Pattern Recognition
Changes how things look in 3D pictures.