Sphinx: Efficiently Serving Novel View Synthesis using Regression-Guided Selective Refinement
By: Yuchen Xia , Souvik Kundu , Mosharaf Chowdhury and more
Potential Business Impact:
Makes 3D scenes look real, super fast.
Novel View Synthesis (NVS) is the task of generating new images of a scene from viewpoints that were not part of the original input. Diffusion-based NVS can generate high-quality, temporally consistent images, however, remains computationally prohibitive. Conversely, regression-based NVS offers suboptimal generation quality despite requiring significantly lower compute; leaving the design objective of a high-quality, inference-efficient NVS framework an open challenge. To close this critical gap, we present Sphinx, a training-free hybrid inference framework that achieves diffusion-level fidelity at a significantly lower compute. Sphinx proposes to use regression-based fast initialization to guide and reduce the denoising workload for the diffusion model. Additionally, it integrates selective refinement with adaptive noise scheduling, allowing more compute to uncertain regions and frames. This enables Sphinx to provide flexible navigation of the performance-quality trade-off, allowing adaptation to latency and fidelity requirements for dynamically changing inference scenarios. Our evaluation shows that Sphinx achieves an average 1.8x speedup over diffusion model inference with negligible perceptual degradation of less than 5%, establishing a new Pareto frontier between quality and latency in NVS serving.
Similar Papers
Appreciate the View: A Task-Aware Evaluation Framework for Novel View Synthesis
CV and Pattern Recognition
Checks if computer-made pictures look real.
DT-NVS: Diffusion Transformers for Novel View Synthesis
CV and Pattern Recognition
Creates new pictures of a scene from one photo.
AR-1-to-3: Single Image to Consistent 3D Object Generation via Next-View Prediction
CV and Pattern Recognition
Creates realistic 3D objects from a single picture.