Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control
By: NVIDIA , : , Hassan Abu Alhaija and more
Potential Business Impact:
Creates realistic game worlds from simple drawings.
We introduce Cosmos-Transfer, a conditional world generation model that can generate world simulations based on multiple spatial control inputs of various modalities such as segmentation, depth, and edge. In the design, the spatial conditional scheme is adaptive and customizable. It allows weighting different conditional inputs differently at different spatial locations. This enables highly controllable world generation and finds use in various world-to-world transfer use cases, including Sim2Real. We conduct extensive evaluations to analyze the proposed model and demonstrate its applications for Physical AI, including robotics Sim2Real and autonomous vehicle data enrichment. We further demonstrate an inference scaling strategy to achieve real-time world generation with an NVIDIA GB200 NVL72 rack. To help accelerate research development in the field, we open-source our models and code at https://github.com/nvidia-cosmos/cosmos-transfer1.
Similar Papers
World Simulation with Video Foundation Models for Physical AI
CV and Pattern Recognition
Creates realistic worlds from text, images, or video.
Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models
CV and Pattern Recognition
Creates fake driving videos to train self-driving cars.
Dreamland: Controllable World Creation with Simulator and Generative Models
CV and Pattern Recognition
Makes computer worlds realistic and easy to change.