VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning
By: Baolu Li , Yiming Zhang , Qinghe Wang and more
Potential Business Impact:
Makes any video effect with one example.
Visual effects (VFX) are crucial to the expressive power of digital media, yet their creation remains a major challenge for generative AI. Prevailing methods often rely on the one-LoRA-per-effect paradigm, which is resource-intensive and fundamentally incapable of generalizing to unseen effects, thus limiting scalability and creation. To address this challenge, we introduce VFXMaster, the first unified, reference-based framework for VFX video generation. It recasts effect generation as an in-context learning task, enabling it to reproduce diverse dynamic effects from a reference video onto target content. In addition, it demonstrates remarkable generalization to unseen effect categories. Specifically, we design an in-context conditioning strategy that prompts the model with a reference example. An in-context attention mask is designed to precisely decouple and inject the essential effect attributes, allowing a single unified model to master the effect imitation without information leakage. In addition, we propose an efficient one-shot effect adaptation mechanism to boost generalization capability on tough unseen effects from a single user-provided video rapidly. Extensive experiments demonstrate that our method effectively imitates various categories of effect information and exhibits outstanding generalization to out-of-domain effects. To foster future research, we will release our code, models, and a comprehensive dataset to the community.
Similar Papers
IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning
CV and Pattern Recognition
Adds cool effects to videos without changing the background.
Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation
CV and Pattern Recognition
Combines many movie effects in one place.
Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation
CV and Pattern Recognition
Makes movies have many special effects at once.