Dual-Process Image Generation
By: Grace Luo , Jonathan Granskog , Aleksander Holynski and more
Potential Business Impact:
Teaches image makers new drawing styles quickly.
Prior methods for controlling image generation are limited in their ability to be taught new tasks. In contrast, vision-language models, or VLMs, can learn tasks in-context and produce the correct outputs for a given input. We propose a dual-process distillation scheme that allows feed-forward image generators to learn new tasks from deliberative VLMs. Our scheme uses a VLM to rate the generated images and backpropagates this gradient to update the weights of the image generator. Our general framework enables a wide variety of new control tasks through the same text-and-image based interface. We showcase a handful of applications of this technique for different types of control signals, such as commonsense inferences and visual prompts. With our method, users can implement multimodal controls for properties such as color palette, line weight, horizon position, and relative depth within a matter of minutes. Project page: https://dual-process.github.io.
Similar Papers
Data Factory with Minimal Human Effort Using VLMs
CV and Pattern Recognition
Makes computers create realistic pictures from words.
Making VLMs More Robot-Friendly: Self-Critical Distillation of Low-Level Procedural Reasoning
Robotics
Robots learn to do tasks better by thinking.
HKD4VLM: A Progressive Hybrid Knowledge Distillation Framework for Robust Multimodal Hallucination and Factuality Detection in VLMs
CV and Pattern Recognition
Makes AI tell the truth, not make things up.