Score: 0

Dual-Process Image Generation

Published: June 2, 2025 | arXiv ID: 2506.01955v1

By: Grace Luo , Jonathan Granskog , Aleksander Holynski and more

Potential Business Impact:

Teaches image makers new drawing styles quickly.

Business Areas:

Computer Vision Hardware, Software

Prior methods for controlling image generation are limited in their ability to be taught new tasks. In contrast, vision-language models, or VLMs, can learn tasks in-context and produce the correct outputs for a given input. We propose a dual-process distillation scheme that allows feed-forward image generators to learn new tasks from deliberative VLMs. Our scheme uses a VLM to rate the generated images and backpropagates this gradient to update the weights of the image generator. Our general framework enables a wide variety of new control tasks through the same text-and-image based interface. We showcase a handful of applications of this technique for different types of control signals, such as commonsense inferences and visual prompts. With our method, users can implement multimodal controls for properties such as color palette, line weight, horizon position, and relative depth within a matter of minutes. Project page: https://dual-process.github.io.

Data Factory with Minimal Human Effort Using VLMs

CV and Pattern Recognition

Makes computers create realistic pictures from words.

7 Oct 2025 0

89%

Making VLMs More Robot-Friendly: Self-Critical Distillation of Low-Level Procedural Reasoning

Robotics

Robots learn to do tasks better by thinking.

11 Jul 2025 2

89%

HKD4VLM: A Progressive Hybrid Knowledge Distillation Framework for Robust Multimodal Hallucination and Factuality Detection in VLMs

CV and Pattern Recognition

Makes AI tell the truth, not make things up.

16 Jun 2025 1

View PDF Login to Bookmark

Page Count

19 pages

Dual-Process Image Generation

Teaches image makers new drawing styles quickly.

Technical Abstract

Data Factory with Minimal Human Effort Using VLMs

Making VLMs More Robot-Friendly: Self-Critical Distillation of Low-Level Procedural Reasoning

HKD4VLM: A Progressive Hybrid Knowledge Distillation Framework for Robust Multimodal Hallucination and Factuality Detection in VLMs