Score: 0

Beyond Single Prompts: Synergistic Fusion and Arrangement for VICL

Published: January 15, 2026 | arXiv ID: 2601.10117v1

By: Wenwen Liao , Jianbo Yu , Yuansong Wang and more

Vision In-Context Learning (VICL) enables inpainting models to quickly adapt to new visual tasks from only a few prompts. However, existing methods suffer from two key issues: (1) selecting only the most similar prompt discards complementary cues from other high-quality prompts; and (2) failing to exploit the structured information implied by different prompt arrangements. We propose an end-to-end VICL framework to overcome these limitations. Firstly, an adaptive Fusion Module aggregates critical patterns and annotations from multiple prompts to form more precise contextual prompts. Secondly, we introduce arrangement-specific lightweight MLPs to decouple layout priors from the core model, while minimally affecting the overall model. In addition, an bidirectional fine-tuning mechanism swaps the roles of query and prompt, encouraging the model to reconstruct the original prompt from fused context and thus enhancing collaboration between the fusion module and the inpainting model. Experiments on foreground segmentation, single-object detection, and image colorization demonstrate superior results and strong cross-task generalization of our method.

Exploring Task-Level Optimal Prompts for Visual In-Context Learning

Artificial Intelligence

Teaches computers to learn faster with fewer examples.

15 Jan 2025 0

91%

Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning

CV and Pattern Recognition

Helps computers learn tasks by looking at examples.

30 Apr 2025 1

91%

T2T-VICL: Unlocking the Boundaries of Cross-Task Visual In-Context Learning via Implicit Text-Driven VLMs

CV and Pattern Recognition

Helps AI understand different picture tasks together.

20 Nov 2025 0

View PDF Login to Bookmark

Beyond Single Prompts: Synergistic Fusion and Arrangement for VICL

Technical Abstract

Exploring Task-Level Optimal Prompts for Visual In-Context Learning

Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning

T2T-VICL: Unlocking the Boundaries of Cross-Task Visual In-Context Learning via Implicit Text-Driven VLMs