Beyond Single Prompts: Synergistic Fusion and Arrangement for VICL
By: Wenwen Liao , Jianbo Yu , Yuansong Wang and more
Vision In-Context Learning (VICL) enables inpainting models to quickly adapt to new visual tasks from only a few prompts. However, existing methods suffer from two key issues: (1) selecting only the most similar prompt discards complementary cues from other high-quality prompts; and (2) failing to exploit the structured information implied by different prompt arrangements. We propose an end-to-end VICL framework to overcome these limitations. Firstly, an adaptive Fusion Module aggregates critical patterns and annotations from multiple prompts to form more precise contextual prompts. Secondly, we introduce arrangement-specific lightweight MLPs to decouple layout priors from the core model, while minimally affecting the overall model. In addition, an bidirectional fine-tuning mechanism swaps the roles of query and prompt, encouraging the model to reconstruct the original prompt from fused context and thus enhancing collaboration between the fusion module and the inpainting model. Experiments on foreground segmentation, single-object detection, and image colorization demonstrate superior results and strong cross-task generalization of our method.
Similar Papers
Exploring Task-Level Optimal Prompts for Visual In-Context Learning
Artificial Intelligence
Teaches computers to learn faster with fewer examples.
Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning
CV and Pattern Recognition
Helps computers learn tasks by looking at examples.
T2T-VICL: Unlocking the Boundaries of Cross-Task Visual In-Context Learning via Implicit Text-Driven VLMs
CV and Pattern Recognition
Helps AI understand different picture tasks together.