Score: 0

A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task

Published: April 24, 2025 | arXiv ID: 2504.17547v1

By: Jiaqi Deng , Zonghan Wu , Huan Huo and more

Potential Business Impact:

Helps computers answer questions using pictures and facts.

Business Areas:

Computer Vision Hardware, Software

Knowledge-based Vision Question Answering (KB-VQA) extends general Vision Question Answering (VQA) by not only requiring the understanding of visual and textual inputs but also extensive range of knowledge, enabling significant advancements across various real-world applications. KB-VQA introduces unique challenges, including the alignment of heterogeneous information from diverse modalities and sources, the retrieval of relevant knowledge from noisy or large-scale repositories, and the execution of complex reasoning to infer answers from the combined context. With the advancement of Large Language Models (LLMs), KB-VQA systems have also undergone a notable transformation, where LLMs serve as powerful knowledge repositories, retrieval-augmented generators and strong reasoners. Despite substantial progress, no comprehensive survey currently exists that systematically organizes and reviews the existing KB-VQA methods. This survey aims to fill this gap by establishing a structured taxonomy of KB-VQA approaches, and categorizing the systems into main stages: knowledge representation, knowledge retrieval, and knowledge reasoning. By exploring various knowledge integration techniques and identifying persistent challenges, this work also outlines promising future research directions, providing a foundation for advancing KB-VQA models and their applications.

Enabling Collaborative Parametric Knowledge Calibration for Retrieval-Augmented Vision Question Answering

CV and Pattern Recognition

Helps computers answer questions using pictures and facts.

5 Apr 2025 2

90%

iQUEST: An Iterative Question-Guided Framework for Knowledge Base Question Answering

Computation and Language

Helps computers answer hard questions using facts.

2 Jun 2025 0

90%

VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs

CV and Pattern Recognition

Teaches computers to understand how the world works.

25 Nov 2025 2

View PDF Login to Bookmark

Page Count

20 pages

A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task

Helps computers answer questions using pictures and facts.

Technical Abstract

Enabling Collaborative Parametric Knowledge Calibration for Retrieval-Augmented Vision Question Answering

iQUEST: An Iterative Question-Guided Framework for Knowledge Base Question Answering

VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs