Plane Geometry Problem Solving with Multi-modal Reasoning: A Survey
By: Seunghyuk Cho , Zhenyue Qin , Yang Liu and more
Potential Business Impact:
Helps computers solve geometry problems from pictures.
Plane geometry problem solving (PGPS) has recently gained significant attention as a benchmark to assess the multi-modal reasoning capabilities of large vision-language models. Despite the growing interest in PGPS, the research community still lacks a comprehensive overview that systematically synthesizes recent work in PGPS. To fill this gap, we present a survey of existing PGPS studies. We first categorize PGPS methods into an encoder-decoder framework and summarize the corresponding output formats used by their encoders and decoders. Subsequently, we classify and analyze these encoders and decoders according to their architectural designs. Finally, we outline major challenges and promising directions for future research. In particular, we discuss the hallucination issues arising during the encoding phase within encoder-decoder architectures, as well as the problem of data leakage in current PGPS benchmarks.
Similar Papers
Towards Geometry Problem Solving in the Large Model Era: A Survey
CV and Pattern Recognition
Teaches computers to solve geometry problems like humans.
Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information
CV and Pattern Recognition
Helps computers solve math problems using pictures.
GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning
Computation and Language
Helps computers understand and solve geometry problems.