Score: 1

A Theorem-Proving-Based Evaluation of Neural Semantic Parsing

Published: October 13, 2025 | arXiv ID: 2510.11225v1

By: Hayate Funakura, Hyunsoo Kim, Koji Mineshima

Potential Business Impact:

Checks if computer language makes sense logically.

Business Areas:

Semantic Search Internet Services

Graph-matching metrics such as Smatch are the de facto standard for evaluating neural semantic parsers, yet they capture surface overlap rather than logical equivalence. We reassess evaluation by pairing graph-matching with automated theorem proving. We compare two approaches to building parsers: supervised fine-tuning (T5-Small/Base) and few-shot in-context learning (GPT-4o/4.1/5), under normalized and unnormalized targets. We evaluate outputs using graph-matching, bidirectional entailment between source and target formulas with a first-order logic theorem prover, and well-formedness. Across settings, we find that models performing well on graph-matching often fail to produce logically equivalent formulas. Normalization reduces incidental target variability, improves well-formedness, and strengthens logical adequacy. Error analysis shows performance degrades with increasing formula complexity and with coordination, prepositional phrases, and passive voice; the dominant failures involve variable binding and indexing, and predicate naming. These findings highlight limits of graph-based metrics for reasoning-oriented applications and motivate logic-sensitive evaluation and training objectives together with simplified, normalized target representations. All code and data for our experiments are publicly available.

Tree Matching Networks for Natural Language Inference: Parameter-Efficient Semantic Understanding via Dependency Parse Trees

Computation and Language

Teaches computers to understand sentences faster.

28 Nov 2025 0

87%

Spectral Neuro-Symbolic Reasoning II: Semantic Node Merging, Entailment Filtering, and Knowledge Graph Alignment

Computation and Language

Helps computers understand and reason like people.

2 Nov 2025 0

87%

GraphMatch: Fusing Language and Graph Representations in a Dynamic Two-Sided Work Marketplace

Machine Learning (CS)

Finds best job matches faster by understanding words and connections.

2 Dec 2025 0

View PDF Login to Bookmark

Country of Origin

🇯🇵 Japan

Repos / Data Links

github.com

Page Count

12 pages

A Theorem-Proving-Based Evaluation of Neural Semantic Parsing

Checks if computer language makes sense logically.

Technical Abstract

Tree Matching Networks for Natural Language Inference: Parameter-Efficient Semantic Understanding via Dependency Parse Trees

Spectral Neuro-Symbolic Reasoning II: Semantic Node Merging, Entailment Filtering, and Knowledge Graph Alignment

GraphMatch: Fusing Language and Graph Representations in a Dynamic Two-Sided Work Marketplace