When TableQA Meets Noise: A Dual Denoising Framework for Complex Questions and Large-scale Tables
By: Shenghao Ye , Yu Guo , Dong Jin and more
Potential Business Impact:
Cleans messy tables for smarter answers.
Table question answering (TableQA) is a fundamental task in natural language processing (NLP). The strong reasoning capabilities of large language models (LLMs) have brought significant advances in this field. However, as real-world applications involve increasingly complex questions and larger tables, substantial noisy data is introduced, which severely degrades reasoning performance. To address this challenge, we focus on improving two core capabilities: Relevance Filtering, which identifies and retains information truly relevant to reasoning, and Table Pruning, which reduces table size while preserving essential content. Based on these principles, we propose EnoTab, a dual denoising framework for complex questions and large-scale tables. Specifically, we first perform Evidence-based Question Denoising by decomposing the question into minimal semantic units and filtering out those irrelevant to answer reasoning based on consistency and usability criteria. Then, we propose Evidence Tree-guided Table Denoising, which constructs an explicit and transparent table pruning path to remove irrelevant data step by step. At each pruning step, we observe the intermediate state of the table and apply a post-order node rollback mechanism to handle abnormal table states, ultimately producing a highly reliable sub-table for final answer reasoning. Finally, extensive experiments show that EnoTab achieves outstanding performance on TableQA tasks with complex questions and large-scale tables, confirming its effectiveness.
Similar Papers
Towards Question Answering over Large Semi-structured Tables
Computation and Language
Finds answers in huge computer tables faster.
TableReasoner: Advancing Table Reasoning Framework with Large Language Models
Artificial Intelligence
Answers questions from messy computer tables.
LLM-Symbolic Integration for Robust Temporal Tabular Reasoning
Computation and Language
Helps computers answer questions from tables better.