TabReX : Tabular Referenceless eXplainable Evaluation
By: Tejas Anvekar , Juhna Park , Aparna Garimella and more
Potential Business Impact:
Checks if computer-made tables are good.
Evaluating the quality of tables generated by large language models (LLMs) remains an open challenge: existing metrics either flatten tables into text, ignoring structure, or rely on fixed references that limit generalization. We present TabReX, a reference-less, property-driven framework for evaluating tabular generation via graph-based reasoning. TabReX converts both source text and generated tables into canonical knowledge graphs, aligns them through an LLM-guided matching process, and computes interpretable, rubric-aware scores that quantify structural and factual fidelity. The resulting metric provides controllable trade-offs between sensitivity and specificity, yielding human-aligned judgments and cell-level error traces. To systematically asses metric robustness, we introduce TabReX-Bench, a large-scale benchmark spanning six domains and twelve planner-driven perturbation types across three difficulty tiers. Empirical results show that TabReX achieves the highest correlation with expert rankings, remains stable under harder perturbations, and enables fine-grained model-vs-prompt analysis establishing a new paradigm for trustworthy, explainable evaluation of structured generation systems.
Similar Papers
T-REX: Table -- Refute or Entail eXplainer
Computation and Language
Checks if facts in tables are true.
FreshTab: Sourcing Fresh Data for Table-to-Text Generation Evaluation
Computation and Language
Makes computers understand new information from tables.
T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables
Computation and Language
Helps computers turn messy tables into clear reports.