JT-DA: Enhancing Data Analysis with Tool-Integrated Table Reasoning Large Language Models
By: Ce Chi , Xing Wang , Zhendong Wang and more
Potential Business Impact:
Helps computers understand and answer questions from tables.
In this work, we present JT-DA-8B (JiuTian Data Analyst 8B), a specialized large language model designed for complex table reasoning tasks across diverse real-world scenarios. To address the lack of high-quality supervision in tabular reasoning scenarios, we construct a comprehensive and diverse training corpus with 34 well-defined table reasoning tasks, by aggregating 29 public table QA datasets and 3 million tables. An automatic pipeline is proposed to generate realistic multi-step analytical tasks involving reasoning patterns. The model is trained upon open-source JT-Coder-8B model, an 8B-parameter decoder-only foundation model trained from scratch. In the training stage, we leverage LLM-based scoring and workflow-aligned filtering to distill high-quality, table-centric data. Both supervised fine-tuning (SFT) and Reinforcement learning (RL) are adopted to optimize our model. Afterwards, a four-stage table reasoning workflow is proposed, including table preprocessing, table sensing, tool-integrated reasoning, and prompt engineering, to improve model interpretability and execution accuracy. Experimental results show that JT-DA-8B achieves strong performance in various table reasoning tasks, demonstrating the effectiveness of data-centric generation and workflow-driven optimization.
Similar Papers
Agentic LLMs for Question Answering over Tabular Data
Computation and Language
Answers questions from complex tables using smart computer language.
Utilizing Training Data to Improve LLM Reasoning for Tabular Understanding
Machine Learning (CS)
Helps computers understand data tables better.
Reasoning-Table: Exploring Reinforcement Learning for Table Reasoning
Computation and Language
Teaches computers to understand and use tables better.