Score: 0

Basis Transformers for Multi-Task Tabular Regression

Published: June 7, 2025 | arXiv ID: 2506.06926v1

By: Wei Min Loh, Jiaqi Shang, Pascal Poupart

Potential Business Impact:

Helps computers understand messy data better.

Business Areas:

Text Analytics Data and Analytics, Software

Dealing with tabular data is challenging due to partial information, noise, and heterogeneous structure. Existing techniques often struggle to simultaneously address key aspects of tabular data such as textual information, a variable number of columns, and unseen data without metadata besides column names. We propose a novel architecture, \textit{basis transformers}, specifically designed to tackle these challenges while respecting inherent invariances in tabular data, including hierarchical structure and the representation of numeric values. We evaluate our design on a multi-task tabular regression benchmark, achieving an improvement of 0.338 in the median $R^2$ score and the lowest standard deviation across 34 tasks from the OpenML-CTR23 benchmark. Furthermore, our model has five times fewer parameters than the best-performing baseline and surpasses pretrained large language model baselines -- even when initialized from randomized weights.