Basis Transformers for Multi-Task Tabular Regression
By: Wei Min Loh, Jiaqi Shang, Pascal Poupart
Potential Business Impact:
Helps computers understand messy data better.
Dealing with tabular data is challenging due to partial information, noise, and heterogeneous structure. Existing techniques often struggle to simultaneously address key aspects of tabular data such as textual information, a variable number of columns, and unseen data without metadata besides column names. We propose a novel architecture, \textit{basis transformers}, specifically designed to tackle these challenges while respecting inherent invariances in tabular data, including hierarchical structure and the representation of numeric values. We evaluate our design on a multi-task tabular regression benchmark, achieving an improvement of 0.338 in the median $R^2$ score and the lowest standard deviation across 34 tasks from the OpenML-CTR23 benchmark. Furthermore, our model has five times fewer parameters than the best-performing baseline and surpasses pretrained large language model baselines -- even when initialized from randomized weights.
Similar Papers
Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights
Machine Learning (CS)
Makes AI learn better from messy information.
MultiTab: A Scalable Foundation for Multitask Learning on Tabular Data
Machine Learning (CS)
Helps computers learn many things from tables faster.
Datum-wise Transformer for Synthetic Tabular Data Detection in the Wild
Machine Learning (CS)
Finds fake computer-made tables of information.