Comparing Task-Agnostic Embedding Models for Tabular Data
By: Frederik Hoppe , Lars Kleinemeier , Astrid Franz and more
Potential Business Impact:
Makes computers understand data much faster.
Recent foundation models for tabular data achieve strong task-specific performance via in-context learning. Nevertheless, they focus on direct prediction by encapsulating both representation learning and task-specific inference inside a single, resource-intensive network. This work specifically focuses on representation learning, i.e., on transferable, task-agnostic embeddings. We systematically evaluate task-agnostic representations from tabular foundation models (TabPFN and TabICL) alongside with classical feature engineering (TableVectorizer) across a variety of application tasks as outlier detection (ADBench) and supervised learning (TabArena Lite). We find that simple TableVectorizer features achieve comparable or superior performance while being up to three orders of magnitude faster than tabular foundation models. The code is available at https://github.com/ContactSoftwareAI/TabEmbedBench.
Similar Papers
Generalization Can Emerge in Tabular Foundation Models From a Single Table
Machine Learning (CS)
Teaches computers to learn from just one example.
Robust Tabular Foundation Models
Machine Learning (CS)
Makes AI better at learning from data.
TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models
Machine Learning (CS)
Makes computers learn from bigger, more complex data.