Score: 1

Large Language Models as Universal Predictors? An Empirical Study on Small Tabular Datasets

Published: August 24, 2025 | arXiv ID: 2508.17391v1

By: Nikolaos Pavlidis , Vasilis Perifanis , Symeon Symeonidis and more

Potential Business Impact:

Lets computers learn from small data sets.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large Language Models (LLMs), originally developed for natural language processing (NLP), have demonstrated the potential to generalize across modalities and domains. With their in-context learning (ICL) capabilities, LLMs can perform predictive tasks over structured inputs without explicit fine-tuning on downstream tasks. In this work, we investigate the empirical function approximation capability of LLMs on small-scale structured datasets for classification, regression and clustering tasks. We evaluate the performance of state-of-the-art LLMs (GPT-5, GPT-4o, GPT-o3, Gemini-2.5-Flash, DeepSeek-R1) under few-shot prompting and compare them against established machine learning (ML) baselines, including linear models, ensemble methods and tabular foundation models (TFMs). Our results show that LLMs achieve strong performance in classification tasks under limited data availability, establishing practical zero-training baselines. In contrast, the performance in regression with continuous-valued outputs is poor compared to ML models, likely because regression demands outputs in a large (often infinite) space, and clustering results are similarly limited, which we attribute to the absence of genuine ICL in this setting. Nonetheless, this approach enables rapid, low-overhead data exploration and offers a viable alternative to traditional ML pipelines in business intelligence and exploratory analytics contexts. We further analyze the influence of context size and prompt structure on approximation quality, identifying trade-offs that affect predictive performance. Our findings suggest that LLMs can serve as general-purpose predictive engines for structured data, with clear strengths in classification and significant limitations in regression and clustering.

Can large language models assist choice modelling? Insights into prompting strategies and current models capabilities

Econometrics

Helps computers guess what people will choose.

29 Jul 2025 1

91%

Large Language Models For Text Classification: Case Study And Comprehensive Review

Computation and Language

Computers sort information better than before.

14 Jan 2025 1

91%

Scalable In-Context Learning on Tabular Data via Retrieval-Augmented Large Language Models

Computation and Language

Lets computers learn from any size table.

5 Feb 2025 2

View PDF Login to Bookmark

Page Count

14 pages

Large Language Models as Universal Predictors? An Empirical Study on Small Tabular Datasets

Lets computers learn from small data sets.

Technical Abstract

Can large language models assist choice modelling? Insights into prompting strategies and current models capabilities

Large Language Models For Text Classification: Case Study And Comprehensive Review

Scalable In-Context Learning on Tabular Data via Retrieval-Augmented Large Language Models