Score: 2

OmniStruct: Universal Text-to-Structure Generation across Diverse Schemas

Published: November 23, 2025 | arXiv ID: 2511.18335v1

By: James Y. Huang , Wenxuan Zhou , Nan Xu and more

BigTech Affiliations: Microsoft

Potential Business Impact:

Teaches small computers to make organized answers.

Business Areas:

Semantic Web Internet Services

The ability of Large Language Models (LLMs) to generate structured outputs that follow arbitrary schemas is crucial to a wide range of downstream tasks that require diverse structured representations of results such as information extraction, table generation, and function calling. While modern LLMs excel in generating unstructured responses in natural language, whether this advancement translates to a strong performance on text-to-structure tasks remains unclear. To bridge this gap, we first introduce OmniStruct, a comprehensive benchmark for assessing LLMs' capabilities on diverse text-to-structure tasks such as information extraction, table generation, and function calling. We build OmniStruct by identifying existing datasets across a wide range of tasks that are suitable for a structured answer format, and adapting them under a unified text-to-structure problem setting. To facilitate the development of efficient text-to-structure models, we collect high-quality training data via synthetic task generation. Without using any supervised data for OmniStruct tasks, our experiments demonstrate the possibility of fine-tuning much smaller models on synthetic data into universal structured generation models that can rival the performance of GPT-4o.

Automata-Based Steering of Large Language Models for Diverse Structured Generation

Computation and Language

Creates more varied computer-generated text.

14 Nov 2025 0

88%

The Effectiveness of Large Language Models in Transforming Unstructured Text to Standardized Formats

Artificial Intelligence

Turns messy text into organized lists.

4 Mar 2025 1

88%

LLM driven Text-to-Table Generation through Sub-Tasks Guidance and Iterative Refinement

Computation and Language

Helps computers turn messy notes into organized charts.

12 Aug 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com

Page Count

15 pages

OmniStruct: Universal Text-to-Structure Generation across Diverse Schemas

Teaches small computers to make organized answers.

Technical Abstract

Automata-Based Steering of Large Language Models for Diverse Structured Generation

The Effectiveness of Large Language Models in Transforming Unstructured Text to Standardized Formats

LLM driven Text-to-Table Generation through Sub-Tasks Guidance and Iterative Refinement