Score: 1

The Effectiveness of Large Language Models in Transforming Unstructured Text to Standardized Formats

Published: March 4, 2025 | arXiv ID: 2503.02650v2

By: William Brach, Kristián Košťál, Michal Ries

Potential Business Impact:

Turns messy text into organized lists.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

The exponential growth of unstructured text data presents a fundamental challenge in modern data management and information retrieval. While Large Language Models (LLMs) have shown remarkable capabilities in natural language processing, their potential to transform unstructured text into standardized, structured formats remains largely unexplored - a capability that could revolutionize data processing workflows across industries. This study breaks new ground by systematically evaluating LLMs' ability to convert unstructured recipe text into the structured Cooklang format. Through comprehensive testing of four models (GPT-4o, GPT-4o-mini, Llama3.1:70b, and Llama3.1:8b), an innovative evaluation approach is introduced that combines traditional metrics (WER, ROUGE-L, TER) with specialized metrics for semantic element identification. Our experiments reveal that GPT-4o with few-shot prompting achieves breakthrough performance (ROUGE-L: 0.9722, WER: 0.0730), demonstrating for the first time that LLMs can reliably transform domain-specific unstructured text into structured formats without extensive training. Although model performance generally scales with size, we uncover surprising potential in smaller models like Llama3.1:8b for optimization through targeted fine-tuning. These findings open new possibilities for automated structured data generation across various domains, from medical records to technical documentation, potentially transforming the way organizations process and utilize unstructured information.

Has the Creativity of Large-Language Models peaked? An analysis of inter- and intra-LLM variability

Computation and Language

Computers aren't getting more creative, even the best ones.

10 Apr 2025 1

90%

Good News for Script Kiddies? Evaluating Large Language Models for Automated Exploit Generation

Cryptography and Security

AI can write code to break computer programs.

2 May 2025 0

90%

Sustainability via LLM Right-sizing

Computation and Language

Finds AI that works well without costing too much.

17 Apr 2025 0

View PDF Login to Bookmark

Country of Origin

🇸🇰 Slovakia

Repos / Data Links

github.com github.com

Page Count

24 pages

The Effectiveness of Large Language Models in Transforming Unstructured Text to Standardized Formats

Turns messy text into organized lists.

Technical Abstract

Has the Creativity of Large-Language Models peaked? An analysis of inter- and intra-LLM variability

Good News for Script Kiddies? Evaluating Large Language Models for Automated Exploit Generation

Sustainability via LLM Right-sizing