XGrammar 2: Dynamic and Efficient Structured Generation Engine for Agentic LLMs
By: Linzhang Li , Yixin Dong , Guanjie Wang and more
Potential Business Impact:
Makes AI agents create complex information much faster.
Modern LLM agents are required to handle increasingly complex structured generation tasks, such as tool calling and conditional structured generation. These tasks are significantly more dynamic than predefined structures, posing new challenges to the current structured generation engines. In this paper, we propose XGrammar 2, a highly optimized structured generation engine for agentic LLMs. XGrammar 2 accelerates the mask generation for these dynamic structured generation tasks through a new dynamic dispatching semantics: TagDispatch. We further introduce a just-in-time (JIT) compilation method to reduce compilation time and a cross-grammar caching mechanism to leverage the common sub-structures across different grammars. Additionally, we extend the previous PDA-based mask generation algorithm to the Earley-parser-based one and design a repetition compression algorithm to handle repetition structures in grammars. Evaluation results show that XGrammar 2 can achieve more than 6x speedup over the existing structured generation engines. Integrated with an LLM inference engine, XGrammar 2 can handle dynamic structured generation tasks with near-zero overhead.
Similar Papers
Grammar Search for Multi-Agent Systems
Artificial Intelligence
Builds smarter AI agents with simpler, cheaper code.
X-GridAgent: An LLM-Powered Agentic AI System for Assisting Power Grid Analysis
Systems and Control
AI helps power grids run better with simple questions.
Towards Corpus-Grounded Agentic LLMs for Multilingual Grammatical Analysis
Computation and Language
AI helps understand language rules in many languages.