A Survey on Retrieval And Structuring Augmented Generation with Large Language Models
By: Pengcheng Jiang , Siru Ouyang , Yizhu Jiao and more
Potential Business Impact:
Helps AI tell true facts, not made-up ones.
Large Language Models (LLMs) have revolutionized natural language processing with their remarkable capabilities in text generation and reasoning. However, these models face critical challenges when deployed in real-world applications, including hallucination generation, outdated knowledge, and limited domain expertise. Retrieval And Structuring (RAS) Augmented Generation addresses these limitations by integrating dynamic information retrieval with structured knowledge representations. This survey (1) examines retrieval mechanisms including sparse, dense, and hybrid approaches for accessing external knowledge; (2) explore text structuring techniques such as taxonomy construction, hierarchical classification, and information extraction that transform unstructured text into organized representations; and (3) investigate how these structured representations integrate with LLMs through prompt-based methods, reasoning frameworks, and knowledge embedding techniques. It also identifies technical challenges in retrieval efficiency, structure quality, and knowledge integration, while highlighting research opportunities in multimodal retrieval, cross-lingual structures, and interactive systems. This comprehensive overview provides researchers and practitioners with insights into RAS methods, applications, and future directions.
Similar Papers
LLM Program Optimization via Retrieval Augmented Search
Machine Learning (CS)
Helps computers write faster, better code.
Hybrid Retrieval for Hallucination Mitigation in Large Language Models: A Comparative Analysis
Information Retrieval
Makes AI tell the truth, not make things up.
Structure-R1: Dynamically Leveraging Structural Knowledge in LLM Reasoning through Reinforcement Learning
Computation and Language
Helps computers reason better with organized facts.