The Pneuma Project: Reifying Information Needs as Relational Schemas to Automate Discovery, Guide Preparation, and Align Data with Intent
By: Muhammad Imam Luthfi Balaka, Raul Castro Fernandez
Potential Business Impact:
Helps find and organize information by talking to a computer.
Data discovery and preparation remain persistent bottlenecks in the data management lifecycle, especially when user intent is vague, evolving, or difficult to operationalize. The Pneuma Project introduces Pneuma-Seeker, a system that helps users articulate and fulfill information needs through iterative interaction with a language model-powered platform. The system reifies the user's evolving information need as a relational data model and incrementally converges toward a usable document aligned with that intent. To achieve this, the system combines three architectural ideas: context specialization to reduce LLM burden across subtasks, a conductor-style planner to assemble dynamic execution plans, and a convergence mechanism based on shared state. The system integrates recent advances in retrieval-augmented generation (RAG), agentic frameworks, and structured data preparation to support semi-automatic, language-guided workflows. We evaluate the system through LLM-based user simulations and show that it helps surface latent intent, guide discovery, and produce fit-for-purpose documents. It also acts as an emergent documentation layer, capturing institutional knowledge and supporting organizational memory.
Similar Papers
Pneuma: Leveraging LLMs for Tabular Data Representation and Retrieval in an End-to-End System
Databases
Finds the right data using plain English questions.
RAISE: Reasoning Agent for Interactive SQL Exploration
Artificial Intelligence
Lets computers answer questions from data better.
DynaQuery: A Self-Adapting Framework for Querying Structured and Multimodal Data
Databases
Helps computers understand and answer questions from all your data.