Adaptive Data Flywheel: Applying MAPE Control Loops to AI Agent Improvement
By: Aaditya Shukla , Sidney Knowles , Meenakshi Madugula and more
Potential Business Impact:
AI learns from mistakes to get smarter.
Enterprise AI agents must continuously adapt to maintain accuracy, reduce latency, and remain aligned with user needs. We present a practical implementation of a data flywheel in NVInfo AI, NVIDIA's Mixture-of-Experts (MoE) Knowledge Assistant serving over 30,000 employees. By operationalizing a MAPE-driven data flywheel, we built a closed-loop system that systematically addresses failures in retrieval-augmented generation (RAG) pipelines and enables continuous learning. Over a 3-month post-deployment period, we monitored feedback and collected 495 negative samples. Analysis revealed two major failure modes: routing errors (5.25\%) and query rephrasal errors (3.2\%). Using NVIDIA NeMo microservices, we implemented targeted improvements through fine-tuning. For routing, we replaced a Llama 3.1 70B model with a fine-tuned 8B variant, achieving 96\% accuracy, a 10x reduction in model size, and 70\% latency improvement. For query rephrasal, fine-tuning yielded a 3.7\% gain in accuracy and a 40\% latency reduction. Our approach demonstrates how human-in-the-loop (HITL) feedback, when structured within a data flywheel, transforms enterprise AI agents into self-improving systems. Key learnings include approaches to ensure agent robustness despite limited user feedback, navigating privacy constraints, and executing staged rollouts in production. This work offers a repeatable blueprint for building robust, adaptive enterprise AI agents capable of learning from real-world usage at scale.
Similar Papers
Agent-in-the-Loop: A Data Flywheel for Continuous Improvement in LLM-based Customer Support
Artificial Intelligence
Improves customer service bots with real-time feedback.
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Machine Learning (CS)
Lets AI learn new things without retraining.
Bridging the Gap Between Simulated and Real Network Data Using Transfer Learning
Networking and Internet Architecture
Makes computer networks predict problems better.