DMA: Online RAG Alignment with Human Feedback
By: Yu Bai , Yukai Miao , Dawei Wang and more
Potential Business Impact:
Helps AI learn from user feedback instantly.
Retrieval-augmented generation (RAG) systems often rely on static retrieval, limiting adaptation to evolving intent and content drift. We introduce Dynamic Memory Alignment (DMA), an online learning framework that systematically incorporates multi-granularity human feedback to align ranking in interactive settings. DMA organizes document-, list-, and response-level signals into a coherent learning pipeline: supervised training for pointwise and listwise rankers, policy optimization driven by response-level preferences, and knowledge distillation into a lightweight scorer for low-latency serving. Throughout this paper, memory refers to the model's working memory, which is the entire context visible to the LLM for In-Context Learning. We adopt a dual-track evaluation protocol mirroring deployment: (i) large-scale online A/B ablations to isolate the utility of each feedback source, and (ii) few-shot offline tests on knowledge-intensive benchmarks. Online, a multi-month industrial deployment further shows substantial improvements in human engagement. Offline, DMA preserves competitive foundational retrieval while yielding notable gains on conversational QA (TriviaQA, HotpotQA). Taken together, these results position DMA as a principled approach to feedback-driven, real-time adaptation in RAG without sacrificing baseline capability.
Similar Papers
A Dynamic Retrieval-Augmented Generation System with Selective Memory and Remembrance
Information Retrieval
Makes AI remember important things better and faster.
Retrieval Feedback Memory Enhancement Large Model Retrieval Generation Method
Information Retrieval
Helps computers remember and find better answers.
Test-time Corpus Feedback: From Retrieval to RAG
Information Retrieval
Lets computers ask better questions to find answers.