EgoMem: Lifelong Memory Agent for Full-duplex Omnimodal Models
By: Yiqun Yao , Naitong Yu , Xiang Li and more
Potential Business Impact:
Lets computers remember and talk to you personally.
We introduce EgoMem, the first lifelong memory agent tailored for full-duplex models that process real-time omnimodal streams. EgoMem enables real-time models to recognize multiple users directly from raw audiovisual streams, to provide personalized response, and to maintain long-term knowledge of users' facts, preferences, and social relationships extracted from audiovisual history. EgoMem operates with three asynchronous processes: (i) a retrieval process that dynamically identifies user via face and voice, and gathers relevant context from a long-term memory; (ii) an omnimodal dialog process that generates personalized audio responses based on the retrieved context; and (iii) a memory management process that automatically detects dialog boundaries from omnimodal streams, and extracts necessary information to update the long-term memory. Unlike existing memory agents for LLMs, EgoMem relies entirely on raw audiovisual streams, making it especially suitable for lifelong, real-time, and embodied scenarios. Experimental results demonstrate that EgoMem's retrieval and memory management modules achieve over 95% accuracy on the test set. When integrated with a fine-tuned RoboEgo omnimodal chatbot, the system achieves fact-consistency scores above 87% in real-time personalized dialogs, establishing a strong baseline for future research.
Similar Papers
O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents
Computation and Language
AI remembers you better for smarter chats.
Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents
Computation and Language
AI remembers you better for smarter chats.
RoboEgo System Card: An Omnimodal Model with Native Full Duplexity
Artificial Intelligence
Lets robots understand and talk back instantly.