KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions
By: Tingyu Wu , Zhisheng Chen , Ziyan Weng and more
Potential Business Impact:
Helps computers understand people's life stories.
Existing long-horizon memory benchmarks mostly use multi-turn dialogues or synthetic user histories, which makes retrieval performance an imperfect proxy for person understanding. We present \BenchName, a publicly releasable benchmark built from long-form autobiographical narratives, where actions, context, and inner thoughts provide dense evidence for inferring stable motivations and decision principles. \BenchName~reconstructs each narrative into a flashback-aware, time-anchored stream and evaluates models with evidence-linked questions spanning factual recall, subjective state attribution, and principle-level reasoning. Across diverse narrative sources, retrieval-augmented systems mainly improve factual accuracy, while errors persist on temporally grounded explanations and higher-level inferences, highlighting the need for memory mechanisms beyond retrieval. Our data is in \href{KnowMeBench}{https://github.com/QuantaAlpha/KnowMeBench}.
Similar Papers
StoryBench: A Dynamic Benchmark for Evaluating Long-Term Memory with Multi Turns
Computation and Language
Tests how well AI remembers stories and makes choices.
PrivacyBench: A Conversational Benchmark for Evaluating Privacy in Personalized AI
Computation and Language
AI assistants can accidentally reveal your secrets.
FindingDory: A Benchmark to Evaluate Memory in Embodied Agents
CV and Pattern Recognition
Helps robots remember and act over time.