Score: 0

Near-Zero-Overhead Freshness for Recommendation Systems via Inference-Side Model Updates

Published: December 13, 2025 | arXiv ID: 2512.12295v1

By: Wenjun Yu , Sitian Chen , Cheng Chen and more

Deep Learning Recommendation Models (DLRMs) underpin personalized services but face a critical freshness-accuracy tradeoff due to massive parameter synchronization overheads. Production DLRMs deploy decoupled training/inference clusters, where synchronizing petabyte-scale embedding tables (EMTs) causes multi-minute staleness, degrading recommendation quality and revenue. We observe that (1) inference nodes exhibit sustained CPU underutilization (peak <= 20%), and (2) EMT gradients possess intrinsic low-rank structure, enabling compact update representation. We present LiveUpdate, a system that eliminates inter-cluster synchronization by colocating Low-Rank Adaptation (LoRA) trainers within inference nodes. LiveUpdate addresses two core challenges: (1) dynamic rank adaptation via singular value monitoring to constrain memory overhead (<2% of EMTs), and (2) NUMA-aware resource scheduling with hardware-enforced QoS to eliminate update inference contention (P99 latency impact <20ms). Evaluations show LiveUpdate reduces update costs by 2x versus delta-update baselines while achieving higher accuracy within 1-hour windows. By transforming idle inference resources into freshness engines, LiveUpdate delivers online model updates while outperforming state-of-the-art delta-update methods by 0.04% to 0.24% in accuracy.

Reuse, Don't Recompute: Efficient Large Reasoning Model Inference via Memory Orchestration

Multiagent Systems

Lets computers remember answers to save time.

17 Nov 2025 0

87%

Deep Learning Model Acceleration and Optimization Strategies for Real-Time Recommendation Systems

Information Retrieval

Makes online recommendations faster and better.

13 Jun 2025 0

87%

An Efficient LLM-based Evolutional Recommendation with Locate-Forget-Update Paradigm

Information Retrieval

Helps online stores recommend better by remembering what you like.

20 Nov 2025 1

View PDF Login to Bookmark

Near-Zero-Overhead Freshness for Recommendation Systems via Inference-Side Model Updates

Technical Abstract

Reuse, Don't Recompute: Efficient Large Reasoning Model Inference via Memory Orchestration

Deep Learning Model Acceleration and Optimization Strategies for Real-Time Recommendation Systems

An Efficient LLM-based Evolutional Recommendation with Locate-Forget-Update Paradigm