Score: 2

Multimodal-enhanced Federated Recommendation: A Group-wise Fusion Approach

Published: September 24, 2025 | arXiv ID: 2509.19955v1

By: Chunxu Zhang , Weipeng Zhang , Guodong Long and more

Potential Business Impact:

Shares user tastes privately for better movie picks.

Business Areas:
Guides Media and Entertainment

Federated Recommendation (FR) is a new learning paradigm to tackle the learn-to-rank problem in a privacy-preservation manner. How to integrate multi-modality features into federated recommendation is still an open challenge in terms of efficiency, distribution heterogeneity, and fine-grained alignment. To address these challenges, we propose a novel multimodal fusion mechanism in federated recommendation settings (GFMFR). Specifically, it offloads multimodal representation learning to the server, which stores item content and employs a high-capacity encoder to generate expressive representations, alleviating client-side overhead. Moreover, a group-aware item representation fusion approach enables fine-grained knowledge sharing among similar users while retaining individual preferences. The proposed fusion loss could be simply plugged into any existing federated recommender systems empowering their capability by adding multi-modality features. Extensive experiments on five public benchmark datasets demonstrate that GFMFR consistently outperforms state-of-the-art multimodal FR baselines.

Country of Origin
🇦🇺 🇨🇳 Australia, China

Page Count
9 pages

Category
Computer Science:
Information Retrieval