Disentangling and Generating Modalities for Recommendation in Missing Modality Scenarios
By: Jiwan Kim , Hongseok Kang , Sein Kim and more
Potential Business Impact:
Recommends better even with missing info.
Multi-modal recommender systems (MRSs) have achieved notable success in improving personalization by leveraging diverse modalities such as images, text, and audio. However, two key challenges remain insufficiently addressed: (1) Insufficient consideration of missing modality scenarios and (2) the overlooking of unique characteristics of modality features. These challenges result in significant performance degradation in realistic situations where modalities are missing. To address these issues, we propose Disentangling and Generating Modality Recommender (DGMRec), a novel framework tailored for missing modality scenarios. DGMRec disentangles modality features into general and specific modality features from an information-based perspective, enabling richer representations for recommendation. Building on this, it generates missing modality features by integrating aligned features from other modalities and leveraging user modality preferences. Extensive experiments show that DGMRec consistently outperforms state-of-the-art MRSs in challenging scenarios, including missing modalities and new item settings as well as diverse missing ratios and varying levels of missing modalities. Moreover, DGMRec's generation-based approach enables cross-modal retrieval, a task inapplicable for existing MRSs, highlighting its adaptability and potential for real-world applications. Our code is available at https://github.com/ptkjw1997/DGMRec.
Similar Papers
Gated Multimodal Graph Learning for Personalized Recommendation
Information Retrieval
Helps online stores show you better stuff.
Filling the Gaps: A Multitask Hybrid Multiscale Generative Framework for Missing Modality in Remote Sensing Semantic Segmentation
CV and Pattern Recognition
Helps computers understand Earth pictures even when data is missing.
How Far Are We from Generating Missing Modalities with Foundation Models?
Multimedia
Helps computers fill in missing picture or text parts.