From Raw Features to Effective Embeddings: A Three-Stage Approach for Multimodal Recipe Recommendation
By: Jeeho Shin, Kyungho Kim, Kijung Shin
Potential Business Impact:
Suggests recipes using food pictures and tastes.
Recipe recommendation has become an essential task in web-based food platforms. A central challenge is effectively leveraging rich multimodal features beyond user-recipe interactions. Our analysis shows that even simple uses of multimodal signals yield competitive performance, suggesting that systematic enhancement of these signals is highly promising. We propose TESMR, a 3-stage framework for recipe recommendation that progressively refines raw multimodal features into effective embeddings through: (1) content-based enhancement using foundation models with multimodal comprehension, (2) relation-based enhancement via message propagation over user-recipe interactions, and (3) learning-based enhancement through contrastive learning with learnable embeddings. Experiments on two real-world datasets show that TESMR outperforms existing methods, achieving 7-15% higher Recall@10.
Similar Papers
From Raw Features to Effective Embeddings: A Three-Stage Approach for Multimodal Recipe Recommendation
Machine Learning (CS)
Finds better recipes using pictures and words.
Are Multimodal Embeddings Truly Beneficial for Recommendation? A Deep Dive into Whole vs. Individual Modalities
Information Retrieval
Makes online suggestions better using words and pictures.
Causal Inspired Multi Modal Recommendation
Information Retrieval
Fixes online shopping picks by ignoring fake trends.