Score: 1

Modality Alignment with Multi-scale Bilateral Attention for Multimodal Recommendation

Published: September 11, 2025 | arXiv ID: 2509.09114v1

By: Kelin Ren, Chan-Yang Ju, Dong-Ho Lee

Potential Business Impact:

Helps online stores show you better stuff.

Business Areas:

Semantic Search Internet Services

Multimodal recommendation systems are increasingly becoming foundational technologies for e-commerce and content platforms, enabling personalized services by jointly modeling users' historical behaviors and the multimodal features of items (e.g., visual and textual). However, most existing methods rely on either static fusion strategies or graph-based local interaction modeling, facing two critical limitations: (1) insufficient ability to model fine-grained cross-modal associations, leading to suboptimal fusion quality; and (2) a lack of global distribution-level consistency, causing representational bias. To address these, we propose MambaRec, a novel framework that integrates local feature alignment and global distribution regularization via attention-guided learning. At its core, we introduce the Dilated Refinement Attention Module (DREAM), which uses multi-scale dilated convolutions with channel-wise and spatial attention to align fine-grained semantic patterns between visual and textual modalities. This module captures hierarchical relationships and context-aware associations, improving cross-modal semantic modeling. Additionally, we apply Maximum Mean Discrepancy (MMD) and contrastive loss functions to constrain global modality alignment, enhancing semantic consistency. This dual regularization reduces mode-specific deviations and boosts robustness. To improve scalability, MambaRec employs a dimensionality reduction strategy to lower the computational cost of high-dimensional multimodal features. Extensive experiments on real-world e-commerce datasets show that MambaRec outperforms existing methods in fusion quality, generalization, and efficiency. Our code has been made publicly available at https://github.com/rkl71/MambaRec.

MMMamba: A Versatile Cross-Modal In Context Fusion Framework for Pan-Sharpening and Zero-Shot Image Enhancement

CV and Pattern Recognition

Makes blurry satellite pictures sharp and clear.

17 Dec 2025 4

90%

Self-supervised Multiplex Consensus Mamba for General Image Fusion

CV and Pattern Recognition

Combines pictures to see more detail.

24 Dec 2025 1

90%

GRAM-MAMBA: Holistic Feature Alignment for Wireless Perception with Adaptive Low-Rank Compensation

CV and Pattern Recognition

Helps smart devices understand everything even with missing info.

18 Jul 2025 0

View PDF Login to Bookmark

Country of Origin

🇰🇷 Korea, Republic of

Repos / Data Links

github.com

Page Count

10 pages

Modality Alignment with Multi-scale Bilateral Attention for Multimodal Recommendation

Helps online stores show you better stuff.

Technical Abstract

MMMamba: A Versatile Cross-Modal In Context Fusion Framework for Pan-Sharpening and Zero-Shot Image Enhancement

Self-supervised Multiplex Consensus Mamba for General Image Fusion

GRAM-MAMBA: Holistic Feature Alignment for Wireless Perception with Adaptive Low-Rank Compensation