Disentangling Content from Style to Overcome Shortcut Learning: A Hybrid Generative-Discriminative Learning Framework
By: Siming Fu, Sijun Dong, Xiaoliang Meng
Potential Business Impact:
Teaches computers to learn what's important, not just looks.
Despite the remarkable success of Self-Supervised Learning (SSL), its generalization is fundamentally hindered by Shortcut Learning, where models exploit superficial features like texture instead of intrinsic structure. We experimentally verify this flaw within the generative paradigm (e.g., MAE) and argue it is a systemic issue also affecting discriminative methods, identifying it as the root cause of their failure on unseen domains. While existing methods often tackle this at a surface level by aligning or separating domain-specific features, they fail to alter the underlying learning mechanism that fosters shortcut dependency. To address this at its core, we propose HyGDL (Hybrid Generative-Discriminative Learning Framework), a hybrid framework that achieves explicit content-style disentanglement. Our approach is guided by the Invariance Pre-training Principle: forcing a model to learn an invariant essence by systematically varying a bias (e.g., style) at the input while keeping the supervision signal constant. HyGDL operates on a single encoder and analytically defines style as the component of a representation that is orthogonal to its style-invariant content, derived via vector projection.
Similar Papers
Disentangling Content from Style to Overcome Shortcut Learning: A Hybrid Generative-Discriminative Learning Framework
CV and Pattern Recognition
Teaches computers to learn what matters, not tricks.
Casual Inference via Style Bias Deconfounding for Domain Generalization
CV and Pattern Recognition
Teaches computers to ignore style, see true patterns.
SCFlow: Implicitly Learning Style and Content Disentanglement with Flow Models
CV and Pattern Recognition
Lets computers change picture styles without losing meaning.