SCFlow: Implicitly Learning Style and Content Disentanglement with Flow Models
By: Pingchuan Ma , Xiaopei Yang , Yusong Li and more
Potential Business Impact:
Lets computers change picture styles without losing meaning.
Explicitly disentangling style and content in vision models remains challenging due to their semantic overlap and the subjectivity of human perception. Existing methods propose separation through generative or discriminative objectives, but they still face the inherent ambiguity of disentangling intertwined concepts. Instead, we ask: Can we bypass explicit disentanglement by learning to merge style and content invertibly, allowing separation to emerge naturally? We propose SCFlow, a flow-matching framework that learns bidirectional mappings between entangled and disentangled representations. Our approach is built upon three key insights: 1) Training solely to merge style and content, a well-defined task, enables invertible disentanglement without explicit supervision; 2) flow matching bridges on arbitrary distributions, avoiding the restrictive Gaussian priors of diffusion models and normalizing flows; and 3) a synthetic dataset of 510,000 samples (51 styles $\times$ 10,000 content samples) was curated to simulate disentanglement through systematic style-content pairing. Beyond controllable generation tasks, we demonstrate that SCFlow generalizes to ImageNet-1k and WikiArt in zero-shot settings and achieves competitive performance, highlighting that disentanglement naturally emerges from the invertible merging process.
Similar Papers
SplitFlux: Learning to Decouple Content and Style from a Single Image
CV and Pattern Recognition
Changes picture style without messing up the main subject.
Inversion-Free Style Transfer with Dual Rectified Flows
CV and Pattern Recognition
Makes pictures look like art, super fast.
Disentangling Content from Style to Overcome Shortcut Learning: A Hybrid Generative-Discriminative Learning Framework
CV and Pattern Recognition
Teaches computers to learn what's important, not just looks.