Independent Density Estimation
By: Jiahao Liu
Potential Business Impact:
Teaches computers to understand pictures and words better.
Large-scale Vision-Language models have achieved remarkable results in various domains, such as image captioning and conditioned image generation. Neverthe- less, these models still encounter difficulties in achieving human-like composi- tional generalization. In this study, we propose a new method called Independent Density Estimation (IDE) to tackle this challenge. IDE aims to learn the connec- tion between individual words in a sentence and the corresponding features in an image, enabling compositional generalization. We build two models based on the philosophy of IDE. The first one utilizes fully disentangled visual representations as input, and the second leverages a Variational Auto-Encoder to obtain partially disentangled features from raw images. Additionally, we propose an entropy- based compositional inference method to combine predictions of each word in the sentence. Our models exhibit superior generalization to unseen compositions compared to current models when evaluated on various datasets.
Similar Papers
Identity Clue Refinement and Enhancement for Visible-Infrared Person Re-Identification
CV and Pattern Recognition
Helps cameras find people in different light.
EDITS: Enhancing Dataset Distillation with Implicit Textual Semantics
CV and Pattern Recognition
Makes small data learn like big data.
Language-Guided Visual Perception Disentanglement for Image Quality Assessment and Conditional Image Generation
CV and Pattern Recognition
Helps computers see images better, not just understand them.