Score: 0

Independent Density Estimation

Published: December 10, 2025 | arXiv ID: 2512.10067v1

By: Jiahao Liu

Potential Business Impact:

Teaches computers to understand pictures and words better.

Business Areas:
Image Recognition Data and Analytics, Software

Large-scale Vision-Language models have achieved remarkable results in various domains, such as image captioning and conditioned image generation. Neverthe- less, these models still encounter difficulties in achieving human-like composi- tional generalization. In this study, we propose a new method called Independent Density Estimation (IDE) to tackle this challenge. IDE aims to learn the connec- tion between individual words in a sentence and the corresponding features in an image, enabling compositional generalization. We build two models based on the philosophy of IDE. The first one utilizes fully disentangled visual representations as input, and the second leverages a Variational Auto-Encoder to obtain partially disentangled features from raw images. Additionally, we propose an entropy- based compositional inference method to combine predictions of each word in the sentence. Our models exhibit superior generalization to unseen compositions compared to current models when evaluated on various datasets.

Country of Origin
🇺🇸 United States

Page Count
10 pages

Category
Computer Science:
CV and Pattern Recognition