Score: 2

F2IDiff: Real-world Image Super-resolution using Feature to Image Diffusion Foundation Model

Published: December 30, 2025 | arXiv ID: 2512.24473v1

By: Devendra K. Jangid , Ripon K. Saha , Dilshan Godaliyadda and more

BigTech Affiliations: Samsung

Potential Business Impact:

Makes phone pictures clearer without fake details.

Business Areas:

Visual Search Internet Services

With the advent of Generative AI, Single Image Super-Resolution (SISR) quality has seen substantial improvement, as the strong priors learned by Text-2-Image Diffusion (T2IDiff) Foundation Models (FM) can bridge the gap between High-Resolution (HR) and Low-Resolution (LR) images. However, flagship smartphone cameras have been slow to adopt generative models because strong generation can lead to undesirable hallucinations. For substantially degraded LR images, as seen in academia, strong generation is required and hallucinations are more tolerable because of the wide gap between LR and HR images. In contrast, in consumer photography, the LR image has substantially higher fidelity, requiring only minimal hallucination-free generation. We hypothesize that generation in SISR is controlled by the stringency and richness of the FM's conditioning feature. First, text features are high level features, which often cannot describe subtle textures in an image. Additionally, Smartphone LR images are at least $12MP$, whereas SISR networks built on T2IDiff FM are designed to perform inference on much smaller images ($<1MP$). As a result, SISR inference has to be performed on small patches, which often cannot be accurately described by text feature. To address these shortcomings, we introduce an SISR network built on a FM with lower-level feature conditioning, specifically DINOv2 features, which we call a Feature-to-Image Diffusion (F2IDiff) Foundation Model (FM). Lower level features provide stricter conditioning while being rich descriptors of even small patches.

FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution

CV and Pattern Recognition

Makes blurry pictures sharp and clear.

1 Dec 2025 0

89%

TinySR: Pruning Diffusion for Real-World Image Super-Resolution

CV and Pattern Recognition

Makes blurry pictures sharp, super fast.

24 Aug 2025 0

89%

DifIISR: A Diffusion Model with Gradient Guidance for Infrared Image Super-Resolution

CV and Pattern Recognition

Makes blurry infrared pictures sharp for robots.

3 Mar 2025 1

View PDF Login to Bookmark

Country of Origin

🇰🇷 South Korea

Page Count

12 pages

F2IDiff: Real-world Image Super-resolution using Feature to Image Diffusion Foundation Model

Makes phone pictures clearer without fake details.

Technical Abstract

FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution

TinySR: Pruning Diffusion for Real-World Image Super-Resolution

DifIISR: A Diffusion Model with Gradient Guidance for Infrared Image Super-Resolution