Score: 2

Replay-Based Continual Learning with Dual-Layered Distillation and a Streamlined U-Net for Efficient Text-to-Image Generation

Published: May 11, 2025 | arXiv ID: 2505.06995v1

By: Md. Naimur Asif Borno , Md Sakib Hossain Shovon , Asmaa Soliman Al-Moisheer and more

Potential Business Impact:

Makes AI art generators faster and smaller.

Business Areas:
Text Analytics Data and Analytics, Software

Recent advancements in text-to-image diffusion models are hindered by high computational demands, limiting accessibility and scalability. This paper introduces KDC-Diff, a novel stable diffusion framework that enhances efficiency while maintaining image quality. KDC-Diff features a streamlined U-Net architecture with nearly half the parameters of the original U-Net (482M), significantly reducing model complexity. We propose a dual-layered distillation strategy to ensure high-fidelity generation, transferring semantic and structural insights from a teacher to a compact student model while minimizing quality degradation. Additionally, replay-based continual learning is integrated to mitigate catastrophic forgetting, allowing the model to retain prior knowledge while adapting to new data. Despite operating under extremely low computational resources, KDC-Diff achieves state-of-the-art performance on the Oxford Flowers and Butterflies & Moths 100 Species datasets, demonstrating competitive metrics such as FID, CLIP, and LPIPS. Moreover, it significantly reduces inference time compared to existing models. These results establish KDC-Diff as a highly efficient and adaptable solution for text-to-image generation, particularly in computationally constrained environments.

Country of Origin
πŸ‡¦πŸ‡Ί πŸ‡°πŸ‡· πŸ‡ΈπŸ‡¦ πŸ‡§πŸ‡© Australia, Korea, Republic of, Bangladesh, Saudi Arabia

Page Count
13 pages

Category
Computer Science:
CV and Pattern Recognition