Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models
By: Riccardo Passoni , Francesca Ronchini , Luca Comanducci and more
Potential Business Impact:
Makes computer-made sounds use less power.
Text-to-audio models have recently emerged as a powerful technology for generating sound from textual descriptions. However, their high computational demands raise concerns about energy consumption and environmental impact. In this paper, we conduct an analysis of the energy usage of 7 state-of-the-art text-to-audio diffusion-based generative models, evaluating to what extent variations in generation parameters affect energy consumption at inference time. We also aim to identify an optimal balance between audio quality and energy consumption by considering Pareto-optimal solutions across all selected models. Our findings provide insights into the trade-offs between performance and environmental impact, contributing to the development of more efficient generative audio models.
Similar Papers
A Review on Score-based Generative Models for Audio Applications
Sound
Makes computers create realistic sounds and voices.
How Green are Neural Language Models? Analyzing Energy Consumption in Text Summarization Fine-tuning
Computation and Language
Makes AI smarter with less energy used.
Video Killed the Energy Budget: Characterizing the Latency and Power Regimes of Open Text-to-Video Models
Machine Learning (CS)
Makes videos from words using less power.