Score: 0

Fine-Grained Energy Prediction For Parallellized LLM Inference With PIE-P

Published: December 14, 2025 | arXiv ID: 2512.12801v1

By: Anurag Dutt , Young Won Choi , Avirup Sil and more

With the widespread adoption of Large Language Models (LLMs), energy costs of running LLMs is quickly becoming a critical concern. However, precisely measuring the energy consumption of LLMs is often infeasible because hardware-based power monitors are not always accessible and software-based energy measurement tools are not accurate. While various prediction techniques have been developed to estimate LLM energy consumption, these approaches are limited to single-GPU environments and thus are not applicable to modern LLM inference which is typically parallelized across multiple GPUs. In this work, we remedy this gap and introduce PIE-P, a fine-grained energy prediction framework for multi-GPU inference, including tensor, pipeline, and data parallelism. Predicting the energy under parallelized inference is complicated by the non-determinism in inter-GPU communication, additional communication overheads, and difficulties in isolating energy during the communication/synchronization phase. We develop a scalable prediction framework that addresses these issues via precise sampling, fine-grained modeling of inter-GPU communication, and careful accounting of parallelization overhead. Our evaluation results show that PIE-P yields accurate and fine-grained energy predictions across parallelism strategies, significantly outperforming baselines.

Learning Process Energy Profiles from Node-Level Power Data

Distributed, Parallel, and Cluster Computing

Tracks computer energy use by each program.

17 Nov 2025 1

88%

Compression-Induced Communication-Efficient Large Model Training and Inferencing

Machine Learning (CS)

Saves energy training smart computer programs.

1 Aug 2025 0

87%

Leveraging LLMs to Automate Energy-Aware Refactoring of Parallel Scientific Codes

Artificial Intelligence

Makes computer code use less power.

4 May 2025 0

View PDF Login to Bookmark

Fine-Grained Energy Prediction For Parallellized LLM Inference With PIE-P

Technical Abstract

Learning Process Energy Profiles from Node-Level Power Data

Compression-Induced Communication-Efficient Large Model Training and Inferencing

Leveraging LLMs to Automate Energy-Aware Refactoring of Parallel Scientific Codes