Score: 1

Joint Partitioning and Placement of Foundation Models for Real-Time Edge AI

Published: November 30, 2025 | arXiv ID: 2512.01039v1

By: Aladin Djuhera, Fernando Koch, Alecio Binotto

Potential Business Impact:

Lets AI work better on phones and other devices.

Business Areas:

PaaS Software

Inference over large-scale foundation models within heterogeneous edge environments necessitates a fundamentally reconfigurable orchestration substrate. Static partitioning of model layers presumes temporal stability across compute and network resources, which is misaligned with the volatility of real-world deployments. We introduce a framework in which both the spatial placement and internal segmentation of foundation models are elevated to runtime-resolved constructs. The orchestration problem is formalized as a constrained optimization over layer-wise assignments, subject to evolving latency, utilization, and privacy gradients. The framework implements reactive inference composition responsive to infrastructural fluctuations by integrating model-aware capacity profiling with dynamic graph re-partitioning and reallocation. We introduce architectural and algorithmic components, along with a representative use case in 6G multi-access edge computing.