Resilient by Design -- Active Inference for Distributed Continuum Intelligence
By: Praveen Kumar Donta , Alfreds Lapkovskis , Enzo Mingozzi and more
Potential Business Impact:
Fixes computer problems before they happen.
Failures are the norm in highly complex and heterogeneous devices spanning the distributed computing continuum (DCC), from resource-constrained IoT and edge nodes to high-performance computing systems. Ensuring reliability and global consistency across these layers remains a major challenge, especially for AI-driven workloads requiring real-time, adaptive coordination. This work-in-progress paper introduces a Probabilistic Active Inference Resilience Agent (PAIR-Agent) to achieve resilience in DCC systems. PAIR-Agent performs three core operations: (i) constructing a causal fault graph from device logs, (ii) identifying faults while managing certainties and uncertainties using Markov blankets and the free energy principle, and (iii) autonomously healing issues through active inference. Through continuous monitoring and adaptive reconfiguration, the agent maintains service continuity and stability under diverse failure conditions. Theoretical validations confirm the reliability and effectiveness of the proposed framework.
Similar Papers
Resilient by Design - Active Inference for Distributed Continuum Intelligence
Distributed, Parallel, and Cluster Computing
Fixes computer problems before they happen.
Resilient Radio Access Networks: AI and the Unknown Unknowns
Information Theory
AI helps 5G networks stay working when things go wrong.
Reliability and Resilience of AI-Driven Critical Network Infrastructure under Cyber-Physical Threats
Cryptography and Security
Keeps internet working even when attacked.