Score: 0

AIvailable: A Software-Defined Architecture for LLM-as-a-Service on Heterogeneous and Legacy GPUs

Published: November 6, 2025 | arXiv ID: 2511.11621v1

By: Pedro Antunes , Ana Rita Ortigoso , Gabriel Vieira and more

Potential Business Impact:

Lets old computers run smart AI programs.

Business Areas:

IaaS Software

The rise of Large Language Models (LLM) has increased the need for scalable, high-performance inference systems, yet most existing frameworks assume homogeneous, resource-rich hardware, often unrealistic in academic, or resource-constrained settings. We introduce AIvailable, a low-cost, highly available LLM-as-a-Service (LLMaaS) platform, that uses a software-defined approach for running LLMs across heterogeneous and legacy GPU nodes, including NVIDIA and AMD devices, with a focus on fully utilizing each node's VRAM. AIvailable operates as a fully GPU-accelerated inference without CPU fallbacks, featuring a unified client interface that allows seamless interaction with all deployed LLMs through a single logical unit. The architecture comprises four main components: the Client Interface for user access, the Service Frontend for secure request routing and load balancing, the SDAI Controller for orchestration, deployment, and monitoring, and the Service Backend of heterogeneous GPU nodes executing workloads. By abstracting GPU-specific details and providing dynamic, VRAM-aware allocation and reallocation of models, AIvailable ensures efficient use of resources and resilience against failures or workload fluctuations. Targeting academic labs, private companies, and other constrained organizations, it supports diverse open LLMs helping democratize generative AI through the repurposing of legacy GPUs.

Automated Dynamic AI Inference Scaling on HPC-Infrastructure: Integrating Kubernetes, Slurm and vLLM

Distributed, Parallel, and Cluster Computing

Makes supercomputers run AI faster for many people.

26 Nov 2025 1

87%

Gaia: Hybrid Hardware Acceleration for Serverless AI in the 3D Compute Continuum

Distributed, Parallel, and Cluster Computing

Makes AI run faster and cheaper everywhere.

1 Nov 2025 0

87%

AI for Distributed Systems Design: Scalable Cloud Optimization Through Repeated LLMs Sampling And Simulators

Distributed, Parallel, and Cluster Computing

AI helps design better computer systems faster.

20 Oct 2025 1

View PDF Login to Bookmark

Country of Origin

🇵🇹 Portugal

Page Count

10 pages

AIvailable: A Software-Defined Architecture for LLM-as-a-Service on Heterogeneous and Legacy GPUs

Lets old computers run smart AI programs.

Technical Abstract

Automated Dynamic AI Inference Scaling on HPC-Infrastructure: Integrating Kubernetes, Slurm and vLLM

Gaia: Hybrid Hardware Acceleration for Serverless AI in the 3D Compute Continuum

AI for Distributed Systems Design: Scalable Cloud Optimization Through Repeated LLMs Sampling And Simulators