Score: 1

RoCE BALBOA: Service-enhanced Data Center RDMA for SmartNICs

Published: July 27, 2025 | arXiv ID: 2507.20412v1

By: Maximilian Jakob Heer , Benjamin Ramhorst , Yu Zhu and more

Potential Business Impact:

Makes computer networks faster for AI.

Business Areas:

Cloud Computing Internet Services, Software

Data-intensive applications in data centers, especially machine learning (ML), have made the network a bottleneck, which in turn has motivated the development of more efficient network protocols and infrastructure. For instance, remote direct memory access (RDMA) has become the standard protocol for data transport in the cloud as it minimizes data copies and reduces CPU-utilization via host-bypassing. Similarly, an increasing amount of network functions and infrastructure have moved to accelerators, SmartNICs, and in-network computing to bypass the CPU. In this paper we explore the implementation and deployment of RoCE BALBOA, an open-source, RoCE v2-compatible, scalable up to hundreds of queue-pairs, and 100G-capable RDMA-stack that can be used as the basis for building accelerators and smartNICs. RoCE BALBOA is customizable, opening up a design space and offering a degree of adaptability not available in commercial products. We have deployed BALBOA in a cluster using FPGAs and show that it has latency and performance characteristics comparable to commercial NICs. We demonstrate its potential by exploring two classes of use cases. One involves enhancements to the protocol for infrastructure purposes (encryption, deep packet inspection using ML). The other showcases the ability to perform line-rate compute offloads with deep pipelines by implementing commercial data preprocessing pipelines for recommender systems that process the data as it arrives from the network before transferring it directly to the GPU. These examples demonstrate how BALBOA enables the exploration and development of SmartNICs and accelerators operating on network data streams.