Handling of Memory Page Faults during Virtual-Address RDMA
By: Antonis Psistakis
Potential Business Impact:
Lets computers share data without copying.
Nowadays, avoiding system calls during cluster communication (e.g., in Data Centers and High Performance Computing) in modern high-speed interconnection networks has become a necessity, due to the high overhead of multiple data copies between kernel and user space. User-level zero-copy Remote Direct Memory Access (RDMA) technologies address this problem by improving performance and reducing system energy consumption. However, traditional RDMA engines cannot tolerate page faults and therefore use various techniques to avoid them. State-of-the-art RDMA approaches typically rely on pinning address spaces or multiple pages per application. This method introduces long-term disadvantages due to increased programming complexity (pinning and unpinning buffers), limits on how much memory can be pinned, and inefficient memory utilization. In addition, pinning does not fully prevent page faults because modern operating systems apply internal optimization mechanisms, such as Transparent Huge Pages (THP), which are enabled by default in Linux. This thesis implements a page-fault handling mechanism integrated with the DMA engine of the ExaNeSt project. Faults are detected by the ARM System Memory Management Unit (SMMU) and resolved through a hardware-software solution that can request retransmission when needed. This mechanism required modifications to the Linux SMMU driver, the development of a new software library, changes to the DMA engine hardware, and adjustments to the DMA scheduling logic. Experiments were conducted on the Quad-FPGA Daughter Board (QFDB) of ExaNeSt, which uses Xilinx Zynq UltraScale+ MPSoCs. Finally, we evaluate our mechanism and compare it against alternatives such as pinning and pre-faulting, and discuss the advantages of our approach.
Similar Papers
IOMMU Support for Virtual-Address Remote DMA in an ARMv8 environment
Distributed, Parallel, and Cluster Computing
Lets many computers share information faster.
IOMMU Support for Virtual-Address Remote DMA in an ARMv8 environment
Distributed, Parallel, and Cluster Computing
Makes many computers share memory correctly.
Dynamic reconfiguration for malleable applications using RMA
Distributed, Parallel, and Cluster Computing
Lets big computer programs change size without stopping.