Score: 0

Compiler-supported reduced precision and AoS-SoA transformations for heterogeneous hardware

Published: December 5, 2025 | arXiv ID: 2512.05516v1

By: Pawel K. Radtke, Tobias Weinzierl

This study evaluates AoS-to-SoA transformations over reduced-precision data layouts for a particle simulation code on several GPU platforms: We hypothesize that SoA fits particularly well to SIMT, while AoS is the preferred storage format for many Lagrangian codes. Reduced-precision (below IEEE accuracy) is an established tool to address bandwidth constraints, although it remains unclear whether AoS and precision conversions should execute on a CPU or be deployed to a GPU if the compute kernel itself must run on an accelerator. On modern superchips where CPUs and GPUs share (logically) one data space, it is also unclear whether it is advantageous to stream data to the accelerator prior to the calculation, or whether we should let the accelerator transform data on demand, i.e.~work in-place logically. We therefore introduce compiler annotations to facilitate such conversions and to give the programmer the option to orchestrate the conversions in combination with GPU offloading. For some of our compute kernels of interest, Nvidia's G200 platforms yield a speedup of around 2.6 while AMD's MI300A exhibits more robust performance yet profits less. We assume that our compiler-based techniques are applicable to a wide variety of Lagrangian codes and beyond.

Mapping code on Coarse Grained Reconfigurable Arrays using a SAT solver

Hardware Architecture

Finds best way to speed up computer tasks.

2 Dec 2025 1

86%

AxOSyn: An Open-source Framework for Synthesizing Novel Approximate Arithmetic Operators

Hardware Architecture

Makes smart devices use less power.

26 Jul 2025 0

85%

T-SAR: A Full-Stack Co-design for CPU-Only Ternary LLM Inference via In-Place SIMD ALU Reorganization

Hardware Architecture

Makes smart AI run faster on small devices.

17 Nov 2025 0

View PDF Login to Bookmark

Compiler-supported reduced precision and AoS-SoA transformations for heterogeneous hardware

Technical Abstract

Mapping code on Coarse Grained Reconfigurable Arrays using a SAT solver

AxOSyn: An Open-source Framework for Synthesizing Novel Approximate Arithmetic Operators

T-SAR: A Full-Stack Co-design for CPU-Only Ternary LLM Inference via In-Place SIMD ALU Reorganization