Score: 0

H-FA: A Hybrid Floating-Point and Logarithmic Approach to Hardware Accelerated FlashAttention

Published: October 31, 2025 | arXiv ID: 2511.00295v1

By: Kosmas Alexandridis, Giorgos Dimitrakopoulos

Potential Business Impact:

Makes AI faster and use less power.

Business Areas:

Flash Storage Hardware

Transformers have significantly advanced AI and machine learning through their powerful attention mechanism. However, computing attention on long sequences can become a computational bottleneck. FlashAttention mitigates this by fusing the softmax and matrix operations into a tiled computation pattern that decouples performance from sequence length. Though designed for GPUs, its simplicity also makes it well suited for direct hardware acceleration. To improve hardware implementation, we compute FlashAttention using a mixture of floating-point and fixed-point logarithm domain representations. Floating-point is used to compute attention scores from query and key matrices, while logarithmic computation simplifies the fused computation of softmax normalization and the multiplication with the value matrix. This transformation, called H-FA, replaces vector-wide floating-point multiplication and division operations by additions and subtractions implemented efficiently with fixed-point arithmetic in the logarithm domain. Exponential function evaluations are effectively omitted and fused with the rest operations, and the final result is directly returned to floating-point arithmetic without any additional hardware overhead. Hardware implementation results at 28nm demonstrate that H-FA achieves a 26.5% reduction in area and a 23.4% reduction in power, on average, compared to FlashAttention parallel hardware architectures built solely with floating-point datapaths, without hindering performance.

Custom Algorithm-based Fault Tolerance for Attention Layers in Transformers

Machine Learning (CS)

Finds mistakes in AI chips fast.

22 Jul 2025 0

86%

FaRAccel: FPGA-Accelerated Defense Architecture for Efficient Bit-Flip Attack Resilience in Transformer Models

Cryptography and Security

Makes AI safer from hackers, faster and cheaper.

28 Oct 2025 1

86%

Low Power Vision Transformer Accelerator with Hardware-Aware Pruning and Optimized Dataflow

Hardware Architecture

Makes computer vision faster and use less power.

16 Oct 2025 0

View PDF Login to Bookmark

Country of Origin

🇬🇷 Greece

Page Count

13 pages

H-FA: A Hybrid Floating-Point and Logarithmic Approach to Hardware Accelerated FlashAttention

Makes AI faster and use less power.

Technical Abstract

Custom Algorithm-based Fault Tolerance for Attention Layers in Transformers

FaRAccel: FPGA-Accelerated Defense Architecture for Efficient Bit-Flip Attack Resilience in Transformer Models

Low Power Vision Transformer Accelerator with Hardware-Aware Pruning and Optimized Dataflow