ASIC-based Compression Accelerators for Storage Systems: Design, Placement, and Profiling Insights
By: Tao Lu , Jiapin Wang , Yelin Shan and more
Potential Business Impact:
Makes data storage faster and more efficient.
Lossless compression imposes significant computational over head on datacenters when performed on CPUs. Hardware compression and decompression processing units (CDPUs) can alleviate this overhead, but optimal algorithm selection, microarchitectural design, and system-level placement of CDPUs are still not well understood. We present the design of an ASIC-based in-storage CDPU and provide a comprehensive end-to-end evaluation against two leading ASIC accelerators, Intel QAT 8970 and QAT 4xxx. The evaluation spans three dominant CDPU placement regimes: peripheral, on-chip, and in-storage. Our results reveal: (i) acute sensitivity of throughput and latency to CDPU placement and interconnection, (ii) strong correlation between compression efficiency and data patterns/layouts, (iii) placement-driven divergences between microbenchmark gains and real-application speedups, (iv) discrepancies between module and system-level power efficiency, and (v) scalability and multi-tenant interference is sues of various CDPUs. These findings motivate a placement-aware, cross-layer rethinking of hardware (de)compression for hyperscale storage infrastructures.
Similar Papers
A High-Throughput GPU Framework for Adaptive Lossless Compression of Floating-Point Data
Databases
Shrinks big computer data without losing any details.
Algorithm-Driven On-Chip Integration for High Density and Low Cost
Hardware Architecture
Lets many chip designs share one factory run.
GPU-Based Floating-point Adaptive Lossless Compression
Databases
Makes computer data smaller, faster, and perfect.