Score: 3

M2XFP: A Metadata-Augmented Microscaling Data Format for Efficient Low-bit Quantization

Published: January 27, 2026 | arXiv ID: 2601.19213v2

By: Weiming Hu , Zihan Zhang , Haoyan Zhang and more

BigTech Affiliations: Huawei

Potential Business Impact:

Makes AI smarter with less computer power.

Business Areas:

DSP Hardware

Existing low-bit Microscaling (MX) formats, such as MXFP4, often suffer from substantial accuracy degradation due to the use of a shared scaling factor with the Power-of-Two format. In this work, we explore strategies that introduce minimal metadata to recover accuracy lost during quantization while maintaining high bit efficiency across a wide range of large language models. We propose a complete algorithm-hardware co-design based on flexible metadata, featuring an online quantization with simple encoding. To support the proposed method efficiently, we implement a lightweight hardware unit and integrate it into the accelerator. Evaluation results demonstrate that our method substantially narrows the accuracy gap, achieving on average a 70.63% reduction in accuracy loss compared to MXFP4 and a 37.30% reduction relative to the latest NVFP4 on LLM benchmarks. Furthermore, our design delivers up to 1.91$\times$ speedup and 1.75$\times$ energy savings over state-of-the-art accelerators. Our code is available at https://github.com/SJTU-ReArch-Group/M2XFP_ASPLOS26.

M$^{\text{2}}$XFP: A Metadata-Augmented Microscaling Data Format for Efficient Low-bit Quantization

Hardware Architecture

Makes computer brains work faster, use less power.

27 Jan 2026 3

91%

Benchmarking Post-Training Quantization of Large Language Models under Microscaling Floating Point Formats

Computation and Language

Makes AI smarter with less computer power.

14 Jan 2026 2

90%

MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving

Machine Learning (CS)

Makes AI understand words better with less computer power.

16 Oct 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

17 pages

M2XFP: A Metadata-Augmented Microscaling Data Format for Efficient Low-bit Quantization

Makes AI smarter with less computer power.

Technical Abstract

M$^{\text{2}}$XFP: A Metadata-Augmented Microscaling Data Format for Efficient Low-bit Quantization

Benchmarking Post-Training Quantization of Large Language Models under Microscaling Floating Point Formats

MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving