Score: 1

AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning

Published: October 7, 2025 | arXiv ID: 2510.05468v1

By: Yurun Song , Zhuoyi Yang , Ian G. Harris and more

Potential Business Impact:

Makes AI training faster and use less data.

Business Areas:

Machine Learning Artificial Intelligence, Data and Analytics, Software

Large Language Models (LLMs) are scaling rapidly, creating significant challenges for collaborative server client distributed training, particularly in terms of communication efficiency and computational overheads. To address these challenges, we implement Parameter-efficient Split Learning, which effectively balances efficiency and performance for collaborative training on low-resource devices. To reduce communication overhead in collaborative training, we introduce Adaptive Mixed bit Activation Quantization (AMAQ), a strategy that progressively compresses activations and gradients from high precision (6 to 8 bits) to low precision (3 to 4 bits). AMAQ achieves this by effectively allocating bit budgets across channels based on feature wise and layer wise importance using bit regularization. Under the same bit budgets, AMAQ outperforms fixed-precision approaches, delivering about 2.5% higher generation accuracy and about 1.3% better classification accuracy for models like LLaMA3 8B and Qwen2.5 7B. In addition, it significantly enhances training stability and reducing ultra-low bit representation collapse during the training. Experiments demonstrate that AMAQ integrates effectively into practical multi-machine collaborative training setups, offering superior inference accuracy with only a modest communication overhead for bits adaptation during training. This trade off makes AMAQ a practical and effective solution for collaborative training with minimal communication cost.

AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models

Machine Learning (CS)

Makes smart computer programs use less memory.

15 Sep 2025 1

88%

AxLLM: accelerator architecture for large language models with computation reuse capability

Hardware Architecture

Makes AI models run faster and use less power.

26 Sep 2025 0

88%

Distilling Large Language Models for Network Active Queue Management

Networking and Internet Architecture

Makes internet faster by smarter traffic control.

28 Jan 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com

Page Count

14 pages

AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning

Makes AI training faster and use less data.

Technical Abstract

AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models

AxLLM: accelerator architecture for large language models with computation reuse capability

Distilling Large Language Models for Network Active Queue Management