Score: 1

Privacy-Preserving Inference for Quantized BERT Models

Published: August 3, 2025 | arXiv ID: 2508.01636v1

By: Tianpei Lu , Bingsheng Zhang , Lekun Peng and more

Potential Business Impact:

Keeps your private data safe during AI use.

With the increasing deployment of generative machine learning models in privacy-sensitive domains such as healthcare and personalized services, ensuring secure inference has become a critical challenge. Secure multi-party computation (MPC) enables privacy-preserving model inference but suffers from high communication and computation overhead. The main bottleneck lies in the expensive secure evaluation of floating-point operations. Quantization offers a promising solution by converting floating-point operations into lower-precision integer computations, significantly reducing overhead. However, existing MPC-based quantized inference methods either rely on public quantization parameters-posing privacy risks-or suffer from inefficiencies, particularly in handling nonlinear functions such as activations and softmax. In this work, we propose a fine-grained, layer-wise quantization scheme and support 1-bit weight fully connected layers in a secure setting. We design a multi-input lookup table protocol to evaluate softmax efficiently and securely. Furthermore, we use dual secret sharing schemes and perform precision conversions via lookup tables, eliminating truncation overhead entirely. Experimental evaluation on BERT-base models demonstrates that our approach achieves up to $8\times$ speedup compared to Lu \emph{et al}. (NDSS 25), $9\times$ speedup compared to Gupta \emph{et al}. (PETS 24) and $22 \times$ speedup compared to Knott \emph{et al}. (NeurIPS 21).

Bits for Privacy: Evaluating Post-Training Quantization via Membership Inference

Machine Learning (CS)

Makes AI models more private by using less detail.

17 Dec 2025 1

89%

Enhancing Trustworthiness with Mixed Precision: Benchmarks, Opportunities, and Challenges

Machine Learning (CS)

Makes AI safer for important jobs.

27 Nov 2025 0

89%

Automatic mixed precision for optimizing gained time with constrained loss mean-squared-error based on model partition to sequential sub-graphs

Machine Learning (CS)

Makes AI models run faster and smaller.

19 May 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

8 pages

Privacy-Preserving Inference for Quantized BERT Models

Keeps your private data safe during AI use.

Technical Abstract

Bits for Privacy: Evaluating Post-Training Quantization via Membership Inference

Enhancing Trustworthiness with Mixed Precision: Benchmarks, Opportunities, and Challenges

Automatic mixed precision for optimizing gained time with constrained loss mean-squared-error based on model partition to sequential sub-graphs