Practical and Private Hybrid ML Inference with Fully Homomorphic Encryption
By: Sayan Biswas , Philippe Chartier , Akash Dhasade and more
Potential Business Impact:
Keeps secrets safe while computers do math.
In contemporary cloud-based services, protecting users' sensitive data and ensuring the confidentiality of the server's model are critical. Fully homomorphic encryption (FHE) enables inference directly on encrypted inputs, but its practicality is hindered by expensive bootstrapping and inefficient approximations of non-linear activations. We introduce Safhire, a hybrid inference framework that executes linear layers under encryption on the server while offloading non-linearities to the client in plaintext. This design eliminates bootstrapping, supports exact activations, and significantly reduces computation. To safeguard model confidentiality despite client access to intermediate outputs, Safhire applies randomized shuffling, which obfuscates intermediate values and makes it practically impossible to reconstruct the model. To further reduce latency, Safhire incorporates advanced optimizations such as fast ciphertext packing and partial extraction. Evaluations on multiple standard models and datasets show that Safhire achieves 1.5X - 10.5X lower inference latency than Orion, a state-of-the-art baseline, with manageable communication overhead and comparable accuracy, thereby establishing the practicality of hybrid FHE inference.
Similar Papers
Network and Compiler Optimizations for Efficient Linear Algebra Kernels in Private Transformer Inference
Cryptography and Security
Keeps your private AI chats secret from others.
A Scalable Multi-GPU Framework for Encrypted Large-Model Inference
Cryptography and Security
Lets AI learn secrets without seeing them.
Design and Optimization of Cloud Native Homomorphic Encryption Workflows for Privacy-Preserving ML Inference
Cryptography and Security
Keeps your private data safe during computer learning.