Score: 0

TZ-LLM: Protecting On-Device Large Language Models with Arm TrustZone

Published: November 17, 2025 | arXiv ID: 2511.13717v1

By: Xunjie Wang , Jiacheng Shi , Zihan Zhao and more

Potential Business Impact:

Keeps smart phone AI secrets safe from hackers.

Business Areas:

Tizen Platforms

Large Language Models (LLMs) deployed on mobile devices offer benefits like user privacy and reduced network latency, but introduce a significant security risk: the leakage of proprietary models to end users. To mitigate this risk, we propose a system design for protecting on-device LLMs using Arm Trusted Execution Environment (TEE), TrustZone. Our system addresses two primary challenges: (1) The dilemma between memory efficiency and fast inference (caching model parameters within TEE memory). (2) The lack of efficient and secure Neural Processing Unit (NPU) time-sharing between Rich Execution Environment (REE) and TEE. Our approach incorporates two key innovations. First, we employ pipelined restoration, leveraging the deterministic memory access patterns of LLM inference to prefetch parameters on demand, hiding memory allocation, I/O and decryption latency under computation time. Second, we introduce a co-driver design, creating a minimal data plane NPU driver in the TEE that collaborates with the full-fledged REE driver. This reduces the TEE TCB size and eliminates control plane reinitialization overhead during NPU world switches. We implemented our system on the emerging OpenHarmony OS and the llama.cpp inference framework, and evaluated it with various LLMs on an Arm Rockchip device. Compared to a strawman TEE baseline lacking our optimizations, our system reduces TTFT by up to 90.9% and increases decoding speed by up to 23.2%.

Confidential LLM Inference: Performance and Cost Across CPU and GPU TEEs

Performance

Keeps private AI information safe during use.

23 Sep 2025 1

88%

SecureInfer: Heterogeneous TEE-GPU Architecture for Privacy-Critical Tensors for Large Language Model Deployment

Cryptography and Security

Keeps AI private on phones, still fast.

22 Oct 2025 0

88%

An Early Experience with Confidential Computing Architecture for On-Device Model Protection

Cryptography and Security

Keeps phone AI private and fast.

11 Apr 2025 1

View PDF Login to Bookmark

Page Count

18 pages

TZ-LLM: Protecting On-Device Large Language Models with Arm TrustZone

Keeps smart phone AI secrets safe from hackers.

Technical Abstract

Confidential LLM Inference: Performance and Cost Across CPU and GPU TEEs

SecureInfer: Heterogeneous TEE-GPU Architecture for Privacy-Critical Tensors for Large Language Model Deployment

An Early Experience with Confidential Computing Architecture for On-Device Model Protection