Score: 0

Are We There Yet? A Measurement Study of Efficiency for LLM Applications on Mobile Devices

Published: March 10, 2025 | arXiv ID: 2504.00002v1

By: Xiao Yan, Yi Ding

Potential Business Impact:

Lets phones understand you without internet.

Business Areas:

Mobile Apps Apps, Mobile, Software

Recent advancements in large language models (LLMs) have prompted interest in deploying these models on mobile devices to enable new applications without relying on cloud connectivity. However, the efficiency constraints of deploying LLMs on resource-limited devices present significant challenges. In this paper, we conduct a comprehensive measurement study to evaluate the efficiency tradeoffs between mobile-based, edge-based, and cloud-based deployments for LLM applications. We implement AutoLife-Lite, a simplified LLM-based application that analyzes smartphone sensor data to infer user location and activity contexts. Our experiments reveal that: (1) Only small-size LLMs (<4B parameters) can run successfully on powerful mobile devices, though they exhibit quality limitations compared to larger models; (2) Model compression is effective in lower the hardware requirement, but may lead to significant performance degradation; (3) The latency to run LLMs on mobile devices with meaningful output is significant (>30 seconds), while cloud services demonstrate better time efficiency (<10 seconds); (4) Edge deployments offer intermediate tradeoffs between latency and model capabilities, with different results on CPU-based and GPU-based settings. These findings provide valuable insights for system designers on the current limitations and future directions for on-device LLM applications.

lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models

Machine Learning (CS)

Measures how fast AI runs on your phone.

7 Oct 2025 1

90%

MNN-LLM: A Generic Inference Engine for Fast Large Language Model Deployment on Mobile Devices

Machine Learning (CS)

Makes big AI models run fast on phones.

12 Jun 2025 0

89%

Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge

Machine Learning (CS)

Makes smart computer programs run on phones.

12 Mar 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

6 pages

Are We There Yet? A Measurement Study of Efficiency for LLM Applications on Mobile Devices

Lets phones understand you without internet.

Technical Abstract

lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models

MNN-LLM: A Generic Inference Engine for Fast Large Language Model Deployment on Mobile Devices

Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge