Hardware optimization on Android for inference of AI models
By: Iulius Gherasim, Carlos García Sánchez
Potential Business Impact:
Makes phone AI apps run much faster.
The pervasive integration of Artificial Intelligence models into contemporary mobile computing is notable across numerous use cases, from virtual assistants to advanced image processing. Optimizing the mobile user experience involves minimal latency and high responsiveness from deployed AI models with challenges from execution strategies that fully leverage real time constraints to the exploitation of heterogeneous hardware architecture. In this paper, we research and propose the optimal execution configurations for AI models on an Android system, focusing on two critical tasks: object detection (YOLO family) and image classification (ResNet). These configurations evaluate various model quantization schemes and the utilization of on device accelerators, specifically the GPU and NPU. Our core objective is to empirically determine the combination that achieves the best trade-off between minimal accuracy degradation and maximal inference speed-up.
Similar Papers
Breaking SafetyCore: Exploring the Risks of On-Device AI Deployment
Machine Learning (CS)
Hackers steal and break phone's private AI.
Accelerating Mobile Inference through Fine-Grained CPU-GPU Co-Execution
Machine Learning (CS)
Lets phones run smart programs much faster.
Accelerating Local AI on Consumer GPUs: A Hardware-Aware Dynamic Strategy for YOLOv10s
CV and Pattern Recognition
Makes AI faster on your laptop.