Score: 0

TinyLLM: Evaluation and Optimization of Small Language Models for Agentic Tasks on Edge Devices

Published: November 27, 2025 | arXiv ID: 2511.22138v1

By: Mohd Ariful Haque , Fahad Rahman , Kishor Datta Gupta and more

Potential Business Impact:

Makes smart AI work on phones without internet.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

This paper investigates the effectiveness of small language models (SLMs) for agentic tasks (function/tool/API calling) with a focus on running agents on edge devices without reliance on cloud infrastructure. We evaluate SLMs using the Berkeley Function Calling Leaderboard (BFCL) framework and describe parameter-driven optimization strategies that include supervised fine-tuning (SFT), parameter-efficient fine-tuning (PEFT), reinforcement learning (RL)-based optimization, preference alignment via Direct Preference Optimization (DPO), and hybrid methods. We report results for models including TinyAgent, TinyLlama, Qwen, and xLAM across BFCL categories (simple, multiple, parallel, parallel-multiple, and relevance detection), both in live and non-live settings, and in multi-turn evaluations. We additionally detail a DPO training pipeline constructed from AgentBank data (e.g., ALFRED), including our conversion of SFT data to chosen-rejected pairs using TinyLlama responses as rejected outputs and manual validation. Our results demonstrate clear accuracy differences across model scales where medium-sized models (1-3B parameters) significantly outperform ultra-compact models (<1B parameters), achieving up to 65.74% overall accuracy, and 55.62% multi-turn accuracy with hybrid optimization. This study highlights the importance of hybrid optimization strategies that enable small language models to deliver accurate, efficient, and stable agentic AI on edge devices, making privacy-preserving, low-latency autonomous agents practical beyond the cloud.

Small Models, Big Tasks: An Exploratory Empirical Study on Small Language Models for Function Calling

Artificial Intelligence

Lets small computers understand and use commands.

27 Apr 2025 1

90%

Characterizing and Understanding Energy Footprint and Efficiency of Small Language Model on Edges

Distributed, Parallel, and Cluster Computing

Makes smart gadgets run AI without internet.

7 Nov 2025 0

90%

Small Language Models are the Future of Agentic AI

Artificial Intelligence

Smaller AI models can do many jobs cheaper.

2 Jun 2025 1

View PDF Login to Bookmark

Page Count

8 pages

TinyLLM: Evaluation and Optimization of Small Language Models for Agentic Tasks on Edge Devices

Makes smart AI work on phones without internet.

Technical Abstract

Small Models, Big Tasks: An Exploratory Empirical Study on Small Language Models for Function Calling

Characterizing and Understanding Energy Footprint and Efficiency of Small Language Model on Edges

Small Language Models are the Future of Agentic AI