Score: 0

Evaluating Open-Source Large Language Models for Technical Telecom Question Answering

Published: September 26, 2025 | arXiv ID: 2509.21949v1

By: Arina Caraus , Alessio Buscemi , Sumit Kumar and more

Potential Business Impact:

Tests AI for phone network questions.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large Language Models (LLMs) have shown remarkable capabilities across various fields. However, their performance in technical domains such as telecommunications remains underexplored. This paper evaluates two open-source LLMs, Gemma 3 27B and DeepSeek R1 32B, on factual and reasoning-based questions derived from advanced wireless communications material. We construct a benchmark of 105 question-answer pairs and assess performance using lexical metrics, semantic similarity, and LLM-as-a-judge scoring. We also analyze consistency, judgment reliability, and hallucination through source attribution and score variance. Results show that Gemma excels in semantic fidelity and LLM-rated correctness, while DeepSeek demonstrates slightly higher lexical consistency. Additional findings highlight current limitations in telecom applications and the need for domain-adapted models to support trustworthy Artificial Intelligence (AI) assistants in engineering.

Page Count
6 pages

Category
Computer Science:
Networking and Internet Architecture