Score: 1

Evaluating Large Language Models for Code Translation: Effects of Prompt Language and Prompt Design

Published: September 16, 2025 | arXiv ID: 2509.12973v1

By: Aamer Aljagthami , Mohammed Banabila , Musab Alshehri and more

Potential Business Impact:

Helps computers rewrite code between languages.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large language models (LLMs) have shown promise for automated source-code translation, a capability critical to software migration, maintenance, and interoperability. Yet comparative evidence on how model choice, prompt design, and prompt language shape translation quality across multiple programming languages remains limited. This study conducts a systematic empirical assessment of state-of-the-art LLMs for code translation among C++, Java, Python, and C#, alongside a traditional baseline (TransCoder). Using BLEU and CodeBLEU, we quantify syntactic fidelity and structural correctness under two prompt styles (concise instruction and detailed specification) and two prompt languages (English and Arabic), with direction-aware evaluation across language pairs. Experiments show that detailed prompts deliver consistent gains across models and translation directions, and English prompts outperform Arabic by 13-15%. The top-performing model attains the highest CodeBLEU on challenging pairs such as Java to C# and Python to C++. Our evaluation shows that each LLM outperforms TransCoder across the benchmark. These results demonstrate the value of careful prompt engineering and prompt language choice, and provide practical guidance for software modernization and cross-language interoperability.

Cross-Lingual Prompt Steerability: Towards Accurate and Robust LLM Behavior across Languages

Computation and Language

Makes AI understand and work in many languages.

2 Dec 2025 1

91%

Are Prompts All You Need? Evaluating Prompt-Based Large Language Models (LLM)s for Software Requirements Classification

Software Engineering

Helps computers sort software ideas faster, needing less data.

17 Sep 2025 2

91%

Uncovering Systematic Failures of LLMs in Verifying Code Against Natural Language Specifications

Software Engineering

Computers can't always tell if code matches instructions.

17 Aug 2025 0

View PDF Login to Bookmark

Page Count

5 pages

Evaluating Large Language Models for Code Translation: Effects of Prompt Language and Prompt Design

Helps computers rewrite code between languages.

Technical Abstract

Cross-Lingual Prompt Steerability: Towards Accurate and Robust LLM Behavior across Languages

Are Prompts All You Need? Evaluating Prompt-Based Large Language Models (LLM)s for Software Requirements Classification

Uncovering Systematic Failures of LLMs in Verifying Code Against Natural Language Specifications