REMODEL-LLM: Transforming C code to Java using LLMs
By: Aryan Gupta, Y. Raghu Reddy
The automated translation of C code to Java code is a notoriously difficult task, fraught with challenges stemming from fundamental paradigm shifts (procedural vs. Object Oriented), memory models (manual pointers vs. Garbage Collection), and incompatible data types. This paper investigates the efficacy of 19 small, quantized LLMs (under 20 billion parameters) for the C to Java translation task. We use a novel, hybrid pipeline that leverages Abstract Syntax Trees (ASTs) for semantic decomposition and employs a highly constrained, rule based prompting strategy. The results are stark: a clear multi tiered performance divide emerged. The vast majority of models (Tier 3, e.g., llama3.1, gemma3, starcoder2) failed 100\% of the tests, proving incapable of generating even basic, runnable Java boilerplate. A small middle tier (Tier 2, e.g., mistral-nemo and mistral) produced runnable code but was plagued by dangerous semantic failures and wrong translations. Only three models (Tier 1: phi4, deepseek-coder-v2, codeqwen) proved viable, passing over 50\% of the test suite. Even these top models failed on the most complex C concepts, such as function pointers, sizeof, and enum logic, revealing a hard ceiling for the reasoning capabilities of current quantized models.
Similar Papers
Leveraging LLMs for Automated Translation of Legacy Code: A Case Study on PL/SQL to Java Transformation
Software Engineering
Helps old computer code become new code.
An Experimental Study of Real-Life LLM-Proposed Performance Improvements
Software Engineering
Computers write faster code, but humans write best.
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Software Engineering
Helps computers write computer programs from words.