Cost-Efficient Long Code Translation using LLMs while Leveraging Identifier Replacements
By: Manojit Chakraborty, Madhusudan Ghosh, Rishabh Gupta
Potential Business Impact:
Translates long computer code accurately and faster.
In the domain of software development, LLMs have been utilized to automate tasks such as code translation, where source code from one programming language is translated to another while preserving its functionality. However, LLMs often struggle with long source codes that don't fit into the context window, which produces inaccurate translations. To address this, we propose a novel zero-shot code translation method that incorporates identifier replacement. By substituting user-given long identifiers with generalized placeholders during translation, our method allows the LLM to focus on the logical structure of the code, by reducing token count and memory usage, which improves the efficiency and cost-effectiveness of long code translation. Our empirical results demonstrate that our approach preserves syntactical and hierarchical information and produces translation results with reduced tokens.
Similar Papers
Enhancing LLMs in Long Code Translation through Instrumentation and Program State Alignment
Software Engineering
Makes computer code translate better, even long code.
On Effective Semantic Translation for Code: A Study Based on Pseudocode
Software Engineering
Translates computer code better by explaining it first.
An Experimental Study of Real-Life LLM-Proposed Performance Improvements
Software Engineering
Computers write faster code, but humans write best.