Bootstrapping Code Translation with Weighted Multilanguage Exploration
By: Yuhan Wu , Huan Zhang , Wei Cheng and more
Potential Business Impact:
Translates computer code between languages automatically.
Code translation across multiple programming languages is essential yet challenging due to two vital obstacles: scarcity of parallel data paired with executable test oracles, and optimization imbalance when handling diverse language pairs. We propose BootTrans, a bootstrapping method that resolves both obstacles. Its key idea is to leverage the functional invariance and cross-lingual portability of test suites, adapting abundant pivot-language unit tests to serve as universal verification oracles for multilingual RL training. Our method introduces a dual-pool architecture with seed and exploration pools to progressively expand training data via execution-guided experience collection. Furthermore, we design a language-aware weighting mechanism that dynamically prioritizes harder translation directions based on relative performance across sibling languages, mitigating optimization imbalance. Extensive experiments on the HumanEval-X and TransCoder-Test benchmarks demonstrate substantial improvements over baseline LLMs across all translation directions, with ablations validating the effectiveness of both bootstrapping and weighting components.
Similar Papers
Cross-Lingual Stability and Bias in Instruction-Tuned Language Models for Humanitarian NLP
Computation and Language
Helps find human rights abuses in any language.
I Can't Share Code, but I need Translation -- An Empirical Study on Code Translation through Federated LLM
Software Engineering
Helps computers translate code without sharing secrets.
XDoGE: Multilingual Data Reweighting to Enhance Language Inclusivity in LLMs
Computation and Language
Helps computers understand many languages better.