Score: 2

CrossPL: Evaluating Large Language Models on Cross Programming Language Code Generation

Published: July 26, 2025 | arXiv ID: 2507.19904v1

By: Zhanhang Xiong , Dongxia Wang , Yuekang Li and more

Potential Business Impact:

Helps computers write code that works between different languages.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

As large language models (LLMs) become increasingly embedded in software engineering workflows, a critical capability remains underexplored: generating correct code that enables cross-programming-language (CPL) interoperability. This skill is essential for building complex systems that integrate components written in multiple languages via mechanisms like inter-process communication (IPC). To bridge this gap, we present CrossPL, the first benchmark designed to systematically evaluate LLMs' ability to generate CPL-interoperating code. CrossPL comprises 1,982 tasks centered around IPC, covering six widely-used programming languages and seven representative CPL techniques. We construct this benchmark by (i) analyzing 19,169 multi-language GitHub repositories using 156 hand-crafted finite state machines (FSMs), and (ii) developing an LLM-based pipeline that automatically extracts CPL code snippets, generates task instructions, and validates functional correctness. We evaluate 14 state-of-the-art general-purpose LLMs and 6 code-oriented LLMs released in the past three years on CrossPL via FSM-based validation. Results reveal that even the best-performing models struggle with CPL scenarios, underscoring the need for more targeted research in this space. Our benchmark and code are available at: https://anonymous.4open.science/r/crosspl-2814.

Cross-Task Benchmarking and Evaluation of General-Purpose and Code-Specific Large Language Models

Software Engineering

Makes computers better at understanding language and code.

4 Dec 2025 1

89%

Perish or Flourish? A Holistic Evaluation of Large Language Models for Code Generation in Functional Programming

Programming Languages

Helps computers write better functional code.

5 Jan 2026 1

89%

Fine-Tuning Code Language Models to Detect Cross-Language Bugs

Software Engineering

Finds bugs where different computer languages meet.

29 Jul 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 🇦🇺 China, Australia

Page Count

12 pages

CrossPL: Evaluating Large Language Models on Cross Programming Language Code Generation

Helps computers write code that works between different languages.

Technical Abstract

Cross-Task Benchmarking and Evaluation of General-Purpose and Code-Specific Large Language Models

Perish or Flourish? A Holistic Evaluation of Large Language Models for Code Generation in Functional Programming

Fine-Tuning Code Language Models to Detect Cross-Language Bugs