Score: 0

UCoder: Unsupervised Code Generation by Internal Probing of Large Language Models

Published: December 19, 2025 | arXiv ID: 2512.17385v1

By: Jiajun Wu , Jian Yang , Wei Zhang and more

Potential Business Impact:

Teaches computers to write code without examples.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large language models (LLMs) have demonstrated remarkable capabilities in code generation tasks. However, their effectiveness heavily relies on supervised training with extensive labeled (e.g., question-answering pairs) or unlabeled datasets (e.g., code snippets), which are often expensive and difficult to obtain at scale. To address this limitation, this paper introduces a method IPC, an unsupervised framework that leverages Internal Probing of LLMs for Code generation without any external corpus, even unlabeled code snippets. We introduce the problem space probing, test understanding probing, solution space probing, and knowledge consolidation and reinforcement to probe the internal knowledge and confidence patterns existing in LLMs. Further, IPC identifies reliable code candidates through self-consistency mechanisms and representation-based quality estimation to train UCoder (coder with unsupervised learning). We validate the proposed approach across multiple code benchmarks, demonstrating that unsupervised methods can achieve competitive performance compared to supervised approaches while significantly reducing the dependency on labeled data and computational resources. Analytic experiments reveal that internal model states contain rich signals about code quality and correctness, and that properly harnessing these signals enables effective unsupervised learning for code generation tasks, opening new directions for training code LLMs in resource-constrained scenarios.

Model-Agnostic Correctness Assessment for LLM-Generated Code via Dynamic Internal Representation Selection

Software Engineering

Checks if computer code works correctly.

3 Oct 2025 1

88%

Seed-Coder: Let the Code Model Curate Data for Itself

Computation and Language

Makes computers write and understand code better.

4 Jun 2025 1

87%

From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence

Software Engineering

Helps computers write computer programs from words.

23 Nov 2025 2

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

12 pages

UCoder: Unsupervised Code Generation by Internal Probing of Large Language Models

Teaches computers to write code without examples.

Technical Abstract

Model-Agnostic Correctness Assessment for LLM-Generated Code via Dynamic Internal Representation Selection

Seed-Coder: Let the Code Model Curate Data for Itself

From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence