Score: 0

Imitation Game: Reproducing Deep Learning Bugs Leveraging an Intelligent Agent

Published: December 17, 2025 | arXiv ID: 2512.14990v1

By: Mehil B Shah, Mohammad Masudur Rahman, Foutse Khomh

Despite their wide adoption in various domains (e.g., healthcare, finance, software engineering), Deep Learning (DL)-based applications suffer from many bugs, failures, and vulnerabilities. Reproducing these bugs is essential for their resolution, but it is extremely challenging due to the inherent nondeterminism of DL models and their tight coupling with hardware and software environments. According to recent studies, only about 3% of DL bugs can be reliably reproduced using manual approaches. To address these challenges, we present RepGen, a novel, automated, and intelligent approach for reproducing deep learning bugs. RepGen constructs a learning-enhanced context from a project, develops a comprehensive plan for bug reproduction, employs an iterative generate-validate-refine mechanism, and thus generates such code using an LLM that reproduces the bug at hand. We evaluate RepGen on 106 real-world deep learning bugs and achieve a reproduction rate of 80.19%, a 19.81% improvement over the state-of-the-art measure. A developer study involving 27 participants shows that RepGen improves the success rate of DL bug reproduction by 23.35%, reduces the time to reproduce by 56.8%, and lowers participants' cognitive load.

Improving the Reproducibility of Deep Learning Software: An Initial Investigation through a Case Study Analysis

Machine Learning (CS)

Makes computer learning results work again.

6 May 2025 1

87%

BugGen: A Self-Correcting Multi-Agent LLM Pipeline for Realistic RTL Bug Synthesis

Software Engineering

Finds computer chip mistakes much faster.

12 Jun 2025 2

86%

Reflective Paper-to-Code Reproduction Enabled by Fine-Grained Verification

Software Engineering

Helps computers copy science papers into working code.

21 Aug 2025 0

View PDF Login to Bookmark

Imitation Game: Reproducing Deep Learning Bugs Leveraging an Intelligent Agent

Technical Abstract

Improving the Reproducibility of Deep Learning Software: An Initial Investigation through a Case Study Analysis

BugGen: A Self-Correcting Multi-Agent LLM Pipeline for Realistic RTL Bug Synthesis

Reflective Paper-to-Code Reproduction Enabled by Fine-Grained Verification