Time Travel: LLM-Assisted Semantic Behavior Localization with Git Bisect
By: Yujing Wang, Weize Hong
Potential Business Impact:
Finds software bugs faster, even when tests are tricky.
We present a novel framework that integrates Large Language Models (LLMs) into the Git bisect process for semantic fault localization. Traditional bisect assumes deterministic predicates and binary failure states assumptions often violated in modern software development due to flaky tests, nonmonotonic regressions, and semantic divergence from upstream repositories. Our system augments bisect traversal with structured chain of thought reasoning, enabling commit by commit analysis under noisy conditions. We evaluate multiple open source and proprietary LLMs for their suitability and fine tune DeepSeekCoderV2 using QLoRA on a curated dataset of semantically labeled diffs. We adopt a weak supervision workflow to reduce annotation overhead, incorporating human in the loop corrections and self consistency filtering. Experiments across multiple open source projects show a 6.4 point absolute gain in success rate from 74.2 to 80.6 percent, leading to significantly fewer failed traversals and by experiment up to 2x reduction in average bisect time. We conclude with discussions on temporal reasoning, prompt design, and finetuning strategies tailored for commit level behavior analysis.
Similar Papers
LLMBisect: Breaking Barriers in Bug Bisection with A Comparative Analysis Pipeline
Machine Learning (CS)
Finds computer bugs faster using smart AI.
Large Language Models for Fault Localization: An Empirical Study
Software Engineering
Finds bugs in computer code faster.
Exploring the Potential and Limitations of Large Language Models for Novice Program Fault Localization
Software Engineering
Helps new coders find mistakes in their programs.