Score: 0

CodeWiki: Automated Repository-Level Documentation at Scale

Published: October 28, 2025 | arXiv ID: 2510.24428v1

By: Nguyen Hoang Anh , Minh Le-Anh , Bach Le and more

Potential Business Impact:

Helps programmers understand big code projects easily.

Business Areas:

Open Source Software

Developers spend nearly 58% of their time understanding codebases, yet maintaining comprehensive documentation remains challenging due to complexity and manual effort. While recent Large Language Models (LLMs) show promise for function-level documentation, they fail at the repository level, where capturing architectural patterns and cross-module interactions is essential. We introduce CodeWiki, the first open-source framework for holistic repository-level documentation across seven programming languages. CodeWiki employs three innovations: (i) hierarchical decomposition that preserves architectural context, (ii) recursive agentic processing with dynamic delegation, and (iii) synthesis of textual and visual artifacts including architecture diagrams and data flows. We also present CodeWikiBench, the first repository-level documentation benchmark with multi-level rubrics and agentic assessment. CodeWiki achieves 68.79% quality score with proprietary models and 64.80% with open-source alternatives, outperforming existing closed-source systems and demonstrating scalable, accurate documentation for real-world repositories.