Score: 0

The Chonkers Algorithm: Content-Defined Chunking with Strict Guarantees on Size and Locality

Published: September 14, 2025 | arXiv ID: 2509.11121v1

By: Benjamin Berger

Potential Business Impact:

Makes computer files smaller and easier to update.

Business Areas:
Content Delivery Network Content and Publishing

This paper presents the Chonkers algorithm, a novel content-defined chunking method providing simultaneous strict guarantees on chunk size and edit locality. Unlike existing algorithms such as Rabin fingerprinting and anchor-based methods, Chonkers achieves bounded propagation of edits and precise control over chunk sizes. I describe the algorithm's layered structure, theoretical guarantees, implementation considerations, and introduce the Yarn datatype, a deduplicated, merge-tree-based string representation benefiting from Chonkers' strict guarantees.

Page Count
28 pages

Category
Computer Science:
Data Structures and Algorithms