Score: 3

OpenZL: A Graph-Based Model for Compression

Published: October 3, 2025 | arXiv ID: 2510.03203v2

By: Yann Collet , Nick Terrell , W. Felix Handte and more

BigTech Affiliations: Meta

Potential Business Impact:

Makes files smaller and faster to use.

Business Areas:
Semantic Web Internet Services

Research techniques in the last decade have improved lossless compression ratios by significantly increasing processing time. These techniques have remained obscure because production systems require high throughput and low resource utilization. In practice, application-specific compression algorithms that leverage knowledge of the data structure and semantics are more popular. Application-specific compressor systems outperform even the best generic compressors, but these techniques have some drawbacks. Application-specific compressors are inherently limited in applicability, have high development costs, and are difficult to maintain and deploy. In this work, we show that these challenges can be overcome with a new compression strategy. We propose the "graph model" of compression, a new theoretical framework for representing compression as a directed acyclic graph of modular codecs. OpenZL compresses data into a self-describing wire format, any configuration of which can be decompressed by a universal decoder. OpenZL's design enables rapid development of tailored compressors with minimal code; its universal decoder eliminates deployment lag; and its investment in a well-vetted standard component library minimizes security risks. Experimental results demonstrate that OpenZL achieves superior compression ratios and speeds compared to state-of-the-art general-purpose compressors on a variety of real-world datasets. Internal deployments at Meta have also shown consistent improvements in size and/or speed, with development timelines reduced from months to days. OpenZL thus represents a significant advance in practical, scalable, and maintainable data compression for modern data-intensive applications.

Country of Origin
🇺🇸 United States

Repos / Data Links

Page Count
25 pages

Category
Computer Science:
Information Retrieval