Skip to content

vs xz / LZMA

xz (using LZMA2) optimizes for ratio above all else. It’s slow — both to compress and, notably, to decompress — but on text-like data it historically achieved among the best ratios of any widely deployed byte compressor. It’s the closest competitor for “best ratio on a general-purpose corpus.”

  • xz: LZMA2. Large sliding window (up to 4 GB), range coding, heavy context modeling.
  • Bindu: symbolic pipeline; coordinates and deltas rather than byte references.

From the industry benchmark:

AggregatexzBindu
All files compressed (% reduction)76.15%77.95%
Per-file wins7/3019/30

xz is the strongest competitor on this corpus — it wins more files than any other codec we measured. Bindu still wins more files overall, but xz’s wins are decisive on a handful of file types: x86 binaries, certain hyperspectral cubes, and AIS records.

Selected workloads:

Workloadxz -9eBindu
Hutter enwik9 (1 GB Wikipedia)4.69×5.43×
Silesia webster4.95×5.75×
Silesia nci23.15×24.79×
GOES-16 weather telemetry17.79×21.98×
OMNI solar wind timestamps3.70×2,349×
Silesia mozilla (x86 binary)3.83×2.88×
Silesia samba (source)5.78×5.03×
Silesia sao (sparse astronomical)1.64×1.44×

The headline: on telemetry, sequential, and structured data Bindu pulls ahead, often substantially. On x86 binaries and a few specific encodings xz holds the edge.

xz’s speed profile is its main drawback. xz -9e compresses at ~1–2 MB/s and decompresses at ~75 MB/s on a single core. Bindu encode is in the same general range as xz; decode is faster. Crucially, for many Bindu workloads you don’t decompress at all — you operate on the compressed form, which sidesteps xz’s slow-decode penalty entirely.

  • One-off cold archives where compression time and read time both don’t matter.
  • Source tarballs and OS packages, where xz has entrenched ecosystem support.
  • Workloads dominated by x86 binaries, where xz’s executable filters are tuned specifically for that case.
  • General-purpose corpora where you want a single fixed codec rather than a tunable system.
  • Sequential telemetry / satellite — see the flagship use case.
  • Workloads where you read or query the compressed form repeatedly.
  • Anywhere the symbolic representation pays off downstream of compression — search, edit, cross-file comparison.