Skip to content

vs gzip

gzip is the universal default: available everywhere, well understood, and good enough for most text. Here’s how Bindu differs.

  • gzip: DEFLATE (LZ77 + Huffman) over a byte stream. 32 KB sliding window. No awareness of file structure.
  • Bindu: symbolic pipeline producing coordinates and deltas. The compressed form is the working representation.

From the industry benchmark — full corpus, 30 files, SHA-256 round-trip verified:

AggregategzipBindu
All files compressed (% reduction)64.51%77.95%
Per-file wins019/30

On individual workloads:

Workloadgzip -9 ratioBindu ratio
Silesia webster (text)3.44×5.75×
Silesia xml8.07×12.63×
Silesia nci (structured)11.23×24.79×
GOES-16 weather telemetry12.30×21.98×
MMS mission status flags1,025×263,314×
Hutter Prize enwik9 (1 GB Wikipedia)3.10×5.43×

Bindu wins decisively on structured and sequential data. On general-purpose mixed corpora, the gap is smaller; on already-compressed media or random bytes, both produce ~1×.

CapabilitygzipBindu
Read compressed form without decodeNoYes
Search the compressed file directlyNoYes
Edit a region in placeNoYes
Cross-file comparison without decompressNoYes
Tunable per workloadNoYes
Format-agnostic (works everywhere)YesPartial
  • You need a format every tool understands today, with no install dependency.
  • You’re compressing transient transport payloads (HTTP responses) where decode latency dominates.
  • The data is already-compressed media, encrypted, or otherwise high-entropy — both will produce ~1× and gzip’s tooling reach is the deciding factor.
  • Sequential, structured, telemetry-style data — see the satellite & telemetry use case.
  • Workloads where the read path matters: search, query, or operate on the compressed form rather than just store and retrieve.
  • Long-retention archives where the storage and downstream-compute savings amortize.