vs gzip
gzip is the universal default: available everywhere, well understood, and good enough for most text. Here’s how Bindu differs.
- gzip: DEFLATE (LZ77 + Huffman) over a byte stream. 32 KB sliding window. No awareness of file structure.
- Bindu: symbolic pipeline producing coordinates and deltas. The compressed form is the working representation.
Ratio (measured)
Section titled “Ratio (measured)”From the industry benchmark — full corpus, 30 files, SHA-256 round-trip verified:
| Aggregate | gzip | Bindu |
|---|---|---|
| All files compressed (% reduction) | 64.51% | 77.95% |
| Per-file wins | 0 | 19/30 |
On individual workloads:
| Workload | gzip -9 ratio | Bindu ratio |
|---|---|---|
Silesia webster (text) | 3.44× | 5.75× |
Silesia xml | 8.07× | 12.63× |
Silesia nci (structured) | 11.23× | 24.79× |
| GOES-16 weather telemetry | 12.30× | 21.98× |
| MMS mission status flags | 1,025× | 263,314× |
| Hutter Prize enwik9 (1 GB Wikipedia) | 3.10× | 5.43× |
Bindu wins decisively on structured and sequential data. On general-purpose mixed corpora, the gap is smaller; on already-compressed media or random bytes, both produce ~1×.
Capability differences
Section titled “Capability differences”| Capability | gzip | Bindu |
|---|---|---|
| Read compressed form without decode | No | Yes |
| Search the compressed file directly | No | Yes |
| Edit a region in place | No | Yes |
| Cross-file comparison without decompress | No | Yes |
| Tunable per workload | No | Yes |
| Format-agnostic (works everywhere) | Yes | Partial |
When gzip is still the right choice
Section titled “When gzip is still the right choice”- You need a format every tool understands today, with no install dependency.
- You’re compressing transient transport payloads (HTTP responses) where decode latency dominates.
- The data is already-compressed media, encrypted, or otherwise high-entropy — both will produce ~1× and gzip’s tooling reach is the deciding factor.
When to prefer Bindu
Section titled “When to prefer Bindu”- Sequential, structured, telemetry-style data — see the satellite & telemetry use case.
- Workloads where the read path matters: search, query, or operate on the compressed form rather than just store and retrieve.
- Long-retention archives where the storage and downstream-compute savings amortize.