When not to use Bindu
Bindu is a tunable approach to compression, not a universal hammer. Using it where it doesn’t apply means dragging around tuning complexity for no benefit. Use this checklist before adopting it.
Don’t use Bindu if…
Section titled “Don’t use Bindu if…”Your data is already-compressed media
Section titled “Your data is already-compressed media”Video (H.264/265), audio (MP3/AAC/Opus), images (JPEG/WebP), and archive formats (.zip, .tar.gz) cannot be meaningfully compressed further. Bindu produces output the same size as the input plus a small header. Use the source format as-is.
Your data is random or encrypted
Section titled “Your data is random or encrypted”By definition there’s nothing to compress. Same outcome as already-compressed media. Encrypt after compression, not before.
Your data is unstructured narrative prose and you don’t need legibility
Section titled “Your data is unstructured narrative prose and you don’t need legibility”Books, essays, long-form web text. Bindu lands competitive with the best classical codecs on prose (it’s the strongest classical result on the Hutter Prize corpora — see the industry benchmark) but doesn’t outrun them by a margin worth the tuning. If you’re not going to query or operate on the compressed form, zstd or xz with a trained dictionary is simpler.
You need universal tooling support
Section titled “You need universal tooling support”Anywhere the consumer isn’t in your control — HTTP responses to third parties, emailed attachments, files posted to FTP servers used by unknown clients — pick a format every system can decompress. gzip, brotli, and zstd all qualify. Bindu does not yet.
Your files are very small and one-off
Section titled “Your files are very small and one-off”Below a few KB of input, the per-file header and symbol-table bootstrap are a meaningful fraction of the output. Bindu shines on artifacts where there’s enough data for the symbol table to amortize.
Your workload is compress-once, read-once, sequentially
Section titled “Your workload is compress-once, read-once, sequentially”Bindu’s structural advantages include searching the compressed form, random access, and operating without decompressing. If you’ll read the file exactly once, sequentially, from start to end, those advantages don’t pay off. A byte compressor is simpler and sufficient.
You need sub-millisecond decode latency
Section titled “You need sub-millisecond decode latency”Aggregate decode throughput is 4.6 MB/s today — zstd -19 decodes the same files at 1,075 MB/s. CDN hot paths, real-time streaming, and other decode-heavy workloads are not the right fit. (Search and edit on most data classes don’t require decompression — that’s the point of the system — but if your workload genuinely needs raw decoded bytes fast, pick zstd.)
You need length-changing in-place edits
Section titled “You need length-changing in-place edits”The fast in-place edit paths (Tier 1 grammar rule patch, Tier 2 dictionary rewrite) require len(old) == len(new). Variable-length edits force a full decompress / recompress (Tier 3), with no advantage over the conventional pipeline. See Honest limits → In-place edit constraint.
You’re memory-constrained
Section titled “You’re memory-constrained”Peak RSS during enwik9 compression is 19.5 GB for a 953 MB input — xz -9 peaks at 675 MB. CubeSat or otherwise memory-constrained deployments need to either chunk the input or use a tuned-down pipeline budget.
Your data has no exploitable structure
Section titled “Your data has no exploitable structure”Output of bindu compress --dry-run and the predicted ratio isn’t meaningfully better than zstd -19? There’s no reason to pay the tuning cost. Bindu’s wins come from structure; no structure → no wins.
Soft cases — depends on the workload
Section titled “Soft cases — depends on the workload”General-purpose mixed corpora out of the box
Section titled “General-purpose mixed corpora out of the box”On a Silesia-style mix (text + binaries + scientific images + structured records) out of the box, Bindu is competitive with the best — wins more files than any other codec, but loses some by 0.2–0.3× to xz or bzip2. If your data really is “everything mixed together” and you can’t tune for it, the choice between Bindu, xz, and zstd is a small one. Tune Bindu for your specific shape and the gap opens up.
x86 binaries
Section titled “x86 binaries”xz with executable filters is specifically tuned for x86 instruction encoding. On mozilla in the Silesia corpus, xz wins by ~25% over Bindu. If your archive is mostly x86 binaries, xz is the better default.
Do use Bindu if…
Section titled “Do use Bindu if…”The mirror image of the above:
- Data is sequential and slowly-changing (telemetry, satellite, sensor streams). Flagship use case.
- Data is structured (JSON, JSONL, CSV, Parquet, source code, logs, scientific arrays).
- Volumes are large or repeated.
- You want capabilities beyond storage: search, query, edit, cross-file compare on the compressed form.
- You control both writer and reader and can amortize a tuning step.
TL;DR decision tree
Section titled “TL;DR decision tree”Is the data sequential telemetry / satellite / sensor stream?├─ Yes → Use Bindu (flagship case).└─ No → Is it structured (JSON, CSV, code, logs, tabular)? ├─ No → Use zstd (or gzip for compatibility). └─ Yes → Will you query, search, or operate on the compressed form? ├─ Yes → Use Bindu. └─ No → Use zstd -19 or xz -9e.