When not to use Bindu

Bindu is a tunable approach to compression, not a universal hammer. Using it where it doesn’t apply means dragging around tuning complexity for no benefit. Use this checklist before adopting it.

Don’t use Bindu if…

Your data is already-compressed media

Video (H.264/265), audio (MP3/AAC/Opus), images (JPEG/WebP), and archive formats (.zip, .tar.gz) cannot be meaningfully compressed further. Bindu produces output the same size as the input plus a small header. Use the source format as-is.

Your data is random or encrypted

By definition there’s nothing to compress. Same outcome as already-compressed media. Encrypt after compression, not before.

Your data is unstructured narrative prose and you don’t need legibility

Books, essays, long-form web text. Bindu lands competitive with the best classical codecs on prose (it’s the strongest classical result on the Hutter Prize corpora — see the industry benchmark) but doesn’t outrun them by a margin worth the tuning. If you’re not going to query or operate on the compressed form, zstd or xz with a trained dictionary is simpler.

You need universal tooling support

Anywhere the consumer isn’t in your control — HTTP responses to third parties, emailed attachments, files posted to FTP servers used by unknown clients — pick a format every system can decompress. gzip, brotli, and zstd all qualify. Bindu does not yet.

Your files are very small and one-off

Below a few KB of input, the per-file header and symbol-table bootstrap are a meaningful fraction of the output. Bindu shines on artifacts where there’s enough data for the symbol table to amortize.

Your workload is compress-once, read-once, sequentially

Bindu’s structural advantages include searching the compressed form, random access, and operating without decompressing. If you’ll read the file exactly once, sequentially, from start to end, those advantages don’t pay off. A byte compressor is simpler and sufficient.

You need sub-millisecond decode latency

Aggregate decode throughput is 4.6 MB/s today — zstd -19 decodes the same files at 1,075 MB/s. CDN hot paths, real-time streaming, and other decode-heavy workloads are not the right fit. (Search and edit on most data classes don’t require decompression — that’s the point of the system — but if your workload genuinely needs raw decoded bytes fast, pick zstd.)

You need length-changing in-place edits

The fast in-place edit paths (Tier 1 grammar rule patch, Tier 2 dictionary rewrite) require len(old) == len(new). Variable-length edits force a full decompress / recompress (Tier 3), with no advantage over the conventional pipeline. See Honest limits → In-place edit constraint.

You’re memory-constrained

Peak RSS during enwik9 compression is 19.5 GB for a 953 MB input — xz -9 peaks at 675 MB. CubeSat or otherwise memory-constrained deployments need to either chunk the input or use a tuned-down pipeline budget.

Your data has no exploitable structure

Output of bindu compress --dry-run and the predicted ratio isn’t meaningfully better than zstd -19? There’s no reason to pay the tuning cost. Bindu’s wins come from structure; no structure → no wins.

Soft cases — depends on the workload

General-purpose mixed corpora out of the box

On a Silesia-style mix (text + binaries + scientific images + structured records) out of the box, Bindu is competitive with the best — wins more files than any other codec, but loses some by 0.2–0.3× to xz or bzip2. If your data really is “everything mixed together” and you can’t tune for it, the choice between Bindu, xz, and zstd is a small one. Tune Bindu for your specific shape and the gap opens up.

x86 binaries

xz with executable filters is specifically tuned for x86 instruction encoding. On mozilla in the Silesia corpus, xz wins by ~25% over Bindu. If your archive is mostly x86 binaries, xz is the better default.

Do use Bindu if…

The mirror image of the above:

Data is sequential and slowly-changing (telemetry, satellite, sensor streams). Flagship use case.
Data is structured (JSON, JSONL, CSV, Parquet, source code, logs, scientific arrays).
Volumes are large or repeated.
You want capabilities beyond storage: search, query, edit, cross-file compare on the compressed form.
You control both writer and reader and can amortize a tuning step.

TL;DR decision tree

Is the data sequential telemetry / satellite / sensor stream?
├─ Yes → Use Bindu (flagship case).
└─ No  → Is it structured (JSON, CSV, code, logs, tabular)?
         ├─ No  → Use zstd (or gzip for compatibility).
         └─ Yes → Will you query, search, or operate on the compressed form?
                  ├─ Yes → Use Bindu.
                  └─ No  → Use zstd -19 or xz -9e.