Skip to content

Honest limits

Compression always trades along three axes — ratio, speed, and memory — and Bindu is engineered for ratio at the cost of the other two. This page is the honest accounting, with measured numbers from the unified benchmark.

Aggregate across the 30-file corpus is 0.6 MB/s at the max-ratio setting. With --shape and --dtype hints on structured satellite data this rises to 17–714 MB/s (e.g., 553 MB/s on MMS flags, 714 MB/s on Sentinel-2 B04). On unstructured generic text it is consistently slow.

For comparison, zstd -1 runs at 322 MB/s on the same hardware.

If your workload is compress-once / read-many, this is a fair trade. If it is compress-frequently in a hot path, choose zstd.

Aggregate decode is 4.6 MB/s — the weakest axis. zstd -19 on the same files decodes at 1,075 MB/s. We are working on this; for now, decode-heavy workloads (CDN, hot read paths) are not the right fit.

Search and edit, however, do not require decompression on most data classes — that is the whole point of the system. If you are reaching for Bindu specifically because of the computable properties, decode throughput is rarely the binding constraint anyway.

Peak RSS during enwik9 compression is 19.5 GB for a 953 MB input. xz -9 peaks at 675 MB on the same file.

CubeSat or otherwise memory-constrained deployments need to either chunk the input or use a smaller pipeline budget. Tuning the binary down (see Tuning Bindu) is what makes embedded deployments tractable — a satellite-tuned compressor strips most pipelines and runs in tens of bytes of in-software state, plus the binary.

The fast edit paths require len(old) == len(new):

  • Tier 1 — Grammar rule patch, equal-length, in-place via memory-mapped write. Measured at 3 ms on 4 MB ADS-B, against 1.33 s for the conventional pipeline.
  • Tier 2 — Dictionary entry rewrite, equal-length pattern that matches a DICT alphabet entry; rewrites the entry inline.
  • Tier 3 — Decompress + scan + recompress. The general case for non-equal-length edits or unsupported wire models. Correctness-preserving; same cost as the conventional pipeline.

Length-preserving edits are the common case for fielded data corrections (numeric value updates, coordinate fixes, ID rewrites). Non-length-preserving edits force tier 3 because changing an underlying rule body’s length cascades through every offset that follows it.

  • Already-compressed media (JPEG, MP4, MP3) — nothing to compress further.
  • Workloads that need sub-millisecond decode latency — CDN hot paths, real-time streaming.
  • Memory-constrained embedded targets that can’t afford the full encoder’s working set, unless tuned down.

The full decision checklist is at Use Cases → When not to use Bindu.

Every benchmark number quoted in this document is reproducible from the public corpus and harnesses; CI re-verifies the per-file ratios on every commit, with SHA-256 round-trip checks that fail the build if results drift.