Skip to content

vs zstd

Zstandard is the strongest general-purpose byte-level compressor available. It’s fast, tunable across a wide ratio/speed range, and supports trained dictionaries. If you’re choosing between “smart byte compression” and Bindu, this is the comparison that matters most.

  • zstd: LZ77 variant with FSE/Huffman entropy, configurable window up to 2 GB, support for pre-trained dictionaries up to 110 KB.
  • Bindu: symbolic pipeline computing a coordinate-keyed symbol table on the spot. The symbol table has no fixed cap and represents semantic structure rather than byte sequences.

From the industry benchmark:

AggregatezstdBindu
All files compressed (% reduction)73.51%77.95%
Per-file wins1/3019/30

Selected workloads:

Workloadzstd bestBindu
Silesia webster4.90×5.75×
Silesia nci20.84×24.79×
GOES-16 weather telemetry15.93×21.98×
OMNI solar wind timestamps3.24×2,349×
MMS mission status flags29,103×263,314×
Silesia mozilla (binaries)3.42×2.88×
MMS Epoch timestamps1.98×1.34×

Bindu wins most of the corpus. zstd holds a small edge on a few file types (notably some binary formats and certain int64 timestamp encodings), and on those the gap is small.

zstd holds a real lead on raw decode throughput — it’s the streaming-decode champion at ~2 GB/s on the test rig. Bindu prioritizes operating on the compressed form rather than getting back to bytes as fast as possible. If your workload is “decompress and stream as fast as possible,” zstd is the right pick.

zstd supports trained dictionaries, which is the closest analogue to Bindu’s symbol table.

Aspectzstd dictBindu symbol table
Size limit110 KBNone
Built once and frozen?YesNo (can grow)
Carries semantic structure?No (bytes)Yes (coordinates)
Search compressed form?NoYes

A trained zstd dictionary is the right tool when you have a pile of similar small files and want a flat ratio improvement everywhere. A Bindu symbol table is the right tool when you want the meaning of the data to be reusable — searchable, editable, cross-file comparable.

  • Streaming workloads where decode throughput is the bottleneck.
  • Mixed-content workloads where you want one well-known codec across everything.
  • You need ubiquitous tooling support (Linux kernel, btrfs, every modern language has a zstd binding).
  • Sequential telemetry / satellite — see the flagship use case.
  • Workloads where you’ll search, edit, or query the compressed form (the computable property).
  • Long-retention archives where the symbol table amortizes across many files.