Bindu vs. other compressors

Modern lossless compression splits into four well-defined categories. Bindu occupies a fifth category: computable compression. Let’s look at how all the categories compare.

For per-codec head-to-heads with measured numbers, see the compression benchmarks.

1. Classical byte-level compressors

Examples: gzip (DEFLATE), bzip2 (BWT), zstd (LZ77+FSE), xz (LZMA2), brotli.

Approach: Find repeated byte sequences in a sliding window and substitute them with shorter codes.

Strengths: These are tools are ubiquitous, format-agnostic, fast, and battle-hardened over decades. zstd is the industry standard for speed and sz for compression ratio.

Limits: No awareness of structure. They re-discover the same redundancy on every block, which is why a satellite repeatedly re-compresses the same dark-sky frame instead of storing only what changed.

2. Dictionary-trained byte codecs

Examples: zstd with a trained dictionary, brotli’s static dictionary.

Approach: Ship a small (typically ≤256 KB) pre-computed table of common patterns alongside the codec; files compressed with it substitute references back into the table.

Strengths: These codecs deliver flat ratio improvements on repetitive small files with no runtime training overhead, especially on data that resembles the training corpus.

Limits: The dictionary is fixed at build time and brittle outside its training corpus. brotli’s dictionary, for instance, was tuned against 1990s English-language web content and gradually loses relevance as the web evolves.

3. Specialized domain codecs

Examples: aec / CCSDS 121 (satellite telemetry), FLAC (lossless audio), ZFP (LLNL float arrays), x265 (video), JPEG-LS, fpack (FITS).

Approach: Hand-tune the entire codec for one data shape (e.g. onboard spacecraft, audio waveforms, or scientific float arrays) assuming a great deal about the input to keep code size and runtime small.

Strengths: These codecs are unmatched on the workload they target, and small enough to run on tightly-constrained embedded hardware. aec encodes at 470 MB/s with just 32 MB of RAM; FLAC is the audio reference.

Limits: They only work on their target shape. Move outside it and they don’t apply at all. You end up with one codec per workload, and operational complexity grows accordingly.

4. Neural compressors

Examples: cmix, nncp, paq8 family.

Approach: Train a deep model on the data as it streams, predicting each next symbol from prior context and encoding it in as few bits as the prediction allows.

Strengths: These compressors hit the highest published ratios on text. cmix reaches 7.10× on the Hutter Prize corpus where the best classical codec lands at ~4.7×, near the Shannon limit.

Limits: The model itself ends up larger than the data being compressed — gigabytes of weights for a gigabyte of compressed output. Compute cost is enormous (GPUs, hours of wall time). Unfit for embedded, satellite, or any transmission-constrained setting.

Where Bindu sits

Bindu is a fifth category of compression tool known as computable compression. It is distinct from all four categories as follows:

Symbolic, not byte-level. The unit of compression is a symbol with coordinates and a root + delta decomposition, not a byte reference into a sliding window.
Computed on the spot, not pre-trained. No fixed dictionary, no out-of-band training step required. The symbol vocabulary is derived from the data in front of it.
Optionally growing. A single-shot session is fully self-contained (the file’s header carries everything required to decompress). A long-running pipeline can let the symbol vocabulary grow and amortize across artifacts.
Tunable per workload. The default pipeline includes seven sub-pipelines for different classes of structure. For a tightly-scoped deployment, you can strip the unused pipelines. For example,a satellite-tuned compressor is on the order of tens of bytes of in-software state.
Operates on the compressed form. You can search, edit, and cross-compare Bindu artifacts without decompressing them.

Summary

Property	Classical	Dict-trained	Specialized	Neural	Bindu
Awareness of structure	None	Limited	Hard-coded	Learned	Symbolic
Vocabulary source	None	Training corpus	Codec author	Model weights	Computed on the spot
Vocabulary size	None	≤256 KB	None	GB+	Tunable: bytes → unbounded
Same codec across domains	Yes	Yes	No	Yes	Yes
Read without decompressing	No	No	No	No	Yes
Search compressed form	No	No	No	No	Yes
Edit compressed form	No	No	No	No	Yes
Suitable for embedded	Some	Yes	Yes	No	Yes (tuned)