Bindu vs. other compressors
Modern lossless compression splits into four well-defined categories. Bindu occupies a fifth category: computable compression. Let’s look at how all the categories compare.
For per-codec head-to-heads with measured numbers, see the benchmarks.
1. Classical byte-level compressors
Section titled “1. Classical byte-level compressors”Examples: gzip (DEFLATE), bzip2 (BWT), zstd (LZ77+FSE), xz (LZMA2), brotli.
Approach: Find repeated byte sequences in a sliding window and substitute them with shorter codes.
Strengths: These are tools are ubiquitous, format-agnostic, fast, and battle-hardened over decades. zstd is the industry standard for speed and sz for compression ratio.
Limits: No awareness of structure. They re-discover the same redundancy on every block, which is why a satellite repeatedly re-compresses the same dark-sky frame instead of storing only what changed.
2. Dictionary-trained byte codecs
Section titled “2. Dictionary-trained byte codecs”Examples: zstd with a trained dictionary, brotli’s static dictionary.
Approach: Ship a small (typically ≤256 KB) pre-computed table of common patterns alongside the codec; files compressed with it substitute references back into the table.
Strengths: These codecs deliver flat ratio improvements on repetitive small files with no runtime training overhead, especially on data that resembles the training corpus.
Limits: The dictionary is fixed at build time and brittle outside its training corpus. brotli’s dictionary, for instance, was tuned against 1990s English-language web content and gradually loses relevance as the web evolves.
3. Specialized domain codecs
Section titled “3. Specialized domain codecs”Examples: aec / CCSDS 121 (satellite telemetry), FLAC (lossless audio), ZFP (LLNL float arrays), x265 (video), JPEG-LS, fpack (FITS).
Approach: Hand-tune the entire codec for one data shape (e.g. onboard spacecraft, audio waveforms, or scientific float arrays) assuming a great deal about the input to keep code size and runtime small.
Strengths: These codecs are unmatched on the workload they target, and small enough to run on tightly-constrained embedded hardware. aec encodes at 470 MB/s with just 32 MB of RAM; FLAC is the audio reference.
Limits: They only work on their target shape. Move outside it and they don’t apply at all. You end up with one codec per workload, and operational complexity grows accordingly.
4. Neural compressors
Section titled “4. Neural compressors”Examples: cmix, nncp, paq8 family.
Approach: Train a deep model on the data as it streams, predicting each next symbol from prior context and encoding it in as few bits as the prediction allows.
Strengths: These compressors hit the highest published ratios on text. cmix reaches 7.10× on the Hutter Prize corpus where the best classical codec lands at ~4.7×, near the Shannon limit.
Limits: The model itself ends up larger than the data being compressed — gigabytes of weights for a gigabyte of compressed output. Compute cost is enormous (GPUs, hours of wall time). Unfit for embedded, satellite, or any transmission-constrained setting.
Where Bindu sits
Section titled “Where Bindu sits”Bindu is a fifth category of compression tool known as computable compression. It is distinct from all four categories as follows:
- Symbolic, not byte-level. The unit of compression is a symbol with coordinates and a
root + deltadecomposition, not a byte reference into a sliding window. - Computed on the spot, not pre-trained. No fixed dictionary, no out-of-band training step required. The symbol vocabulary is derived from the data in front of it.
- Optionally growing. A single-shot session is fully self-contained (the file’s header carries everything required to decompress). A long-running pipeline can let the symbol vocabulary grow and amortize across artifacts.
- Tunable per workload. The default pipeline includes seven sub-pipelines for different classes of structure. For a tightly-scoped deployment, you can strip the unused pipelines. For example,a satellite-tuned compressor is on the order of tens of bytes of in-software state.
- Operates on the compressed form. You can search, edit, and cross-compare Bindu artifacts without decompressing them.
Summary
Section titled “Summary”| Property | Classical | Dict-trained | Specialized | Neural | Bindu |
|---|---|---|---|---|---|
| Awareness of structure | None | Limited | Hard-coded | Learned | Symbolic |
| Vocabulary source | None | Training corpus | Codec author | Model weights | Computed on the spot |
| Vocabulary size | None | ≤256 KB | None | GB+ | Tunable: bytes → unbounded |
| Same codec across domains | Yes | Yes | No | Yes | Yes |
| Read without decompressing | No | No | No | No | Yes |
| Search compressed form | No | No | No | No | Yes |
| Edit compressed form | No | No | No | No | Yes |
| Suitable for embedded | Some | Yes | Yes | No | Yes (tuned) |