Use cases

Bindu is general-purpose and domain-agnostic by design. The system works by deriving a set of root formulas and symbols that describe a dataset, then expressing the data itself as deltas from those roots. Each additional pass re-derives formulas over the symbols produced by the prior pass and moves the compressed form closer to the theoretical entropy limit.

A natural fit follows: without any tuning, Bindu is strongest on datasets that express efficiently as deltas. This includes sequential telemetry, structured records with templated fields, repetitive code or log scaffolding, or wherver the data is mostly variations on a stable theme.

For other styles of data, out of the box, Bindu is near or above the compression ratios of every classical codec we’ve measured (though often substantially faster). When tuned to a particular workload, our expectation is that, given enough data and compute, Bindu will outperform every available compression tool, including against domain-specific codecs.

This page surveys the workloads where Bindu adds especially strong value relative to the alternatives. Each section below summarizes the case and links to a deeper page.

Applicable domains

Satellite & telemetry

Telemetry data such as satellite downlinks, sensor streams, IoT feeds, vehicle GPS, and space-weather measurements is sequential, slowly changing, and dominated by transmission cost rather than storage. These are exactly the conditions where Bindu’s symbolic delta encoding pays off the most. Real telemetry routinely compresses into the high 90s of percent; on signals where the underlying state barely moves, ratios climb into the thousands or even millions to one. Tuned for a single satellite, the compressor strips down to tens of bytes of in-software state — small enough to deploy on the spacecraft itself.

Learn more →

Log archives

Application logs are sequential, repetitive, and typically queried long after they’re written, which are three properties that line up neatly with Bindu’s strengths. The grammar pipeline picks up the recurring scaffolding (timestamps, service names, error codes, request IDs) and ships only the residual values on top. Once an archive is compressed, months or years of log data can be searched and aggregated directly against the compressed wire, which collapses the conventional “decompress everything to find one line” pattern into a coordinate-space lookup.

Learn more →

Source code repositories

Source code is structured (parseable grammars), highly repetitive (keywords, common idioms, library calls), and often stored at scale in monorepos, code-search indexes, or code-training corpora for LLMs. Bindu’s grammar pipeline factors the recurring scaffolding into rules and stores only what’s distinctive about each file. The same archive then supports pattern lookup directly against the compressed form, which makes large-scale code-search and code-training workloads dramatically cheaper to run.

Learn more →

Scientific datasets

Scientific data tends to come in three shapes: tabular (CSV, Parquet), tensor (HDF5, NetCDF, Zarr), and time series. Each shape maps cleanly onto one of Bindu’s pipelines — DICT for columnar categorical data, LINDELTA for stride-encoded numeric arrays, BWT for textual fields and identifiers. The combination compresses both the storage footprint and the downstream analysis time, since queries and reductions can run against the compressed data without materializing intermediate copies.

Learn more →

LLM training corpora

Training corpora consist of body text plus a layer of structured metadata around each document such as URLs, licenses, language codes, quality scores, source identifiers. The grammar pipeline compresses the metadata layer heavily; the body itself compresses at roughly classical-codec parity. The more interesting effect is downstream: the same compressed corpus can be searched and filtered during training without a decompression round-trip, so curating subsets becomes a coordinate-space operation rather than a full re-pass over raw bytes.

Learn more →

Document archives

Support tickets, invoices, contracts, emails are all examples of long-retention archives where each document carries structured fields plus a free-text body. The structured fields compress dramatically through the grammar pipeline; bodies compress like ordinary prose. Where Bindu pays off most is the read pattern: archives queried sporadically over years (audit trails, regulatory holds, customer-history lookups) can be searched directly against the compressed form, so the conventional “rehydrate the whole archive to find one record” workflow disappears entirely.

Learn more →

When not to use Bindu

There are some circumstances where Bindu is unlikely to be a strong candidate for achieving improvements in storage, data transfer or compute.

Already-compressed media — video (H.264/265), audio (MP3/AAC/Opus), images (JPEG/WebP), or archive formats (.zip, .tar.gz). Nothing meaningful is left to compress, and Bindu’s per-file overhead becomes a small tax rather than a saving.
Random or encrypted data — by definition there is no exploitable structure, so no compressor will help.
Workloads needing universal decoder support — HTTP responses to third parties or files headed to unknown clients. Use gzip, brotli, or zstd, all of which are decodable everywhere.
Sub-millisecond decode latency — CDN hot paths and real-time streams need raw decoded bytes as fast as possible. zstd decodes roughly 200× faster than Bindu today, so this is the wrong fit.
Memory-constrained embedded targets — Bindu’s encoder can peak at ~20 GB RSS on the largest test corpora, so CubeSat-scale deployments need a tuned-down build with most pipelines stripped.
Compress-once, read-once, sequential reads — Bindu’s structural advantages (search, random access, in-place edit) don’t pay off if you’ll never use them. A byte compressor is simpler and sufficient.
Length-changing in-place edits — the fast edit paths require len(old) == len(new). Variable-length edits fall back to a full decompress / recompress, which has no advantage over the conventional pipeline.

The full decision checklist (including the soft cases — general-purpose mixed corpora, x86 binaries, and so on) and a TL;DR decision tree both live at When not to use Bindu.

Tuning to a new domain

Bindu can be tuned to a specific workload to achieve higher levels of performance than the out-of-the-box defaults. See Tuning Bindu for the available levers and how to apply them.