Use cases
Bindu handles a wide range of data well. It has strong out-of-the-box performance with repetitive or sequential workloads such as telemetry, logs, source code, or structured records. Specialized formats such as scientific arrays or domain-specific binary layouts typically reach their best ratios after some tuning to the corpus.
Let’s take a look at where Bindu can be put to immediate use today.
Applicable domains
Section titled “Applicable domains”Satellite & telemetry
Section titled “Satellite & telemetry”Telemetry data such as satellite downlinks, sensor streams, IoT feeds, vehicle GPS, and space-weather measurements is sequential, slowly changing, and dominated by transmission cost rather than storage. These are exactly the conditions where Bindu’s symbolic delta encoding pays off the most. Real telemetry routinely compresses into the high 90s of percent; on signals where the underlying state barely moves, ratios climb into the thousands or even millions to one. Tuned for a single satellite, the compressor strips down to tens of bytes of in-software state, small enough to deploy on the spacecraft itself.
Log archives
Section titled “Log archives”Application logs are sequential, repetitive, and typically queried long after they’re written, which are three properties that line up neatly with Bindu’s strengths. The grammar pipeline picks up the recurring scaffolding (timestamps, service names, error codes, request IDs) and ships only the residual values on top. Once an archive is compressed, months or years of log data can be searched and aggregated directly against the compressed wire, which collapses the conventional “decompress everything to find one line” pattern into a coordinate-space lookup.
Source code repositories
Section titled “Source code repositories”Source code is structured (parseable grammars), highly repetitive (keywords, common idioms, library calls), and often stored at scale in monorepos, code-search indexes, or code-training corpora for LLMs. Bindu’s grammar pipeline factors the recurring scaffolding into rules and stores only what’s distinctive about each file. The same archive then supports pattern lookup directly against the compressed form, which makes large-scale code-search and code-training workloads dramatically cheaper to run.
Scientific datasets
Section titled “Scientific datasets”Scientific data tends to come in three shapes: tabular (CSV, Parquet), tensor (HDF5, NetCDF, Zarr), and time series. Each shape maps cleanly onto one of Bindu’s pipelines — DICT for columnar categorical data, LINDELTA for stride-encoded numeric arrays, BWT for textual fields and identifiers. The combination compresses both the storage footprint and the downstream analysis time, since queries and reductions can run against the compressed data without materializing intermediate copies.
LLM training corpora
Section titled “LLM training corpora”Training corpora consist of body text plus a layer of structured metadata around each document such as URLs, licenses, language codes, quality scores, source identifiers. The grammar pipeline compresses the metadata layer heavily; the body itself compresses at roughly classical-codec parity. The more interesting effect is downstream: the same compressed corpus can be searched and filtered during training without a decompression round-trip, so curating subsets becomes a coordinate-space operation rather than a full re-pass over raw bytes.
Document archives
Section titled “Document archives”Support tickets, invoices, contracts, emails are all examples of long-retention archives where each document carries structured fields plus a free-text body. The structured fields compress dramatically through the grammar pipeline; bodies compress like ordinary prose. Where Bindu pays off most is the read pattern: archives queried sporadically over years (audit trails, regulatory holds, customer-history lookups) can be searched directly against the compressed form, so the conventional “rehydrate the whole archive to find one record” workflow disappears entirely.
What computable compression unlocks
Section titled “What computable compression unlocks”Across the domains above, Bindu’s ability to operate on compressed data makes several things economical that aren’t at conventional compute cost:
- Searching long-retention archives. Audit trails, regulatory holds, customer histories. Instead of rehydrating an entire archive to find one record, you query the compressed form directly.
- Editing data in place. Redactions, schema corrections, GDPR deletions. Modify the compressed artifact without a full unpack/repack cycle.
- Cheaper LLM training and retrieval. Training corpora can be filtered and curated in compressed form. Retrieval pipelines can search compressed indexes directly.
- Smaller embedded deployments. Tuned versions of Bindu strip down to tens of bytes of in-software state — small enough to run on a CubeSat.
Ideal use cases for Bindu
Section titled “Ideal use cases for Bindu”Bindu adds the most value when one or more of these is true:
- You store or transmit data at scale (terabytes and up).
- You query, search, or edit archived data and want those operations to cost less.
- You have structured or specialized data that conventional codecs don’t handle well.
- You retain data for long periods and want it to keep getting cheaper to use over time.
When not to use Bindu
Section titled “When not to use Bindu”There are some circumstances where Bindu is unlikely to be a strong candidate for achieving improvements in storage, data transfer or compute.
- Already-compressed media — video (H.264/265), audio (MP3/AAC/Opus), images (JPEG/WebP), or archive formats (
.zip,.tar.gz). Nothing meaningful is left to compress, and Bindu’s per-file overhead becomes a small tax rather than a saving. - Random or encrypted data — by definition there is no exploitable structure, so no compressor will help.
- Workloads needing universal decoder support — HTTP responses to third parties or files headed to unknown clients. Use gzip, brotli, or zstd, all of which are decodable everywhere.
- Sub-millisecond decode latency — CDN hot paths and real-time streams need raw decoded bytes as fast as possible. zstd decodes roughly 200× faster than Bindu today, so this is the wrong fit.
- Memory-constrained embedded targets — Bindu’s encoder can peak at ~20 GB RSS on the largest test corpora, so CubeSat-scale deployments need a tuned-down build with most pipelines stripped.
- Compress-once, read-once, sequential reads — Bindu’s structural advantages (search, random access, in-place edit) don’t pay off if you’ll never use them. A byte compressor is simpler and sufficient.
- Length-changing in-place edits — the fast edit paths require
len(old) == len(new). Variable-length edits fall back to a full decompress / recompress, which has no advantage over the conventional pipeline.
The full decision checklist (including the soft cases — general-purpose mixed corpora, x86 binaries, and so on) and a TL;DR decision tree both live at When not to use Bindu.
Tuning to a new domain
Section titled “Tuning to a new domain”Bindu can be tuned to a specific workload to achieve higher levels of performance than the out-of-the-box defaults. See Tuning Bindu for the available levers and how to apply them.