Edit and search in depth
This page continues from Operating on compressed data. The walkthrough showed the core behavior with alice.txt.bindu; this page covers the command details and the cases where Bindu can operate directly on the compressed form.
bindu replace file.bindu "old" "new"bindu replace rewrites every occurrence of old with new. The Alice-to-Mabel walkthrough hits the fast path because the replacement has the same byte length and the rule body for Alice can be patched in place. In general, bindu replace picks one of three tiers in priority order:
- Grammar rule patch — equal-length, in-place via memory-mapped write. This is the fast path shown by the Alice walkthrough and measured by the ADS-B benchmark on the previous page.
- Dictionary entry rewrite — equal-length pattern that matches a DICT alphabet entry; rewrites the entry inline.
- Decompress + scan + recompress — the general case for non-equal-length edits or unsupported wire models. Correctness-preserving; same cost as the conventional pipeline.
Length-preserving edits are the compressed-space win. Non-length-preserving edits force tier 3: correct, but the same cost class as the conventional pipeline.
Search
Section titled “Search”bindu search "pattern" file.bindubindu count "pattern" file.bindubindu find "pattern" file.bindu -n 10When you run bindu search (or its count / find siblings), Bindu inspects the compressed file’s header to see which pipeline produced it, then picks a search strategy tailored to that pipeline. For most pipelines the search runs directly against the compressed bytes; for the residual-encoded ones it has to decompress first.
| Wire model | Body content | Search strategy | Compressed search? |
|---|---|---|---|
| RAW | body is the raw bytes | Boyer-Moore-Horspool over body | Yes |
| CONSTANT | single repeated byte | O(1) — does the pattern repeat? | Yes |
| DICT | alphabet + index stream | alphabet-mapped index-stream scan | Yes |
| BWT + index | Burrows-Wheeler L-column + sidecar | FM-index backward search (Ferragina-Manzini) | Yes (with FM-index sidecar) |
| Grammar | rule table inside legacy wire | rule-table memcmp | Yes |
| RLE | run-length residuals against prior bytes | decompress + scan | No |
| LINDELTA | stride-period residuals against fixed offset | decompress + scan | No |
| PAETH2D | 2D Paeth residuals against neighbor pixels | decompress + scan | No |
Five model classes admit zero-decompression search today (RAW, CONSTANT, DICT, BWT-with-index, grammar-rule probe). Three (RLE, LINDELTA, PAETH2D) fall back to decompression-first search because their wire formats encode residuals against a neighbor context that has to be reconstructed before any candidate match can be verified. For measured search results across larger corpora, see Operating benchmarks.
Count vs locate
Section titled “Count vs locate”bindu count returns just the occurrence count and is the fastest path. bindu find -n K returns the first K positions and pays the additional O(K · log n) for position recovery on the FM-index path. If you only need the count (e.g., for filtering or aggregation), prefer count.
Cross-file operations
Section titled “Cross-file operations”The same property makes cross-file analytics tractable on large compressed corpora. You can ask questions across many files without unpacking any of them. A canonical example from the satellite domain:
“Find every time a left turn was taken at a 30-degree angle.”
against a fleet of compressed telemetry files runs as a coordinate-space scan, not a decompression pipeline. You only pay decompression cost when you want to materialize the matching records.
Why this matters
Section titled “Why this matters”Three independent line items get cheaper at once:
- Storage — the bytes you keep are smaller.
- Network — the bytes you transmit are smaller.
- Compute — the bytes you read at query time are smaller, because the query runs against the compressed form.
For workloads where Bindu shines, compression isn’t a one-shot win at write time — it pays out continuously every time the data is read.