Edit and search in depth

This page continues from Operating on compressed data. The walkthrough showed the core behavior with alice.txt.bindu; this page covers the command details and the cases where Bindu can operate directly on the compressed form.

Edit

bindu replace file.bindu "old" "new"

bindu replace rewrites every occurrence of old with new. The Alice-to-Mabel walkthrough hits the fast path because the replacement has the same byte length and the rule body for Alice can be patched in place. In general, bindu replace picks one of three tiers in priority order:

Grammar rule patch — equal-length, in-place via memory-mapped write. This is the fast path shown by the Alice walkthrough and measured by the ADS-B benchmark on the previous page.
Dictionary entry rewrite — equal-length pattern that matches a DICT alphabet entry; rewrites the entry inline.
Decompress + scan + recompress — the general case for non-equal-length edits or unsupported wire models. Correctness-preserving; same cost as the conventional pipeline.

Length-preserving edits are the compressed-space win. Non-length-preserving edits force tier 3: correct, but the same cost class as the conventional pipeline.

Search

bindu search "pattern" file.bindu
bindu count  "pattern" file.bindu
bindu find   "pattern" file.bindu -n 10

When you run bindu search (or its count / find siblings), Bindu inspects the compressed file’s header to see which pipeline produced it, then picks a search strategy tailored to that pipeline. For most pipelines the search runs directly against the compressed bytes; for the residual-encoded ones it has to decompress first.

Wire model	Body content	Search strategy	Compressed search?
RAW	body is the raw bytes	Boyer-Moore-Horspool over body	Yes
CONSTANT	single repeated byte	O(1) — does the pattern repeat?	Yes
DICT	alphabet + index stream	alphabet-mapped index-stream scan	Yes
BWT + index	Burrows-Wheeler L-column + sidecar	FM-index backward search (Ferragina-Manzini)	Yes (with FM-index sidecar)
Grammar	rule table inside legacy wire	rule-table memcmp	Yes
RLE	run-length residuals against prior bytes	decompress + scan	No
LINDELTA	stride-period residuals against fixed offset	decompress + scan	No
PAETH2D	2D Paeth residuals against neighbor pixels	decompress + scan	No

Five model classes admit zero-decompression search today (RAW, CONSTANT, DICT, BWT-with-index, grammar-rule probe). Three (RLE, LINDELTA, PAETH2D) fall back to decompression-first search because their wire formats encode residuals against a neighbor context that has to be reconstructed before any candidate match can be verified. For measured search results across larger corpora, see Operating benchmarks.

Count vs locate

bindu count returns just the occurrence count and is the fastest path. bindu find -n K returns the first K positions and pays the additional O(K · log n) for position recovery on the FM-index path. If you only need the count (e.g., for filtering or aggregation), prefer count.

Cross-file operations

The same property makes cross-file analytics tractable on large compressed corpora. You can ask questions across many files without unpacking any of them. A canonical example from the satellite domain:

“Find every time a left turn was taken at a 30-degree angle.”

against a fleet of compressed telemetry files runs as a coordinate-space scan, not a decompression pipeline. You only pay decompression cost when you want to materialize the matching records.

Why this matters

Three independent line items get cheaper at once:

Storage — the bytes you keep are smaller.
Network — the bytes you transmit are smaller.
Compute — the bytes you read at query time are smaller, because the query runs against the compressed form.

For workloads where Bindu shines, compression isn’t a one-shot win at write time — it pays out continuously every time the data is read.