Skip to content

Compressing data

This page walks through the standard Bindu workflow on a recognizable file: Alice’s Adventures in Wonderland by Lewis Carroll, taken from the Project Gutenberg public-domain corpus. The same flow applies to any input — telemetry, logs, source code, scientific data — but Alice is small enough to follow along on a laptop and well-known enough to make the search/edit demo on the next page legible.

Terminal window
# macOS
brew install bindu-labs/tap/bindu
# Linux
curl -fsSL https://get.bindu.dev | sh
# Verify
bindu --version

For other platforms see the release page.

Terminal window
curl -O https://www.gutenberg.org/files/11/11-0.txt
mv 11-0.txt alice.txt
ls -lh alice.txt
# -rw-r--r-- alice.txt 174K
Terminal window
bindu compress alice.txt

This produces alice.txt.bindu alongside the original. Bindu auto-detected English narrative prose and routed the input through the corresponding sub-pipeline (see Overview). Output:

alice.txt 174 KB
alice.txt.bindu ~46 KB (~3.8× ratio)

The exact ratio depends on the configuration; Bindu lands at roughly the same range as the strongest classical codecs on prose-style English text.

Terminal window
gzip --keep alice.txt
zstd --keep alice.txt -19 -o alice.txt.zst
xz --keep -9e alice.txt
ls -lh alice.txt*

Approximate sizes you’ll see (varies slightly by version):

CodecSizeRatio
alice.txt174 KB1.0×
gzip -6~62 KB2.8×
zstd -19~52 KB3.3×
xz -9e~48 KB3.6×
Bindu~46 KB3.8×

This is the “out of the box on prose” story: Bindu is competitive with — and on this file slightly ahead of — the strongest classical codecs. The bigger wins are on sequential, structured data, not prose. See the satellite & telemetry use case for the workloads where Bindu pulls decisively ahead, and the industry benchmark for the full measured picture.

Terminal window
bindu decompress alice.txt.bindu --output alice-roundtrip.txt
diff alice.txt alice-roundtrip.txt
# (no output — files are identical)
shasum alice.txt alice-roundtrip.txt
# matching SHA-1 hashes

Bindu is lossless. The decompressed file is byte-identical to the original.

You didn’t supply a schema. You didn’t pick a sub-pipeline. You didn’t specify a level. The selector at the front of the pipeline routed the input through the right path automatically. For tightly-tuned deployments — a satellite that only ever produces one shape of data — you can strip unused sub-pipelines down to a much smaller compressor; for general use, the defaults are the right starting point.

The compression part is the conventional half of Bindu. The unconventional half is what you can do with the compressed file without ever decompressing it. That’s covered in Operating on compressed data.