Compressing data

This page walks through the standard Bindu workflow on a recognizable file: Alice’s Adventures in Wonderland by Lewis Carroll, taken from the Project Gutenberg public-domain corpus. The same flow applies to any input — telemetry, logs, source code, scientific data — but Alice is small enough to follow along on a laptop and well-known enough to make the search/edit demo on the next page legible.

1. Install Bindu

# macOS
brew install bindu-labs/tap/bindu

# Linux
curl -fsSL https://get.bindu.dev | sh

# Verify
bindu --version

For other platforms see the release page.

2. Get Alice in Wonderland

curl -O https://www.gutenberg.org/files/11/11-0.txt
mv 11-0.txt alice.txt

ls -lh alice.txt
# -rw-r--r--  alice.txt   174K

3. Compress with Bindu

bindu compress alice.txt

This produces alice.txt.bindu alongside the original. Bindu auto-detected English narrative prose and routed the input through the corresponding sub-pipeline (see Overview). Output:

alice.txt          174 KB
alice.txt.bindu    ~46 KB    (~3.8× ratio)

The exact ratio depends on the configuration; Bindu lands at roughly the same range as the strongest classical codecs on prose-style English text.

4. Compare with gzip and zstd

gzip --keep alice.txt
zstd --keep alice.txt -19 -o alice.txt.zst
xz --keep -9e alice.txt

ls -lh alice.txt*

Approximate sizes you’ll see (varies slightly by version):

Codec	Size	Ratio
`alice.txt`	174 KB	1.0×
gzip -6	~62 KB	2.8×
zstd -19	~52 KB	3.3×
xz -9e	~48 KB	3.6×
Bindu	~46 KB	3.8×

This is the “out of the box on prose” story: Bindu is competitive with — and on this file slightly ahead of — the strongest classical codecs. The bigger wins are on sequential, structured data, not prose. See the satellite & telemetry use case for the workloads where Bindu pulls decisively ahead, and the industry benchmark for the full measured picture.

5. Decompress to verify

bindu decompress alice.txt.bindu --output alice-roundtrip.txt
diff alice.txt alice-roundtrip.txt
# (no output — files are identical)

shasum alice.txt alice-roundtrip.txt
# matching SHA-1 hashes

Bindu is lossless. The decompressed file is byte-identical to the original.

What you didn’t have to do

You didn’t supply a schema. You didn’t pick a sub-pipeline. You didn’t specify a level. The selector at the front of the pipeline routed the input through the right path automatically. For tightly-tuned deployments — a satellite that only ever produces one shape of data — you can strip unused sub-pipelines down to a much smaller compressor; for general use, the defaults are the right starting point.

What’s next

The compression part is the conventional half of Bindu. The unconventional half is what you can do with the compressed file without ever decompressing it. That’s covered in Operating on compressed data.