Reproducing benchmarks

This page walks through reproducing every number in the Benchmarks page. All measurements are deterministic, all corpora are publicly downloadable, and round-trip verification is built into the harness.

If your numbers diverge from the published ones by more than ~5% on ratio (or ~20% on speed, given hardware variance), open a GitHub issue — that’s a bug we want to know about.

What you’ll measure

Per (codec × file) combination:

Compressed size in bytes → ratio + reduction percentage
Wall, user, and system CPU time for compress and decompress separately
Peak resident set size (max RSS) during compress and decompress
Major / minor page faults and voluntary / involuntary context switches
CPU utilization percentage
Energy proxy in joules: (user + sys) × per-core-package-power
Round-trip integrity via SHA-256 (decoded ≡ original)

All metrics come straight from getrusage(2) via /usr/bin/time -v. No instrumentation hooks, no internal counters.

Prerequisites

Hardware

A 64-bit Linux x86_64 box with:

≥ 24 GB RAM if you plan to run enwik9 (the 1 GB Hutter Prize file). Bindu’s BWT suffix-array allocation needs ~16 GB peak just for the suffix array; total peak RSS during enwik9 encode is ~20 GB. Smaller corpora (Silesia, satellite) fit comfortably in 8 GB.
≥ 80 GB free disk (corpora ~2 GB; per-codec working artifacts and decompressed copies during round-trip verification add another ~10× working footprint at peak).
A reasonably modern x86 CPU. The published numbers are from an AMD Ryzen 7 8745HS (8 cores, 35 W TDP).

OS packages

Install GNU time (the /usr/bin/time binary, distinct from the shell builtin) plus the commodity codecs we benchmark:

sudo apt install \
    time gzip bzip2 xz-utils zstd brotli zip unzip \
    build-essential

Verify each:

/usr/bin/time --version
gzip --version | head -1
bzip2 --version 2>&1 | head -1
xz --version | head -1
zstd --version
brotli --version
zip -v | head -1

Any of pigz, pbzip2, pixz, lz4, lzop, lrzip, zpaq, 7z will be auto-included by the harness if present on PATH. None are required.

Specialized codecs (optional, for the satellite/scientific corpus)

# libaec — CCSDS 121.0-B-3 Adaptive Entropy Coding (the satellite standard)
sudo apt install libaec-tools

# zfp — LLNL reversible float compression
git clone https://github.com/LLNL/zfp && cd zfp && mkdir build && cd build
cmake .. && make -j && sudo make install

# flac — surrogate codec for int16 streams
sudo apt install flac

# fpack — FITS Rice (astronomy)
sudo apt install libcfitsio-bin

If you skip these, the satellite-section runs will simply mark those codecs n/a for files where they don’t apply.

Build Bindu

The benchmark uses Bindu built from source with release flags so behavior matches the published binary exactly:

git clone https://github.com/bindu-labs/bindu
cd bindu
gcc -O3 -mavx2 -march=native -DSOVEREIGN_CLI \
    -o bindu sovereign2.c -lm -lpthread
./bindu --version

The -march=native instead of -march=znver4 lets you keep the same SIMD coverage on whatever CPU you’re on. For strict reproducibility against the published numbers, use -march=znver4 if you have a Zen 4 part, or accept that throughput will differ on other microarchitectures (ratios will not).

Corpora

Three industry-standard corpora. All publicly downloadable, all distributed by their original maintainers.

Silesia (generic, 202 MB)

Industry-standard generic compression benchmark — 12 files spanning English text, x86 binaries, scientific imaging, structured records.

mkdir -p corpus/silesia && cd corpus/silesia
curl -L https://sun.aei.polsl.pl//~sdeor/corpus/silesia.zip -o silesia.zip
unzip silesia.zip && rm silesia.zip
cd ../..

Expected files: dickens, mozilla, mr, nci, ooffice, osdb, reymont, samba, sao, webster, x-ray, xml. Total ~202 MB.

Hutter Prize / LTCB (large text, 1.05 GB)

The canonical Wikipedia compression benchmark.

mkdir -p corpus/hutter && cd corpus/hutter
curl -L http://mattmahoney.net/dc/enwik8.zip -o enwik8.zip
curl -L http://mattmahoney.net/dc/enwik9.zip -o enwik9.zip
unzip enwik8.zip && unzip enwik9.zip
rm enwik8.zip enwik9.zip
cd ../..

Verify:

md5sum corpus/hutter/enwik8 corpus/hutter/enwik9
# Expected:
# a1fa5ffddb56f4953e226637dabbb36a  corpus/hutter/enwik8
# e206c3450ac99950df65bf70ef61a12d  corpus/hutter/enwik9

Satellite / telemetry (306 MB)

This corpus is assembled from public NASA/ESA mission archives. It’s larger to set up; the satellite extraction script downloads source files from CDAweb, Copernicus Open Access Hub, and Heasarc, then writes raw arrays to corpus/satellite/. Run:

bench/satellite/extract.py --out corpus/satellite/

Expected layout under corpus/satellite/:

ccsds/raw/        — MMS, OMNI, THEMIS space-weather telemetry
commercial/raw/   — Sentinel-2 MSI bands, AVIRIS hyperspectral, Umbra SAR
raw/              — GOES-16 weather imagery, Chandra X-ray events
sectors/          — AIS ship tracking CSV, ADS-B flight tracking JSON

This corpus is optional. Skip it if you only need the generic + Hutter Prize numbers.

The harness

The benchmark harness is a Python driver that wraps each codec invocation in /usr/bin/time -v, parses the rusage output, verifies SHA-256 round-trip, and writes per-run JSON + CSV.

The full harness is in bench/corporate/harness.py. The core measurement primitive is small enough to read inline:

import os, subprocess, hashlib, re
from pathlib import Path

TIME_FMT = ("wall=%e user=%U sys=%S maxrss=%M "
            "pagefault_major=%F pagefault_minor=%R "
            "ctx_vol=%w ctx_invol=%c cpu_pct=%P exit=%x")

_re = re.compile(
    r"wall=(?P<wall>[\d.]+)\s+user=(?P<user>[\d.]+)\s+sys=(?P<sys>[\d.]+)\s+"
    r"maxrss=(?P<rss>\d+)\s+pagefault_major=(?P<fmaj>\d+)\s+"
    r"pagefault_minor=(?P<fmin>\d+)\s+ctx_vol=(?P<cvol>\d+)\s+"
    r"ctx_invol=(?P<cinv>\d+)\s+cpu_pct=(?P<cpu>\d+)%?\s+exit=(?P<exit>\d+)")

def measure(argv, stdin_path=None, stdout_path=None, err_path=None):
    """Run argv under /usr/bin/time -v and return parsed rusage."""
    fin  = open(stdin_path, "rb") if stdin_path else subprocess.DEVNULL
    fout = open(stdout_path, "wb") if stdout_path else subprocess.DEVNULL
    cmd  = ["/usr/bin/time", "-f", TIME_FMT] + argv
    with open(err_path, "w") as ferr:
        rc = subprocess.run(cmd, stdin=fin, stdout=fout, stderr=ferr)
    if stdin_path:  fin.close()
    if stdout_path: fout.close()
    m = _re.search(Path(err_path).read_text())
    return {**m.groupdict(), "rc": rc.returncode}

def sha256(path):
    h = hashlib.sha256()
    with open(path, "rb") as f:
        for chunk in iter(lambda: f.read(1 << 20), b""):
            h.update(chunk)
    return h.hexdigest()

A single (codec, file) measurement looks like this:

def bench_one(codec_argv_enc, codec_argv_dec, src, comp, dec):
    c = measure(codec_argv_enc, stdin_path=src, stdout_path=comp,
                err_path=comp + ".err")
    d = measure(codec_argv_dec, stdin_path=comp, stdout_path=dec,
                err_path=dec + ".err")
    return {
        "ratio":     os.path.getsize(src) / os.path.getsize(comp),
        "comp_pct":  (1 - os.path.getsize(comp) / os.path.getsize(src)) * 100,
        "c_wall":    float(c["wall"]),
        "d_wall":    float(d["wall"]),
        "c_rss_kb":  int(c["rss"]),
        "d_rss_kb":  int(d["rss"]),
        "roundtrip": sha256(dec) == sha256(src),
    }

Codec invocation

Single-threaded invocations for fair per-core comparison. Each codec is run at multiple levels and the harness records all of them; the Industry Benchmark tables show the best-ratio level per codec per file.

Codec	Compress	Decompress
Bindu	`bindu c <in> <out>`	`bindu d <in> <out>`
Bindu (with shape hints)	`bindu c --shape WxH --dtype TYPE <in> <out>`	`bindu d <in> <out>`
`gzip`	`gzip -<L> -c` (stdin → stdout)	`gzip -d -c`
`bzip2`	`bzip2 -<L> -c`	`bzip2 -d -c`
`xz`	`xz -<L> -c -T1`	`xz -d -c -T1`
`xz` (extreme)	`xz -9e -c -T1`	`xz -d -c -T1`
`zstd`	`zstd -<L> -c -q --single-thread`	`zstd -d -c -q`
`zstd` (long-range)	`zstd -22 --ultra --long=27 -c -q --single-thread`	`zstd -d --long=27 -c -q`
`brotli`	`brotli -q <L> -c`	`brotli -d -c`
`zip`	`zip -q -j -<L> <out> <in>`	`unzip -q -o -p <in>`
`aec` (CCSDS 121)	`aec [-s] -n <bits> -j 16 -r 128 <in> <out>`	`aec -d [-s] -n <bits> -j 16 -r 128 <in> <out>`
`zfp` (reversible)	`zfp -<f\|d> -<dim args> -i <in> -z <out> -R`	`zfp -<f\|d> -<dim args> -z <in> -o <out> -R`
`flac` (raw int16)	`flac -8 --force-raw-format --endian=little --sign=signed --channels=1 --bps=16 --sample-rate=48000 -o <out> <in>`	`flac -d --force-raw-format --endian=little --sign=signed -o <out> <in>`

The single-thread flags (-T1, --single-thread) ensure the measurement reflects per-core performance. Parallel variants (pigz, pbzip2, pixz, zstd -T0) would scale encode throughput near-linearly with cores without changing ratio.

Bindu’s --shape/--dtype hints activate its multi-threaded SBPN mode on structured satellite data — this matches its real deployment. The hints are a no-op for unstructured generic data.

Cache hygiene (critical)

Bindu’s content-addressed cache stores prior compression results keyed by SHA of the input. Without clearing it between runs, repeated measurements on the same input return cached output instantly and the wall time will not reflect actual compression work.

Always clear the cache before a benchmark run:

rm -rf ~/.sbp_cache ~/.sbp_ananta.bin
export SBP_DETERMINISTIC=1

The SBP_DETERMINISTIC=1 environment variable disables Bindu’s online learning loops so back-to-back runs produce bit-identical output.

Running the full matrix

git clone https://github.com/bindu-labs/bindu
cd bindu
make release             # builds ./bindu
rm -rf ~/.sbp_cache ~/.sbp_ananta.bin
export SBP_DETERMINISTIC=1
python3 bench/corporate/harness.py

The harness writes:

bench/corporate/results/
├── results.json            # Silesia: 228 runs (12 files × 19 configs)
├── results.csv
├── satellite_results.json  # 95 runs (16 files × multiple codecs)
├── satellite_results.csv
├── enwik_results.json      # 26 runs (enwik8 full level matrix + enwik9 best level per codec)
├── enwik_results.csv
└── run.log                 # human-readable progress log

A complete run on the reference rig (Ryzen 7 8745HS, single-threaded) takes:

Corpus	Wall time
Silesia (228 runs)	~13 min
Satellite (95 runs)	~6 min
Hutter Prize enwik8 (19 configs)	~30 min
Hutter Prize enwik9 (7 configs)	~75 min
Total	~2 hours

The dominant cost is xz-9e and brotli-11 on enwik9. If you skip enwik9, total drops to ~50 min.

Round-trip verification

Every successful run produces a SHA-256 hash of the decompressed output and compares it to the SHA-256 of the original input. No run counts as successful unless the hashes match. Codec-specific gotchas:

aec (CCSDS 121) pads its output to the block boundary — the decoded stream is byte-identical to the input for the first orig_bytes, with up to 24 bytes of trailing pad. The harness verifies against the prefix; this is spec-compliant and not data loss.
zfp reversible requires the -R flag on both encode AND decode. Without it on decode, the codec rejects the stream.
brotli requires a space between the -q flag and the level (-q 11, not -q11).
zip uses a different invocation pattern (file mode rather than stdin/stdout). The harness handles this.

Computing the metrics

Given the per-run JSON output, the published metrics are:

Metric	Formula
Compression ratio	`orig_bytes / comp_bytes`
Compression %	`(1 - comp_bytes / orig_bytes) × 100`
Encode throughput (MB/s)	`orig_bytes / 1e6 / c_wall`
Decode throughput (MB/s)	`orig_bytes / 1e6 / d_wall`
CPU utilization	`(c_user + c_sys) / c_wall × 100%`
Energy proxy (J)	`(c_user + c_sys) × (TDP / num_cores)`

For the published runs the energy proxy uses 35 W / 8 cores = 4.375 W per core-second, derived from the Ryzen 7 8745HS package TDP. This is a CPU-time proxy, not a hardware power measurement — the system’s intel-rapl/energy_uj nodes are root-only on most distros, and perf_event_paranoid=4 (default on Ubuntu 24.04+) blocks user-mode power/energy-pkg/. The proxy preserves the relative ranking between codecs but should not be quoted as an absolute joule figure. To get true package energy, run as root with:

sudo cat /sys/class/powercap/intel-rapl:0/energy_uj

before and after each measurement, and subtract.

Re-running just Bindu after a code change

The harness can re-run a single codec without redoing the full matrix:

# Edit sovereign2.c, then:
gcc -O3 -mavx2 -march=native -DSOVEREIGN_CLI -o bindu sovereign2.c -lm -lpthread
rm -rf ~/.sbp_cache ~/.sbp_ananta.bin
python3 bench/corporate/rerun_bindu_only.py

This re-measures Bindu against all corpora (~80 min for the full set with enwik9, ~5 min if you skip enwik9), merges the new rows into the existing JSONs/CSVs while preserving the reference codec numbers, and regenerates the report. Useful for development loops where you’re tuning Bindu and don’t want to wait 2 hours for every commit.

Common pitfalls

Symptom	Cause	Fix
Bindu ratios “improve” suspiciously after a re-run	Cache hit returning prior output	`rm -rf ~/.sbp_cache ~/.sbp_ananta.bin`
`enwik9` produces 0-byte output	int overflow in pre-1.0 builds (input > 536 MB)	Update to current main; `(size_t)n` casts have been added to the malloc paths
Decode throughput varies ±50% run-to-run	Thermal throttling, host load	Run 3× back-to-back with cold caches; report median
`zfp` decode fails	Missing `-R` on decompress	See Round-trip verification above
`brotli` rejects level	`-q11` instead of `-q 11`	Add the space
`aec` “round-trip mismatch” on tail bytes	Block-boundary padding	Verify prefix only — this is spec-compliant
Massive Bindu RSS at 1 GB+	BWT suffix-array allocation (~16 GB on enwik9)	Need ≥ 24 GB RAM; chunked BWT is on the roadmap
`/tmp/satdata` disappears between runs	`/tmp` is wiped on reboot	Move the satellite corpus to a permanent location and update the harness’s corpus path

Producing the published report

After the harness completes, the report generator combines the three JSONs and emits the consolidated table:

python3 bench/corporate/unified_report.py > UNIFIED_REPORT.md

The output is the source of truth for Industry Benchmark. If you reproduce the matrix and your UNIFIED_REPORT.md differs from the published one by more than the documented variance bands, that’s a regression — please file it.

Variance bands

These are the run-to-run deltas we observe on the reference rig with cleared cache and SBP_DETERMINISTIC=1:

Compression ratio: bit-identical across runs for every codec we measure (deterministic).
Encode wall time: ±5% typical, up to ±15% on small inputs (< 1 MB) where startup cost dominates.
Decode wall time: ±10% typical. More noise than encode because decode is short and more sensitive to thermal/load.
Peak RSS: ±2% typical (allocator behavior).
Energy proxy: tracks (user + sys) time, so same band as those.

If your numbers are outside these bands, check: cache cleared, SBP_DETERMINISTIC=1 set, no other heavy processes running (the published runs were under load average < 1.0), and thermal headroom (sensors to confirm CPU is not throttling).

What we don’t claim

Cross-platform reproducibility. The published numbers are Linux x86_64. Bindu builds and runs on macOS / aarch64, but those throughput numbers will differ. Ratios should be identical.
Cross-version comparability. Bindu is under active development. Pin the binary version (bindu --version) in any cited comparison. The published numbers are from the version tagged in bench/corporate/results/run.log.
Power-measurement accuracy. The energy figures are a CPU-time proxy. Use a kill-a-watt or a root-mode RAPL reader for absolute joules.