Skip to content

Reproducing benchmarks

This page walks through reproducing every number in the Benchmarks page. All measurements are deterministic, all corpora are publicly downloadable, and round-trip verification is built into the harness.

If your numbers diverge from the published ones by more than ~5% on ratio (or ~20% on speed, given hardware variance), open a GitHub issue — that’s a bug we want to know about.

Per (codec × file) combination:

  • Compressed size in bytes → ratio + reduction percentage
  • Wall, user, and system CPU time for compress and decompress separately
  • Peak resident set size (max RSS) during compress and decompress
  • Major / minor page faults and voluntary / involuntary context switches
  • CPU utilization percentage
  • Energy proxy in joules: (user + sys) × per-core-package-power
  • Round-trip integrity via SHA-256 (decoded ≡ original)

All metrics come straight from getrusage(2) via /usr/bin/time -v. No instrumentation hooks, no internal counters.

A 64-bit Linux x86_64 box with:

  • 24 GB RAM if you plan to run enwik9 (the 1 GB Hutter Prize file). Bindu’s BWT suffix-array allocation needs ~16 GB peak just for the suffix array; total peak RSS during enwik9 encode is ~20 GB. Smaller corpora (Silesia, satellite) fit comfortably in 8 GB.
  • 80 GB free disk (corpora ~2 GB; per-codec working artifacts and decompressed copies during round-trip verification add another ~10× working footprint at peak).
  • A reasonably modern x86 CPU. The published numbers are from an AMD Ryzen 7 8745HS (8 cores, 35 W TDP).

Install GNU time (the /usr/bin/time binary, distinct from the shell builtin) plus the commodity codecs we benchmark:

Terminal window
sudo apt install \
time gzip bzip2 xz-utils zstd brotli zip unzip \
build-essential

Verify each:

Terminal window
/usr/bin/time --version
gzip --version | head -1
bzip2 --version 2>&1 | head -1
xz --version | head -1
zstd --version
brotli --version
zip -v | head -1

Any of pigz, pbzip2, pixz, lz4, lzop, lrzip, zpaq, 7z will be auto-included by the harness if present on PATH. None are required.

Specialized codecs (optional, for the satellite/scientific corpus)

Section titled “Specialized codecs (optional, for the satellite/scientific corpus)”
Terminal window
# libaec — CCSDS 121.0-B-3 Adaptive Entropy Coding (the satellite standard)
sudo apt install libaec-tools
# zfp — LLNL reversible float compression
git clone https://github.com/LLNL/zfp && cd zfp && mkdir build && cd build
cmake .. && make -j && sudo make install
# flac — surrogate codec for int16 streams
sudo apt install flac
# fpack — FITS Rice (astronomy)
sudo apt install libcfitsio-bin

If you skip these, the satellite-section runs will simply mark those codecs n/a for files where they don’t apply.

The benchmark uses Bindu built from source with release flags so behavior matches the published binary exactly:

Terminal window
git clone https://github.com/bindu-labs/bindu
cd bindu
gcc -O3 -mavx2 -march=native -DSOVEREIGN_CLI \
-o bindu sovereign2.c -lm -lpthread
./bindu --version

The -march=native instead of -march=znver4 lets you keep the same SIMD coverage on whatever CPU you’re on. For strict reproducibility against the published numbers, use -march=znver4 if you have a Zen 4 part, or accept that throughput will differ on other microarchitectures (ratios will not).

Three industry-standard corpora. All publicly downloadable, all distributed by their original maintainers.

Industry-standard generic compression benchmark — 12 files spanning English text, x86 binaries, scientific imaging, structured records.

Terminal window
mkdir -p corpus/silesia && cd corpus/silesia
curl -L https://sun.aei.polsl.pl//~sdeor/corpus/silesia.zip -o silesia.zip
unzip silesia.zip && rm silesia.zip
cd ../..

Expected files: dickens, mozilla, mr, nci, ooffice, osdb, reymont, samba, sao, webster, x-ray, xml. Total ~202 MB.

The canonical Wikipedia compression benchmark.

Terminal window
mkdir -p corpus/hutter && cd corpus/hutter
curl -L http://mattmahoney.net/dc/enwik8.zip -o enwik8.zip
curl -L http://mattmahoney.net/dc/enwik9.zip -o enwik9.zip
unzip enwik8.zip && unzip enwik9.zip
rm enwik8.zip enwik9.zip
cd ../..

Verify:

Terminal window
md5sum corpus/hutter/enwik8 corpus/hutter/enwik9
# Expected:
# a1fa5ffddb56f4953e226637dabbb36a corpus/hutter/enwik8
# e206c3450ac99950df65bf70ef61a12d corpus/hutter/enwik9

This corpus is assembled from public NASA/ESA mission archives. It’s larger to set up; the satellite extraction script downloads source files from CDAweb, Copernicus Open Access Hub, and Heasarc, then writes raw arrays to corpus/satellite/. Run:

Terminal window
bench/satellite/extract.py --out corpus/satellite/

Expected layout under corpus/satellite/:

ccsds/raw/ — MMS, OMNI, THEMIS space-weather telemetry
commercial/raw/ — Sentinel-2 MSI bands, AVIRIS hyperspectral, Umbra SAR
raw/ — GOES-16 weather imagery, Chandra X-ray events
sectors/ — AIS ship tracking CSV, ADS-B flight tracking JSON

This corpus is optional. Skip it if you only need the generic + Hutter Prize numbers.

The benchmark harness is a Python driver that wraps each codec invocation in /usr/bin/time -v, parses the rusage output, verifies SHA-256 round-trip, and writes per-run JSON + CSV.

The full harness is in bench/corporate/harness.py. The core measurement primitive is small enough to read inline:

import os, subprocess, hashlib, re
from pathlib import Path
TIME_FMT = ("wall=%e user=%U sys=%S maxrss=%M "
"pagefault_major=%F pagefault_minor=%R "
"ctx_vol=%w ctx_invol=%c cpu_pct=%P exit=%x")
_re = re.compile(
r"wall=(?P<wall>[\d.]+)\s+user=(?P<user>[\d.]+)\s+sys=(?P<sys>[\d.]+)\s+"
r"maxrss=(?P<rss>\d+)\s+pagefault_major=(?P<fmaj>\d+)\s+"
r"pagefault_minor=(?P<fmin>\d+)\s+ctx_vol=(?P<cvol>\d+)\s+"
r"ctx_invol=(?P<cinv>\d+)\s+cpu_pct=(?P<cpu>\d+)%?\s+exit=(?P<exit>\d+)")
def measure(argv, stdin_path=None, stdout_path=None, err_path=None):
"""Run argv under /usr/bin/time -v and return parsed rusage."""
fin = open(stdin_path, "rb") if stdin_path else subprocess.DEVNULL
fout = open(stdout_path, "wb") if stdout_path else subprocess.DEVNULL
cmd = ["/usr/bin/time", "-f", TIME_FMT] + argv
with open(err_path, "w") as ferr:
rc = subprocess.run(cmd, stdin=fin, stdout=fout, stderr=ferr)
if stdin_path: fin.close()
if stdout_path: fout.close()
m = _re.search(Path(err_path).read_text())
return {**m.groupdict(), "rc": rc.returncode}
def sha256(path):
h = hashlib.sha256()
with open(path, "rb") as f:
for chunk in iter(lambda: f.read(1 << 20), b""):
h.update(chunk)
return h.hexdigest()

A single (codec, file) measurement looks like this:

def bench_one(codec_argv_enc, codec_argv_dec, src, comp, dec):
c = measure(codec_argv_enc, stdin_path=src, stdout_path=comp,
err_path=comp + ".err")
d = measure(codec_argv_dec, stdin_path=comp, stdout_path=dec,
err_path=dec + ".err")
return {
"ratio": os.path.getsize(src) / os.path.getsize(comp),
"comp_pct": (1 - os.path.getsize(comp) / os.path.getsize(src)) * 100,
"c_wall": float(c["wall"]),
"d_wall": float(d["wall"]),
"c_rss_kb": int(c["rss"]),
"d_rss_kb": int(d["rss"]),
"roundtrip": sha256(dec) == sha256(src),
}

Single-threaded invocations for fair per-core comparison. Each codec is run at multiple levels and the harness records all of them; the Industry Benchmark tables show the best-ratio level per codec per file.

CodecCompressDecompress
Bindubindu c <in> <out>bindu d <in> <out>
Bindu (with shape hints)bindu c --shape WxH --dtype TYPE <in> <out>bindu d <in> <out>
gzipgzip -<L> -c (stdin → stdout)gzip -d -c
bzip2bzip2 -<L> -cbzip2 -d -c
xzxz -<L> -c -T1xz -d -c -T1
xz (extreme)xz -9e -c -T1xz -d -c -T1
zstdzstd -<L> -c -q --single-threadzstd -d -c -q
zstd (long-range)zstd -22 --ultra --long=27 -c -q --single-threadzstd -d --long=27 -c -q
brotlibrotli -q <L> -cbrotli -d -c
zipzip -q -j -<L> <out> <in>unzip -q -o -p <in>
aec (CCSDS 121)aec [-s] -n <bits> -j 16 -r 128 <in> <out>aec -d [-s] -n <bits> -j 16 -r 128 <in> <out>
zfp (reversible)zfp -<f|d> -<dim args> -i <in> -z <out> -Rzfp -<f|d> -<dim args> -z <in> -o <out> -R
flac (raw int16)flac -8 --force-raw-format --endian=little --sign=signed --channels=1 --bps=16 --sample-rate=48000 -o <out> <in>flac -d --force-raw-format --endian=little --sign=signed -o <out> <in>

The single-thread flags (-T1, --single-thread) ensure the measurement reflects per-core performance. Parallel variants (pigz, pbzip2, pixz, zstd -T0) would scale encode throughput near-linearly with cores without changing ratio.

Bindu’s --shape/--dtype hints activate its multi-threaded SBPN mode on structured satellite data — this matches its real deployment. The hints are a no-op for unstructured generic data.

Bindu’s content-addressed cache stores prior compression results keyed by SHA of the input. Without clearing it between runs, repeated measurements on the same input return cached output instantly and the wall time will not reflect actual compression work.

Always clear the cache before a benchmark run:

Terminal window
rm -rf ~/.sbp_cache ~/.sbp_ananta.bin
export SBP_DETERMINISTIC=1

The SBP_DETERMINISTIC=1 environment variable disables Bindu’s online learning loops so back-to-back runs produce bit-identical output.

Terminal window
git clone https://github.com/bindu-labs/bindu
cd bindu
make release # builds ./bindu
rm -rf ~/.sbp_cache ~/.sbp_ananta.bin
export SBP_DETERMINISTIC=1
python3 bench/corporate/harness.py

The harness writes:

bench/corporate/results/
├── results.json # Silesia: 228 runs (12 files × 19 configs)
├── results.csv
├── satellite_results.json # 95 runs (16 files × multiple codecs)
├── satellite_results.csv
├── enwik_results.json # 26 runs (enwik8 full level matrix + enwik9 best level per codec)
├── enwik_results.csv
└── run.log # human-readable progress log

A complete run on the reference rig (Ryzen 7 8745HS, single-threaded) takes:

CorpusWall time
Silesia (228 runs)~13 min
Satellite (95 runs)~6 min
Hutter Prize enwik8 (19 configs)~30 min
Hutter Prize enwik9 (7 configs)~75 min
Total~2 hours

The dominant cost is xz-9e and brotli-11 on enwik9. If you skip enwik9, total drops to ~50 min.

Every successful run produces a SHA-256 hash of the decompressed output and compares it to the SHA-256 of the original input. No run counts as successful unless the hashes match. Codec-specific gotchas:

  • aec (CCSDS 121) pads its output to the block boundary — the decoded stream is byte-identical to the input for the first orig_bytes, with up to 24 bytes of trailing pad. The harness verifies against the prefix; this is spec-compliant and not data loss.
  • zfp reversible requires the -R flag on both encode AND decode. Without it on decode, the codec rejects the stream.
  • brotli requires a space between the -q flag and the level (-q 11, not -q11).
  • zip uses a different invocation pattern (file mode rather than stdin/stdout). The harness handles this.

Given the per-run JSON output, the published metrics are:

MetricFormula
Compression ratioorig_bytes / comp_bytes
Compression %(1 - comp_bytes / orig_bytes) × 100
Encode throughput (MB/s)orig_bytes / 1e6 / c_wall
Decode throughput (MB/s)orig_bytes / 1e6 / d_wall
CPU utilization(c_user + c_sys) / c_wall × 100%
Energy proxy (J)(c_user + c_sys) × (TDP / num_cores)

For the published runs the energy proxy uses 35 W / 8 cores = 4.375 W per core-second, derived from the Ryzen 7 8745HS package TDP. This is a CPU-time proxy, not a hardware power measurement — the system’s intel-rapl/energy_uj nodes are root-only on most distros, and perf_event_paranoid=4 (default on Ubuntu 24.04+) blocks user-mode power/energy-pkg/. The proxy preserves the relative ranking between codecs but should not be quoted as an absolute joule figure. To get true package energy, run as root with:

Terminal window
sudo cat /sys/class/powercap/intel-rapl:0/energy_uj

before and after each measurement, and subtract.

The harness can re-run a single codec without redoing the full matrix:

Terminal window
# Edit sovereign2.c, then:
gcc -O3 -mavx2 -march=native -DSOVEREIGN_CLI -o bindu sovereign2.c -lm -lpthread
rm -rf ~/.sbp_cache ~/.sbp_ananta.bin
python3 bench/corporate/rerun_bindu_only.py

This re-measures Bindu against all corpora (~80 min for the full set with enwik9, ~5 min if you skip enwik9), merges the new rows into the existing JSONs/CSVs while preserving the reference codec numbers, and regenerates the report. Useful for development loops where you’re tuning Bindu and don’t want to wait 2 hours for every commit.

SymptomCauseFix
Bindu ratios “improve” suspiciously after a re-runCache hit returning prior outputrm -rf ~/.sbp_cache ~/.sbp_ananta.bin
enwik9 produces 0-byte outputint overflow in pre-1.0 builds (input > 536 MB)Update to current main; (size_t)n casts have been added to the malloc paths
Decode throughput varies ±50% run-to-runThermal throttling, host loadRun 3× back-to-back with cold caches; report median
zfp decode failsMissing -R on decompressSee Round-trip verification above
brotli rejects level-q11 instead of -q 11Add the space
aec “round-trip mismatch” on tail bytesBlock-boundary paddingVerify prefix only — this is spec-compliant
Massive Bindu RSS at 1 GB+BWT suffix-array allocation (~16 GB on enwik9)Need ≥ 24 GB RAM; chunked BWT is on the roadmap
/tmp/satdata disappears between runs/tmp is wiped on rebootMove the satellite corpus to a permanent location and update the harness’s corpus path

After the harness completes, the report generator combines the three JSONs and emits the consolidated table:

Terminal window
python3 bench/corporate/unified_report.py > UNIFIED_REPORT.md

The output is the source of truth for Industry Benchmark. If you reproduce the matrix and your UNIFIED_REPORT.md differs from the published one by more than the documented variance bands, that’s a regression — please file it.

These are the run-to-run deltas we observe on the reference rig with cleared cache and SBP_DETERMINISTIC=1:

  • Compression ratio: bit-identical across runs for every codec we measure (deterministic).
  • Encode wall time: ±5% typical, up to ±15% on small inputs (< 1 MB) where startup cost dominates.
  • Decode wall time: ±10% typical. More noise than encode because decode is short and more sensitive to thermal/load.
  • Peak RSS: ±2% typical (allocator behavior).
  • Energy proxy: tracks (user + sys) time, so same band as those.

If your numbers are outside these bands, check: cache cleared, SBP_DETERMINISTIC=1 set, no other heavy processes running (the published runs were under load average < 1.0), and thermal headroom (sensors to confirm CPU is not throttling).

  • Cross-platform reproducibility. The published numbers are Linux x86_64. Bindu builds and runs on macOS / aarch64, but those throughput numbers will differ. Ratios should be identical.
  • Cross-version comparability. Bindu is under active development. Pin the binary version (bindu --version) in any cited comparison. The published numbers are from the version tagged in bench/corporate/results/run.log.
  • Power-measurement accuracy. The energy figures are a CPU-time proxy. Use a kill-a-watt or a root-mode RAPL reader for absolute joules.