Reproducing benchmarks
This page walks through reproducing every number in the Benchmarks page. All measurements are deterministic, all corpora are publicly downloadable, and round-trip verification is built into the harness.
If your numbers diverge from the published ones by more than ~5% on ratio (or ~20% on speed, given hardware variance), open a GitHub issue — that’s a bug we want to know about.
What you’ll measure
Section titled “What you’ll measure”Per (codec × file) combination:
- Compressed size in bytes → ratio + reduction percentage
- Wall, user, and system CPU time for compress and decompress separately
- Peak resident set size (max RSS) during compress and decompress
- Major / minor page faults and voluntary / involuntary context switches
- CPU utilization percentage
- Energy proxy in joules:
(user + sys) × per-core-package-power - Round-trip integrity via SHA-256 (decoded ≡ original)
All metrics come straight from getrusage(2) via /usr/bin/time -v. No instrumentation hooks, no internal counters.
Prerequisites
Section titled “Prerequisites”Hardware
Section titled “Hardware”A 64-bit Linux x86_64 box with:
- ≥ 24 GB RAM if you plan to run
enwik9(the 1 GB Hutter Prize file). Bindu’s BWT suffix-array allocation needs ~16 GB peak just for the suffix array; total peak RSS during enwik9 encode is ~20 GB. Smaller corpora (Silesia, satellite) fit comfortably in 8 GB. - ≥ 80 GB free disk (corpora ~2 GB; per-codec working artifacts and decompressed copies during round-trip verification add another ~10× working footprint at peak).
- A reasonably modern x86 CPU. The published numbers are from an AMD Ryzen 7 8745HS (8 cores, 35 W TDP).
OS packages
Section titled “OS packages”Install GNU time (the /usr/bin/time binary, distinct from the shell builtin) plus the commodity codecs we benchmark:
sudo apt install \ time gzip bzip2 xz-utils zstd brotli zip unzip \ build-essentialVerify each:
/usr/bin/time --versiongzip --version | head -1bzip2 --version 2>&1 | head -1xz --version | head -1zstd --versionbrotli --versionzip -v | head -1Any of pigz, pbzip2, pixz, lz4, lzop, lrzip, zpaq, 7z will be auto-included by the harness if present on PATH. None are required.
Specialized codecs (optional, for the satellite/scientific corpus)
Section titled “Specialized codecs (optional, for the satellite/scientific corpus)”# libaec — CCSDS 121.0-B-3 Adaptive Entropy Coding (the satellite standard)sudo apt install libaec-tools
# zfp — LLNL reversible float compressiongit clone https://github.com/LLNL/zfp && cd zfp && mkdir build && cd buildcmake .. && make -j && sudo make install
# flac — surrogate codec for int16 streamssudo apt install flac
# fpack — FITS Rice (astronomy)sudo apt install libcfitsio-binIf you skip these, the satellite-section runs will simply mark those codecs n/a for files where they don’t apply.
Build Bindu
Section titled “Build Bindu”The benchmark uses Bindu built from source with release flags so behavior matches the published binary exactly:
git clone https://github.com/bindu-labs/binducd bindugcc -O3 -mavx2 -march=native -DSOVEREIGN_CLI \ -o bindu sovereign2.c -lm -lpthread./bindu --versionThe -march=native instead of -march=znver4 lets you keep the same SIMD coverage on whatever CPU you’re on. For strict reproducibility against the published numbers, use -march=znver4 if you have a Zen 4 part, or accept that throughput will differ on other microarchitectures (ratios will not).
Corpora
Section titled “Corpora”Three industry-standard corpora. All publicly downloadable, all distributed by their original maintainers.
Silesia (generic, 202 MB)
Section titled “Silesia (generic, 202 MB)”Industry-standard generic compression benchmark — 12 files spanning English text, x86 binaries, scientific imaging, structured records.
mkdir -p corpus/silesia && cd corpus/silesiacurl -L https://sun.aei.polsl.pl//~sdeor/corpus/silesia.zip -o silesia.zipunzip silesia.zip && rm silesia.zipcd ../..Expected files: dickens, mozilla, mr, nci, ooffice, osdb, reymont, samba, sao, webster, x-ray, xml. Total ~202 MB.
Hutter Prize / LTCB (large text, 1.05 GB)
Section titled “Hutter Prize / LTCB (large text, 1.05 GB)”The canonical Wikipedia compression benchmark.
mkdir -p corpus/hutter && cd corpus/huttercurl -L http://mattmahoney.net/dc/enwik8.zip -o enwik8.zipcurl -L http://mattmahoney.net/dc/enwik9.zip -o enwik9.zipunzip enwik8.zip && unzip enwik9.ziprm enwik8.zip enwik9.zipcd ../..Verify:
md5sum corpus/hutter/enwik8 corpus/hutter/enwik9# Expected:# a1fa5ffddb56f4953e226637dabbb36a corpus/hutter/enwik8# e206c3450ac99950df65bf70ef61a12d corpus/hutter/enwik9Satellite / telemetry (306 MB)
Section titled “Satellite / telemetry (306 MB)”This corpus is assembled from public NASA/ESA mission archives. It’s larger to set up; the satellite extraction script downloads source files from CDAweb, Copernicus Open Access Hub, and Heasarc, then writes raw arrays to corpus/satellite/. Run:
bench/satellite/extract.py --out corpus/satellite/Expected layout under corpus/satellite/:
ccsds/raw/ — MMS, OMNI, THEMIS space-weather telemetrycommercial/raw/ — Sentinel-2 MSI bands, AVIRIS hyperspectral, Umbra SARraw/ — GOES-16 weather imagery, Chandra X-ray eventssectors/ — AIS ship tracking CSV, ADS-B flight tracking JSONThis corpus is optional. Skip it if you only need the generic + Hutter Prize numbers.
The harness
Section titled “The harness”The benchmark harness is a Python driver that wraps each codec invocation in /usr/bin/time -v, parses the rusage output, verifies SHA-256 round-trip, and writes per-run JSON + CSV.
The full harness is in bench/corporate/harness.py. The core measurement primitive is small enough to read inline:
import os, subprocess, hashlib, refrom pathlib import Path
TIME_FMT = ("wall=%e user=%U sys=%S maxrss=%M " "pagefault_major=%F pagefault_minor=%R " "ctx_vol=%w ctx_invol=%c cpu_pct=%P exit=%x")
_re = re.compile( r"wall=(?P<wall>[\d.]+)\s+user=(?P<user>[\d.]+)\s+sys=(?P<sys>[\d.]+)\s+" r"maxrss=(?P<rss>\d+)\s+pagefault_major=(?P<fmaj>\d+)\s+" r"pagefault_minor=(?P<fmin>\d+)\s+ctx_vol=(?P<cvol>\d+)\s+" r"ctx_invol=(?P<cinv>\d+)\s+cpu_pct=(?P<cpu>\d+)%?\s+exit=(?P<exit>\d+)")
def measure(argv, stdin_path=None, stdout_path=None, err_path=None): """Run argv under /usr/bin/time -v and return parsed rusage.""" fin = open(stdin_path, "rb") if stdin_path else subprocess.DEVNULL fout = open(stdout_path, "wb") if stdout_path else subprocess.DEVNULL cmd = ["/usr/bin/time", "-f", TIME_FMT] + argv with open(err_path, "w") as ferr: rc = subprocess.run(cmd, stdin=fin, stdout=fout, stderr=ferr) if stdin_path: fin.close() if stdout_path: fout.close() m = _re.search(Path(err_path).read_text()) return {**m.groupdict(), "rc": rc.returncode}
def sha256(path): h = hashlib.sha256() with open(path, "rb") as f: for chunk in iter(lambda: f.read(1 << 20), b""): h.update(chunk) return h.hexdigest()A single (codec, file) measurement looks like this:
def bench_one(codec_argv_enc, codec_argv_dec, src, comp, dec): c = measure(codec_argv_enc, stdin_path=src, stdout_path=comp, err_path=comp + ".err") d = measure(codec_argv_dec, stdin_path=comp, stdout_path=dec, err_path=dec + ".err") return { "ratio": os.path.getsize(src) / os.path.getsize(comp), "comp_pct": (1 - os.path.getsize(comp) / os.path.getsize(src)) * 100, "c_wall": float(c["wall"]), "d_wall": float(d["wall"]), "c_rss_kb": int(c["rss"]), "d_rss_kb": int(d["rss"]), "roundtrip": sha256(dec) == sha256(src), }Codec invocation
Section titled “Codec invocation”Single-threaded invocations for fair per-core comparison. Each codec is run at multiple levels and the harness records all of them; the Industry Benchmark tables show the best-ratio level per codec per file.
| Codec | Compress | Decompress |
|---|---|---|
| Bindu | bindu c <in> <out> | bindu d <in> <out> |
| Bindu (with shape hints) | bindu c --shape WxH --dtype TYPE <in> <out> | bindu d <in> <out> |
gzip | gzip -<L> -c (stdin → stdout) | gzip -d -c |
bzip2 | bzip2 -<L> -c | bzip2 -d -c |
xz | xz -<L> -c -T1 | xz -d -c -T1 |
xz (extreme) | xz -9e -c -T1 | xz -d -c -T1 |
zstd | zstd -<L> -c -q --single-thread | zstd -d -c -q |
zstd (long-range) | zstd -22 --ultra --long=27 -c -q --single-thread | zstd -d --long=27 -c -q |
brotli | brotli -q <L> -c | brotli -d -c |
zip | zip -q -j -<L> <out> <in> | unzip -q -o -p <in> |
aec (CCSDS 121) | aec [-s] -n <bits> -j 16 -r 128 <in> <out> | aec -d [-s] -n <bits> -j 16 -r 128 <in> <out> |
zfp (reversible) | zfp -<f|d> -<dim args> -i <in> -z <out> -R | zfp -<f|d> -<dim args> -z <in> -o <out> -R |
flac (raw int16) | flac -8 --force-raw-format --endian=little --sign=signed --channels=1 --bps=16 --sample-rate=48000 -o <out> <in> | flac -d --force-raw-format --endian=little --sign=signed -o <out> <in> |
The single-thread flags (-T1, --single-thread) ensure the measurement reflects per-core performance. Parallel variants (pigz, pbzip2, pixz, zstd -T0) would scale encode throughput near-linearly with cores without changing ratio.
Bindu’s --shape/--dtype hints activate its multi-threaded SBPN mode on structured satellite data — this matches its real deployment. The hints are a no-op for unstructured generic data.
Cache hygiene (critical)
Section titled “Cache hygiene (critical)”Bindu’s content-addressed cache stores prior compression results keyed by SHA of the input. Without clearing it between runs, repeated measurements on the same input return cached output instantly and the wall time will not reflect actual compression work.
Always clear the cache before a benchmark run:
rm -rf ~/.sbp_cache ~/.sbp_ananta.binexport SBP_DETERMINISTIC=1The SBP_DETERMINISTIC=1 environment variable disables Bindu’s online learning loops so back-to-back runs produce bit-identical output.
Running the full matrix
Section titled “Running the full matrix”git clone https://github.com/bindu-labs/binducd bindumake release # builds ./bindurm -rf ~/.sbp_cache ~/.sbp_ananta.binexport SBP_DETERMINISTIC=1python3 bench/corporate/harness.pyThe harness writes:
bench/corporate/results/├── results.json # Silesia: 228 runs (12 files × 19 configs)├── results.csv├── satellite_results.json # 95 runs (16 files × multiple codecs)├── satellite_results.csv├── enwik_results.json # 26 runs (enwik8 full level matrix + enwik9 best level per codec)├── enwik_results.csv└── run.log # human-readable progress logA complete run on the reference rig (Ryzen 7 8745HS, single-threaded) takes:
| Corpus | Wall time |
|---|---|
| Silesia (228 runs) | ~13 min |
| Satellite (95 runs) | ~6 min |
| Hutter Prize enwik8 (19 configs) | ~30 min |
| Hutter Prize enwik9 (7 configs) | ~75 min |
| Total | ~2 hours |
The dominant cost is xz-9e and brotli-11 on enwik9. If you skip enwik9, total drops to ~50 min.
Round-trip verification
Section titled “Round-trip verification”Every successful run produces a SHA-256 hash of the decompressed output and compares it to the SHA-256 of the original input. No run counts as successful unless the hashes match. Codec-specific gotchas:
aec(CCSDS 121) pads its output to the block boundary — the decoded stream is byte-identical to the input for the firstorig_bytes, with up to 24 bytes of trailing pad. The harness verifies against the prefix; this is spec-compliant and not data loss.zfpreversible requires the-Rflag on both encode AND decode. Without it on decode, the codec rejects the stream.brotlirequires a space between the-qflag and the level (-q 11, not-q11).zipuses a different invocation pattern (file mode rather than stdin/stdout). The harness handles this.
Computing the metrics
Section titled “Computing the metrics”Given the per-run JSON output, the published metrics are:
| Metric | Formula |
|---|---|
| Compression ratio | orig_bytes / comp_bytes |
| Compression % | (1 - comp_bytes / orig_bytes) × 100 |
| Encode throughput (MB/s) | orig_bytes / 1e6 / c_wall |
| Decode throughput (MB/s) | orig_bytes / 1e6 / d_wall |
| CPU utilization | (c_user + c_sys) / c_wall × 100% |
| Energy proxy (J) | (c_user + c_sys) × (TDP / num_cores) |
For the published runs the energy proxy uses 35 W / 8 cores = 4.375 W per core-second, derived from the Ryzen 7 8745HS package TDP. This is a CPU-time proxy, not a hardware power measurement — the system’s intel-rapl/energy_uj nodes are root-only on most distros, and perf_event_paranoid=4 (default on Ubuntu 24.04+) blocks user-mode power/energy-pkg/. The proxy preserves the relative ranking between codecs but should not be quoted as an absolute joule figure. To get true package energy, run as root with:
sudo cat /sys/class/powercap/intel-rapl:0/energy_ujbefore and after each measurement, and subtract.
Re-running just Bindu after a code change
Section titled “Re-running just Bindu after a code change”The harness can re-run a single codec without redoing the full matrix:
# Edit sovereign2.c, then:gcc -O3 -mavx2 -march=native -DSOVEREIGN_CLI -o bindu sovereign2.c -lm -lpthreadrm -rf ~/.sbp_cache ~/.sbp_ananta.binpython3 bench/corporate/rerun_bindu_only.pyThis re-measures Bindu against all corpora (~80 min for the full set with enwik9, ~5 min if you skip enwik9), merges the new rows into the existing JSONs/CSVs while preserving the reference codec numbers, and regenerates the report. Useful for development loops where you’re tuning Bindu and don’t want to wait 2 hours for every commit.
Common pitfalls
Section titled “Common pitfalls”| Symptom | Cause | Fix |
|---|---|---|
| Bindu ratios “improve” suspiciously after a re-run | Cache hit returning prior output | rm -rf ~/.sbp_cache ~/.sbp_ananta.bin |
enwik9 produces 0-byte output | int overflow in pre-1.0 builds (input > 536 MB) | Update to current main; (size_t)n casts have been added to the malloc paths |
| Decode throughput varies ±50% run-to-run | Thermal throttling, host load | Run 3× back-to-back with cold caches; report median |
zfp decode fails | Missing -R on decompress | See Round-trip verification above |
brotli rejects level | -q11 instead of -q 11 | Add the space |
aec “round-trip mismatch” on tail bytes | Block-boundary padding | Verify prefix only — this is spec-compliant |
| Massive Bindu RSS at 1 GB+ | BWT suffix-array allocation (~16 GB on enwik9) | Need ≥ 24 GB RAM; chunked BWT is on the roadmap |
/tmp/satdata disappears between runs | /tmp is wiped on reboot | Move the satellite corpus to a permanent location and update the harness’s corpus path |
Producing the published report
Section titled “Producing the published report”After the harness completes, the report generator combines the three JSONs and emits the consolidated table:
python3 bench/corporate/unified_report.py > UNIFIED_REPORT.mdThe output is the source of truth for Industry Benchmark. If you reproduce the matrix and your UNIFIED_REPORT.md differs from the published one by more than the documented variance bands, that’s a regression — please file it.
Variance bands
Section titled “Variance bands”These are the run-to-run deltas we observe on the reference rig with cleared cache and SBP_DETERMINISTIC=1:
- Compression ratio: bit-identical across runs for every codec we measure (deterministic).
- Encode wall time: ±5% typical, up to ±15% on small inputs (< 1 MB) where startup cost dominates.
- Decode wall time: ±10% typical. More noise than encode because decode is short and more sensitive to thermal/load.
- Peak RSS: ±2% typical (allocator behavior).
- Energy proxy: tracks
(user + sys)time, so same band as those.
If your numbers are outside these bands, check: cache cleared, SBP_DETERMINISTIC=1 set, no other heavy processes running (the published runs were under load average < 1.0), and thermal headroom (sensors to confirm CPU is not throttling).
What we don’t claim
Section titled “What we don’t claim”- Cross-platform reproducibility. The published numbers are Linux x86_64. Bindu builds and runs on macOS / aarch64, but those throughput numbers will differ. Ratios should be identical.
- Cross-version comparability. Bindu is under active development. Pin the binary version (
bindu --version) in any cited comparison. The published numbers are from the version tagged inbench/corporate/results/run.log. - Power-measurement accuracy. The energy figures are a CPU-time proxy. Use a kill-a-watt or a root-mode RAPL reader for absolute joules.
See also
Section titled “See also”- Industry Benchmark — the published results this page reproduces.
- vs gzip, vs brotli, vs zstd, vs xz / LZMA — focused head-to-head pages with narrative.
- When not to use Bindu — workloads where Bindu is the wrong choice.