Benchmarks
This page is the consolidated head-to-head: Bindu vs every lossless compressor we could install on a single test rig, across three industry-standard corpora. 349 measured runs, 100% round-trip verified by SHA-256.
If you want to know how to reproduce these numbers yourself, see Reproducing benchmarks for the methodology page.
Test rig
Section titled “Test rig”| Component | Spec |
|---|---|
| CPU | AMD Ryzen 7 8745HS — 8 cores, TDP 35 W |
| RAM | 27 GB DDR5 |
| OS | Linux x86_64 |
| Verification | SHA-256 of decompressed output ≡ SHA-256 of original input |
All commodity compressors invoked with single-thread flags (xz -T1, zstd --single-thread; gzip/bzip2/brotli/zip are natively single-threaded) so per-core throughput is comparable across codecs. Bindu uses its --shape/--dtype hints on structured satellite data — that matches its real deployment path.
Codecs measured
Section titled “Codecs measured”| Codec | Class | Files | Notes |
|---|---|---|---|
| Bindu | symbolic / formulaic | 30 | this benchmark |
xz | LZMA2 | 30 | levels 1, 6, 9, 9e |
brotli | LZ77 + static dictionary | 14 | levels 6, 9, 11 |
bzip2 | BWT + Huffman | 30 | levels 1, 9 |
zstd | FSE + LZ77 | 30 | levels 1, 3, 9, 19, 22 --long |
gzip | DEFLATE | 30 | levels 1, 6, 9 |
zip | DEFLATE archiver | 14 | level 6 |
aec | CCSDS 121.0-B-3 Rice | 10 | satellite-applicable only |
zfp | LLNL reversible | 2 | float-only |
flac | Lossless audio | 3 | int16-only |
A note on aec (CCSDS 121 Adaptive Entropy Coding): this is the lossless reference codec used by NOAA, ESA, JAXA, and NASA missions, and embedded in HDF5 via SZIP. It is sometimes mis-typed as “AES” — that’s encryption (Rijndael), not a compressor. It’s excluded.
Corpus
Section titled “Corpus”Three industry-standard corpora, 30 files total, ~1.5 GB raw:
| Corpus | Files | Total raw | Description |
|---|---|---|---|
| Silesia | 12 | 202 MB | Industry-standard generic benchmark — text, binaries, scientific images, structured records |
| Satellite / telemetry | 16 | 306 MB | SAR (Umbra), MSI (Sentinel-2), HSI (AVIRIS), weather (GOES-16), astronomy (Chandra), space-weather telemetry (MMS, OMNI, THEMIS), SSA text (AIS, ADS-B) |
| Hutter Prize / LTCB | 2 | 1.05 GB | enwik8 (100 MB) and enwik9 (1 GB) — first 100 MB and first 1 GB of the English Wikipedia XML dump (Mar 2006), Matt Mahoney’s canonical benchmark |
Per-file ratio table
Section titled “Per-file ratio table”The headline result: ratio (uncompressed / compressed) for each codec on each file. Bold = winner per file.
| Corpus | File | Size | Bindu | xz | brotli | bzip2 | zstd | gzip | zip | aec | zfp | flac |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Silesia | dickens | 9.7 MB | 4.02× | 3.60× | 3.60× | 3.64× | 3.58× | 2.65× | 2.63× | — | — | — |
| Silesia | mozilla | 48.8 MB | 2.88× | 3.83× | 3.69× | 2.86× | 3.42× | 2.70× | 2.69× | — | — | — |
| Silesia | webster | 39.5 MB | 5.75× | 4.95× | 4.92× | 4.80× | 4.90× | 3.44× | 3.40× | — | — | — |
| Silesia | x-ray | 8.1 MB | 2.13× | 1.89× | 1.81× | 2.09× | 1.64× | 1.41× | 1.40× | — | — | — |
| Silesia | xml | 5.1 MB | 12.63× | 12.29× | 12.41× | 12.12× | 11.80× | 8.07× | 7.72× | — | — | — |
| Silesia | samba | 20.6 MB | 5.03× | 5.78× | 5.74× | 4.75× | 5.57× | 4.00× | 3.96× | — | — | — |
| Silesia | osdb | 9.6 MB | 3.98× | 3.55× | 3.58× | 3.60× | 3.26× | 2.71× | 2.70× | — | — | — |
| Silesia | reymont | 6.3 MB | 5.90× | 5.04× | 4.97× | 5.32× | 4.92× | 3.64× | 3.57× | — | — | — |
| Silesia | sao | 6.9 MB | 1.44× | 1.64× | 1.58× | 1.47× | 1.45× | 1.36× | 1.36× | — | — | — |
| Silesia | nci | 32.0 MB | 24.79× | 23.15× | 22.08× | 18.51× | 20.84× | 11.23× | 10.49× | — | — | — |
| Silesia | ooffice | 5.9 MB | 2.14× | 2.54× | 2.48× | 2.15× | 2.37× | 1.99× | 1.99× | — | — | — |
| Silesia | mr | 9.5 MB | 4.22× | 3.63× | 3.53× | 4.08× | 3.21× | 2.71× | 2.70× | — | — | — |
| Satellite | SAR_Umbra | 15.3 MB | 1.46× | 1.43× | — | 1.40× | 1.36× | 1.33× | — | 1.33× | — | — |
| Satellite | MSI_S2_B04 | 47.7 MB | 2.81× | 2.65× | — | 2.82× | 2.41× | 2.20× | — | 2.81× | — | — |
| Satellite | MSI_S2_B11 | 57.5 MB | 3.10× | 2.82× | — | 3.03× | 2.56× | 2.29× | — | 3.22× | — | — |
| Satellite | HSI_Salinas | 47.5 MB | 3.01× | 3.23× | — | 2.38× | 2.38× | 1.46× | — | 1.83× | — | 1.83× |
| Satellite | HSI_IndianPines | 8.0 MB | 1.99× | 1.74× | — | 1.69× | 1.39× | 1.24× | — | 1.73× | — | — |
| Satellite | WX_GOES_Ch01 | 28.6 MB | 21.98× | 17.79× | — | 21.57× | 15.93× | 12.30× | — | 20.25× | — | 9.98× |
| Satellite | WX_GOES_Ch13 | 7.2 MB | 6.74× | 5.34× | — | 5.91× | 4.95× | 4.07× | — | 5.93× | — | 5.73× |
| Satellite | SCI_Chandra | 2.4 MB | 7.41× | 6.94× | — | 7.16× | 5.89× | 4.78× | — | 1.25× | — | — |
| Satellite | TEL_MMS_Epoch | 10.5 MB | 1.34× | 1.97× | — | 1.46× | 1.98× | 1.68× | — | — | — | — |
| Satellite | TEL_OMNI_Epoch | 348.8 KB | 2,349× | 3.70× | — | 3.02× | 3.24× | 2.85× | — | — | 2.17× | — |
| Satellite | TEL_OMNI_IMF | 174.4 KB | 220× | 128× | — | 201× | 157× | 160× | — | 36× | — | — |
| Satellite | TEL_MMS_flag | 5.3 MB | 263,314× | 5,933× | — | 112,849× | 29,103× | 1,025× | — | 1,213× | — | — |
| Satellite | TEL_MMS_B_gse | 21.1 MB | 1.62× | 1.44× | — | 1.10× | 1.13× | 1.13× | — | — | 1.00× | — |
| Satellite | TEL_THEMIS_B | 368.0 KB | 1.11× | 1.26× | — | 1.03× | 1.09× | 1.09× | — | — | — | — |
| Satellite | SSA_AIS | 50.0 MB | 8.75× | 9.30× | — | 4.16× | 8.68× | 2.81× | — | — | — | — |
| Satellite | SSA_ADSB | 3.8 MB | 10.33× | 8.94× | — | 9.76× | 8.79× | 6.55× | — | — | — | — |
| Hutter | enwik8 | 95.4 MB | 4.29× | 4.03× | 3.88× | 3.45× | 3.95× | 2.74× | 2.74× | — | — | — |
| Hutter | enwik9 | 953.7 MB | 5.43× | 4.69× | 3.87× | 3.94× | 4.25× | 3.10× | 3.09× | — | — | — |
Per-file winners
Section titled “Per-file winners”| Codec | Wins | Domains |
|---|---|---|
| Bindu | 19 | Generic text, structured records, scientific imaging, weather, astronomy, space-weather telemetry, SSA records, Hutter Prize |
xz | 7 | x86 binaries, source code, certain hyperspectral cubes, AIS records |
bzip2 | 2 | medical imaging, multispectral red band |
aec (CCSDS 121) | 1 | Sentinel-2 SWIR band |
zstd | 1 | int64 timestamps |
Bindu wins 63% of files outright — more than every other codec combined.
Per-codec aggregate
Section titled “Per-codec aggregate”Across files where each codec applies (specialized codecs only run on dtypes they support):
| Codec | Files | Total in | Total out | Comp % | Avg enc MB/s | Avg dec MB/s | Peak RSS enc |
|---|---|---|---|---|---|---|---|
| Bindu | 30 | 1.5 GB | 343 MB | 77.95% | 0.6 | 4.6 | 19.5 GB |
xz | 30 | 1.5 GB | 371 MB | 76.15% | 1.9 | 105 | 675 MB |
brotli | 14 | 1.2 GB | 318 MB | 74.54% | 2.0 | 493 | 237 MB |
zstd | 30 | 1.5 GB | 412 MB | 73.51% | 2.3 | 949 | 747 MB |
bzip2 | 30 | 1.5 GB | 435 MB | 72.03% | 22 | 39 | 9 MB |
zip | 14 | 1.2 GB | 409 MB | 67.34% | 32 | 216 | 3 MB |
gzip | 30 | 1.5 GB | 553 MB | 64.51% | 22 | 234 | 2.1 MB |
aec (CCSDS 121) | 10 | 220 MB | 81 MB | 62.94% | 470 | 470 | 32 MB |
flac | 3 | 83 MB | 30 MB | 63.84% | 153 | 265 | 4 MB |
zfp | 2 | 21 MB | 21 MB | 0.44% | 250 | 173 | 44 MB |
Hutter Prize / Large Text Compression Benchmark
Section titled “Hutter Prize / Large Text Compression Benchmark”Bindu’s results on enwik8 and enwik9 placed against published LTCB rankings and historical Hutter Prize entries:
enwik8 — 100 MB English Wikipedia
Section titled “enwik8 — 100 MB English Wikipedia”| Compressor | Ratio | bits/byte | Class | Source |
|---|---|---|---|---|
| cmix v18 | 7.10× | 1.13 | neural context-mix | Hutter Prize 2024 (published) |
| nncp v2 | 6.50–6.80× | 1.18–1.23 | neural arithmetic | published |
| paq8hp12 | 6.32× | 1.27 | tuned PAQ for enwik8 | published 2008 |
| zpaq -m6 | 6.11× | 1.31 | classical extreme | published |
| paq8l | 5.95× | 1.34 | PAQ ensemble | Hutter Prize 2009 |
| brotli -q11 | 4.68× | 1.71 | LZ77 + dict | published (Google) |
| xz -9 / lzma -9e | 4.65× | 1.72 | LZMA2 | published |
| zstd -22 | 4.30× | 1.86 | FSE + LZ77 | published (Meta) |
| Bindu | 4.29× | 1.87 | symbolic / formulaic | measured |
| zpaq -m5 | 4.34× | 1.84 | classical context-mix | published |
| bzip2 -9 | 3.71× | 2.16 | BWT | published |
| gzip -9 | 3.13× | 2.55 | DEFLATE | published |
enwik9 — 1 GB English Wikipedia
Section titled “enwik9 — 1 GB English Wikipedia”| Compressor | Ratio | bits/byte | Class | Source |
|---|---|---|---|---|
| cmix v18+ | 7.0–7.1× | 1.13–1.15 | neural context-mix | published |
| nncp v2 | ~6.5× | 1.23 | neural arithmetic | published |
| paq8 family | ~6.0× | 1.33 | PAQ ensemble | published |
| Bindu | 5.43× | 1.47 | symbolic / formulaic | measured |
| zpaq -m5 | ~4.7–5.0× | 1.60–1.70 | classical | published |
| xz -9 | 4.69× | 1.71 | LZMA2 | measured |
| zstd -19 | 4.25× | 1.88 | FSE + LZ77 | measured |
| bzip2 -9 | 3.94× | 2.03 | BWT | measured |
| brotli -9 | 3.87× | 2.07 | LZ77 + dict | measured |
| gzip -9 | 3.10× | 2.58 | DEFLATE | measured |
Where Bindu sits in the LTCB landscape
Section titled “Where Bindu sits in the LTCB landscape”| Tier | Compressors | enwik8 ratio range | Notes |
|---|---|---|---|
| Neural (Hutter Prize-class) | cmix, nncp, paq8 | 5.95–7.10× | GPU/long compute; file-specific learned models |
| Bindu | Bindu | 4.29× | symbolic, single CPU, deterministic |
| Classical max-ratio | xz, brotli-11, zstd-22, zpaq-m5 | 3.95–4.68× | mainstream production codecs |
| Classical mainstream | bzip2, gzip | 2.74–3.71× | ubiquitous baselines |
On enwik9 (1 GB), Bindu’s 5.43× / 1.47 bpc is better than any non-neural codec measured at this scale, beating xz-9 by 16% and within striking distance of the lower neural tier. The Hutter Prize award threshold is 7.10× / 1.13 bpc — historically reachable only with neural context mixing.
ALP corpus — float64 time-series
Section titled “ALP corpus — float64 time-series”Separately from the unified Silesia/satellite/Hutter benchmark, Bindu was measured against the ALP reference implementation on the 30-dataset ALP float-time-series corpus.
| Metric | Result |
|---|---|
| Geometric mean ratio (bits/value) vs ALP | 1.88× fewer |
| Per-dataset wins | 27 of 30 vs ALP, Chimp, and Patas |
The result is meaningful because ALP is the published state-of-the-art for lossless float compression; the 27/30 outcome is a clear lead on the workload it was designed for. Bindu uses the DICT and LINDELTA pipelines for this domain — see Architecture for the routing logic.
Headline trade-offs
Section titled “Headline trade-offs”| Axis | Winner | Value |
|---|---|---|
| Best ratio overall | Bindu | 77.95% aggregate; 19/30 per-file wins |
| Highest encode throughput | aec (CCSDS 121) | 470 MB/s — onboard-spacecraft champion |
| Highest decode throughput | zstd -1 | 2,355 MB/s peak |
| Lowest RAM (compress) | gzip | 2.1 MB peak |
| Lowest energy (J/MB encoded) | aec | 0.009 J/MB |
| Highest single-file ratio | Bindu on TEL_MMS_flag | 263,314× (5.5 MB → 21 bytes) |
Domain recommendations
Section titled “Domain recommendations”| Workload | Recommended codec | Why |
|---|---|---|
| Cold archival, mixed generic data | xz -9e | well-established, 77% Silesia aggregate |
| Cold archival, telemetry / structured | Bindu | wins 19/30 files; extreme ratios (200–263k×) on sparse telemetry |
| Hutter Prize / large text | Bindu | leads enwik8 (4.29×) and enwik9 (5.43×) among classical codecs |
| Onboard spacecraft (CPU/RAM constrained) | aec | 62.94% at 470 MB/s, 32 MB RAM, 0.009 J/MB |
| Streaming / low-latency ingest | zstd -1 | 451 MB/s enc, 2,355 MB/s dec |
| Read-heavy / CDN | zstd -1 | decode leader by ~2× |
| x86 binaries | xz -9e | LZMA2 with executable filters tuned for this |
| Int16 sensor / time-series | flac | 63.8% at 153 MB/s, 4 MB RAM |
Honest call-outs
Section titled “Honest call-outs”Not every file in the corpus is a Bindu win. The places it loses are worth flagging explicitly:
- Silesia
mozilla,samba,ooffice— large source-tree archives containing x86 binaries.xz -9e’s LZMA2 with executable filters wins by a small margin. If your archive is dominated by x86 binaries, xz is the better default. - Sentinel-2 multispectral B11 —
aec(CCSDS 121) is marginally smaller. The specialized telemetry codec wins on the specific multispectral band it was tuned for. - MMS Epoch / TEL_THEMIS_B / certain int64 timestamp encodings —
zstdandxzcome out ahead by small margins on a handful of timestamp formats.
The pattern: where a domain-specific codec is hand-tuned for a particular bit pattern (executable code, multispectral imagery, fixed-width timestamps), it can edge ahead of Bindu’s general-purpose pipeline. Where the data has structure that Bindu’s grammar / BWT / dictionary / stride pipelines can capture, Bindu’s per-file numbers lead.
Caveats
Section titled “Caveats”- Single host, single run. Production procurement should replicate on n≥3 runs across multiple host SKUs.
- Energy is a CPU-time proxy (
(user + sys) × 4.375 W/core); RAPL is root-only on this rig andperf_event_paranoid=4blocks user-modepower/energy-pkg/. Ranking between codecs is preserved; absolute joule numbers require an out-of-band reader. - No commercial codecs measured (DAPCOM FAPEC, OptimFROG). The benchmark harness picks them up if they appear on
PATH. - No neural context-mix codecs measured (cmix, nncp, paq8). Their published numbers are cited above for reference.
- Bindu uses multi-threaded SBPN mode on structured satellite data (via
--shape/--dtypehints). Commodity codecs were single-threaded for per-core comparability. Parallel variants (pigz,pbzip2,pixz,zstd -T0) would scale encode near-linearly with cores without changing ratio. fpack(FITS Rice) is installed but excluded from the timed matrix because it requires a synthetic FITS wrapper around raw arrays.- Bindu memory cost on enwik9: ~20 GB peak RSS to encode 1 GB. The BWT suffix-array allocation dominates; won’t fit on a 16 GB host. Smaller inputs are unaffected.
Reproducibility
Section titled “Reproducibility”The full per-run CSVs (results.csv, satellite_results.csv, enwik_results.csv) ship with this benchmark and contain every measurement: ratio, compressed bytes, wall/user/sys time, peak RSS, page faults, context switches, energy proxy, and SHA-256 round-trip status. To reproduce on your own hardware, see Reproducing benchmarks.