Skip to content

Benchmarks

This page is the consolidated head-to-head: Bindu vs every lossless compressor we could install on a single test rig, across three industry-standard corpora. 349 measured runs, 100% round-trip verified by SHA-256.

If you want to know how to reproduce these numbers yourself, see Reproducing benchmarks for the methodology page.

ComponentSpec
CPUAMD Ryzen 7 8745HS — 8 cores, TDP 35 W
RAM27 GB DDR5
OSLinux x86_64
VerificationSHA-256 of decompressed output ≡ SHA-256 of original input

All commodity compressors invoked with single-thread flags (xz -T1, zstd --single-thread; gzip/bzip2/brotli/zip are natively single-threaded) so per-core throughput is comparable across codecs. Bindu uses its --shape/--dtype hints on structured satellite data — that matches its real deployment path.

CodecClassFilesNotes
Bindusymbolic / formulaic30this benchmark
xzLZMA230levels 1, 6, 9, 9e
brotliLZ77 + static dictionary14levels 6, 9, 11
bzip2BWT + Huffman30levels 1, 9
zstdFSE + LZ7730levels 1, 3, 9, 19, 22 --long
gzipDEFLATE30levels 1, 6, 9
zipDEFLATE archiver14level 6
aecCCSDS 121.0-B-3 Rice10satellite-applicable only
zfpLLNL reversible2float-only
flacLossless audio3int16-only

A note on aec (CCSDS 121 Adaptive Entropy Coding): this is the lossless reference codec used by NOAA, ESA, JAXA, and NASA missions, and embedded in HDF5 via SZIP. It is sometimes mis-typed as “AES” — that’s encryption (Rijndael), not a compressor. It’s excluded.

Three industry-standard corpora, 30 files total, ~1.5 GB raw:

CorpusFilesTotal rawDescription
Silesia12202 MBIndustry-standard generic benchmark — text, binaries, scientific images, structured records
Satellite / telemetry16306 MBSAR (Umbra), MSI (Sentinel-2), HSI (AVIRIS), weather (GOES-16), astronomy (Chandra), space-weather telemetry (MMS, OMNI, THEMIS), SSA text (AIS, ADS-B)
Hutter Prize / LTCB21.05 GBenwik8 (100 MB) and enwik9 (1 GB) — first 100 MB and first 1 GB of the English Wikipedia XML dump (Mar 2006), Matt Mahoney’s canonical benchmark

The headline result: ratio (uncompressed / compressed) for each codec on each file. Bold = winner per file.

CorpusFileSizeBinduxzbrotlibzip2zstdgzipzipaeczfpflac
Silesiadickens9.7 MB4.02×3.60×3.60×3.64×3.58×2.65×2.63×
Silesiamozilla48.8 MB2.88×3.83×3.69×2.86×3.42×2.70×2.69×
Silesiawebster39.5 MB5.75×4.95×4.92×4.80×4.90×3.44×3.40×
Silesiax-ray8.1 MB2.13×1.89×1.81×2.09×1.64×1.41×1.40×
Silesiaxml5.1 MB12.63×12.29×12.41×12.12×11.80×8.07×7.72×
Silesiasamba20.6 MB5.03×5.78×5.74×4.75×5.57×4.00×3.96×
Silesiaosdb9.6 MB3.98×3.55×3.58×3.60×3.26×2.71×2.70×
Silesiareymont6.3 MB5.90×5.04×4.97×5.32×4.92×3.64×3.57×
Silesiasao6.9 MB1.44×1.64×1.58×1.47×1.45×1.36×1.36×
Silesianci32.0 MB24.79×23.15×22.08×18.51×20.84×11.23×10.49×
Silesiaooffice5.9 MB2.14×2.54×2.48×2.15×2.37×1.99×1.99×
Silesiamr9.5 MB4.22×3.63×3.53×4.08×3.21×2.71×2.70×
SatelliteSAR_Umbra15.3 MB1.46×1.43×1.40×1.36×1.33×1.33×
SatelliteMSI_S2_B0447.7 MB2.81×2.65×2.82×2.41×2.20×2.81×
SatelliteMSI_S2_B1157.5 MB3.10×2.82×3.03×2.56×2.29×3.22×
SatelliteHSI_Salinas47.5 MB3.01×3.23×2.38×2.38×1.46×1.83×1.83×
SatelliteHSI_IndianPines8.0 MB1.99×1.74×1.69×1.39×1.24×1.73×
SatelliteWX_GOES_Ch0128.6 MB21.98×17.79×21.57×15.93×12.30×20.25×9.98×
SatelliteWX_GOES_Ch137.2 MB6.74×5.34×5.91×4.95×4.07×5.93×5.73×
SatelliteSCI_Chandra2.4 MB7.41×6.94×7.16×5.89×4.78×1.25×
SatelliteTEL_MMS_Epoch10.5 MB1.34×1.97×1.46×1.98×1.68×
SatelliteTEL_OMNI_Epoch348.8 KB2,349×3.70×3.02×3.24×2.85×2.17×
SatelliteTEL_OMNI_IMF174.4 KB220×128×201×157×160×36×
SatelliteTEL_MMS_flag5.3 MB263,314×5,933×112,849×29,103×1,025×1,213×
SatelliteTEL_MMS_B_gse21.1 MB1.62×1.44×1.10×1.13×1.13×1.00×
SatelliteTEL_THEMIS_B368.0 KB1.11×1.26×1.03×1.09×1.09×
SatelliteSSA_AIS50.0 MB8.75×9.30×4.16×8.68×2.81×
SatelliteSSA_ADSB3.8 MB10.33×8.94×9.76×8.79×6.55×
Hutterenwik895.4 MB4.29×4.03×3.88×3.45×3.95×2.74×2.74×
Hutterenwik9953.7 MB5.43×4.69×3.87×3.94×4.25×3.10×3.09×
CodecWinsDomains
Bindu19Generic text, structured records, scientific imaging, weather, astronomy, space-weather telemetry, SSA records, Hutter Prize
xz7x86 binaries, source code, certain hyperspectral cubes, AIS records
bzip22medical imaging, multispectral red band
aec (CCSDS 121)1Sentinel-2 SWIR band
zstd1int64 timestamps

Bindu wins 63% of files outright — more than every other codec combined.

Across files where each codec applies (specialized codecs only run on dtypes they support):

CodecFilesTotal inTotal outComp %Avg enc MB/sAvg dec MB/sPeak RSS enc
Bindu301.5 GB343 MB77.95%0.64.619.5 GB
xz301.5 GB371 MB76.15%1.9105675 MB
brotli141.2 GB318 MB74.54%2.0493237 MB
zstd301.5 GB412 MB73.51%2.3949747 MB
bzip2301.5 GB435 MB72.03%22399 MB
zip141.2 GB409 MB67.34%322163 MB
gzip301.5 GB553 MB64.51%222342.1 MB
aec (CCSDS 121)10220 MB81 MB62.94%47047032 MB
flac383 MB30 MB63.84%1532654 MB
zfp221 MB21 MB0.44%25017344 MB

Hutter Prize / Large Text Compression Benchmark

Section titled “Hutter Prize / Large Text Compression Benchmark”

Bindu’s results on enwik8 and enwik9 placed against published LTCB rankings and historical Hutter Prize entries:

CompressorRatiobits/byteClassSource
cmix v187.10×1.13neural context-mixHutter Prize 2024 (published)
nncp v26.50–6.80×1.18–1.23neural arithmeticpublished
paq8hp126.32×1.27tuned PAQ for enwik8published 2008
zpaq -m66.11×1.31classical extremepublished
paq8l5.95×1.34PAQ ensembleHutter Prize 2009
brotli -q114.68×1.71LZ77 + dictpublished (Google)
xz -9 / lzma -9e4.65×1.72LZMA2published
zstd -224.30×1.86FSE + LZ77published (Meta)
Bindu4.29×1.87symbolic / formulaicmeasured
zpaq -m54.34×1.84classical context-mixpublished
bzip2 -93.71×2.16BWTpublished
gzip -93.13×2.55DEFLATEpublished
CompressorRatiobits/byteClassSource
cmix v18+7.0–7.1×1.13–1.15neural context-mixpublished
nncp v2~6.5×1.23neural arithmeticpublished
paq8 family~6.0×1.33PAQ ensemblepublished
Bindu5.43×1.47symbolic / formulaicmeasured
zpaq -m5~4.7–5.0×1.60–1.70classicalpublished
xz -94.69×1.71LZMA2measured
zstd -194.25×1.88FSE + LZ77measured
bzip2 -93.94×2.03BWTmeasured
brotli -93.87×2.07LZ77 + dictmeasured
gzip -93.10×2.58DEFLATEmeasured
TierCompressorsenwik8 ratio rangeNotes
Neural (Hutter Prize-class)cmix, nncp, paq85.95–7.10×GPU/long compute; file-specific learned models
BinduBindu4.29×symbolic, single CPU, deterministic
Classical max-ratioxz, brotli-11, zstd-22, zpaq-m53.95–4.68×mainstream production codecs
Classical mainstreambzip2, gzip2.74–3.71×ubiquitous baselines

On enwik9 (1 GB), Bindu’s 5.43× / 1.47 bpc is better than any non-neural codec measured at this scale, beating xz-9 by 16% and within striking distance of the lower neural tier. The Hutter Prize award threshold is 7.10× / 1.13 bpc — historically reachable only with neural context mixing.

Separately from the unified Silesia/satellite/Hutter benchmark, Bindu was measured against the ALP reference implementation on the 30-dataset ALP float-time-series corpus.

MetricResult
Geometric mean ratio (bits/value) vs ALP1.88× fewer
Per-dataset wins27 of 30 vs ALP, Chimp, and Patas

The result is meaningful because ALP is the published state-of-the-art for lossless float compression; the 27/30 outcome is a clear lead on the workload it was designed for. Bindu uses the DICT and LINDELTA pipelines for this domain — see Architecture for the routing logic.

AxisWinnerValue
Best ratio overallBindu77.95% aggregate; 19/30 per-file wins
Highest encode throughputaec (CCSDS 121)470 MB/s — onboard-spacecraft champion
Highest decode throughputzstd -12,355 MB/s peak
Lowest RAM (compress)gzip2.1 MB peak
Lowest energy (J/MB encoded)aec0.009 J/MB
Highest single-file ratioBindu on TEL_MMS_flag263,314× (5.5 MB → 21 bytes)
WorkloadRecommended codecWhy
Cold archival, mixed generic dataxz -9ewell-established, 77% Silesia aggregate
Cold archival, telemetry / structuredBinduwins 19/30 files; extreme ratios (200–263k×) on sparse telemetry
Hutter Prize / large textBinduleads enwik8 (4.29×) and enwik9 (5.43×) among classical codecs
Onboard spacecraft (CPU/RAM constrained)aec62.94% at 470 MB/s, 32 MB RAM, 0.009 J/MB
Streaming / low-latency ingestzstd -1451 MB/s enc, 2,355 MB/s dec
Read-heavy / CDNzstd -1decode leader by ~2×
x86 binariesxz -9eLZMA2 with executable filters tuned for this
Int16 sensor / time-seriesflac63.8% at 153 MB/s, 4 MB RAM

Not every file in the corpus is a Bindu win. The places it loses are worth flagging explicitly:

  • Silesia mozilla, samba, ooffice — large source-tree archives containing x86 binaries. xz -9e’s LZMA2 with executable filters wins by a small margin. If your archive is dominated by x86 binaries, xz is the better default.
  • Sentinel-2 multispectral B11aec (CCSDS 121) is marginally smaller. The specialized telemetry codec wins on the specific multispectral band it was tuned for.
  • MMS Epoch / TEL_THEMIS_B / certain int64 timestamp encodingszstd and xz come out ahead by small margins on a handful of timestamp formats.

The pattern: where a domain-specific codec is hand-tuned for a particular bit pattern (executable code, multispectral imagery, fixed-width timestamps), it can edge ahead of Bindu’s general-purpose pipeline. Where the data has structure that Bindu’s grammar / BWT / dictionary / stride pipelines can capture, Bindu’s per-file numbers lead.

  • Single host, single run. Production procurement should replicate on n≥3 runs across multiple host SKUs.
  • Energy is a CPU-time proxy ((user + sys) × 4.375 W/core); RAPL is root-only on this rig and perf_event_paranoid=4 blocks user-mode power/energy-pkg/. Ranking between codecs is preserved; absolute joule numbers require an out-of-band reader.
  • No commercial codecs measured (DAPCOM FAPEC, OptimFROG). The benchmark harness picks them up if they appear on PATH.
  • No neural context-mix codecs measured (cmix, nncp, paq8). Their published numbers are cited above for reference.
  • Bindu uses multi-threaded SBPN mode on structured satellite data (via --shape/--dtype hints). Commodity codecs were single-threaded for per-core comparability. Parallel variants (pigz, pbzip2, pixz, zstd -T0) would scale encode near-linearly with cores without changing ratio.
  • fpack (FITS Rice) is installed but excluded from the timed matrix because it requires a synthetic FITS wrapper around raw arrays.
  • Bindu memory cost on enwik9: ~20 GB peak RSS to encode 1 GB. The BWT suffix-array allocation dominates; won’t fit on a 16 GB host. Smaller inputs are unaffected.

The full per-run CSVs (results.csv, satellite_results.csv, enwik_results.csv) ship with this benchmark and contain every measurement: ratio, compressed bytes, wall/user/sys time, peak RSS, page faults, context switches, energy proxy, and SHA-256 round-trip status. To reproduce on your own hardware, see Reproducing benchmarks.