Skip to content

Compression benchmarks

This page compares Bindu to every lossless compressor we could install on a single test rig, across three industry-standard corpora. In total, we execute 349 measured runs, all fully round-trip verified by SHA-256.

If you want to know how to reproduce these numbers yourself, see Reproducing benchmarks for the methodology page.

ComponentSpec
CPUAMD Ryzen 7 8745HS — 8 cores, TDP 35 W
RAM27 GB DDR5
OSLinux x86_64
VerificationSHA-256 of decompressed output ≡ SHA-256 of original input

All commodity compressors invoked with single-thread flags (xz -T1, zstd --single-thread; gzip/bzip2/brotli/zip are natively single-threaded) so per-core throughput is comparable across codecs. Bindu uses its --shape/--dtype hints on structured satellite data — that matches its real deployment path.

CodecClassFilesNotes
Bindusymbolic / formulaic30this benchmark
xzLZMA230levels 1, 6, 9, 9e
brotliLZ77 + static dictionary14levels 6, 9, 11
bzip2BWT + Huffman30levels 1, 9
zstdFSE + LZ7730levels 1, 3, 9, 19, 22 --long
gzipDEFLATE30levels 1, 6, 9
zipDEFLATE archiver14level 6
aecCCSDS 121.0-B-3 Rice10satellite-applicable only
zfpLLNL reversible2float-only
flacLossless audio3int16-only

A note on aec (CCSDS 121 Adaptive Entropy Coding): this is the lossless reference codec used by NOAA, ESA, JAXA, and NASA missions, and embedded in HDF5 via SZIP. It is sometimes mis-typed as “AES” — that’s encryption (Rijndael), not a compressor. It’s excluded.

Three industry-standard corpora, 30 files total, ~1.5 GB raw:

CorpusFilesTotal rawDescription
Silesia12202 MBIndustry-standard generic benchmark — text, binaries, scientific images, structured records
Satellite / telemetry16306 MBSAR (Umbra), MSI (Sentinel-2), HSI (AVIRIS), weather (GOES-16), astronomy (Chandra), space-weather telemetry (MMS, OMNI, THEMIS), SSA text (AIS, ADS-B)
Hutter Prize / LTCB21.05 GBenwik8 (100 MB) and enwik9 (1 GB) — first 100 MB and first 1 GB of the English Wikipedia XML dump (Mar 2006), Matt Mahoney’s canonical benchmark

The headline result: ratio (uncompressed / compressed) for each codec on each file. Bold = winner per file.

CorpusFileSizeBinduxzbrotlibzip2zstdgzipzipaeczfpflac
Silesiadickens9.7 MB4.02×3.60×3.60×3.64×3.58×2.65×2.63×
Silesiamozilla48.8 MB2.88×3.83×3.69×2.86×3.42×2.70×2.69×
Silesiawebster39.5 MB5.75×4.95×4.92×4.80×4.90×3.44×3.40×
Silesiax-ray8.1 MB2.13×1.89×1.81×2.09×1.64×1.41×1.40×
Silesiaxml5.1 MB12.63×12.29×12.41×12.12×11.80×8.07×7.72×
Silesiasamba20.6 MB5.03×5.78×5.74×4.75×5.57×4.00×3.96×
Silesiaosdb9.6 MB3.98×3.55×3.58×3.60×3.26×2.71×2.70×
Silesiareymont6.3 MB5.90×5.04×4.97×5.32×4.92×3.64×3.57×
Silesiasao6.9 MB1.44×1.64×1.58×1.47×1.45×1.36×1.36×
Silesianci32.0 MB24.79×23.15×22.08×18.51×20.84×11.23×10.49×
Silesiaooffice5.9 MB2.14×2.54×2.48×2.15×2.37×1.99×1.99×
Silesiamr9.5 MB4.22×3.63×3.53×4.08×3.21×2.71×2.70×
SatelliteSAR_Umbra15.3 MB1.46×1.43×1.40×1.36×1.33×1.33×
SatelliteMSI_S2_B0447.7 MB2.81×2.65×2.82×2.41×2.20×2.81×
SatelliteMSI_S2_B1157.5 MB3.10×2.82×3.03×2.56×2.29×3.22×
SatelliteHSI_Salinas47.5 MB3.01×3.23×2.38×2.38×1.46×1.83×1.83×
SatelliteHSI_IndianPines8.0 MB1.99×1.74×1.69×1.39×1.24×1.73×
SatelliteWX_GOES_Ch0128.6 MB21.98×17.79×21.57×15.93×12.30×20.25×9.98×
SatelliteWX_GOES_Ch137.2 MB6.74×5.34×5.91×4.95×4.07×5.93×5.73×
SatelliteSCI_Chandra2.4 MB7.41×6.94×7.16×5.89×4.78×1.25×
SatelliteTEL_MMS_Epoch10.5 MB1.34×1.97×1.46×1.98×1.68×
SatelliteTEL_OMNI_Epoch348.8 KB2,349×3.70×3.02×3.24×2.85×2.17×
SatelliteTEL_OMNI_IMF174.4 KB220×128×201×157×160×36×
SatelliteTEL_MMS_flag5.3 MB263,314×5,933×112,849×29,103×1,025×1,213×
SatelliteTEL_MMS_B_gse21.1 MB1.62×1.44×1.10×1.13×1.13×1.00×
SatelliteTEL_THEMIS_B368.0 KB1.11×1.26×1.03×1.09×1.09×
SatelliteSSA_AIS50.0 MB8.75×9.30×4.16×8.68×2.81×
SatelliteSSA_ADSB3.8 MB10.33×8.94×9.76×8.79×6.55×
Hutterenwik895.4 MB4.29×4.03×3.88×3.45×3.95×2.74×2.74×
Hutterenwik9953.7 MB5.43×4.69×3.87×3.94×4.25×3.10×3.09×
CodecWinsDomains
Bindu19Generic text, structured records, scientific imaging, weather, astronomy, space-weather telemetry, SSA records, Hutter Prize
xz7x86 binaries, source code, certain hyperspectral cubes, AIS records
bzip22medical imaging, multispectral red band
aec (CCSDS 121)1Sentinel-2 SWIR band
zstd1int64 timestamps

Bindu wins 63% of files outright — more than every other codec combined.

Across files where each codec applies (specialized codecs only run on dtypes they support):

CodecFilesTotal inTotal outComp %Avg enc MB/sAvg dec MB/sPeak RSS enc
Bindu301.5 GB343 MB77.95%0.64.619.5 GB
xz301.5 GB371 MB76.15%1.9105675 MB
brotli141.2 GB318 MB74.54%2.0493237 MB
zstd301.5 GB412 MB73.51%2.3949747 MB
bzip2301.5 GB435 MB72.03%22399 MB
zip141.2 GB409 MB67.34%322163 MB
gzip301.5 GB553 MB64.51%222342.1 MB
aec (CCSDS 121)10220 MB81 MB62.94%47047032 MB
flac383 MB30 MB63.84%1532654 MB
zfp221 MB21 MB0.44%25017344 MB

Hutter Prize / Large Text Compression Benchmark

Section titled “Hutter Prize / Large Text Compression Benchmark”

Bindu’s results on enwik8 and enwik9 placed against published LTCB rankings and historical Hutter Prize entries:

CompressorRatiobits/byteClassSource
cmix v187.10×1.13neural context-mixHutter Prize 2024 (published)
nncp v26.50–6.80×1.18–1.23neural arithmeticpublished
paq8hp126.32×1.27tuned PAQ for enwik8published 2008
zpaq -m66.11×1.31classical extremepublished
paq8l5.95×1.34PAQ ensembleHutter Prize 2009
brotli -q114.68×1.71LZ77 + dictpublished (Google)
xz -9 / lzma -9e4.65×1.72LZMA2published
zstd -224.30×1.86FSE + LZ77published (Meta)
Bindu4.29×1.87symbolic / formulaicmeasured
zpaq -m54.34×1.84classical context-mixpublished
bzip2 -93.71×2.16BWTpublished
gzip -93.13×2.55DEFLATEpublished
CompressorRatiobits/byteClassSource
cmix v18+7.0–7.1×1.13–1.15neural context-mixpublished
nncp v2~6.5×1.23neural arithmeticpublished
paq8 family~6.0×1.33PAQ ensemblepublished
Bindu5.43×1.47symbolic / formulaicmeasured
zpaq -m5~4.7–5.0×1.60–1.70classicalpublished
xz -94.69×1.71LZMA2measured
zstd -194.25×1.88FSE + LZ77measured
bzip2 -93.94×2.03BWTmeasured
brotli -93.87×2.07LZ77 + dictmeasured
gzip -93.10×2.58DEFLATEmeasured
TierCompressorsenwik8 ratio rangeNotes
Neural (Hutter Prize-class)cmix, nncp, paq85.95–7.10×GPU/long compute; file-specific learned models
BinduBindu4.29×symbolic, single CPU, deterministic
Classical max-ratioxz, brotli-11, zstd-22, zpaq-m53.95–4.68×mainstream production codecs
Classical mainstreambzip2, gzip2.74–3.71×ubiquitous baselines

On enwik9 (1 GB), Bindu’s 5.43× / 1.47 bpc is better than any non-neural codec measured at this scale, beating xz-9 by 16% and within striking distance of the lower neural tier. The Hutter Prize award threshold is 7.10× / 1.13 bpc — historically reachable only with neural context mixing.

Separately from the unified Silesia/satellite/Hutter benchmark, Bindu was measured against the ALP reference implementation on the 30-dataset ALP float-time-series corpus.

MetricResult
Geometric mean ratio (bits/value) vs ALP1.88× fewer
Per-dataset wins27 of 30 vs ALP, Chimp, and Patas

The result is meaningful because ALP is the published state-of-the-art for lossless float compression; the 27/30 outcome is a clear lead on the workload it was designed for. Bindu uses the DICT and LINDELTA pipelines for this domain — see Architecture for the routing logic.

AxisWinnerValue
Best ratio overallBindu77.95% aggregate; 19/30 per-file wins
Highest encode throughputaec (CCSDS 121)470 MB/s — onboard-spacecraft champion
Highest decode throughputzstd -12,355 MB/s peak
Lowest RAM (compress)gzip2.1 MB peak
Lowest energy (J/MB encoded)aec0.009 J/MB
Highest single-file ratioBindu on TEL_MMS_flag263,314× (5.5 MB → 21 bytes)
WorkloadRecommended codecWhy
Cold archival, mixed generic dataxz -9ewell-established, 77% Silesia aggregate
Cold archival, telemetry / structuredBinduwins 19/30 files; extreme ratios (200–263k×) on sparse telemetry
Hutter Prize / large textBinduleads enwik8 (4.29×) and enwik9 (5.43×) among classical codecs
Onboard spacecraft (CPU/RAM constrained)aec62.94% at 470 MB/s, 32 MB RAM, 0.009 J/MB
Streaming / low-latency ingestzstd -1451 MB/s enc, 2,355 MB/s dec
Read-heavy / CDNzstd -1decode leader by ~2×
x86 binariesxz -9eLZMA2 with executable filters tuned for this
Int16 sensor / time-seriesflac63.8% at 153 MB/s, 4 MB RAM

Not every file in the corpus is a Bindu win. The places it loses are worth flagging explicitly:

  • Silesia mozilla, samba, ooffice — large source-tree archives containing x86 binaries. xz -9e’s LZMA2 with executable filters wins by a small margin. If your archive is dominated by x86 binaries, xz is the better default.
  • Sentinel-2 multispectral B11aec (CCSDS 121) is marginally smaller. The specialized telemetry codec wins on the specific multispectral band it was tuned for.
  • MMS Epoch / TEL_THEMIS_B / certain int64 timestamp encodingszstd and xz come out ahead by small margins on a handful of timestamp formats.

The pattern is straightforward. A specialized codec can still win when the data exactly matches the shape it was built for, such as executable code, a specific satellite image band, or a narrow timestamp format. Bindu does best when the data has repeated structure it can describe with its grammar, BWT, dictionary, or stride pipelines. That is why Bindu leads on most files in the corpus, while a few highly specialized cases still favor older tools.

  • Single host, single run. Production procurement should replicate on n≥3 runs across multiple host SKUs.
  • Energy is a CPU-time proxy ((user + sys) × 4.375 W/core); RAPL is root-only on this rig and perf_event_paranoid=4 blocks user-mode power/energy-pkg/. Ranking between codecs is preserved; absolute joule numbers require an out-of-band reader.
  • No commercial codecs measured (DAPCOM FAPEC, OptimFROG). The benchmark harness picks them up if they appear on PATH.
  • No neural context-mix codecs measured (cmix, nncp, paq8). Their published numbers are cited above for reference.
  • Bindu uses multi-threaded SBPN mode on structured satellite data (via --shape/--dtype hints). Commodity codecs were single-threaded for per-core comparability. Parallel variants (pigz, pbzip2, pixz, zstd -T0) would scale encode near-linearly with cores without changing ratio.
  • fpack (FITS Rice) is installed but excluded from the timed matrix because it requires a synthetic FITS wrapper around raw arrays.
  • Bindu memory cost on enwik9: ~20 GB peak RSS to encode 1 GB. The BWT suffix-array allocation dominates; won’t fit on a 16 GB host. Smaller inputs are unaffected.

The full per-run CSVs (results.csv, satellite_results.csv, enwik_results.csv) ship with this benchmark and contain every measurement: ratio, compressed bytes, wall/user/sys time, peak RSS, page faults, context switches, energy proxy, and SHA-256 round-trip status. To reproduce on your own hardware, see Reproducing benchmarks.