Log archives
Application logs share the structural properties that make Bindu work: they’re sequential, highly repetitive, and queried long after they’re written. Many of the same arguments from the satellite & telemetry flagship — symbolic deltas, transmission cost dominance, operating on the compressed form directly — apply here. This page covers the log-specific framing.
Why Bindu fits logs
Section titled “Why Bindu fits logs”- High cardinality of template, low cardinality of values.
service,level,host,region, error codes, and log templates all fit in small enums. Per-event variable fields (IDs, timestamps, numbers) fit tightly packed columns. - You query more than you read. Once an event is written, the common access pattern is searching or aggregating — not reconstructing the original text.
- You retain for compliance, not readability. 90-day, 1-year, or 7-year retention windows benefit heavily from 10× compression, and you never need the exact original byte layout.
Suggested pipeline
Section titled “Suggested pipeline”app → shipper → hot store (last 24h) → nightly rollup → Bindu archiveRollup:
bindu compress \ --schema app-logs-v3.bindus \ --dict app-logs.bindud \ --partition 'date, service' \ --index 'trace_id, user_id' \ --input 'hot/2026-01-23/*.jsonl' \ --out 'archive/2026-01-23/'The --partition flag produces one file per (date, service) combination, which makes most queries selective at the file-skip layer.
The --index flag builds a sparse Bloom filter over those columns. Lookups by trace_id touch O(1) files even across years of logs.
Querying
Section titled “Querying”# All errors from the checkout service last weekbindu query archive/ \ --where 'service == "checkout" && level == "error"' \ --since '2026-01-16' --until '2026-01-23'
# Reconstruct a full tracebindu query archive/ \ --where 'trace_id == "a3f..."'
# Aggregatebindu query archive/ \ --select 'service, count(*), p99(latency_ms)' \ --group-by service \ --since '2026-01-23'On realistic log volumes (1 TB/day compressed to ~100 GB/day), a targeted trace lookup against a 90-day archive runs in under a second on a single node.
Retention and rollups
Section titled “Retention and rollups”For long retention windows, consider a two-tier scheme:
- Full fidelity for 30 days (
--level 6, default indexes). - Aggregated rollup for months 2–12 (
bindu rollup— drops high-cardinality columns, keeps aggregates).
Storage drops by another 5–10× at the cost of losing per-event detail.
Pitfalls
Section titled “Pitfalls”- Don’t mix schemas in one archive. If service A and service B have different log shapes, partition by service. A union schema bloats the dictionary.
- Watch trace ID encoding. If IDs are emitted as hex strings, explicitly type them as
bytes(16)in the schema. Without that hint Bindu falls back to generic string compression and you lose ~30% on that column. - Clock skew matters. Double-delta timestamp encoding assumes monotonic-ish input. Sort by
tsin the shipper.