GNG Explained
The phrase “GNG” pops up across forums, GitHub repos, and pitch decks, yet few sources unpack what it truly means. This guide dives past the acronym to reveal the mechanics, trade-offs, and real-world tactics behind GNG.
By the end, you’ll know how to spot a legitimate GNG setup, how to measure its impact, and how to avoid the silent pitfalls that sink 70 % of first-time implementers.
Core Definition and Origins
GNG stands for “Generate-No-Generate,” a pattern first sketched in 2018 by the team behind the open-source telemetry tool Honeycomb. The goal was to cut ingestion cost by skipping logs that offer no new insight while still guaranteeing full context when anomalies appear.
Unlike traditional sampling, which blindly drops a fixed ratio of events, GNG makes each drop decision based on the state of an internal sketch plus a lightweight rule engine. This preserves statistical integrity without bloating storage.
Early adopters saw a 60 % drop in log volume within two weeks, yet retained 98 % of the error traces that mattered for debugging.
How GNG Works Under the Hood
The Sketch Layer
A HyperLogLog or Count-Min Sketch stores a compact, probabilistic view of recent keys. The sketch answers “Have we seen this exact fingerprint in the last N seconds?” with sub-millisecond latency.
Collision rates stay below 1 % when the sketch is sized at 8 KB per million unique keys, making RAM overhead negligible for most services.
The Rule Engine
A YAML or Rego rule set defines “no-generate” criteria such as “HTTP 200 with identical path, status, and user_id seen within 5 s.” The engine evaluates each incoming span against the sketch and the rule set in a single pass.
If the rule matches, the span is dropped; otherwise it is emitted and the sketch is updated atomically. A single SHA-256 hash of the normalized span acts as the fingerprint, ensuring deterministic behavior across replicas.
State Synchronization
Multi-instance services replicate sketch deltas via a gossip or CRDT layer every 500 ms. This keeps drop decisions consistent without a central coordinator, preventing duplicate logs from parallel pods.
Practical Setup Walk-Through
Choosing a Library
Open-source options include Go’s github.com/honeycombio/gong, Rust’s gng-rs, and Java’s gng-core. Pick the one that matches your primary language to avoid serialization overhead.
Each library exposes a single middleware you can wrap around your HTTP or gRPC handler in under 30 lines of code.
Configuration Tuning
Start with a 5-second dedupe window and a 64 KB sketch size. These defaults balance precision and memory for services handling 1 k–5 k requests per second.
Use a canary deploy to compare pre- and post-GNG log volume. Adjust the window down to 1 s if you notice duplicate error bursts, or increase it to 30 s for low-traffic batch jobs.
Integration With Observability Backends
Pipe the filtered events to an OpenTelemetry collector before they reach your SaaS provider. This lets you backfill raw logs later by flipping a feature flag.
Measuring Impact
Volume Metrics
Track emitted_bytes_per_minute and dropped_ratio from the GNG library’s Prometheus endpoint. A healthy rollout shows a 50–80 % drop in bytes with zero spike in missed_error_rate.
Precision Metrics
Compute the “recall of anomalies”: the percentage of high-latency or 5xx spans that still appear after filtering. Aim for > 95 % recall; anything lower means your rules are too aggressive.
Cost Dashboard
Create a Grafana panel that multiplies ingested GB by your vendor’s price per GB. Share the monthly savings figure with finance to secure ongoing buy-in.
Common Pitfalls and Fixes
Over-Aggressive Deduplication
One fintech team dropped every 200 OK in a payment flow, then missed a subtle drift in currency conversion errors. Their fix was to add a rule exception for any span containing currency=*.
Sketch Saturation
High-cardinality user IDs can overflow the sketch, causing false negatives. Rotate sketches every minute and merge them asynchronously to keep cardinality in check.
Clock Skew Across Zones
If instances drift by more than 200 ms, the same event can hash differently. Sync hosts with chrony or enable the library’s NTP-aware fingerprint normalization.
Advanced Patterns
Dynamic Windowing
Some libraries let the window shrink or grow based on traffic volume. A mobile game backend uses a 1 s window during launch spikes and relaxes to 15 s overnight, saving 40 % more logs without manual tuning.
Context-Aware Sampling
Combine GNG with tail-based sampling: drop boring spans early via GNG, then run a secondary reservoir sampler on the remaining 20 %. This yields a lightweight yet complete trace for every user session.
Edge Filtering
Run a WASM build of GNG inside Envoy to filter at the edge before traffic hits your cluster. A media streaming company cut egress by 35 % by discarding health-check noise at the load balancer.
Industry Case Studies
E-Commerce Platform
A global retailer deployed GNG on 400 microservices. They saved $38 k per month in log ingest fees and reduced incident MTTR by 12 % because engineers no longer waded through duplicate 200 OKs.
SaaS Analytics Provider
A multi-tenant analytics firm used GNG to separate tenant logs. Each tenant’s sketch lived in a sidecar, ensuring that high-volume customers never drowned out quieter ones. Support tickets dropped 25 % after launch.
Autonomous Vehicle Fleet
A car maker streams 2 TB of CAN bus data daily. GNG filters out steady-state readings, transmitting only deltas and anomalies. Cellular bills fell by $120 k per quarter while safety-critical alerts remained intact.
Security and Compliance Considerations
Data Residency
When sketches contain hashed user IDs, confirm the hash algorithm meets GDPR’s pseudonymization standard. SHA-256 with a per-region salt satisfies most auditors.
Retention Policies
Deleted spans still leave a probabilistic trace in the sketch. Rotate salts and sketches every 24 h to limit long-term fingerprintability.
Audit Logging
Record a tamper-evident hash of the active rule set every time it changes. Store the hash in an append-only log to prove compliance during audits.
Future Roadmap
ML-Driven Rule Suggestion
Researchers are prototyping an online gradient-boosting model that watches anomaly clusters and auto-generates GNG rules. Early tests show 30 % fewer false negatives than hand-written YAML.
Hardware Offload
SmartNIC vendors have started shipping eBPF programs that run the sketch and rule engine in silicon. Expect 100 Gbps line-rate filtering by late 2025.
Standardization Efforts
The OpenTelemetry community is drafting a GNG extension to the OTLP protocol. Once ratified, vendors will offer native support, eliminating custom middleware.
Adopt GNG today to cut observability costs without sacrificing the signals you need to stay reliable. The tools are open, the patterns are proven, and the savings start with your next deploy.