Ops Slang Explained Definitions Examples
In server rooms and Slack channels, ops slang is the fastest way to signal intent and urgency. Knowing these terms prevents outages, saves budgets, and keeps engineers on the same page.
Below is a field-tested glossary that moves beyond memes and into real production value.
Core Incident Terms
Sev-0 to Sev-5
Severity levels decide paging rules and escalation chains. Sev-0 means total revenue loss and wakes the CEO; Sev-5 is a cosmetic bug that stays in the backlog.
Each level has a pre-approved runbook and a strict 5-minute acknowledgment window for Sev-0 and Sev-1.
Page Fatigue
Page fatigue happens when alert frequency outruns team capacity and leads to ignored alarms. Mitigate it with alert throttling and on-call shadowing programs.
Blast Radius
The blast radius is the set of downstream systems affected by a single failure. Mapping it before a deploy lets you pre-stage circuit breakers.
Deployment and Release Jargon
Canary
A canary release routes 1–5 % of traffic to a new build. If error budgets stay green for 30 minutes, the rollout continues.
Blue/Green
Blue/green swaps entire environments at the load balancer. It doubles infra cost but gives instant rollback by switching DNS.
Feature Flag
Feature flags let you hide code behind runtime toggles. Kill a bad feature in milliseconds without redeploying.
Hotfix
A hotfix skips the normal sprint cycle and lands straight on prod. Tag it with a hotfix branch prefixed by the incident ticket ID.
Observability Vocabulary
Red Line
The red line is the latency threshold above which users abandon the site. Track it via synthetic probes every 10 seconds.
Golden Signal
Google’s four golden signals are latency, traffic, errors, and saturation. Dashboard them on every service overview page.
Cardinality Explosion
High-cardinality labels can crash your Prometheus instance. Drop unused labels at scrape time with metric relabel configs.
Trace Sampling
Trace sampling captures 1 in 1000 requests by default. Increase the rate for critical user journeys using adaptive sampling rules.
On-Call Culture Slang
War Room
The war room is a dedicated Zoom bridge opened during Sev-0 incidents. Lock the room to essential engineers only and appoint a scribe.
Hero Culture
Hero culture rewards engineers who stay up all night fixing things. Replace it with blameless postmortems and shared runbooks.
Shadow Rotation
Shadow rotation pairs a new engineer with the primary on-call for a week. They carry the pager but escalate to the mentor for every alert.
Handoff Notes
Handoff notes capture open alerts, flaky tests, and infra debt. Store them in a single Confluence page updated at 09:00 sharp.
Automation & Infrastructure Terms
Cattle vs. Pets
Cattle are identical VMs replaced on failure; pets are snowflake servers with names. Aim for 90 % cattle to reduce MTTR.
Immutable Infrastructure
Immutable infra never changes in place; you always redeploy a new image. Use Packer and Terraform to bake AMIs nightly.
Chaos Monkey
Chaos Monkey kills random instances during business hours. Start with a 10 % blast radius in staging before touching prod.
GitOps
GitOps makes a Git repo the single source of truth for cluster state. Argo CD watches the repo and converges Kubernetes to match.
Cost and Performance Lingo
Spiky Workload
Spiky workloads burst at unpredictable times. Pre-warm Lambda pools and use predictive scaling based on CloudWatch anomalies.
Committed Use
Committed use discounts lock you into one-year or three-year spend. Track them in a FinOps dashboard to avoid unused reservations.
Spot Fleet
A spot fleet mixes on-demand and spot instances to hit price targets. Set a 20 % on-demand base layer to handle eviction storms.
CPU Steal
High steal time means noisy neighbors on your hypervisor. Migrate the VM to a dedicated host and alert above 10 %.
Security and Compliance Slang
Zero Trust
Zero trust verifies every request regardless of network origin. Enforce it with mutual TLS and short-lived service tokens.
Red Team vs. Blue Team
The red team attacks; the blue team defends. Schedule quarterly purple-team exercises to close detection gaps.
SBOM
A software bill of materials lists every library in your container. Generate it with Syft and store it next to the image in ECR.
Policy as Code
Policy as code codifies compliance rules in OPA or Sentinel. Block non-compliant Terraform plans before they reach the cloud.
Reliability Engineering Metrics
SLA vs. SLO vs. SLI
SLA is the contractual promise to customers. SLO is the internal goal, and SLI is the measured reality.
If your SLI dips below the SLO for 30 days, freeze new features until reliability recovers.
Error Budget
The error budget is the amount of unreliability you can afford without breaking the SLA. Spend it on launches, but track burn rate weekly.
MTTR vs. MTBF
MTTR measures repair speed; MTBF measures stability between failures. A 5-minute MTTR with a 30-day MTBF beats perfect code that ships yearly.
Toil
Toil is repetitive manual work with no enduring value. Eliminate it by automating ticket triage and certificate renewals.
Advanced Incident Tactics
Rollback vs. Roll-forward
Rollback reverts the bad deploy; roll-forward patches it in prod. Choose based on blast radius and fix complexity.
Dark Launches
Dark launches ship code to prod without exposing it to users. Turn on traffic with a config flag after soak testing.
Incident Commander
The incident commander owns communication and priority, not root cause. They stop engineers from overlapping fixes and losing context.
Status Page Silence
Status page silence is the gap between outage start and public acknowledgment. Keep it under 5 minutes using automated incident bots.
Tooling Ecosystem Nicknames
K8s
K8s is just “Kubernetes” shortened. Say “kates” out loud and you’ll sound like a native.
ELK vs. EFK
ELK is Elasticsearch, Logstash, Kibana. EFK swaps Logstash for Fluent Bit to reduce memory in sidecars.
Terraform Planfile
A planfile is a binary artifact created by `terraform plan -out=tfplan`. Store it in CI artifacts for reproducible applies.
Helm Umbrella Chart
An umbrella chart bundles micro-service charts into one release. Pin sub-chart versions to avoid surprise upgrades.
Everyday Chat Shortcuts
LGTM
“Looks good to me” approves a pull request. Add it as a single-comment GitHub review to unblock merges.
TIL
“Today I learned” shares quick hacks in Slack. Tag the ops channel so knowledge spreads beyond the author.
AFK
“Away from keyboard” signals a quick break during incident bridges. Use it to avoid ghosting the war room.
TL;DR
“Too long; didn’t read” precedes a 50-word summary of a 500-word incident report. Post it above the fold in Confluence.