Ops Slang Explained Definitions Examples

In server rooms and Slack channels, ops slang is the fastest way to signal intent and urgency. Knowing these terms prevents outages, saves budgets, and keeps engineers on the same page.

Below is a field-tested glossary that moves beyond memes and into real production value.

🤖 This content was generated with the help of AI.

Core Incident Terms

Sev-0 to Sev-5

Severity levels decide paging rules and escalation chains. Sev-0 means total revenue loss and wakes the CEO; Sev-5 is a cosmetic bug that stays in the backlog.

Each level has a pre-approved runbook and a strict 5-minute acknowledgment window for Sev-0 and Sev-1.

Page Fatigue

Page fatigue happens when alert frequency outruns team capacity and leads to ignored alarms. Mitigate it with alert throttling and on-call shadowing programs.

Blast Radius

The blast radius is the set of downstream systems affected by a single failure. Mapping it before a deploy lets you pre-stage circuit breakers.

Deployment and Release Jargon

Canary

A canary release routes 1–5 % of traffic to a new build. If error budgets stay green for 30 minutes, the rollout continues.

Blue/Green

Blue/green swaps entire environments at the load balancer. It doubles infra cost but gives instant rollback by switching DNS.

Feature Flag

Feature flags let you hide code behind runtime toggles. Kill a bad feature in milliseconds without redeploying.

Hotfix

A hotfix skips the normal sprint cycle and lands straight on prod. Tag it with a hotfix branch prefixed by the incident ticket ID.

Observability Vocabulary

Red Line

The red line is the latency threshold above which users abandon the site. Track it via synthetic probes every 10 seconds.

Golden Signal

Google’s four golden signals are latency, traffic, errors, and saturation. Dashboard them on every service overview page.

Cardinality Explosion

High-cardinality labels can crash your Prometheus instance. Drop unused labels at scrape time with metric relabel configs.

Trace Sampling

Trace sampling captures 1 in 1000 requests by default. Increase the rate for critical user journeys using adaptive sampling rules.

On-Call Culture Slang

War Room

The war room is a dedicated Zoom bridge opened during Sev-0 incidents. Lock the room to essential engineers only and appoint a scribe.

Hero Culture

Hero culture rewards engineers who stay up all night fixing things. Replace it with blameless postmortems and shared runbooks.

Shadow Rotation

Shadow rotation pairs a new engineer with the primary on-call for a week. They carry the pager but escalate to the mentor for every alert.

Handoff Notes

Handoff notes capture open alerts, flaky tests, and infra debt. Store them in a single Confluence page updated at 09:00 sharp.

Automation & Infrastructure Terms

Cattle vs. Pets

Cattle are identical VMs replaced on failure; pets are snowflake servers with names. Aim for 90 % cattle to reduce MTTR.

Immutable Infrastructure

Immutable infra never changes in place; you always redeploy a new image. Use Packer and Terraform to bake AMIs nightly.

Chaos Monkey

Chaos Monkey kills random instances during business hours. Start with a 10 % blast radius in staging before touching prod.

GitOps

GitOps makes a Git repo the single source of truth for cluster state. Argo CD watches the repo and converges Kubernetes to match.

Cost and Performance Lingo

Spiky Workload

Spiky workloads burst at unpredictable times. Pre-warm Lambda pools and use predictive scaling based on CloudWatch anomalies.

Committed Use

Committed use discounts lock you into one-year or three-year spend. Track them in a FinOps dashboard to avoid unused reservations.

Spot Fleet

A spot fleet mixes on-demand and spot instances to hit price targets. Set a 20 % on-demand base layer to handle eviction storms.

CPU Steal

High steal time means noisy neighbors on your hypervisor. Migrate the VM to a dedicated host and alert above 10 %.

Security and Compliance Slang

Zero Trust

Zero trust verifies every request regardless of network origin. Enforce it with mutual TLS and short-lived service tokens.

Red Team vs. Blue Team

The red team attacks; the blue team defends. Schedule quarterly purple-team exercises to close detection gaps.

SBOM

A software bill of materials lists every library in your container. Generate it with Syft and store it next to the image in ECR.

Policy as Code

Policy as code codifies compliance rules in OPA or Sentinel. Block non-compliant Terraform plans before they reach the cloud.

Reliability Engineering Metrics

SLA vs. SLO vs. SLI

SLA is the contractual promise to customers. SLO is the internal goal, and SLI is the measured reality.

If your SLI dips below the SLO for 30 days, freeze new features until reliability recovers.

Error Budget

The error budget is the amount of unreliability you can afford without breaking the SLA. Spend it on launches, but track burn rate weekly.

MTTR vs. MTBF

MTTR measures repair speed; MTBF measures stability between failures. A 5-minute MTTR with a 30-day MTBF beats perfect code that ships yearly.

Toil

Toil is repetitive manual work with no enduring value. Eliminate it by automating ticket triage and certificate renewals.

Advanced Incident Tactics

Rollback vs. Roll-forward

Rollback reverts the bad deploy; roll-forward patches it in prod. Choose based on blast radius and fix complexity.

Dark Launches

Dark launches ship code to prod without exposing it to users. Turn on traffic with a config flag after soak testing.

Incident Commander

The incident commander owns communication and priority, not root cause. They stop engineers from overlapping fixes and losing context.

Status Page Silence

Status page silence is the gap between outage start and public acknowledgment. Keep it under 5 minutes using automated incident bots.

Tooling Ecosystem Nicknames

K8s

K8s is just “Kubernetes” shortened. Say “kates” out loud and you’ll sound like a native.

ELK vs. EFK

ELK is Elasticsearch, Logstash, Kibana. EFK swaps Logstash for Fluent Bit to reduce memory in sidecars.

Terraform Planfile

A planfile is a binary artifact created by `terraform plan -out=tfplan`. Store it in CI artifacts for reproducible applies.

Helm Umbrella Chart

An umbrella chart bundles micro-service charts into one release. Pin sub-chart versions to avoid surprise upgrades.

Everyday Chat Shortcuts

LGTM

“Looks good to me” approves a pull request. Add it as a single-comment GitHub review to unblock merges.

TIL

“Today I learned” shares quick hacks in Slack. Tag the ops channel so knowledge spreads beyond the author.

AFK

“Away from keyboard” signals a quick break during incident bridges. Use it to avoid ghosting the war room.

TL;DR

“Too long; didn’t read” precedes a 50-word summary of a 500-word incident report. Post it above the fold in Confluence.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *