Understanding Kubernetes Probes: Liveness, Readiness & Startup
Deep dive into Kubernetes health probes. When to use each type, configuration examples, and common mistakes to avoid.
Your pod is Running but traffic never reaches it. Or worse, Kubernetes keeps restarting a perfectly healthy container. Both problems usually come down to the same thing: misconfigured probes. Probes are how Kubernetes knows whether your application is alive, ready to serve, and has finished starting up. Get them right and your deployments are resilient. Get them wrong and you'll spend hours chasing phantom crashes.
Quick Answer
Kubernetes has three types of probes:
| Probe | Question it answers | What happens on failure |
|---|---|---|
| Startup | Has the app finished booting? | Container is killed and restarted |
| Liveness | Is the app still working? | Container is killed and restarted |
| Readiness | Can the app handle traffic right now? | Pod removed from Service endpoints (no restart) |
The key insight: liveness and readiness have different consequences. Liveness kills the container. Readiness just stops sending it traffic. Mixing them up is the most common probe mistake.
How Probes Work
The kubelet on each node runs the probes at regular intervals. Each probe performs a check and gets one of three results:
- Success — the check passed
- Failure — the check failed
- Unknown — the check didn't complete (treated as failure)
Every probe type supports four check mechanisms:
HTTP GET
The kubelet sends an HTTP GET request. Any status code between 200 and 399 is a success.
httpGet:
path: /health
port: 8080This is the most common approach. Your application exposes a health endpoint, and Kubernetes pings it.
TCP Socket
The kubelet tries to open a TCP connection. If the port is open, it's a success.
tcpSocket:
port: 3306Useful for databases and services that don't speak HTTP.
Exec Command
The kubelet runs a command inside the container. Exit code 0 is a success.
exec:
command:
- cat
- /tmp/healthyUseful when health depends on something a simple HTTP check can't capture.
gRPC
The kubelet sends a gRPC health check request following the gRPC Health Checking Protocol. A SERVING status is a success.
grpc:
port: 50051
service: myapp-liveness # Optional: target a specific serviceThis was introduced as alpha in Kubernetes 1.24 and became stable (GA) in 1.27. Your application must implement the standard gRPC Health Checking Protocol (grpc.health.v1.Health). The port field is required and must be a number — unlike HTTP and TCP probes, you can't reference a port by name.
The optional service field lets you differentiate probe types on the same gRPC endpoint. For example, you can have your health server respond differently to myapp-liveness vs myapp-readiness requests, instead of running two separate gRPC servers. The Kubernetes project recommends concatenating your service name with the probe type (e.g. myservice-liveness) as a naming convention.
This is the natural choice for gRPC-based microservices — no need to bolt on an HTTP endpoint just for health checks.
Important caveats:
- gRPC probes don't support TLS or authentication parameters
- Configuration errors (wrong port, unimplemented protocol) count as probe failures
- The probe runs against the pod IP, so make sure your gRPC endpoint listens on
0.0.0.0, not justlocalhost
Startup Probe
The startup probe runs first, before liveness and readiness kick in. It answers one question: has the application finished its initialization?
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30
periodSeconds: 10This gives the application up to 300 seconds (30 × 10s) to start. During this time, liveness and readiness probes are disabled. Once the startup probe succeeds, it never runs again — liveness and readiness take over.
When to Use It
Use a startup probe when your application has a slow or unpredictable startup time. Java apps loading Spring contexts, apps running database migrations on boot, or services that need to warm up caches are all good candidates.
Without a startup probe, you'd have to inflate initialDelaySeconds on your liveness probe — which means Kubernetes can't detect a truly dead container during that window.
What Happens on Failure
If the startup probe exhausts all its attempts (failureThreshold reached), Kubernetes kills and restarts the container. This is the same behavior as a liveness probe failure.
Liveness Probe
The liveness probe runs continuously after the startup probe succeeds (or immediately if there's no startup probe). It answers: is this container still functioning?
livenessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 10
failureThreshold: 3
timeoutSeconds: 5When to Use It
Use a liveness probe when your application can enter a broken state that it can't recover from on its own. Deadlocked threads, corrupted internal state, or infinite loops are classic examples.
What Happens on Failure
After failureThreshold consecutive failures, Kubernetes kills the container and restarts it according to the pod's restartPolicy. This is a hard reset — the process is terminated and a new one starts.
When NOT to Use It
Don't point your liveness probe at a dependency. If your app's /healthz checks the database and the database goes down, Kubernetes will restart all your pods — making things worse, not better. Liveness should check whether this container is healthy, not whether the entire system is.
Readiness Probe
The readiness probe also runs continuously. It answers: can this container handle incoming requests right now?
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 5
failureThreshold: 3
timeoutSeconds: 3When to Use It
Use a readiness probe when your application might temporarily be unable to serve traffic. Loading a large dataset, waiting for a cache to warm up, or experiencing backpressure from a downstream service are all good reasons.
What Happens on Failure
The pod is removed from Service endpoints. No traffic is routed to it. Crucially, the container is not restarted — it keeps running. Once the readiness probe succeeds again, the pod is added back to the endpoints.
This is the fundamental difference from liveness: readiness is gentle. It gives your app a chance to recover on its own.
Readiness vs Liveness: The Critical Distinction
| Scenario | Use Readiness | Use Liveness |
|---|---|---|
| App temporarily overloaded | ✅ Remove from traffic | ❌ Restart would make it worse |
| Dependency (DB) is down | ✅ Stop sending requests | ❌ Restarting won't fix the DB |
| App is deadlocked | Not enough alone | ✅ Only a restart can fix it |
| App has a memory leak | Not enough alone | ✅ Restart before OOM |
Probe Configuration Parameters
All three probe types share the same configuration options:
| Parameter | Default | Description |
|---|---|---|
initialDelaySeconds | 0 | Seconds to wait after container start before running the first probe |
periodSeconds | 10 | How often to run the probe |
timeoutSeconds | 1 | How long to wait for a response before considering it failed |
successThreshold | 1 | Consecutive successes needed to be considered healthy (must be 1 for liveness/startup) |
failureThreshold | 3 | Consecutive failures needed to trigger the failure action |
The time before Kubernetes takes action on failure is: periodSeconds × failureThreshold. With defaults, that's 30 seconds (10s × 3).
Complete Example
Here's a production-ready configuration for a typical web application:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 3
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
containers:
- name: web-app
image: web-app:v1
ports:
- containerPort: 8080
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30
periodSeconds: 10
# Allows up to 5 min to start
livenessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 15
failureThreshold: 3
timeoutSeconds: 5
# Restarts after 45s of failures
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 5
failureThreshold: 3
timeoutSeconds: 3
# Removes from traffic after 15s of failuresNotice the three different endpoints. This is intentional:
/health— basic "am I booted?" check (startup)/healthz— "am I alive and not deadlocked?" check (liveness)/ready— "can I handle requests?" which might check downstream dependencies (readiness)
Common Mistakes
Mistake 1: Using the Same Endpoint for All Probes
If your /health endpoint checks the database, and you use it for liveness, a database outage will restart all your pods. Use separate endpoints with different logic.
Mistake 2: No Startup Probe on Slow Apps
Without a startup probe, you'll set initialDelaySeconds: 120 on your liveness probe. During those 120 seconds, if your app crashes, Kubernetes won't know. A startup probe is more precise.
Mistake 3: Liveness Probe Too Aggressive
# Don't do this
livenessProbe:
periodSeconds: 1
failureThreshold: 1
timeoutSeconds: 1One slow response and your container gets killed. Use reasonable thresholds — a liveness failure should mean the container is truly broken, not just momentarily slow.
Mistake 4: Missing Readiness Probe
Without a readiness probe, Kubernetes sends traffic to your pod as soon as the container starts. If your app takes 10 seconds to initialize, users will see errors during that window. Always add a readiness probe.
Mistake 5: timeoutSeconds Too Low
The default timeoutSeconds is 1 second. If your health endpoint queries a database or external service, 1 second might not be enough under load. A timeout failure counts as a probe failure.
Troubleshooting Probe Issues
Pod Keeps Restarting (CrashLoopBackOff)
If Kubernetes keeps killing your container during startup, the liveness probe is probably firing before the app is ready.
Diagnose:
kubectl describe pod <pod-name>Look for events like:
Warning Unhealthy kubelet Liveness probe failed: connection refused
Normal Killing kubelet Container failed liveness probe, will be restartedFix: Add a startup probe or increase initialDelaySeconds on the liveness probe.
Pod Running but Not Receiving Traffic
If your pod is Running but shows 0/1 READY, the readiness probe is failing.
Diagnose:
# Check readiness status
kubectl get pods
# Check probe failures
kubectl describe pod <pod-name>
# Test the endpoint from inside the pod
kubectl exec <pod-name> -- curl -s localhost:8080/readyFix: Check what the readiness endpoint returns. Common issues: the app is waiting for a dependency, the endpoint path is wrong, or the port doesn't match.
Intermittent Restarts Under Load
If pods restart during traffic spikes, the liveness probe might be timing out when the app is busy.
Fix: Increase timeoutSeconds and failureThreshold on the liveness probe. Consider separating the liveness endpoint from any heavy logic — it should be as lightweight as possible.
Practice This Scenario
Theory is useful, but nothing beats hands-on troubleshooting. This Kubeasy challenge drops you into a broken cluster and asks you to fix it:
A notification service keeps getting killed mid-startup, even though the app itself is fine. You'll investigate why Kubernetes is restarting a healthy container and fix the probe configuration. (~15 min, medium difficulty)
Prevention Tips
- Always implement a readiness probe — This is the bare minimum. Without it, users will hit your app before it's ready
- Use startup probes for slow apps — They're more precise than inflating
initialDelaySeconds - Keep liveness checks lightweight — A simple "am I alive" check, not a full dependency audit
- Never check external dependencies in liveness — A database outage shouldn't cascade-restart all your pods
- Test probe behavior locally — Use
kubectl port-forwardandcurlto verify your health endpoints return what you expect - Monitor probe failures — Prometheus can scrape
kube_pod_container_status_restarts_totalto catch flapping probes early

