Your Kubernetes pod just got killed with an OOMKilled status. The container keeps restarting, entering a CrashLoopBackOff state. Sound familiar? This is one of the most common issues in Kubernetes, and fortunately, it's straightforward to diagnose and fix.

Quick Answer

Your pod is using more memory than its configured limit. To fix it:

bash

kubectl describe pod <pod-name> | grep -A 5 "Last State"

If you see Reason: OOMKilled, increase the memory limit in your deployment:

yaml

resources:
  limits:
    memory: "512Mi"  # Increase this value
  requests:
    memory: "256Mi"

What Causes OOMKilled?

Kubernetes kills pods with OOMKilled when a container exceeds its memory limit. This is the kernel's Out-Of-Memory (OOM) killer in action, protecting the node from running out of memory.

Common Causes

Cause	Description
Memory limit too low	The limit doesn't match actual application needs
Memory leak	Application continuously allocates memory without releasing it
Traffic spike	Sudden increase in load causes higher memory usage
JVM heap misconfiguration	Java apps with heap size exceeding container limit
Caching without bounds	In-memory caches growing unbounded

Step-by-Step Troubleshooting

Step 1: Confirm the OOMKilled Status

First, check if your pod was actually OOMKilled:

bash

kubectl describe pod <pod-name>

Look for this in the output:

javascript

Last State:     Terminated
  Reason:       OOMKilled
  Exit Code:    137

Exit code 137 = 128 + 9 (SIGKILL). This confirms the kernel killed your container.

Step 2: Check Current Resource Configuration

See what limits are currently set:

bash

kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].resources}' | jq

Or for a deployment:

bash

kubectl get deployment <deployment-name> -o jsonpath='{.spec.template.spec.containers[0].resources}' | jq

Step 3: Analyze Actual Memory Usage

If you have metrics-server installed, check real usage:

bash

kubectl top pod <pod-name>

Compare the MEMORY column to your configured limit. If it's close to or at the limit, you've found the problem.

Step 4: Check Events for Patterns

Look at cluster events to see if this is a recurring issue:

bash

kubectl get events --field-selector reason=OOMKilling --sort-by='.lastTimestamp'

Common Solutions

Solution A: Increase Memory Limit

The most common fix is simply giving your container more memory:

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    spec:
      containers:
      - name: my-app
        resources:
          requests:
            memory: "256Mi"
            cpu: "100m"
          limits:
            memory: "512Mi"  # Increased from 256Mi
            cpu: "500m"

Apply the change:

bash

kubectl apply -f deployment.yaml

Pro tip: Start with 2x your average memory usage as the limit, then adjust based on monitoring.

Solution B: Fix Memory Leaks

If memory usage grows continuously until OOMKilled, you likely have a memory leak. Steps to debug:

Profile the application locally with tools like pprof (Go), VisualVM (Java), or memory_profiler (Python)
Check for common patterns:

Solution C: Configure JVM Heap (Java Apps)

For Java applications, ensure the JVM heap fits within the container limit:

yaml

env:
- name: JAVA_OPTS
  value: "-Xmx384m -Xms256m"
resources:
  limits:
    memory: "512Mi"  # Must be > Xmx + ~100Mi for non-heap

Rule of thumb: Container limit = Xmx + 128Mi (for metaspace, threads, native memory)

Solution D: Add Resource Requests

If you only have limits, add requests too:

yaml

resources:
  requests:
    memory: "256Mi"  # Guaranteed minimum
  limits:
    memory: "512Mi"  # Maximum allowed

Why this matters:

Requests = guaranteed resources for scheduling
Limits = maximum the container can use before being killed

Understanding Requests vs Limits

Aspect	Requests	Limits
Purpose	Scheduling guarantee	Maximum cap
What happens if exceeded	Nothing (can use more if available)	Container killed (OOMKilled)
Best practice	Set to average usage	Set to peak usage + buffer

QoS Classes

How you set requests and limits determines your pod's Quality of Service class:

QoS Class	Configuration	Eviction Priority
Guaranteed	requests = limits (for all containers)	Last to be evicted
Burstable	requests < limits	Medium priority
BestEffort	No requests or limits	First to be evicted

For production workloads, aim for Guaranteed or Burstable with reasonable limits.

Practice This Scenario

easyResources & Scaling

Pod Evicted

A data processing pod keeps crashing and getting evicted. It was working fine yesterday, but now Kubernetes keeps killing it.

15 min

Start Challenge

In this hands-on challenge, you'll:

Investigate why a pod keeps crashing
Understand the difference between requests and limits
Fix the configuration to achieve stable operation

Prevention Tips

Always set resource limits - Never run production workloads without limits
Monitor memory trends - Use Prometheus + Grafana to catch issues before OOM
Set up alerts - Alert on containers approaching 80% of their memory limit
Load test - Profile memory usage under realistic load before deploying
Use Vertical Pod Autoscaler - Let VPA recommend appropriate resource values

Example VPA Configuration

yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"

Summary

OOMKilled errors happen when your container exceeds its memory limit. The fix is usually straightforward:

Diagnose with kubectl describe pod
Measure actual usage with kubectl top pod
Adjust the memory limit to match real needs
Monitor to prevent future occurrences

Remember: it's better to set slightly higher limits than to have your application constantly restarting. But don't go overboard—wasted resources mean higher costs and less efficient cluster utilization.

Quick Answer

Your pod is using more memory than its configured limit. To fix it:

bash

kubectl describe pod <pod-name> | grep -A 5 "Last State"

If you see Reason: OOMKilled, increase the memory limit in your deployment:

yaml

resources:
  limits:
    memory: "512Mi"  # Increase this value
  requests:
    memory: "256Mi"

What Causes OOMKilled?

Kubernetes kills pods with OOMKilled when a container exceeds its memory limit. This is the kernel's Out-Of-Memory (OOM) killer in action, protecting the node from running out of memory.

Common Causes

Cause	Description
Memory limit too low	The limit doesn't match actual application needs
Memory leak	Application continuously allocates memory without releasing it
Traffic spike	Sudden increase in load causes higher memory usage
JVM heap misconfiguration	Java apps with heap size exceeding container limit
Caching without bounds	In-memory caches growing unbounded

Step-by-Step Troubleshooting

Step 1: Confirm the OOMKilled Status

First, check if your pod was actually OOMKilled:

bash

kubectl describe pod <pod-name>

Look for this in the output:

javascript

Last State:     Terminated
  Reason:       OOMKilled
  Exit Code:    137

Exit code 137 = 128 + 9 (SIGKILL). This confirms the kernel killed your container.

Step 2: Check Current Resource Configuration

See what limits are currently set:

bash

kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].resources}' | jq

Or for a deployment:

bash

kubectl get deployment <deployment-name> -o jsonpath='{.spec.template.spec.containers[0].resources}' | jq

Step 3: Analyze Actual Memory Usage

If you have metrics-server installed, check real usage:

bash

kubectl top pod <pod-name>

Compare the MEMORY column to your configured limit. If it's close to or at the limit, you've found the problem.

Step 4: Check Events for Patterns

Look at cluster events to see if this is a recurring issue:

bash

kubectl get events --field-selector reason=OOMKilling --sort-by='.lastTimestamp'

Common Solutions

Solution A: Increase Memory Limit

The most common fix is simply giving your container more memory:

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    spec:
      containers:
      - name: my-app
        resources:
          requests:
            memory: "256Mi"
            cpu: "100m"
          limits:
            memory: "512Mi"  # Increased from 256Mi
            cpu: "500m"

Apply the change:

bash

kubectl apply -f deployment.yaml

Pro tip: Start with 2x your average memory usage as the limit, then adjust based on monitoring.

Solution B: Fix Memory Leaks

If memory usage grows continuously until OOMKilled, you likely have a memory leak. Steps to debug:

Profile the application locally with tools like pprof (Go), VisualVM (Java), or memory_profiler (Python)
Check for common patterns:

Solution C: Configure JVM Heap (Java Apps)

For Java applications, ensure the JVM heap fits within the container limit:

yaml

env:
- name: JAVA_OPTS
  value: "-Xmx384m -Xms256m"
resources:
  limits:
    memory: "512Mi"  # Must be > Xmx + ~100Mi for non-heap

Rule of thumb: Container limit = Xmx + 128Mi (for metaspace, threads, native memory)

Solution D: Add Resource Requests

If you only have limits, add requests too:

yaml

resources:
  requests:
    memory: "256Mi"  # Guaranteed minimum
  limits:
    memory: "512Mi"  # Maximum allowed

Why this matters:

Requests = guaranteed resources for scheduling
Limits = maximum the container can use before being killed

Understanding Requests vs Limits

Aspect	Requests	Limits
Purpose	Scheduling guarantee	Maximum cap
What happens if exceeded	Nothing (can use more if available)	Container killed (OOMKilled)
Best practice	Set to average usage	Set to peak usage + buffer

QoS Classes

How you set requests and limits determines your pod's Quality of Service class:

QoS Class	Configuration	Eviction Priority
Guaranteed	requests = limits (for all containers)	Last to be evicted
Burstable	requests < limits	Medium priority
BestEffort	No requests or limits	First to be evicted

For production workloads, aim for Guaranteed or Burstable with reasonable limits.

Practice This Scenario

easyResources & Scaling

Pod Evicted

A data processing pod keeps crashing and getting evicted. It was working fine yesterday, but now Kubernetes keeps killing it.

15 min

Start Challenge

In this hands-on challenge, you'll:

Investigate why a pod keeps crashing
Understand the difference between requests and limits
Fix the configuration to achieve stable operation

Prevention Tips

Always set resource limits - Never run production workloads without limits
Monitor memory trends - Use Prometheus + Grafana to catch issues before OOM
Set up alerts - Alert on containers approaching 80% of their memory limit
Load test - Profile memory usage under realistic load before deploying
Use Vertical Pod Autoscaler - Let VPA recommend appropriate resource values

Example VPA Configuration

yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"

Summary

OOMKilled errors happen when your container exceeds its memory limit. The fix is usually straightforward:

Diagnose with kubectl describe pod
Measure actual usage with kubectl top pod
Adjust the memory limit to match real needs
Monitor to prevent future occurrences

Quick Answer

What Causes OOMKilled?

Common Causes

Step-by-Step Troubleshooting

Step 1: Confirm the OOMKilled Status

Step 2: Check Current Resource Configuration

Step 3: Analyze Actual Memory Usage

Step 4: Check Events for Patterns

Common Solutions

Solution A: Increase Memory Limit

Solution B: Fix Memory Leaks

Solution C: Configure JVM Heap (Java Apps)

Solution D: Add Resource Requests

Understanding Requests vs Limits

QoS Classes

Practice This Scenario

Prevention Tips

Example VPA Configuration

Related Articles

Summary

Written by

Paul Brissaud

Quick Answer

What Causes OOMKilled?

Common Causes

Step-by-Step Troubleshooting

Step 1: Confirm the OOMKilled Status

Step 2: Check Current Resource Configuration

Step 3: Analyze Actual Memory Usage

Step 4: Check Events for Patterns

Common Solutions

Solution A: Increase Memory Limit

Solution B: Fix Memory Leaks

Solution C: Configure JVM Heap (Java Apps)

Solution D: Add Resource Requests

Understanding Requests vs Limits

QoS Classes

Practice This Scenario

Prevention Tips

Example VPA Configuration

Related Articles

Summary

Written by

Paul Brissaud

Related Articles

Debugging CrashLoopBackOff in Kubernetes

Kubeasy vs Killercoda: Which Platform for Learning Kubernetes?

Getting Started with Kubeasy: Your First Challenge

Related Articles

Debugging CrashLoopBackOff in Kubernetes

Kubeasy vs Killercoda: Which Platform for Learning Kubernetes?

Getting Started with Kubeasy: Your First Challenge