Production Deployment (20 minutes)⚓
Configure your load tests for production-grade reliability, observability, and performance.
What you'll learn⚓
- How to size resources for master and worker pods
- How to isolate load test workloads on dedicated nodes
- How to export metrics to OpenTelemetry
- How to scale workers for high user counts
- How to monitor test health via status conditions
Prerequisites⚓
- Completed the CI/CD Integration tutorial
- Understanding of Kubernetes resource management
- OpenTelemetry Collector deployed (optional, for Step 4)
The scenario⚓
You're running a 1000-user production load test against your staging environment. The test needs:
- Dedicated nodes (no interference with production workloads)
- Resource limits (predictable cluster usage)
- Metrics export to your observability stack
- 30-minute sustained test with monitoring
Step 1: Enhanced test script⚓
Building on the ecommerce_test.py from Tutorial 1, here's a production-ready version with realistic complexity:
# production_test.py
from locust import HttpUser, task, between
import logging
import uuid
class ProductionShopperUser(HttpUser):
"""Production-grade e-commerce load test with authentication and realistic behavior."""
wait_time = between(1, 3) # Realistic user pacing: 1-3 seconds between actions
def on_start(self):
"""Called once when user starts - simulate authentication."""
self.user_id = str(uuid.uuid4())
# Authenticate with the API
response = self.client.post("/api/v1/auth/login", json={
"username": "loadtest@example.com",
"password": "test-password"
}, name="Login")
if response.status_code == 200:
# Store auth token for subsequent requests
self.auth_token = response.json().get("token")
self.client.headers.update({"Authorization": f"Bearer {self.auth_token}"})
logging.info("User authenticated successfully")
else:
logging.error(f"Authentication failed: {response.status_code}")
@task(5) # 50% of requests - most common action
def browse_products(self):
"""Browse product catalog - main landing page."""
self.client.get("/api/v1/products", name="Browse Products")
@task(2) # 20% of requests
def search_products(self):
"""Search for specific products."""
search_terms = ["laptop", "phone", "tablet", "monitor"]
term = search_terms[hash(str(self.user_id)) % len(search_terms)]
self.client.get(f"/api/v1/products?q={term}", name="Search Products")
@task(2) # 20% of requests
def view_product_detail(self):
"""View specific product details."""
# Simulate viewing different products
product_id = 1000 + (hash(str(self.user_id)) % 100)
self.client.get(f"/api/v1/products/{product_id}", name="View Product Detail")
@task(1) # 10% of requests
def add_to_cart(self):
"""Add product to shopping cart."""
product_id = 1000 + (hash(str(self.user_id)) % 100)
self.client.post("/api/v1/cart", json={
"product_id": product_id,
"quantity": 1
}, name="Add to Cart")
def on_stop(self):
"""Called when user stops - cleanup if needed."""
logging.info("User session ended")
Key enhancements:
on_starthook — Authenticates once per user (realistic behavior)- Weighted tasks — 5:2:2:1 ratio matches real user behavior (browsing > searching > viewing > purchasing)
- Dynamic data — Product IDs vary per user (prevents cache hits)
- Wait time —
between(1, 3)simulates realistic user pacing - Logging — Helps debug test issues in production
Step 2: Size resources appropriately⚓
Why resource sizing matters⚓
Load tests need consistent performance to generate reliable results. Without resource limits:
- Worker pods compete for CPU with other workloads → inconsistent request rates
- Memory exhaustion can crash pods mid-test → incomplete results
- Cluster instability affects production services
Sizing guidelines⚓
Master pod (coordinator): - Memory: 512Mi request, 1Gi limit — handles test coordination and statistics - CPU: 500m request, 1000m limit — moderate processing needs
Worker pods (load generators): - Memory: 256Mi request, 512Mi limit — per-worker estimate: ~50 users - CPU: 250m request, no limit — maximizes request generation throughput - Replica count: Total users ÷ 50 = worker count (e.g., 1000 users = 20 workers)
Resource configuration example⚓
apiVersion: locust.io/v2
kind: LocustTest
metadata:
name: resource-sized-test
spec:
image: locustio/locust:2.43.3
testFiles:
configMapRef: production-test
master:
command: |
--locustfile /lotest/src/production_test.py
--host https://api.staging.example.com
--users 1000
--spawn-rate 50
--run-time 30m
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
worker:
command: "--locustfile /lotest/src/production_test.py"
replicas: 20 # 1000 users ÷ 50 users/worker = 20 workers
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
# CPU limit intentionally omitted for maximum performance
Why omit CPU limit on workers? CPU limits can throttle request generation, reducing test accuracy. Workers with only CPU requests get maximum available CPU while still being schedulable.
Step 3: Isolate on dedicated nodes⚓
Why dedicated nodes prevent interference⚓
Running load tests on shared nodes can:
- Throttle production workloads — high CPU usage from workers affects critical services
- Skew test results — resource contention from other pods creates inconsistent performance
- Violate policies — some clusters prohibit non-production workloads on production nodes
Label nodes for load testing⚓
# Identify nodes for load testing (e.g., separate node pool)
kubectl get nodes
# Label dedicated node(s)
kubectl label nodes worker-node-1 workload-type=load-testing
kubectl label nodes worker-node-2 workload-type=load-testing
kubectl label nodes worker-node-3 workload-type=load-testing
Configure node affinity⚓
Add node affinity to ensure pods only run on labeled nodes:
apiVersion: locust.io/v2
kind: LocustTest
metadata:
name: isolated-test
spec:
image: locustio/locust:2.43.3
testFiles:
configMapRef: production-test
master:
command: |
--locustfile /lotest/src/production_test.py
--host https://api.staging.example.com
--users 1000
--spawn-rate 50
--run-time 30m
worker:
command: "--locustfile /lotest/src/production_test.py"
replicas: 20
scheduling:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: workload-type
operator: In
values:
- load-testing # Only schedule on labeled nodes
Add tolerations for tainted nodes⚓
If your dedicated nodes have taints (prevents accidental scheduling), add tolerations:
spec:
scheduling:
affinity:
# ... (node affinity from above)
tolerations:
- key: "workload-type"
operator: "Equal"
value: "load-testing"
effect: "NoSchedule"
Verification:
# Check where master and worker pods are scheduled
kubectl get pods -l performance-test-name=isolated-test -o wide
# You should see NODE column showing only your labeled nodes
Step 4: Enable OpenTelemetry⚓
Why native OpenTelemetry beats sidecars⚓
The v2 operator includes native OpenTelemetry support, eliminating the need for sidecar containers:
- Lower overhead — no extra containers per pod
- Simpler configuration — environment variables injected automatically
- Better performance — direct export from Locust to collector
Configure OpenTelemetry export⚓
apiVersion: locust.io/v2
kind: LocustTest
metadata:
name: otel-enabled-test
spec:
image: locustio/locust:2.43.3
testFiles:
configMapRef: production-test
master:
command: |
--locustfile /lotest/src/production_test.py
--host https://api.staging.example.com
--users 1000
--spawn-rate 50
--run-time 30m
worker:
command: "--locustfile /lotest/src/production_test.py"
replicas: 20
observability:
openTelemetry:
enabled: true
endpoint: "otel-collector.monitoring:4317" # Your OTel Collector endpoint
protocol: "grpc" # or "http/protobuf"
# TLS is the default; set insecure: true only for development without TLS
extraEnvVars:
OTEL_SERVICE_NAME: "production-load-test"
OTEL_RESOURCE_ATTRIBUTES: "environment=staging,team=platform,test.type=load"
Configuration details:
endpoint— OpenTelemetry Collector gRPC endpoint (format:host:port)protocol—grpc(default) orhttp/protobufinsecure— TLS is the default; settrueonly for development without TLSextraEnvVars— Custom attributes for trace/metric filtering
Verify OpenTelemetry injection⚓
# Check environment variables in master pod
kubectl get pod -l performance-test-pod-name=otel-enabled-test-master \
-o yaml | grep OTEL_
# Expected output:
# OTEL_TRACES_EXPORTER: otlp
# OTEL_METRICS_EXPORTER: otlp
# OTEL_EXPORTER_OTLP_ENDPOINT: otel-collector.monitoring:4317
# OTEL_EXPORTER_OTLP_PROTOCOL: grpc
# OTEL_SERVICE_NAME: production-load-test
Step 5: Deploy the complete production test⚓
Combining all previous steps, here's the full production-ready LocustTest CR:
apiVersion: locust.io/v2
kind: LocustTest
metadata:
name: production-load-test
namespace: load-testing
spec:
image: locustio/locust:2.43.3
testFiles:
configMapRef: production-test
master:
command: |
--locustfile /lotest/src/production_test.py
--host https://api.staging.example.com
--users 1000
--spawn-rate 50
--run-time 30m
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
worker:
command: "--locustfile /lotest/src/production_test.py"
replicas: 20 # 1000 users ÷ 50 users per worker
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
# CPU limit omitted for maximum worker performance
scheduling:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: workload-type
operator: In
values:
- load-testing
tolerations:
- key: "workload-type"
operator: "Equal"
value: "load-testing"
effect: "NoSchedule"
observability:
openTelemetry:
enabled: true
endpoint: "otel-collector.monitoring:4317"
protocol: "grpc"
extraEnvVars:
OTEL_SERVICE_NAME: "production-load-test"
OTEL_RESOURCE_ATTRIBUTES: "environment=staging,team=platform"
Deploy the test⚓
# Create ConfigMap from enhanced test script
kubectl create configmap production-test \
--from-file=production_test.py \
--namespace load-testing
# Apply the LocustTest CR
kubectl apply -f production-load-test.yaml
Step 6: Monitor and verify⚓
Watch test progression⚓
# Monitor test status (watch mode)
kubectl get locusttest production-load-test -n load-testing -w
# Expected progression:
# NAME PHASE WORKERS CONNECTED AGE
# production-load-test Pending 20 0 5s
# production-load-test Running 20 20 45s
# production-load-test Succeeded 20 20 31m
Check status conditions⚓
# View detailed status conditions
kubectl get locusttest production-load-test -n load-testing \
-o jsonpath='{.status.conditions[*]}' | jq
# Expected conditions:
# {
# "type": "PodsHealthy",
# "status": "True",
# "reason": "PodsHealthy",
# "message": "All pods are healthy"
# }
Verify worker health⚓
# Check all worker pods are running
kubectl get pods -l performance-test-pod-name=production-load-test-worker \
-n load-testing
# Expected: 20 pods in Running state
Verify OpenTelemetry traces⚓
If OpenTelemetry is configured, check your observability backend:
Prometheus (metrics):
# Query Locust request metrics (illustrative)
locust_requests_total{service_name="production-load-test"}
# Query response time metrics (illustrative)
locust_request_duration_seconds{service_name="production-load-test"}
Metric names are illustrative
Actual metric names depend on your OpenTelemetry/Prometheus setup and exporter configuration. Check your OTel Collector and Prometheus documentation for the exact names available in your environment.
Jaeger/Tempo (traces):
Filter by service.name=production-load-test to see:
- Individual request spans
- Request duration distribution
- Error traces
Access real-time Locust UI⚓
# Port-forward to master pod
kubectl port-forward -n load-testing \
job/production-load-test-master 8089:8089
# Open http://localhost:8089 in browser
The UI shows:
- Live request statistics (RPS, response times, failures)
- Charts showing performance trends over time
- Worker connection status
- Test phase and remaining duration
What you learned⚓
✓ How to size master and worker resources for production workloads ✓ How to isolate load tests on dedicated nodes using affinity and tolerations ✓ How to export traces and metrics to OpenTelemetry collectors ✓ How to scale worker replicas based on simulated user count ✓ How to monitor test health through status conditions ✓ How to deploy complete production-grade load tests
Next steps⚓
- Configure resources — Deep dive into resource optimization
- Configure OpenTelemetry — Advanced observability setup
- Use node affinity — More scheduling strategies
- API Reference — Explore all configuration options
- Security best practices — Secure your load testing infrastructure