Creating Labs
Common Pitfalls
The 7 most common lab authoring mistakes discovered from validating 230 POV Demo labs.
Introduction
Avoiding these mistakes will save hours of debugging and prevent learners from hitting unexplained failures mid-lab.
1. Missing set -e in Solve Scripts
The most common lab failure. Without set -e, a solve script continues running after a failed command, leaving the environment in a broken state that appears solved but isn't.
Avoid
#!/bin/bash -l
kubectl apply -f solution.yaml # if this fails, script continues
kubectl wait --for=condition=ready pod/my-app --timeout=60sCorrect
#!/bin/bash -l
set -e # must be line 2
kubectl apply -f solution.yaml
kubectl wait --for=condition=ready pod/my-app --timeout=60s2. Bash History with set -o history
Using set -o history in scripts (instead of history -s) causes HISTSIZE overflow when bash history is flushed. This has broken KCSA and OpenTelemetry labs.
Avoid
set -o history
HISTFILE=~/.bash_history
history -wCorrect
history -s "kubectl apply -f solution.yaml"
ensure_history_flushed3. Non-Idempotent Commands
Setup and solve scripts may run more than once. Commands that fail on second run will break lab state.
Avoid (these fail if run twice)
kubectl run my-pod --image=nginx # fails: pod already exists
helm install my-release ./chart # fails: release already exists
kubectl create namespace my-ns # fails: namespace already exists
echo "line" >> /etc/config # appends duplicate linesCorrect (idempotent alternatives)
kubectl run my-pod --image=nginx --dry-run=client -o yaml | kubectl apply -f -
helm upgrade --install my-release ./chart
kubectl create namespace my-ns --dry-run=client -o yaml | kubectl apply -f -
grep -q "line" /etc/config || echo "line" >> /etc/config4. Race Conditions in Kubernetes Labs
Kubernetes resources take time to become ready. Running kubectl wait immediately after helm install fails because the pods haven't started yet.
Avoid
helm install my-release ./chart
kubectl wait --for=condition=ready pod -l app=my-app --timeout=120s # often failsCorrect
helm install my-release ./chart
sleep 10 # allow pods to start
kubectl wait --for=condition=ready pod -l app=my-app --timeout=120s5. Binary Copy “Text File Busy” Error
Copying a binary over a running executable fails with “Text file busy”. Stop the service before copying.
Avoid
cp /tmp/new-binary /usr/local/bin/myapp # fails if myapp is runningCorrect
systemctl stop myapp
cp /tmp/new-binary /usr/local/bin/myapp
systemctl start myapp6. Glob Patterns Matching Test Files
Glob patterns like rules/*.yml may match test fixture files (e.g. test_alerts.yml) that were not intended to be included. Always verify what a glob matches.
Avoid
kubectl apply -f rules/*.yml # may apply test_alerts.yml unintentionallyCorrect
# List what the glob matches before applying
ls rules/*.yml
kubectl apply -f rules/alerts.yml rules/recording.yml # explicit files7. Commands Returning Non-Zero Under set -e
Some commands return non-zero exit codes even on success. Under set -e, these silently abort the script.
Avoid
set -e
kubectl auth can-i create pods # returns 1 if "no", aborting the script
history -s "kubectl apply" # may return non-zero in some environmentsCorrect
set -e
kubectl auth can-i create pods || true # ignore exit code
history -s "kubectl apply" || true # safe history addition