Creating Labs

Common Pitfalls

The 7 most common lab authoring mistakes discovered from validating 230 POV Demo labs.

Introduction

Note: These 7 pitfalls were discovered while validating 230 labs. Most cause silent failures or broken lab state that learners encounter but cannot diagnose.

Avoiding these mistakes will save hours of debugging and prevent learners from hitting unexplained failures mid-lab.

1. Missing set -e in Solve Scripts

The most common lab failure. Without set -e, a solve script continues running after a failed command, leaving the environment in a broken state that appears solved but isn't.

Avoid

#!/bin/bash -l
kubectl apply -f solution.yaml    # if this fails, script continues
kubectl wait --for=condition=ready pod/my-app --timeout=60s

Correct

#!/bin/bash -l
set -e                            # must be line 2
kubectl apply -f solution.yaml
kubectl wait --for=condition=ready pod/my-app --timeout=60s

2. Bash History with set -o history

Using set -o history in scripts (instead of history -s) causes HISTSIZE overflow when bash history is flushed. This has broken KCSA and OpenTelemetry labs.

Avoid

set -o history
HISTFILE=~/.bash_history
history -w

Correct

history -s "kubectl apply -f solution.yaml"
ensure_history_flushed

3. Non-Idempotent Commands

Setup and solve scripts may run more than once. Commands that fail on second run will break lab state.

Avoid (these fail if run twice)

kubectl run my-pod --image=nginx          # fails: pod already exists
helm install my-release ./chart           # fails: release already exists
kubectl create namespace my-ns            # fails: namespace already exists
echo "line" >> /etc/config               # appends duplicate lines

Correct (idempotent alternatives)

kubectl run my-pod --image=nginx --dry-run=client -o yaml | kubectl apply -f -
helm upgrade --install my-release ./chart
kubectl create namespace my-ns --dry-run=client -o yaml | kubectl apply -f -
grep -q "line" /etc/config || echo "line" >> /etc/config

4. Race Conditions in Kubernetes Labs

Kubernetes resources take time to become ready. Running kubectl wait immediately after helm install fails because the pods haven't started yet.

Avoid

helm install my-release ./chart
kubectl wait --for=condition=ready pod -l app=my-app --timeout=120s  # often fails

Correct

helm install my-release ./chart
sleep 10  # allow pods to start
kubectl wait --for=condition=ready pod -l app=my-app --timeout=120s

5. Binary Copy “Text File Busy” Error

Copying a binary over a running executable fails with “Text file busy”. Stop the service before copying.

Avoid

cp /tmp/new-binary /usr/local/bin/myapp  # fails if myapp is running

Correct

systemctl stop myapp
cp /tmp/new-binary /usr/local/bin/myapp
systemctl start myapp

6. Glob Patterns Matching Test Files

Glob patterns like rules/*.yml may match test fixture files (e.g. test_alerts.yml) that were not intended to be included. Always verify what a glob matches.

Avoid

kubectl apply -f rules/*.yml   # may apply test_alerts.yml unintentionally

Correct

# List what the glob matches before applying
ls rules/*.yml
kubectl apply -f rules/alerts.yml rules/recording.yml  # explicit files

7. Commands Returning Non-Zero Under set -e

Some commands return non-zero exit codes even on success. Under set -e, these silently abort the script.

Avoid

set -e
kubectl auth can-i create pods    # returns 1 if "no", aborting the script
history -s "kubectl apply"        # may return non-zero in some environments

Correct

set -e
kubectl auth can-i create pods || true   # ignore exit code
history -s "kubectl apply" || true       # safe history addition