Writing Robust Shell Scripts: Idempotency and Error Handling

Shell scripts have a reputation for being fragile: they fail silently, leave half-finished state behind, and break the moment a path contains a space. Most of that fragility is avoidable. This article covers the defensive techniques that turn a throwaway script into something you can run twice, run in CI, and trust in production.

Start with strict mode: set -euo pipefail

By default, Bash plows ahead after errors, treats unset variables as empty strings, and ignores failures in the middle of a pipeline. The conventional first line of a serious script counteracts all three:

#!/usr/bin/env bash
set -euo pipefail

Each flag does something specific:

Strict mode is a sane default, not a silver bullet. -e has surprising edge cases: it does not fire for commands in an if condition, inside &&/|| chains, or when a function's failing command is part of a tested expression. When you genuinely expect a command to fail, handle it explicitly rather than fighting the flag:

# Don't let an expected non-zero exit kill the script
if ! grep -q "ready" status.txt; then
  echo "not ready yet"
fi

# Capture an exit code without tripping errexit
set +e
some_flaky_command
rc=$?
set -e
[[ $rc -eq 0 ]] || echo "command failed with $rc"

With -u enabled, reference variables that may legitimately be unset using a default: "${VAR:-}" expands to empty without error, and "${VAR:?must be set}" aborts with a clear message.

Clean up with trap

Scripts that create temp files, lock files, or background processes need to clean them up even when they fail partway through. A trap on the EXIT pseudo-signal runs no matter how the script ends: normal exit, error under -e, or an unhandled signal.

#!/usr/bin/env bash
set -euo pipefail

workdir="$(mktemp -d)"
cleanup() {
  rm -rf "$workdir"
}
trap cleanup EXIT

# ... do work in "$workdir"; it is removed on any exit

A single EXIT trap is usually enough, because EXIT fires after the default handlers for INT, TERM, and friends have terminated the script. If you need different behaviour for interruption versus normal completion, trap signals separately. Keep cleanup handlers idempotent and defensive: they may run when setup only partially completed, so guard against missing variables and use rm -f rather than plain rm.

Quoting and word-splitting

The single most common source of shell bugs is unquoted expansion. When you write $var without quotes, Bash performs word-splitting on whitespace and then glob expansion on the result. A filename like my report.txt becomes two arguments; a value containing * expands against the current directory.

# Wrong: breaks on spaces, expands globs
cp $src $dst

# Right: each variable is a single, literal argument
cp "$src" "$dst"

Rules of thumb that prevent the majority of quoting bugs:

Making operations idempotent

An idempotent script produces the same end state whether it runs once or ten times. This matters because real scripts get re-run after partial failures, in retry loops, and during convergence-style provisioning. The pattern is always check the desired state, then act only if needed.

Many standard tools have idempotent flags built in. Reach for them before writing your own checks:

# Creates the directory, succeeds silently if it already exists
mkdir -p /opt/app/config

# Create an empty file or update its timestamp; never errors if present
touch /var/run/app.lock

# Symlink that replaces an existing link rather than failing
ln -sfn /opt/app/releases/current /opt/app/live

For appending configuration, the naive echo "line" >> file is not idempotent: run it twice and you get duplicate lines. Guard the append with a check:

line="export PATH=/opt/app/bin:\$PATH"
file="$HOME/.bashrc"

# Append only if an exact-match line is not already present
grep -qxF "$line" "$file" || printf '%s\n' "$line" >> "$file"

The -q silences output, -x requires a whole-line match, and -F treats the pattern as a fixed string so regex metacharacters in the line are not interpreted. For more complex desired-state logic, check before acting:

# Only create the user if it doesn't exist
if ! id -u appuser >/dev/null 2>&1; then
  useradd --system appuser
fi

# Only download if the artifact is missing
[[ -f "$artifact" ]] || curl -fsSL "$url" -o "$artifact"

Exit codes

Exit codes are how scripts communicate success and failure to whatever called them. By convention 0 means success and any non-zero value (1-255) means failure. Always exit non-zero on failure so callers, CI pipelines, and && chains can detect it.

if ! validate_config; then
  echo "config validation failed" >&2
  exit 1
fi

A few details worth knowing:

Pitfalls of pipelines and command substitution

Even with pipefail, pipelines have traps. The most subtle is that each stage of a pipeline runs in a subshell, so variables assigned inside the loop body don't survive:

# BROKEN: count is modified in a subshell and lost
count=0
find . -name '*.log' | while read -r f; do
  count=$((count + 1))
done
echo "$count"   # prints 0

# FIX: avoid the pipe with process substitution
count=0
while read -r f; do
  count=$((count + 1))
done < <(find . -name '*.log')
echo "$count"   # correct

When reading lines, use while IFS= read -r line. The IFS= prevents leading and trailing whitespace from being stripped, and -r stops backslashes from being interpreted as escapes. For filenames specifically, drive the loop from find -print0 with read -d '' to survive newlines in names.

Command substitution has its own gotchas. $(...) strips all trailing newlines, which is usually convenient but bites you when trailing whitespace is significant. More importantly, an unquoted command substitution is word-split and glob-expanded like any other expansion:

# Unquoted: output is split on whitespace and globbed
files=$(ls)             # fragile, and don't parse ls anyway

# Quote it to preserve the result as a single value
config="$(cat "$file")"

Also remember that under set -e, a failing command inside a command substitution used in an assignment may not abort the script, because the assignment itself succeeds. If the inner command matters, run it on its own line and check, or rely on pipefail plus an explicit test.

A consolidated template

#!/usr/bin/env bash
set -euo pipefail

workdir="$(mktemp -d)"
trap 'rm -rf "$workdir"' EXIT

main() {
  local target="${1:?usage: deploy.sh <target>}"
  mkdir -p "$workdir/build"
  # ... idempotent, well-quoted work here ...
  echo "deployed to $target"
}

main "$@"

Takeaway

Robust shell scripting comes down to a short checklist: start with set -euo pipefail and understand its edge cases, register a trap ... EXIT for cleanup, quote every expansion, make each operation check-before-act or use idempotent flags like mkdir -p and grep -qxF, return meaningful exit codes, and watch for subshell and word-splitting surprises in pipelines and command substitution. Run ShellCheck on everything. None of these techniques is advanced, but applied consistently they are the difference between a script that works once on your machine and one you can rerun anywhere without fear.

bashshell-scriptingdevopsautomationerror-handling
← All articles