Caching Strategies Every Backend Developer Should Know
Caching is one of the highest-leverage tools a backend engineer has: it can cut latency from hundreds of milliseconds to single digits and shield a database from traffic it could never survive. But a cache is a second copy of your data, and a second copy is a second source of truth that can drift, go stale, or fall over in surprising ways. This article walks through the patterns and failure modes worth understanding before you reach for one.
The four core patterns
Caching patterns differ mainly in who talks to the cache and the database, and when writes propagate. Knowing the difference is what separates a cache that helps from one that quietly serves wrong data.
Cache-aside (lazy loading)
The application owns the logic. On a read, it checks the cache first; on a miss, it loads from the database, populates the cache, and returns. This is the most common pattern because it is simple and the cache only ever holds data that was actually requested.
def get_user(user_id):
key = f"user:{user_id}"
cached = cache.get(key)
if cached is not None:
return cached
user = db.query("SELECT * FROM users WHERE id = ?", user_id)
cache.set(key, user, ttl=300) # populate with a TTL
return user
The tradeoff: the cache and database are only loosely coupled, so on a write you must explicitly invalidate or update the key, or readers will see stale data until the TTL expires. The first request after a miss also pays the full database cost.
Read-through
Functionally similar to cache-aside, but the cache library or a provider sits inline and loads from the backing store itself on a miss. Your code just calls cache.get(key) and the loader runs transparently. This centralizes the load logic (one place to get the TTL and serialization right) at the cost of needing a cache that supports it.
Write-through
On every write, the application writes to the cache and the database synchronously, as one logical operation. The cache is always consistent with the database for keys it holds, so reads never see stale data for cached keys. The cost is added write latency, and you cache data that may never be read.
def update_user(user_id, data):
db.update("users", user_id, data) # write store first
cache.set(f"user:{user_id}", data, ttl=300) # then cache
Write-back (write-behind)
The application writes only to the cache and returns immediately; the cache flushes to the database asynchronously, often batching writes. This gives the lowest write latency and absorbs bursts well, which is why it suits high-write workloads like counters and metrics. The danger is durability: if the cache node dies before a flush, those writes are gone. Use it only where you can tolerate or recover lost writes, or where the cache itself is durable.
TTL and eviction
A cache has finite memory, so it must decide what to keep. Two mechanisms do this: TTL (time-to-live) expires entries after a fixed duration regardless of access, and eviction removes entries when memory fills up.
- LRU (Least Recently Used) evicts the entry untouched for the longest time. It is the sensible default for most workloads because recency usually predicts reuse.
- LFU (Least Frequently Used) evicts the entry with the fewest accesses. It protects genuinely hot keys from being flushed by a one-off scan, but adapts slowly when access patterns shift.
- Random / FIFO / allkeys-vs-volatile variants exist too; Redis, for example, offers
allkeys-lru,volatile-lru,allkeys-lfu, and others, where "volatile" only considers keys that have a TTL set.
TTL is also your safety net for stale data: even if an invalidation is missed, a short TTL bounds how long the wrong value can live. Pick TTLs by how tolerant the data is of staleness — seconds for a fast-changing feed, hours for a rarely-changing config blob.
Cache invalidation
Knowing when a cached value is no longer valid is famously hard. The practical approaches:
- TTL-based expiry — simplest; accept bounded staleness and let entries expire.
- Explicit invalidation — on a write, delete or update the affected keys. Reliable when one service owns the data, fragile when many writers exist.
- Versioned / key-based invalidation — embed a version in the key (
user:42:v7) so a bump instantly makes old entries unreachable and they age out naturally. No delete coordination required.
Prefer deleting a key over updating it on writes. Deletion lets the next read repopulate from the source of truth, which avoids a class of races where two concurrent writers leave the cache holding a value that matches neither final database state.
The thundering herd / cache stampede
When a popular key expires, every concurrent request misses at once and they all hit the database simultaneously to recompute the same value. Under load this spike can overwhelm the database and cascade into an outage — precisely when you needed the cache most. Several mitigations, often combined:
- Locking / request coalescing — the first miss acquires a lock and recomputes; others wait briefly and read the freshly cached value instead of also hitting the database. This is sometimes called single-flight or a "mutex" on the key.
- Early/probabilistic recomputation — refresh a key slightly before it expires, with a probability that rises as expiry approaches, so one request recomputes while the rest still serve the cached value.
- Stale-while-revalidate — serve the expired value to keep latency flat while a single background task refreshes it.
- Jittered TTLs — add randomness to expiry times so a batch of keys populated together does not all expire in the same instant.
# Single-flight on a hot key
def get_or_compute(key):
value = cache.get(key)
if value is not None:
return value
if cache.set_nx(f"lock:{key}", ttl=10): # only one winner
value = expensive_compute()
cache.set(key, value, ttl=300)
cache.delete(f"lock:{key}")
return value
sleep(0.05) # let the winner populate
return cache.get(key) or expensive_compute()
A related failure is the cache miss storm for keys that do not exist at all (cache penetration): repeated lookups for missing IDs bypass the cache every time. Cache a short-lived negative result (a "not found" marker), or front the cache with a Bloom filter to reject impossible keys cheaply.
Cache layers
Caching is rarely a single tier. A request can be served — and should be served as early as possible — from several layers, each cheaper and faster than the next:
- Client / browser — HTTP caching via
Cache-Control,ETag, and conditional requests. The cheapest hit is the one that never leaves the device. - CDN / edge — caches static assets and cacheable responses geographically close to users. Great for read-heavy, shareable content; controlled with the same HTTP headers plus per-CDN rules.
- Application cache — an in-process cache (a local map) or a shared store like Redis or Memcached. In-process is fastest but each node has its own copy and its own consistency problem; a shared store trades a network hop for a single coherent view.
- Database cache — query/result caches and the database's own buffer pool that keeps hot pages in memory. Largely automatic, but worth understanding so you do not duplicate effort.
The general rule: cache as close to the user as the data's freshness requirements allow. Highly dynamic, per-user data stays in the application tier; static, shared data belongs at the edge.
Consistency tradeoffs
Every cache is a deliberate trade of consistency for speed. Be explicit about which you are choosing:
- Strong-ish consistency — write-through plus invalidate-on-write keeps cached keys aligned with the database, at the cost of write latency and complexity.
- Eventual consistency — cache-aside with TTLs accepts a bounded window of staleness in exchange for simplicity and throughput. This is the right call far more often than people admit.
The hardest problems are concurrent reads and writes racing through the cache, and multi-node in-process caches that disagree. Mitigate with key-based versioning, short TTLs as a backstop, deleting rather than updating on writes, and a shared cache when nodes must agree. Decide what staleness window your product can tolerate before you design the cache, not after a bug report.
Practical takeaway
Start with cache-aside, a sensible LRU eviction policy, and short jittered TTLs — it covers the majority of real workloads with the least complexity. Invalidate by deleting keys on write, add single-flight or stale-while-revalidate the moment a key gets hot, and only reach for write-through or write-back when measurements (not guesses) tell you the default is not enough. Above all, name the staleness window you are willing to accept; a cache without that decision made on purpose is just a bug waiting for traffic.
← All articles