Caching Strategies Every Backend Developer Should Know

Caching is one of the highest-leverage tools a backend engineer has: it can cut latency from hundreds of milliseconds to single digits and shield a database from traffic it could never survive. But a cache is a second copy of your data, and a second copy is a second source of truth that can drift, go stale, or fall over in surprising ways. This article walks through the patterns and failure modes worth understanding before you reach for one.

The four core patterns

Caching patterns differ mainly in who talks to the cache and the database, and when writes propagate. Knowing the difference is what separates a cache that helps from one that quietly serves wrong data.

Cache-aside (lazy loading)

The application owns the logic. On a read, it checks the cache first; on a miss, it loads from the database, populates the cache, and returns. This is the most common pattern because it is simple and the cache only ever holds data that was actually requested.

def get_user(user_id):
    key = f"user:{user_id}"
    cached = cache.get(key)
    if cached is not None:
        return cached
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)
    cache.set(key, user, ttl=300)   # populate with a TTL
    return user

The tradeoff: the cache and database are only loosely coupled, so on a write you must explicitly invalidate or update the key, or readers will see stale data until the TTL expires. The first request after a miss also pays the full database cost.

Read-through

Functionally similar to cache-aside, but the cache library or a provider sits inline and loads from the backing store itself on a miss. Your code just calls cache.get(key) and the loader runs transparently. This centralizes the load logic (one place to get the TTL and serialization right) at the cost of needing a cache that supports it.

Write-through

On every write, the application writes to the cache and the database synchronously, as one logical operation. The cache is always consistent with the database for keys it holds, so reads never see stale data for cached keys. The cost is added write latency, and you cache data that may never be read.

def update_user(user_id, data):
    db.update("users", user_id, data)        # write store first
    cache.set(f"user:{user_id}", data, ttl=300)  # then cache

Write-back (write-behind)

The application writes only to the cache and returns immediately; the cache flushes to the database asynchronously, often batching writes. This gives the lowest write latency and absorbs bursts well, which is why it suits high-write workloads like counters and metrics. The danger is durability: if the cache node dies before a flush, those writes are gone. Use it only where you can tolerate or recover lost writes, or where the cache itself is durable.

TTL and eviction

A cache has finite memory, so it must decide what to keep. Two mechanisms do this: TTL (time-to-live) expires entries after a fixed duration regardless of access, and eviction removes entries when memory fills up.

TTL is also your safety net for stale data: even if an invalidation is missed, a short TTL bounds how long the wrong value can live. Pick TTLs by how tolerant the data is of staleness — seconds for a fast-changing feed, hours for a rarely-changing config blob.

Cache invalidation

Knowing when a cached value is no longer valid is famously hard. The practical approaches:

Prefer deleting a key over updating it on writes. Deletion lets the next read repopulate from the source of truth, which avoids a class of races where two concurrent writers leave the cache holding a value that matches neither final database state.

The thundering herd / cache stampede

When a popular key expires, every concurrent request misses at once and they all hit the database simultaneously to recompute the same value. Under load this spike can overwhelm the database and cascade into an outage — precisely when you needed the cache most. Several mitigations, often combined:

# Single-flight on a hot key
def get_or_compute(key):
    value = cache.get(key)
    if value is not None:
        return value
    if cache.set_nx(f"lock:{key}", ttl=10):   # only one winner
        value = expensive_compute()
        cache.set(key, value, ttl=300)
        cache.delete(f"lock:{key}")
        return value
    sleep(0.05)                # let the winner populate
    return cache.get(key) or expensive_compute()

A related failure is the cache miss storm for keys that do not exist at all (cache penetration): repeated lookups for missing IDs bypass the cache every time. Cache a short-lived negative result (a "not found" marker), or front the cache with a Bloom filter to reject impossible keys cheaply.

Cache layers

Caching is rarely a single tier. A request can be served — and should be served as early as possible — from several layers, each cheaper and faster than the next:

  1. Client / browser — HTTP caching via Cache-Control, ETag, and conditional requests. The cheapest hit is the one that never leaves the device.
  2. CDN / edge — caches static assets and cacheable responses geographically close to users. Great for read-heavy, shareable content; controlled with the same HTTP headers plus per-CDN rules.
  3. Application cache — an in-process cache (a local map) or a shared store like Redis or Memcached. In-process is fastest but each node has its own copy and its own consistency problem; a shared store trades a network hop for a single coherent view.
  4. Database cache — query/result caches and the database's own buffer pool that keeps hot pages in memory. Largely automatic, but worth understanding so you do not duplicate effort.

The general rule: cache as close to the user as the data's freshness requirements allow. Highly dynamic, per-user data stays in the application tier; static, shared data belongs at the edge.

Consistency tradeoffs

Every cache is a deliberate trade of consistency for speed. Be explicit about which you are choosing:

The hardest problems are concurrent reads and writes racing through the cache, and multi-node in-process caches that disagree. Mitigate with key-based versioning, short TTLs as a backstop, deleting rather than updating on writes, and a shared cache when nodes must agree. Decide what staleness window your product can tolerate before you design the cache, not after a bug report.

Practical takeaway

Start with cache-aside, a sensible LRU eviction policy, and short jittered TTLs — it covers the majority of real workloads with the least complexity. Invalidate by deleting keys on write, add single-flight or stale-while-revalidate the moment a key gets hot, and only reach for write-through or write-back when measurements (not guesses) tell you the default is not enough. Above all, name the staleness window you are willing to accept; a cache without that decision made on purpose is just a bug waiting for traffic.

cachingbackendperformancedistributed-systemsredis
← All articles