What is Caching?
Caching is the practice of keeping a copy of data — or the result of an expensive computation — in fast-to-reach storage, so the next time someone asks for the same thing you hand it over instantly instead of doing the slow work again. It is one of the most effective ways to make software faster, and one of the easiest to get subtly wrong.
The fundamental trade-off
Caching exploits a simple economic fact: some data is requested far more often than it changes. If a product page is viewed ten thousand times an hour but updated once a day, recomputing it on every view is enormous waste. The cache stores the computed answer and serves it cheaply. The catch is that you are now keeping a copy, and copies go stale. Every caching decision is really a trade between speed (serve the copy) and freshness (the copy might be wrong). Get the balance right and the system flies; get it wrong and users see outdated data or you blow away all the benefit.
Caching happens at every layer
Caches are everywhere in a modern stack. The CPU has hardware caches. The browser caches assets so a repeat visit loads instantly. A CDN caches content in data centers near users around the world, cutting network latency. The application caches query results and rendered fragments in memory or in a dedicated store like Redis. The database has its own buffer cache. Each layer shaves time off a different part of the journey, and a fast system usually has several working together rather than relying on one.
Caching patterns
How data gets into the cache matters. Cache-aside (lazy loading) is the most common: the application checks the cache, and on a miss it reads the source, stores the result, and returns it. Read-through puts the cache in front of the source so it loads automatically. Write-through writes to the cache and the source together, keeping them consistent at the cost of slower writes. Write-behind buffers writes in the cache and flushes them to the source later, which is fast but risks data loss. Choosing the pattern is about how much staleness and write latency the use case can tolerate.
The hard part: invalidation and eviction
There is an old joke that the two hardest problems in computer science are cache invalidation and naming things. Invalidation — deciding when a cached copy is no longer valid and removing it — is genuinely difficult, because the answer depends on business rules that are rarely clean. The blunt instrument is a TTL (time to live): expire each item after a set duration. More precise approaches invalidate on the specific event that changed the data. Separately, caches have finite space, so an eviction policy decides what to drop when full — LRU (least recently used) is the common default, alongside LFU and FIFO. Both choices directly shape your cache hit rate, the fraction of requests the cache actually serves.
Failure modes worth knowing
Caches introduce their own pathologies. A cache stampede (or thundering herd) happens when a popular item expires and thousands of requests all miss at once and slam the database simultaneously. Cache penetration is when requests for data that does not exist repeatedly bypass the cache and hit the source. These are solvable — with request coalescing, jittered TTLs, and negative caching — but only if you know to look for them, which is exactly where observability and load testing earn their keep.
At QUANT LAB
Caching is often the highest-leverage performance change we make on the systems we build under SaaS platform development and operate under DevOps engineering. It is frequently the right answer to "the database is slow" before anyone contemplates sharding. But we are deliberate about invalidation, because a cache that serves stale data quietly is worse than no cache at all — and we keep an eye on the security edge cases too, since caching per-user data in a shared layer is a classic way to leak one customer's information to another.
Long-form deep-dives that use this term
All postsAdding AI Features to Your SaaS (2026)
Where AI helps, build-vs-API trade-offs, evals, guardrails, and shipping without torching margins.
Read postBuilding Multi-Tenant SaaS on Postgres RLS
Row-level security patterns for isolating tenant data without separate databases.
Read postCaching Strategies for SaaS (2026)
Cache layers from CDN to Redis, invalidation that works, stampede protection, and what never to cache.
Read post
Related terms
App slower than it should be?
We add the right caching at the right layers — with invalidation done properly — so your app stays fast and correct. Book a 30-minute call.