Skip to main content
QuantLab Logo
Glossary · Infrastructure

What is Caching?

Caching is the practice of keeping a copy of data — or the result of an expensive computation — in fast-to-reach storage, so the next time someone asks for the same thing you hand it over instantly instead of doing the slow work again. It is one of the most effective ways to make software faster, and one of the easiest to get subtly wrong.

The fundamental trade-off

Caching exploits a simple economic fact: some data is requested far more often than it changes. If a product page is viewed ten thousand times an hour but updated once a day, recomputing it on every view is enormous waste. The cache stores the computed answer and serves it cheaply. The catch is that you are now keeping a copy, and copies go stale. Every caching decision is really a trade between speed (serve the copy) and freshness (the copy might be wrong). Get the balance right and the system flies; get it wrong and users see outdated data or you blow away all the benefit.

Caching happens at every layer

Caches are everywhere in a modern stack. The CPU has hardware caches. The browser caches assets so a repeat visit loads instantly. A CDN caches content in data centers near users around the world, cutting network latency. The application caches query results and rendered fragments in memory or in a dedicated store like Redis. The database has its own buffer cache. Each layer shaves time off a different part of the journey, and a fast system usually has several working together rather than relying on one.

Caching patterns

How data gets into the cache matters. Cache-aside (lazy loading) is the most common: the application checks the cache, and on a miss it reads the source, stores the result, and returns it. Read-through puts the cache in front of the source so it loads automatically. Write-through writes to the cache and the source together, keeping them consistent at the cost of slower writes. Write-behind buffers writes in the cache and flushes them to the source later, which is fast but risks data loss. Choosing the pattern is about how much staleness and write latency the use case can tolerate.

The hard part: invalidation and eviction

There is an old joke that the two hardest problems in computer science are cache invalidation and naming things. Invalidation — deciding when a cached copy is no longer valid and removing it — is genuinely difficult, because the answer depends on business rules that are rarely clean. The blunt instrument is a TTL (time to live): expire each item after a set duration. More precise approaches invalidate on the specific event that changed the data. Separately, caches have finite space, so an eviction policy decides what to drop when full — LRU (least recently used) is the common default, alongside LFU and FIFO. Both choices directly shape your cache hit rate, the fraction of requests the cache actually serves.

Failure modes worth knowing

Caches introduce their own pathologies. A cache stampede (or thundering herd) happens when a popular item expires and thousands of requests all miss at once and slam the database simultaneously. Cache penetration is when requests for data that does not exist repeatedly bypass the cache and hit the source. These are solvable — with request coalescing, jittered TTLs, and negative caching — but only if you know to look for them, which is exactly where observability and load testing earn their keep.

At QUANT LAB

Caching is often the highest-leverage performance change we make on the systems we build under SaaS platform development and operate under DevOps engineering. It is frequently the right answer to "the database is slow" before anyone contemplates sharding. But we are deliberate about invalidation, because a cache that serves stale data quietly is worse than no cache at all — and we keep an eye on the security edge cases too, since caching per-user data in a shared layer is a classic way to leak one customer's information to another.

App slower than it should be?

We add the right caching at the right layers — with invalidation done properly — so your app stays fast and correct. Book a 30-minute call.

SaaS platform development