The Invalidation Spectrum
Phil Karlton famously said there are only two hard things in Computer Science: cache invalidation and naming things. In distributed systems, keeping cached data consistent with the source of truth is a trade-off between consistency and availability.
1. Purge vs. Invalidate
While often used interchangeably, these terms have distinct semantics in advanced CDN configurations (like Varnish or Fastly):
- Purge: Immediately removes the object from the cache. The next request must go to the origin. This increases origin load but ensures strict consistency.
- Invalidate (or Ban): Marks an object as stale. Depending on the `stale-while-revalidate` directives, the cache might serve the stale content while fetching the new version in the background, or issue an If-Modified-Since request to the origin.
2. Versioning and Cache Busting
For static assets, the most effective invalidation strategy is immutable versioning. Instead of trying to purge `app.js`, you deploy `app.a1b2c3.js`.
This technique, often handled by bundlers (Webpack, Vite), allows you to set long `Max-Age` headers (e.g., 1 year) because the file contents never change at that specific URL.
3. Surrogate Keys (Cache Tags)
Invalidating dynamic content is trickier. A single database record might appear on a product page, a category listing, a search result, and a homepage recommendation widget. Surrogate Keys (or Cache Tags) allow you to group cached objects by dependency.
When the origin serves a response, it adds a header:
Later, if Product 42 is updated, you issue a single API call to the CDN to purge the tag `product-42`. The CDN finds all cached pages containing that tag and invalidates them simultaneously.
4. Cache Warming
Purging a hot key can lead to the Thundering Herd problem, where thousands of concurrent requests hit the origin simultaneously. Cache warming mitigates this by proactively fetching the new content before (or immediately after) invalidation, or by using `stale-while-revalidate` logic to allow one request to update the cache while others are queued or served stale content.