Building a Lightning Quick Site with Large Scale Caching
FFW recently built a large-scale Drupal site that was designed to stream a broadcast event. With numerous schedule pages that were updated each minute, and streams that started, stopped, or rolled over according to set schedules, building a site that felt like it was updating thousands of pages in real time was no small task.
I recently sat down to talk with one of FFW’s Drupal architects on how we approached the challenge. He talked me through the development of a cache invalidation solution that was powerful and nimble enough to rapidly refresh the site’s content—which was no small undertaking.
The problems with traditional caching
Caching is how site pages are stored and eventually displayed to users. The first time a page request is sent to a server, when the server loads that page, it saves a copy for itself. Then, the next time a request for that page comes in, the server serves up a copy of that page—unless enough time has elapsed that the page needs to be reloaded to reflect possible updates.
Traditional caching methods are built around a time-based cache expiration system. In this model, pages are cached for a certain amount of time—one to five minutes, perhaps—and at the end of the time period, a new page is generated.
While this solution works for many Drupal sites, there are several problems with traditional time-based caching on large sites or news sites. First, when the new page is served up, it causes a slow page load for the first person to request that page, making the website feel slower in general.
Second, for big sites, time-based caching puts a heavy load on the servers. If a site has 15,000 unique URLs, each of which is on a time-based cache, the server is handling 46 refresh requests each seconds, many of which are likely not being updated. (After all, once most pages are published, they’re rarely updated. Think of a site’s “About” page, or a blog article or news item.)
Lastly, time-based caching makes it hard for sites to break news in a timely way. Big announcements, such as who won an award, or corrections to high-traffic news articles may take five minutes to request on a time-based caching model, and that waiting can feel like agony.
Building a better solution
FFW set out to build a solution that would feel like the site was being updated in real time, but would reduce the load on the servers. We designed a cache invalidation feature that we call event-based cache purging.
Here’s how it worked.
First, the FFW team set the cache refresh rate incredibly high: the shortest cache duration was half an hour, while other pages would be cached for up to two days. This action alone meant that 99.7% of requests made to the Drupal site were cached. This reduced the load on the servers, making the site feel lightning fast.
Next, the team built a solution that would explicitly remove documents from the cache each time a page was updated. It was no small undertaking: when a page was updated or deleted, it would have to be refreshed or removed from the cache, and all the other pieces of content that had relationships with that content would also need to be refreshed. This meant that the caching solution had to maintain millions of records detailing which content had relationships to other pieces of content so that everything could be updated accordingly.
Furthermore, since streams would change from one minute to the next, some pages (such as stream pages or schedule pages) had to pull and display new information each minute; to do this, the FFW team built an algorithm to check the Drupal database each minute and update only the videos that had status or stream changes. The appropriate pages would get updated, which made it seem as though the entire site was updating in real time—though that wasn’t actually the case.
All in all, the solution that FFW delivered was far more cached than an ordinary Drupal site would be, while still delivering content to visitors in what feels like real-time. The implementation of pages that purge when they save, pages that purge referential data, and minute-pulling all combine for an experience that, to the end-user and to the editorial team, feels seamless.
To learn how the FFW team can build a similarly fast solution for your site, contact us and we’ll get back to you as soon as possible.