The Global FlyFisher
Simply the Best Place to go for Online Fly Fishing and Fly Tyinghttp://globalflyfisher.com/blog/a-little-about-caching
A little about caching
A site as GFF needs some level of caching in order to be able to deliver pages fast without bogging down the server or database
Nerd alertThis is NOT about fly fishing or fly tying, but about site development and nerdy stuff
Drupal is a flexible and intricate system, which can build some really complex pages based on many rules and a lot of logic. The problem with all these rules and all the logic is that the server has to do a lot of calculations and database lookups every time a page is built. A single page can easily require 500-1,000 MySQL queries before the system has gathered all the information needed to properly build the page.
With a site such as this, that is simply crazy. When users request pages at a rate like one or several per second, then the bottlenecks become to narrow, and the server starts delivering pages a a rate that's way too slow. And in some shared server environments you will get punished or at least notified that you are using too many resources such as CPU cycles, and maybe even experience shutdowns based on overloading the server. Been there, done that...
Luckily Drupal has several mechanisms to keep this load low. Internally Drupal caches a lot of elements, and the large number of queries and calculations needed to create a single node, form or some other element are spared by saving the result in a database table. When a node is needed again, the finished node element is read from the database in one swoop in stead of creating it from the bottom again. This also goes for some menus, forms and blocks. This does save some work, but only cuts off a bit of the calculation needed. Placing blocks, finding related content, building dynamic menus and many other tasks are still done and take their toll on the server.
The most efficient way of caching is at the absolute outer level of the page building process, meaning handling the finished HTML. A lot of pages wind up exactly the same every time they are served, 100% identical, and in that case we might as well serve the already created HTML from a previous visit.
And rather than saving that in the database and reading it from there – adding load to the already busy database – we can simply save the pages as files. Good old fashioned flat HTML files.
The file system on a web server is highly efficient and able to deliver content in split seconds. So stashing the HTML in a static file and serving that can be done at almost no cost to the system.
It's even more efficient if you can store the cached pages in memory, and for really busy sites, that's a very good solution. On GFF we don't need that kind of efficiency.
Drupal doesn't have a built-in system to handle caching outside of the CMS. The most common way to create a static file caching system on really highly loaded sites is to use Varnish, a so called HTTP accelerator.
Varnish is able to create statically cached files for any system, not just Drupal. Since it operates on a server level, it doesn't have to know how the pages were built, but simply grabs the finished result before it's sent to the user, and saves it. The next user who requests the same page will never even activate Drupal, but Varnish will simply send the finished page from its stash. No database queries or any Drupal-PHP will be executed, taking off a huge load from the server. Varnish is able to serve up to 3,000 pages a second, which is a crazy lot of traffic.
Efficient and complex
Varnish – which was originally designed by a Dane by the way, Poul-Henning Kamp – is also a pretty complex system, which requires a good deal of server knowledge and server control to harness. Varnish uses the virtual memory of the server to hold its cached pages, and lets the server decide what to send on to a physical disk.
Since Varnish has to know what and when to cache and serve, there's a whole language to control it. When using Varnish on Drupal you also need a module to do some communication with Varnish, and setting up a Varnish system isn't trivial.
Not 3,000 pages per second
Now, GFF doesn't need to be able to serve 3,000 pages per second, and I have no wish to wrestle a system as complex as Varnish on the server.
But the traffic load is high enough to call for some kind of extra cache on top of what Drupal offers out of the box. When the load is high, there might be a page requested per second, and that's enough to be noticeable on the server.
The virtual server that GFF runs on has 4 virtual CPU cores to its disposal, and sometimes two of these are maxed out and the third on its way. It's not an issue for the server, but the page speed drops and the user experience isn't good.
Boost to the rescue
Mike Carper is a Drupal developer who, together with a number of other contributors, have made and refined a Drupal-module called Boost.
Boost does what Varnish does, but on a less complex scale and within the Apache and Drupal ecosystem on the web server. It plugs into the web server and Drupal and also makes static HTML files of the content, but lets Apache serve them from files, circumventing Drupal, PHP and the database for most users.
It's not super simple, but certainly simpler than Varnish.
I installed Boost back when we were hosted by Hostgator where they would constantly nag me because our site used too much processing power. Now we're on Linode, and the base server performance is MUCH better, but still Boost makes a huge difference compared to the uncached site.
The load on the CPUs is about a quarter or less of what it is without Boost, and page speed is very obviously much better.
It's pretty easy to install Boost. Copy over the module and activate it. Set the basic parameters and fire it up. It will then start building static pages from the Drupal page requests.
In order to serve these files rather than having Drupal regenerate them, you need to add some stuff to your .htaccess file, which tells Apache to look for the static files rather than invoking Drupal for all pages.
The Boost'ed pages are only served to anonymous users, and certain pages are exempt from being cached out of the box. Logged in users and pages that need to be updated for every visit, are simply not cached, but built by Drupal like always.
The result is an obvious improvement in speed, and most pages are loaded in a second or two when you visit them the first time, and shown almost instantly when you return.
On the server side there's a smaller load on the resources, making the site run more smoothly for the logged in users and for me – the admin – in particular.