Blargh. That didn’t exactly work out.
A few days back, DreamHost sent me the email they are sending so many others, saying that I’m an abusive bastard who needs to quit hogging so much of the CPU on my shared server. My hogging appears to be the result of using the one-click installer they wanted me to use to install WordPress, causing me to quite often use a quarter- to a half-second of CPU to run PHP for every requested page. Do that 16,000-some times a day and I use up three or four thousand CPU seconds, and if you only allow 24,000 CPU seconds per day for lusers, you can’t pack very many on a server.
My first solution, installing WP-Cache 2, seems to have done some good, taking me from my 0.244 CPU seconds per PHP instance average before I knew I was a sinner to an average of 0.138 to 0.192, depending. They use a reporting tool that, well, apparently isn’t exactly suited to the task at hand (I’d call it a piece of crap, except that it’s something Hixie wrote back in 2002, so I assume it’s just intended to do something other than what they need), so if you are sinning with PHP running as a CGI (again, something they very strongly recommend) you get everything lumped into one “php5.cgi” category (despite the fact that for Gregarius I run PHP 4 as an Apache module). Partway through day 2, I tried to follow their instructions for getting more detailed results by not running PHP as a CGI, despite the fact that their instructions appear to actually run it as a CGI, and the fact that their Apache module is compiled with such silly restrictions that it won’t actually run WordPress, so php5.cgi had a 0.138 second average, and php had a 0.192 second average. Then after a full day of running just their CGI-pretending-not-to-be, php had a 0.191 second average, so boo to PHP 4 and I’m back to PHP 5.
Still, I’m an evildoer:
customers should use less than 30-40 cpu minutes per day and dividing by 60 (because they certainly couldn’t report and mandate in the same unit), in the most recent day I used 52.8. So I started looking for big wins. The most obvious one would be to ban search engine crawlers: between googlebot, msnbot, Yahoo!’s Slurp, and whatever the hell Paul Allen is up to with Everest-Vulcan, I could cut out more than 3000 hits per day, in exchange for losing 56 search engine visitors interested such things as “ho [sic] do I turn off smartfilter,” but that seems a little extreme. So I started looking at by far my biggest request,
Oddly, despite the presence of a bunch of Etag and Last-Modified handling code in WordPress, I seem to be returning nothing but
200 OK, not a
304 Not Modified in the bunch. Not really a problem, since DreamHost gives me more bandwidth than I can possibly use, and then increases it every time I turn around, but the most obvious way to solve that, serve a static file and let Apache handle conditional gets, would also cut out a lot of PHP use. That’s when I started foolishly button-pushing: a little WordPress plugin that hooked all the post-changing functions and deleted the static file, some mod_rewrite funky caching rules to serve the static file if it was present, or call
wp-atom.php if it wasn’t, and a little hacking in
wp-atom.php to buffer output and dump it to the static file, and I thought I was all set. Then I checked my aggregator, and saw that I’d just syndicated everything currently in my whole feed to Planet Mozilla, since I was both serving the static file for more than I wanted, and also saving various things like comment feeds or individual entry feeds as the static file for the main feed. Oopsie. A bit more traffic, as people in such odd places as DebianLinux.net’s Free Software Planet wonder about the source of this big wad of posts, isn’t exactly going to cut my resource use.
So, I did the un-WordPressish thing, and put the burden on me: instead of the plugin deleting, and you having to wait for the regeneration, when I save a post I have to wait (a fraction of a second) while the feed regenerates, from a separate file that won’t be building other random feeds. For some reason I don’t understand that maybe has to do with all the redirecting and mod_rewriting in the background, the access log still reports sending a
200, but sending
bytes which I assume means an approximate
304. I hope.
While looking closely at accesses, I noticed a couple of other big wins from vastly too quickly refreshing feed consumers: somone in Italy with the oft-reviled UA string
Java/1.4.2_08 who was fetching my feed (very rarely updated more than once a day, usually more like once a week or month) every minute, and two users of JetBrains’ Omea Reader (which I probably once knew was where Dmitry Jemerov went when he stopped developing the grand old aggregator Syndirella, but had since forgotten) who from the random 1.5 to 4.5 minute refreshes appear to have had their refresh rate set to something shorter than their program was able to make it through their whole set of feeds, so that they were literally refreshing as fast as possible. If you are a Brazilian or Hungarian user of Omea Reader, and you are willing to turn your refresh rate down to something reasonable (an hour pleases me most, for historical reasons, but even 30 minutes is quite tolerable), feel free to contact me, and I’ll stop turning you away with a
503 Service Unavailable.
The strange part about the Java-user in Italy is that, while he was fetching the WordPress-produced feed, with its either missing or non-working information for conditional gets, once a minute, and continued to attempt to fetch it once a minute while I served him
503 errors for a day or so, as soon as I let him back in to get the static feed he switched to a reasonable enough 30 minute refresh schedule. Whether that’s in the aggregator code, or an underlying HTTP library, defaulting to a one minute refresh in the absence of any other information strikes me as a really awful idea.
So, what happens when I push this button that says “Publish” — will it notice that the feed contents have changed, and change out the static file, or will I just have broken something else? (Answer: nothing will happen, because I’m incompetent. Once more.)