Half-baked, and a little fried
I’ve been having a hard time making up my mind about whether I think baked or fried is better for a weblog (a baked page is a static file, like Movable Type/Blogger/Radio serve up, a fried page is build from the database (or text files, or whathaveyou) when it is requested, like b2 or blosxom). Thinking about an individual entry archive page getting heavily linked, or the main page when I forget to update for several days, so it’s not changing, rebuilding it for every page view seems like a waste of time and trouble. However, making a tiny change in a template and having to rebuild 490-some static files makes a fried page look pretty good.
While browsing through Rasmus’s Tips and Tricks slides from his talk at PHPCon, I think I found my answer (on page 25 of the accursed PDF file), in what he calls “funky caching.” It works like this: you redirect 404 errors to a PHP script, which looks at the requested URL, decides whether it should actually exist, and if it should it builds the file from the database, saves it to the filesystem, and then returns the page to whoever requested it. Next time that URL is requested, the static file will be served.
What would that mean for a weblog? If a post gets slashdotted or scripting.commed, you’re just serving up static files, not having to run to your database every time a request comes in. The only time you would need to rebuild an entry page would be when someone adds a comment, or when you change the entry or the template. How do you rebuild after a template change? Unlike the Movable Type system of waiting while hundreds of static pages are built, hoping that your cheap shared host’s reaper doesn’t shut the script down, you just delete all the files. If you’re doing a redesign, making dozens of changes, you only need to rebuild the pages you are actually checking, by deleting them and then viewing them once in your browser. Everything else can wait until someone wants the page (and chances are it will be a search engine robot, which can just wait patiently while its pages are fried). Whether or not it’s really a good, efficient system, I love the idea of rebuilding the site by just deleting all the files and then walking away.
For php applications, jpcache works pretty well for caching output. You can set it to always use the cache file if available. In addition, you get ETag and gzip support.
Phil, sounds like On-the-fly Content-Regeneration:
http://www.engelschall.com/pw/apache/rewriteguide/#ToC33
Certainly along the same lines, although at least in the example there, he’s talking about a single file with a single CGI that regenerates it – you’d need to rewrite that to capture the requested filename and pass it to a single CGI.
The most interesting part of that to me is the part about ”(or a cronjob) removes the static contents.” That would be a slick way to do an online RSS aggregator: every hour a cronjob deletes the static files, which only get regenerated the next time someone wants to view them. If you keep search engine bots out, then you wouldn’t need to keep hitting feeds when nobody’s looking at the results. Nice!
so, would it be possible to get movable type to do something like that, or do i have to start over with something else? i’d do just about anything to not have to rebuild, considering how often i redesign.
It would be possible for MT to do it, but I think Ben would have to do it, rather than someone hacking it in. Going from fried to half-baked is pretty easy (see Wari’s half-baked pyblosxom TrackBacked above), since you already have a script that builds any page on demand, and you just have to also have it it save the result to the filesystem, but going from baked to half-baked means rethinking and rewriting the way everything works: you would have to redo the code for everything you do from MT’s interface, so that saving an entry or a template deletes files rather than building them, rewrite the comments and TrackBack scripts to delete files, and write scripts that would rebuild every type of page on demand. And once you did all that, I suspect (without being willing to actually go look) that you would end up with something that the MT license wouldn’t let you distribute.
Search engines tend to not cache 404 pages. Would this technique render one’s website uncrawled by spiders? What effect would this have?
-bill
While I admit I haven’t actually tried it to make sure, I would expect that any server which lets you redirect 404s to a script leaves it up to the script to set the result code, so you wouldn’t actually be returning a 404 for pages that you are funkily caching: if it’s a page that should exist, you create it and return 200, and only return 404 for something that really doesn’t exist.
Okay — i’ve experimented a fair bit with funky caching and Rewrite rules. The one thing that I don’t seem to be able to do is get a set of rules that does *both* funky caching *AND* fixes the trailing slash problem. In other words: 1. for url/ or url, serve up index.html 2. if any file including index.html is missing, run the script.
http://www.zope.org/Members/andym/FSCacheManager
Andy McKay did this a few years ago. This 404-hackey is what vignette uses (? or used to use).
as days pass by
Dynamic vs Static
”Funky caching” could be useful for a static publishing system as well. Weblog archives can consume a great deal of space, yet those pages are rarely needed. Why not GZip entries from previous months and use a 404 handler to extract pages as-needed?
Half Baked and a Mostly Fried Pyblosxom
Can pyblosxom bake pages? Why do you want to bake pages anyway? Got it to work!
Baking, funky caching, and tarballs for weblog cryo storage
So, while I was catching up on T Bryce Yehl’s blog since missing his transition to MovableType, I caught an interesting blurb he wrote with regards to Phil Ringnalda’s ponderings on FriedPages and BakedPages in weblogs:”Funky caching” could be useful f…
Blosxom 0+6i BETA 1
Rael Dornfest: It seems to me this could be mixed with a little funky caching. After all, why generate citations hourly when they can be generated on demand and cached.
The return of funky caching
Freezing baked content extends shelf life.
Funky Conumdrum
I’ve been playing with Vellum a bit today. One of the features of Vellum is funky caching where if you
LRU caching weblogs
CornerHost provides servers more than capable of keeping up with the load of dynamically generating every page on this site on every request, with more than enough cycles to spare for other bloggers on the same machine. Despite this, caching provides b
Rewrite/Error Handler
Gary Burd, writes ”The mod_rewrite approach is better than the ErrorDocument and DirectoryIndex approach because the mod_rewrite approach does not fill the access and error logs with spurious 404s.” Almost. If you’re writing portable code, then its a b…
Web application caching
Blogs, as most current web applications, need to address the server-side caching issue in order to reduce webserver load.
It looks like most people are quite happy with caching static versions of their pages for some defined amount of time. This metho…
Distractions
My weblog goes future-proof and cruft-free while migrating to a PHP + RSS + XSLT back-end.
Vellum – funky Blogware?
Vellum ist mal wieder eine neue Blogware. Mal wieder? Das Konzept lässt aufhorchen. (via Schockwellenreiter)…
News of the recent weeks and bloggage
Personal stuff soon, but interesting articles first: A speech given to MSFT research about the evils of DRM Senator Hatch (who used to understand stuff) has introduced the INDUCE Act, which will criminalize the act of inducing another to commit…
Referers, downtime and caching
As you may have noticed the site has been down for a few weeks now. While it’s an eternity on the internet it flies by in realtime, especially when you’ve got so many other balls in the air. To make a long story shorter…
A few weeks ago I noticed …