MT’s completely search-friendly dynamic URLs

The first few times I saw people worrying that the new dynamic PHP-based publishing in Movable Type 3.1x would mess up their permalinks, and make Google stop loving them, I didn’t worry too much about it. But, today I saw Mr. Dooce saying:

Why not use the dynamic page generation in MovableType 3.1?? Have you seen the instructions for that?? You have to do a lot of monkeying around and what happens to all those Google archived links in search results and links from others? Sure, we could try to generate a redirect .htaccess file, but again, pain pain pain. I think one of the biggest strengths of MovableType is that it generates a living breathing HTML page for the archives. It’s also one of the reasons that so many blogs show up so high in search results.

So maybe there’s a need for a little discussion of how URLs work with dynamic publishing. Please don’t think I’m talking down to you (or to Jon, who I’m sure only needed the prod of saying the URLs don’t change, and probably won’t be reading this at all) in this entry; it’s just that I’ve noticed how your eyes glaze over when you expect that someone’s going to geek beyond you. Fear not, it’s pretty simple.

Here’s an example of a URL from before dynamic publishing:

http://www.dooce.com/archives/nubbin/09_10_2004.html

And here’s an example of a URL with dynamic publishing:

http://www.dooce.com/archives/nubbin/09_10_2004.html

You’ve got to admit, that doesn’t look like it’s going to confuse Google, does it? Every single character is exactly the same, in the exact same order. How’s that happen?

When you ask Apache, the web server program most of you use (we’ll get to non-Apache and less-able Apache in a bit), for that URL, it doesn’t go straight to that directory on the disk and serve it up. Unless your server is run by someone cruel, who doesn’t allow it, first it will look for a file in / named .htaccess, and will do things to the URL based on what’s in there. Then it does the same in /archives/ and then in /archives/nubbin/, and only then, if you haven’t changed the URL around with the instructions in any of those .htaccess files, will it look for a file named 09_10_2004.html and send it off to the browser that asked for it.

With .htaccess and an Apache extension called mod_rewrite, there’s absolutely no need for anyone at the other end to know just what you are doing when you send them /archives/nubbin/09_10_2004.html. For example, when you request my /index.php file, you actually get it from another directory, that you don’t even need to know the name of: it just looks like it’s in /.

MT’s dynamic publishing just takes that a step farther: everything that you tell it you want to be dynamic, it moves the actual file aside (by renaming it to 09_10_2004.html.static), and then the .htaccess file and Apache have this conversation:

Apache:
I want /archives/nubbin/09_10_2004.html
.htaccess:
Is it a real file, on the disk?
Apache:
(Looks) Nope, not there.
.htaccess:
Is that the name of a directory?
Apache:
(Peers at 09_10_2004.html, shakes head) D’oh. No.
.htaccess:
Go ask this file named mtview.php for it – tell him what you want in the REQUEST_URI variable, and he’ll hook you up. And don’t bother telling the person who asked you that you didn’t just find the file.

Apache tells mtview.php what it was originally looking for, mtview.php looks in a database table called mt_fileinfo to see what would be in that file if it was really there, and builds it up from scratch. Apache grabs that, sends it to your browser (or Googlebot, or anything else that asks), and doesn’t let on in any way that it didn’t find the real file. (Well, except for an X-Powered-By: PHP/4.3.6 header that nobody will notice.)

If you don’t get to use mod_rewrite, but you can have custom error pages (either through a .htaccess file or through something like cPanel), you can probably still use dynamic publishing, though you may wind up with a whole lot of 404 reports in your stats: just tell your server to use mtview.php for your custom error page, and it will try to create whatever file someone asks for, only returning a 404 if it’s not a URL that it knows about. And that’s also how you do it if you’re on a Windows server running IIS instead of Apache: tell it to use mtview.php as a custom error page.

Either way, there’s no reason for your permalinks to change, because they don’t really exist any more: once it’s working, you can delete all the old whatever.html.static files, and nothing changes. The URL is just a handy label that mtview.php uses to look up a row in the database that tells it things like “that’s a Monthly page, build it with template number 15, starting with 20040901” or “that’s an Individual entry, entry number 237, build it with template number 17.”

There are a few steps and requirements, and it’s not going to be right for everyone, but one thing you don’t have to worry about at all is your URLs.

16 Comments

Comment by Adam Kalsey #
2004-09-10 22:11:47

I realize that you’re trying to show in a conversational way how all this tech wizardry happens, but your conversation between Apache and .htaccess isn’t exactly how it happens.

The files can be there, but when Apache goes to grab it off the disk it first checks with .htaccess to see if there are any special instructions it should follow. .htaccess says, absolutely! You should ignore that file that’s on the disk and instead ask mtview.php for it, following all the other stuff about the REQUEST_URI and keeping it a big secret from the person on the other end.

Comment by Phil Ringnalda #
2004-09-10 22:31:50